A Proposal for Searching Desktop Data

Kayest, Mamta; Jain, S. K.

doi:10.1007/978-981-10-0419-3_14

Mamta Kayest⁵ &
S. K. Jain⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 413))

892 Accesses
2 Citations

Abstract

Managing personal desktop data has become a necessity of the present day society as data on one’s PC is increasing day by day. This data is huge as well as heterogeneous in nature. Users often need to locate the required data on desktop system. Therefore, how efficiently to find the required data items has become an emerging research issue. Various desktop search engines and tools are developed to provide search over the desktop data. In this paper, we propose a solution for managing heterogeneous desktop data.

Access provided by Autonomous University of Puebla. Download conference paper PDF

LSG: A Unified Multi-dimensional Latent Semantic Graph for Personal Information Retrieval

FRIDAL: A Desktop Search System Based on Latent Interfile Relationships

Searching Desktop Files Based on Access Logs

Keywords

1 Introduction

The capacity of hard disk drives has increased tremendously; as a result, user stores a large number of files on his/her personal computer. So, certainly sometimes users face lot of difficulties in getting desired documents even though they know that they are saved somewhere on the disk. Nowadays, searching for documents can be faster on the World Wide Web than on our personal computer. Due to the availability of a variety of web search engines and ranking algorithms like the PageRank algorithm introduced by Google [1], web search has become more efficient than PC search. Therefore, there is a need of providing efficient search over desktop data to access required information. The main motivation of this work is to search files on desktop system efficiently for retrieving required data easily. Retrieval of partial information from files is also a necessity of users. In this paper, we have proposed diverse ways of searching heterogeneous desktop data. The proposed system also retrieves partial contents from a semi-structured data files, e.g., XML files. Rest of the paper is organized as follows: Sect. 2 describes the related work; Sect. 3 discusses the proposed system for desktop data search; and finally, in Sect. 4 concludes the paper.

2 Related Work

Personal data refers to digital data accessed by a person during his/her lifetime and is owned by oneself. Personal data consists of heterogeneous data mix of word documents, pictures, XML data, audio file, video files, emails, and so on. This large amount of personal data may be spread on various devices like desktop system, laptop system, homepage server, e-mail server, official website, digital cameras, mobile phone, etc. For retrieving relevant information, effective management of personal data is required [2]. Desktop data is also personal data on one’s desktop, but it is centralized in nature. Various desktop search engines (DSEs) have been developed for managing desktop data including Windows search [3], Google desktop search [1], Yahoo! [4], Corpernic Desktop Search [5], and many more, some of them are compared on various parameters in [6]. DSEs are based on the file systems of underlying operating systems and lack in capability of retrieving partial contents from files [7]. For searching through DSE approach, users first input search query to the search engine and then search engine transfer the query to the indexed database to get required result [8]. DSE employ one or more crawler programs on desktop files to crawl and extract information that are used by indexer to create an indexed database. Problem with DSEs are that they do not provide partial retrieval of information [7], no support for complex queries, no support for semantic integration, and take significant initial indexing time. Modeling and querying over heterogeneous desktop data is another important research issue. In iMeMex [9, 10] data model, a graph data model has been proposed for modeling personal data. A new Xpath-like query language named iQL is proposed to query over the uniform view, which is complex to understand by a novice as users are expected to have knowledge of the underlying structure of the personal data. Similarly, various methods are proposed to query over XML data [11–13].

3 A Proposal for Searching Desktop Data

This section discusses a solution for managing desktop data that includes various aspects of searching including metadata, relationships, and contents of XML files. The proposed work searches file system based on metadata and contents of semi-structured file. Figure 1 depicts a context diagram of the proposed system. Users input queries to the desktop search system, which in turn interacts with the file system for retrieving necessary information. The system returns results to the user after processing queries. Figure 2 depicts a detailed DFD of the proposed desktop search system. The system is divided into two main modules; the first module makes search over files/folders based on their metadata and the second module process queries on XML documents. The proposed system offers options for making searches based on the metadata of files and folders, relationships, and contents of XML file. These options are summarized as follows:

A file is searched based on metadata name, size, extension, and last modified date.
A folder is searched based on metadata name, size, and last modified date.
Relationship hasfile makes search on files.
Relationship hasfolder makes search on folders.
Retrieval of full contents of XML file.
Retrieval of partial contents of XML files based on tags and field names.

For query over metadata of files/folders, first user enters the path of the file/folder and the relationship either hasfile or hasfolder to make search on files or folders. For example, a user searches all the files from drive “d” that were last modified on January 10, 2015. After giving path and relationship a hash table is created in memory containing various entries of file/folder’s metadata and user gets result for files/folders based on the metadata as given in the query. This method of searching supports update guarantee as hash table is created in memory after entering the query. It also reduces the time taken for initial indexing of data by desktop search engines. Algorithm 1 makes search over files/folders based on their metadata and Algorithm 2 searches contents from XML files.

Algorithm 1 (Metadata-based search)

Step 1:
enter the path and relationship
Step 2:
map corresponding metadata entries in hash table: name, extension, size, last modified date for files and name, size, and last modified date for folders.
Step 3:
if relationship is “hasfile”
- then
- read choice in ch for metadata from 1 to 4
- 1. name 2. size 3. extension 4. last modified date
- else if relationship is “hasfolder”
- then
- read choice in ch for metadata from 1 to 3
- 1. name 2. size 3. last modified date
- end if
Step 4:
if (ch == 1)
- then
- search hash map entry for file/folder name and print result
- else if (ch == 2)
- then
- search hash map entry for file/folder size and print result
- else if (ch == 3)
- then
- search hash map entry for file’s extension/folder’s last modified date and print result
- else if (ch == 4)
- then
- search hash map entry for file’s last modified date and print result
- end if

Algorithm 2 (Content-based search on XML files)

Step 1:
enter path of file
Step 2:
read choice in ch for file’s contents
- (1) full contents (2) tag’s data (3) subtag/subfield’s data
Step 3:
query parsed
Step 4:
if (ch == 1)
- then
- get and print full contents of XML file
- else if (ch == 2)
- then
- get and print all data of tag name
- else
- then
- get and print data of subtag
- end if

Some sample queries that the proposed system processes are

1.
Search files from drive d where the file size is 500 MB.
2.
Search file named nisha from e drive.
3.
Search all folders from drive g which are modified on January 10, 2015
4.
Search for folder named nishafol from f drive.
5.
Search files from drive d which are modified on January 11, 2015.
6.
Search all .xls files from d drive.
7.
Display employee names from file EmpData.xml located in drive d.
8.
Display employee’s postal addresses from file EmpData.xml located in drive d.
9.
Display all information related to employees from Empdata.xml, which is located in f drive.
10.
Display contents of file nisha.xml from g drive.
11.
Display employee’s last names from file Empdata.xml from d drive.

4 Conclusion

Management of user’s desktop system is a need of current society as desktop data is huge in amount and change frequently. Various desktop search systems such as Google, Corpenic, etc., are developed for management of personal desktop data. But these search engines require extra indexing time prior starting their work and also do not support partial retrieval of contents from files. In this paper, we propose design of a desktop data search system to which allows search over desktop data using metadata as well as partial and full content retrieval from files (XML files). The implementation of the proposed system is in its advanced stage, extending functionality of the proposed system in our future plan.

References

“Google Desktop” A desktop search engine from Google available at http://desktop.google.com, http://googledesktop.blogspot.in/ last visited on August 25, 2014.
Dittrich J. P., Blunschi L., Farber M. O. R. Girardm, Karakashian S. K., Antonio M., Salles V., “From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System”, proceedings of BTW 2007, 2007, pp. 292–308.
Google Scholar
“Windows Desktop search” A desktop search engine from Microsoft available at http://www.microsoft.com/windows/products/winfamily/desktopsearch/default.mspx, last visited on December 29, 2014.
“Yahoo! Desktop Search” A desktop search engine from Yahoo available at http://info.yahoo.com/privacy/in/yahoo/desktopsearch/, last visited on December 1, 2014.
“Corpenic Desktop search” A desktop search engine from Microsoft available at http://www.copernic.com/en/products/desktop-search/home/download.html,last visited on Jan 9, 2015.
Markscheffel B., Buttner D., Fishcher D., “Desktop Search Engines- A State of the Art Comparison”, proceedings of 6^th International conference on Internet Technology and secured Transactions, 11–14 December 2011, pp. 707–711.
Google Scholar
Pradhan S., “Towards a Novel Desktop Search Technique”, proceedings of 18^th International Conference on Database and Expert Systems Applications, DEXA 2007, held on September 3–7, 2007 at Regensburg, Germany, LNCS 4653, pp. 192–201.
Google Scholar
Cole B., “Search engines tackle the desktop”, IEEE, 2005.
Google Scholar
Dittrich J. P., “iMeMex: A Platform for Personal Dataspace Management”, proceedings on 2nd NSF sponsored Workshop on Personal Information Management, ACM SIGIR 2006.
Google Scholar
Dittrich J. P., Salles M. A. V., “idm: A unified and versatile data model for personal dataspace management”, proceeding of 32nd International Conference on Very Large Data Bases, VLDB 2006, held on September 12–15, 2006 at Seoul, Korea, pp 367–378.
Google Scholar
Florescu D., Kossman D., Manolescu I., “Integrating keyword search into XML query processing” , proceedings of International World Wide Web Conference, pp. 119–135 (2000).
Google Scholar
Pradhan S., “An algebraic query model for effective and efficient retrieval of XML fragments”, VLDB, pp. 295–306 (2006).
Google Scholar
Yunyao L., Cong Y., Hosagrahar J. V., “Schema-free XQuery”, proceedings of 30th VLDB, pp. 72–83 (2004).
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, National Institute of Technology, Kurukshetra, India
Mamta Kayest & S. K. Jain

Authors

Mamta Kayest
View author publications
You can also search for this author in PubMed Google Scholar
S. K. Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mamta Kayest .

Editor information

Editors and Affiliations

Guru Nanak Institutions, Professor & Managing Director, Ibrahimpatnam, Andhra Pradesh, India
H. S. Saini
Guru Nanak Institutions, Professor & Associate Director, Ibrahimpatnam, Andhra Pradesh, India
Rishi Sayal
Guru Nanak Institutions, Professor and Head – CSE and IT, Ibrahimpatnam, India
Sandeep Singh Rawat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kayest, M., Jain, S.K. (2016). A Proposal for Searching Desktop Data. In: Saini, H., Sayal, R., Rawat, S. (eds) Innovations in Computer Science and Engineering. Advances in Intelligent Systems and Computing, vol 413. Springer, Singapore. https://doi.org/10.1007/978-981-10-0419-3_14

Download citation

DOI: https://doi.org/10.1007/978-981-10-0419-3_14
Published: 20 February 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0417-9
Online ISBN: 978-981-10-0419-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Proposal for Searching Desktop Data

Abstract

Similar content being viewed by others

LSG: A Unified Multi-dimensional Latent Semantic Graph for Personal Information Retrieval

FRIDAL: A Desktop Search System Based on Latent Interfile Relationships

Searching Desktop Files Based on Access Logs

Keywords

1 Introduction

2 Related Work

3 A Proposal for Searching Desktop Data

Algorithm 1 (Metadata-based search)

Algorithm 2 (Content-based search on XML files)

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Proposal for Searching Desktop Data

Abstract

Similar content being viewed by others

LSG: A Unified Multi-dimensional Latent Semantic Graph for Personal Information Retrieval

FRIDAL: A Desktop Search System Based on Latent Interfile Relationships

Searching Desktop Files Based on Access Logs

Keywords

1 Introduction

2 Related Work

3 A Proposal for Searching Desktop Data

Algorithm 1 (Metadata-based search)

Algorithm 2 (Content-based search on XML files)

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation