Keywords

1 Introduction

A study with knowledge workers based in the UK and the US found that 83 % of them felt that they wasted time each day on issues of document collaboration [16]. 73 % of knowledge workers reported wasting work time looking for files. Another study observed that knowledge workers spent 20 % of their time searching for hard copies of documents, and that 50 % of the time they did not find what they wanted [3]. It is estimated that the average organization makes 19 copies of each document (37 % being unnecessary, 45 % being duplicates) and loses one out of every 20 documents [15]. Our work addresses the superfluous printing and copying of duplicate documents, as well as the problem of re-finding previously printed copies.

Digital documents are typically managed electronically, while paper documents are mostly organised and managed manually. This leaves users to develop their own strategies for storage and retrieval of physical documents. Ironically, often the use of computers compounds this problem by making it easier to print a new version of a document that is not found immediately. Additionally, reading paper-based documents preferred by many as it offers the flexibility to read anywhere and is also easier to mark up [9]. So even though the majority of documents may now be digital, people still maintain physical copies, which then have to be kept track of and located.

This paper describes the design, implementation and evaluation of our Human-centred workplace (HCW) – a system that enables the tracking of physical printouts of documents using a personal digital library. The concept of this system had been briefly introduced previously [8]. This paper contributes a description of the actual implementation, deployment and evaluation.

The remainder of this paper is organized as follows: Sect. 2 describes the design and implementation of the HCW, while Sect. 3 illustrates the interface and interaction design. Section 4 describes the system evaluation. Section 5 discuses related approaches, while Sect. 6 addresses differences to related approaches, insights of the evaluation for further research and the planned extensions and further steps in our research. The paper concludes with a brief summary.

2 Design and Implementation

The design concept of the HCW was briefly introduced in [8]; here we provide more details and implementation information. We identified five functional requirements, based on our discussion in the introduction (an extensive discussion of requirements and implementation can be found in [4]). These form the basis for our implementation as well as the exploration of related approaches (see Sect. 5). The first three requirements refer to the systems core functionality of tracking, search and recording printing: (R1) Tracking Document Location: Tracking physical document location is the core functionality we aim for to support the task of re-finding documents and avoiding having to re-print them. (R2) Digital Search: There needs to be a search interface to support re-searching and re-finding of physical documents. (R3) Keeping record of printed documents: We wish to track mostly printed documents, but also other physical documents. Keeping track of print-outs would avoid the need to reprint a document and thus avoid duplication of the document. The remaining two requirements refer to the manner in which the R1–R3 are to be achieved: (R4) No Order to Follow: Approaches that require users to follow a pre-defined archival methodology or to be generally orderly have been shown to fail; many people will not follow procedures, however sensible these may be. (R5) No Special Hardware: The system should not require any special hardware so that it can be installed in ordinary offices of knowledge workers.

Fig. 1.
figure 1

HCW architecture and data flow

Figure 1 shows the architecture of the HCW system, designed to fulfill these five requirements. It consists of three elements: Document Manager, Document Tracker, and Document Search. Not shown are the pre-existing elements of office document printer and web cams, which are used for monitoring documents. The dataflow sequence of HCW is as follows: as the user signals the intention to print a digital document (step 1), the document’s metadata are obtained (step 2), and encoded into a unique QR code (step 3), and added automatically to the document’s front page (step 4). The document metadata are added to the HCW database (see step 5), and the user may then read/move the document within the workplace. The cameras monitor the workplace (step 6) and continuously record images (step 7); the images are analysed for QR codes (step 8), and after error correction (step 9), they are decoded and the document’s location is recoded in the digital library based on the areas covered by the cameras (step 10). The user searches for and re-finds the document via the HCW search interface (step 11).

Fig. 2.
figure 2

Front page

The implementation uses Microsoft .net and C#. Two web cams are used: a simple wired web cam as found in typical office settings and a wireless high-resolution camera. The cameras’ fields of vision are semantically encoded to refer to different office areas such as desk, floor, and table. Printer++ is used as a virtual printer to receive the user’s print request (www.printerplusplus.com) and Stroke Scribe (http://strokescribe.com) is used for QR code generation. The QR code uses document header information as metadata. It is placed on a separate front page of the printed document. We experimented with different sizes for the QR code – the one seen in Fig. 2 is the minimal size for recognition in a typical office environment in which cameras are between two and five meters from the documents. QR decoding in the Document Tracker takes an image of the desk surface genrated by the camera and performs a simple five-step algorithm. First the image is converted to grey-scale to reduce the processing load. A Canny operation highlights the object edge (leaving the background black) and the barcode is extracted from the edge image. The barcode is read, and decoded (using the Aspose SDK, www.aspose.com), and sent to the Library to check for a matching document. The documents are included in a personal Digital Library (using Greenstone software, www.greenstone.org) with an extended metadata database to capture the location images and QR code information. More technical details are available in [4].

Fig. 3.
figure 3

Annotation of print record

3 Interface and Interaction

In this section, we show the HCW interface and user interactions, and highlight the benefits of using the HCW for managing and re-finding paper documents using a scenario.

Let’s consider a student printing documents for their Master’s studies. The initial print dialogues (using the HCW printer) seamlessly integrates into the established workflow. The student is prompted by the system to enter a short description about the print-out’s purpose, see Fig. 3, indicating whether this copy is for their own reading or for someone else. If this document has already been printed, HCW warns about this potential duplication (see Fig. 4(a)), allowing our student to cancel printing and find the previous copy or to print again, e.g., to give a copy to someone else. Finally, when a previously printed document cannot easily be re-found, HCW can be used to trigger the student’s memory with the purpose and last location of the print-out, see Fig. 4(b). Additionally, the use of HCW builds a personal digital library of reading material, which can be searched and browsed using the existing library interface.

Fig. 4.
figure 4

(a) Duplicate printing warning, (b) search for printouts

4 Evaluation

We evaluated the HCW software prototype to explore to what extent it satisfies our goals of helping knowledge workers in ordinary office environments to re-find their documents. We carried out three studies: (1) an office-based single-user study over two weeks, (2) a lab-based study with 10 participants, (3) qualitative functionality tests.

4.1 Single-User Study (Office-Based)

The prototype was used by one academic knowledge worker regularly for two weeks for printing and tracking of student submissions, project work and publications. The software was set up in their office (see Fig. 5) on a Dell OptiPlex 9020 with two cameras (USB 2.0 camera with 1600 \(\times \) 1200 colour images at 25 frames per second; wireless web cam 1280 \(\times \) 800 colour images at 30 frames per second) mounted 100 cm above the table and 125 cm above the desk, respectively. The participant kept a diary of events and incidents and was interviewed at the end of the first and the second week to obtain a deeper understanding of the participant’s experiences and gain feedback about the system.

Fig. 5.
figure 5

Study setup: cameras circled red, anonymised participant (Color figure online)

Feedback and Results. The study was performed during a very busy period in the participant’s work. Even though they did not fill in the diary as diligently as was hoped by the researchers, detailed oral feedback was obtained. During the study period, more than 35 documents were printed (and thus entered into the HCW system). Four documents were purposely printed twice to be shared with colleagues. The participant expressed satisfaction with the front page of the document printout, stating it “provides sufficient information to identify the document” and “makes it easy to differentiate from other documents.” They noticed that the print phase took a “little more time” than for ordinary printing, as the HCW processing delayed the printing start by a few seconds. The participant observed that they looked for a number of document print-outs several times “for referencing purposes” during the study period. As this was a very busy time the participant failed to note how many documents and printouts they tried to locate.

When a printout was not immediately visible on the desk, the participant confessed to the habit of reprinting the document. They found HCW’s automatic warning about document duplication was a “useful feature” to reduce reprinting, and reported that the printout annotation and location information given by HCW helped trigger their memory as to the purpose of the document and also helped them find the printout if it was in the office. The participant explicitly praised the “simplicity of user interface” for finding physical documents, stating that “it was easy to understand” and a “simple to interact user interface.” They noted that the availability of different searching parameters (such as keyword search and between-two-dates search) made the search “more accurate” and “targeted.” They reported that “document search was generally successful” but that sometimes the recorded camera images would “show two documents at one place” (i.e., more than one document is shown in the image) in which case they “did not know which one is mine.” They suggested highlighting the correct physical document in the image.

Overall, the participant found the HCW system “convenient” and “useful.” They emphasized that “the software makes sense” and felt it helped them manage their documents in the workplace.

Functionality and Changes. During the two weeks of running the HCW prototype, occasional misfunctions were observed. Very long documents would sometimes not print – this was due to a malfunctioning print spooler service which was fixed during the study. Occasionally the QR code would appear to be shrunk, which led to difficulties in decoding. This was traced back to documents with more than 500 characters in the first few lines on the first page. This was addressed by lowering the error correction parameter in the QR encoding to allow for greater storage capacity of the QR code.

4.2 Lab-Based Study

The lab-based study used the improved software. Again, cameras were mounted 100 cm above a table and 125 cm above a main desk respectively, see Fig. 6. The study had 10 participants UP1–UP10 (6 female, 4 male) aged 18 to 50 years. We invited participants from a variety of backgrounds who were familiar with computer use (2 arts & social science, 4 management, 3 ICT and 1 earth science; 9 students and 1 professional). In an introductory interview, each of them reported often having to search for documents they had previously printed, spending up to three hours on document search in some cases. The study was designed around a set of tasks, and followed by a short interview. Each participant was given three tasks: (1) print the first copy of a document, (2) print a second copy of the document, and (3) find the location of the document.

Fig. 6.
figure 6

Study setup: cameras circled red (Color figure online)

Printing 1st Copy. All 10 participants found this process simple. Seven participants mentioned that while they appreciated the request for annotations on the print-out, they felt they needed greater familiarity with the system in order to better predict what sort of annotations would prove most helpful. Two participants felt the request for additional information held them back in their purpose of printing a document. They were not sure if the information they provided would help later. One participant wished to use language-specific characters, which were not supported. Four participants found the printing less convenient due to the delay in having to enter additional information and the short additional delay for QR code encoding. UP8 and UP9 suggested the use of a progress bar to indicate the impending commencement of printing. Eight participants found the front page sufficient to identify the document; the other two participants did not provide specifics about which information they would wish to include.

Printing 2nd Copy. All 10 participants noted that HCW’s notification of an earlier printout together with its location caused them to reconsider whether a second copy was indeed needed. Nine participants re-found the previous printout and one participant reprinted the document. UP5 expressed that “avoiding unnecessary reprints of the document is a very useful feature, as it would help me to avoid having multiple copies of the same document around.” UP8 commented that HCW “encourag[ed] using the existing copies [rather] than printing [a] new copy.” Five participants felt that the process of re-finding a document was not time-consuming, the other five felt that re-printing would have been faster. UP4 observed that the front page of a reprinted document is identical to the original printout and suggested providing copy number and date of reprinting to distinguish physical copies of the same documents. UP8 suggested providing more information about the document on the front page.

Finding a Printed Copy. Seven participants were successful in finding a printout the researchers had placed in the lab based on the information provided in HCW. Eight found the process effective and was not time consuming. Three had difficulty using the search window efficiently and needed to ask the researcher for help; these users suggested that the user interface layout should be more informative. UP3 suggested an option to check the functionality of every connected camera placed in the workplace.

All ten participants stated they were excited about the idea of automatically keeping track of their desk papers. Five found the system convenient and described the system as “very useful.” UP1 gave feedback that “the system is amazing; it will help to keep track of each and every document” and that the system made it “easier to find papers on the desk, simply by showing the picture of the desk the paper is on.” UP9 expressed that they found the “system convenient and useful for a forgetful person like myself. Not only does it help to find printed document or where my file is, it also helps the environment by avoiding re-printing.” UP8 found the system “very useful as I could see which documents have been printed earlier.”

4.3 Functional Quality Evaluation

Reading QR codes at an angle was found to have a higher reading error rate. We tested a 10 \(\times \) 10 cm QR code at a distance of 110 cm from the camera. A document presented to the camera at an angle of \(0^{\circ }\) deviation was read successfully in all tested cases. An angle of \(10^{\circ }\) read 4 of 5 documents and \(20^{\circ }\) was successful in 3 of 5 attempts. At \(30^{\circ }\) or more, successful reading cannot be guaranteed (only 1 in 5 for \(30^{\circ }\), none for \(45^{\circ }\)). When the document is positioned at 125 cm from the camera, the success rate at \(20^{\circ }\) dropped to 2 of 5. These can be improved by enlarging the QR code, but 16 \(\times \) 16 cm is a natural limitation for QR code on A4 paper. The system still takes about 2 to 3 s to recognize the QR code. Best results are therefore achieved when the users pause briefly between adding each document to a pile of papers. Additional tests are described in [8].

5 Related Approaches

We here present an analysis of related work based on our five requirements (see Sect. 2). The subsequent Sect. 6 then provides a comparison to our HCW system and discusses implications and open research issues.

SOPHYA is a physical document collection system which utilises a wired technology for managing and retrieval of physical documents and artefacts within the collection [13]. SOPHYA thus provides a means of linking the management of real world document artefacts (e.g. folders) with their electronic counterparts, so that document management activities such as filing, locating, retrieving document can be supported. The system uses specially designed hardware shelves and physical document containers for holding documents. SOPHYA supports unordered (piling) [12] and ordered (filing) [13] document collections in two different system implementations. Our notion of filing and piling of documents follow Henderson and Srinivasan’s concepts [7]. The connection between the container and the location of the container is established with electronic circuitry. Each folder has an allocated physical location within a container. An LED on the surface of the container acts as a user interface to indicate that the required document is in the container. Firmware embedded in the physical storage location communicates with the container (e.g., by reading IDs of the containers and controls the user interface). The firmware also communicates with the middleware, which maintains a simple database to keep track of information in the container and the physical location of the container. For our scenario of non-disciplined knowledge workers, SOPHYA has a number of limitations. First the documents still have to be placed in a particular container to be located so it does not provide flexibility and a particular procedure needs to be used. Secondly, metadata need to be entered and maintained manually and this is time consuming. HCW aims to cater for real-life situations in which people deposit their physical documents anywhere in the office and need to recover them easily.

PaperSpace is a document management system that maintains a link between the printed document and its digital counterpart [17, 18]. PaperSpace works with operation codes (in the shape of small graphic icons) printed in the margins of each page of the document. PaperSpace uses a medium resolution webcam to recognise the papers. The system features other functionalities such as capturing and parsing gestured operation performed on the (paper) command bar. The bar image provides linking functions between the paper document and its digital counterpart, and users can directly manipulate the digital document using their printouts. The PaperSpace system provides an innovative interface for linking physical and digital copy. Its approach to enhance the print copy with annotations is closest related to HCW’s use of QR codes. However, PaperSpace does not provide any assistance to re-find the paper version of a document once printed.

Video-based document tracking identifies paper documents on a desk and automatically links them to the corresponding electronic documents [14]. A camera is mounted above the desk to capture and track the document movements. The video is analysed using a computer vision technique for document recognition that enables every paper document on the desk to be linked to its electronic copy. In the system, the document representations can be searched using keywords or by manipulating the image of the desk. The system’s advantage is its technical simplicity: it does not involve tags or special readers. However, only one document can be placed or removed from the stack at a time. It is also assumed that every document placed on the desk is unique.

DocuDesk uses interactive desk technology to establish relationships between the digital and physical documents [5]. The DocuDesk uses an interactive desk and overhead video Infra-Red camera. In DocuDesk there are two ways of linking the document with its digital counterpart, by 2D barcode or 1D barcode. A camera above the desk records an image of the document and, using image recognition, a link with the digital counterpart is created. On placing the document on the DocuDesk, the user is given various options such as email and link. The email option sends the digital copy of the physical book, while the link option attaches additional digital media to the book. DocuDesk does not provide tracking and search functions for physical documents.

Limpid Desk is a visualization tool that allows its users to “see” the contents of a stack of documents; in particular, it allows a user to “see” contents of documents further down in the stack without the top layer needing to be removed [10, 11]. The upper layer is transparentized and users can find desired documents even if they are hidden in the document stack. The hardware used in Limpid Desk includes Projector, Camera and thermo-camera. When the user touches a document on the desk the system detects the touch (via the thermo-camera) and then the upper layer document is virtually transparentized by projection. The Limpid Desk supports physical search interaction techniques, such as ‘stack browsing’ in which the upper layer documents are transparentized one by one through to the bottom of the stack. The Limpid Desk system meets our requirement of giving simple access to physical documents. As the user can visually access a lower layer document without removing the document on the top, the limpid desk is a possible solution to the problem of finding a document in a pile.

The Fused Library uses RFID tags to link physical items with content in a digital library [2]. RFID tags are placed underneath a desk, allowing identification of the user’s location (using laptop-based RFID readers). Depending upon the user’s current location, the library catalogue will present the user with a tailored home page including a quick link to related useful sections in that location. The library catalogue will highlight the books near the user’s location. The fused library uses concepts of physical hypermedia, for which a user’s context (e.g., their location) triggers links to digital material [6]. The Fused Library is a library-based system that meets our requirements of tracking location of user and documents. However it does not keep track of printed documents as such. As offices are typically much less structural than say a traditional library, locating physical and digital object across the workplace would be challenging using the fused library approach.

6 Discussion

This section brings together the discussion of related work in light of the requirements and the HCW system, further comments on the user studies, and aspects of future work.

Related Work. Table 1 provides an overview of the main results of our related work discussion with respect to the system requirements. For comparison, the table also contains information about our HCW system (last row). As can be seen from the table, most related systems provide document tracking and digital search. However, only PaperSpace (in addition to HCW) keeps records of printed documents. Additionally, most systems require the user to employ special hardware and/or to follow some pre-defined methodology. Some hardware is required in all cases, however, PaperSpace, Video tracking and HCW use simple hardware already existing or easily installed in ordinary offices instead of custom-built gear. Tracking document locations using these low-key hardware options is harder to implement and remains quite challenging. Overall, none of the existing systems were suited to address the problems described and the requirements as identified previously. HCW addresses all five requirements and, similar to issues discussed for PaperSpace, its tracking of document locations could be improved through further research.

Table 1. Systems for re-finding physical documents

Implications of User Studies. Although 10 participants are not sufficient for statistical evaluation, they provide indicative observations. The participants with backgrounds other than computer science focused more on the overall outcome and benefit of the system (e.g., “It is very cool, [it] will be of great help to organize and search the physical documents”), while participants with IT background were more critical of the operational aspects. They seemed to find it harder to accommodate even a small system delay and were more analytical of the systems performance. Participants from other backgrounds on average took 20 min to complete the user study; the CS participants took about 30 min. The researchers had the impression that both studies were somewhat hampered by the use of the system in a one-off limited-time manner. The true benefit will only become apparent after sufficient time has elapsed so that the location of paper copies and the purpose of printouts had been forgotten. This would change the motivation for the participants, especially if they could be sure that the system functionality, the digital copies in the Digital Library and the provided information about printouts would be available in future. In this respect the system is akin to augmented memory systems that encounter similar challenges for effective evaluation. Furthermore, the aspect of building a personal library is not yet studied in any detail as similarly the benefits would be of a more long-term nature.

QR Code Quality. Similar to Sallam’s observations [17], we noted that even small delays, as caused by our QR code reading and their tag reading, are irritating to users and will not be easily accommodated through changed user behaviour. We are therefore exploring a number of ideas for improving the readability of QR codes from a distance beyond the simple (and limiting) increase in QR size. Alternative methods for marking paper print-outs for tracking to be explored are marginal markings, similar to the tags used in PaperSpace [17], in combination with QR codes.

Integration into Personal Digital Library. The HCW system would be best used not as a stand-alone digital library merely for printed documents but for tracking reading material. In [1], we introduced such a system for tracking academic reading, which currently only covers digital documents. Merging these two approaches to personal digital libraries is one of our future research goals. Similarly, a closer integration into scholarly workflows (finding, reading, annotating, writing) is desirable. We wish to improve the current user interface and explore whether a closer integration into the Digital Library interface would be beneficial. The current annotation of locations is only very rudimentary – greater flexibility seems desirable but its impact on non-technical end-users needs to be explored.

7 Summary and Conclusions

We live in a digital age though many still use paper copies of documents every day for convenience. Our research is motivated by a number of factors: lost documents with valuable annotations, time wasted searching for print copies of documents, and the wish to save trees by reducing the number of duplicate paper printouts. We aimed to find a solution that does not require knowledge workers to follow yet another well-intentioned new methodology or structure in ordering their material, nor does it necessitate the acquisition of expensive hardware. We are further interested in automatically building a personal digital library not through explicit ingest of documents but through the use of previously available information from the users’ workflow.

This paper described our HCW prototype that supports the management and re-finding of physical documents. We implemented a software prototype and explored its effectiveness in two user studies and together with an exploration of its functional qualities. Our current studies focused on testing convenience and feasibility of HCW system itself and the explicit interactions. Studies of longer term use of the system would allow an exploration of annotation types used to describe the print-outs (possibly allowing for predefined categories to speed up this step), and to test the impact of workflow patterns on the personal digital library and its use. However, already from these three studies it becomes clear that the concept of the Human-centred workplace may successfully address the issues of re-finding printed documents and help avoiding repeated re-printing.

Its better integration with a personal digital library for managing reading material opens up further applications beyond tracking documents, and would make this system a useful element in the established workflow of academics and other knowledge workers. We also identified areas for software improvement such as more effective frame rate for QR recognition, and support for reading documents at greater distance and at an angle. Future work plans are manifold, such as the exploration of methods to track the document piles, and the plans outlined in the discussion.