Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

1.1 Early History

The initial challenge came from one of the authors of the present paper (J. Si.), who needed a mobile application for Android, capable of storing stratigraphic data collected during excavations, consisting of textual descriptions of archaeological contexts, and chronological earlier-than and later-than relations between them. The data was intended to be transferred for further analysis to a standard, Windows-based application (running under Wine on a Linux machine). So from the very beginning the application was intended to be used in an environment with at least two (or perhaps even three) operating systems involved. This is nothing particular. Nowadays users routinely switch between devices, operating systems, and between online and offline mode of work.

We decided to implement the archaeological application, later named Strati5, as a multi-platform spreadsheet. The application was described from the user’s point of view in [10]. We also published a short, nontechnical paper [9], advocating the general idea of rapid development of mobile multiplatform applications in the form of spreadsheets with Strati5 as a working example.

1.2 Why a Spreadsheet?

In the paper [11] the other two of us (J. Sr. and J. Ty., with other co-authors) made a claim, that spreadsheet formulas in fact constitute a platform-independent programming language, even though there is no common formal standard in this respect. Applications written in this language run on virtual machines, which are spreadsheet management systems, like Microsoft Excel (available for Windows desktop and phone, Mac, Android and iOS), Apache OpenOffice and LibreOffice calc (available for Windows, Mac and Linux), WPS Office spreadsheet (available for Windows desktop, Linux, iOS and Android), and many other.Footnote 1

When we had a chance to verify our belief in practice, we did so, treating the archaeological application as a test, if spreadsheet technology can indeed serve for multi-platform programming.

Next, using spreadsheet as an application saved us a lot of implementation work. Programming the user interface is typically one of the most laborious parts of each project. In our case, a vast majority of the user interface is always provided directly by the spreadsheet system used. It is responsible for all functionalities related to data navigation, typing, editing, undo, redo, file opening and saving, etc. It also adapts the application to different screen sizes and resolutions, mouse or touch as a pointing device, etc. Finally, it seems almost impossible to find another technology which would make the very same code run on Android, iOS, Windows, Linux and MacOS.

Having one application for all systems, we did not have to implement any protocols to facilitate data transfer between different applications. Copying data between spreadsheets is generally considered to be error-prone, so sharing the same spreadsheet between devices and systems prevents many potential errors and risks.

Last but not least, spreadsheet technology is very conservative and backward compatibility has always been a major concern. Therefore, we expect our application to remain fully functional for many years without any need of modifications, and even to get ported to new operating systems, should they appear on the market—most likely without a single keystroke on our part.

1.3 Development History

After testing a few systems, WPS Office for Android was chosen as the optimal one to start with and within a few days the first working version of Strati5 for the tablet was available.

Tests took place during regular archaeological excavations in Ostrowite (Pomerania, Poland) led by J. Si. A number of improvements resulted in a tool, running on Android tablet (for collecting data in the field) and on Linux laptop at the base camp (for exporting the data to an external application). The whole spreadsheet was routinely used and transferred between the devices, causing no problems during many months of excavations. Strati5 proved to be reasonably comfortable and intuitive in everyday operation.

Then we decided to offer the application to other users, making it as independent of the operating system as possible. Achieving this goal required a number of changes and appeared to be an interesting programming experience.

In the present paper we discuss the technical issues of programming the interaction with the user in a spreadsheet intended to be transferred between many platforms. We hope that this knowledge and developed know-how will be useful in other cases. Requests for help in porting existing spreadsheets into mobile environments already show up at MrExcel.com, a very active spreadsheet-related forum, as witnessed by recent posts [1, 3, 7, 8]. We expect that this demand will grow. There have been a few papers which deal with the usability of mobile spreadsheets [2, 4, 5], but they all discuss spreadsheet management systems rather than individual spreadsheets, so we are probably pioneers in this respect.

As a by-product of our technical developments, in Sect. 3.2 we describe a simple design pattern, which can be used to control location of cyclic references in all types of Excel: desktop, mobile and online.

2 The Overall Structure of the Interface – Strati5

In this section we describe the spatial organization of the interface and spreadsheet functionalities necessary to implement it. Our working example is Strati5.

2.1 Fundamental Requirements

The requirement was to develop a mobile application, capable of storing stratigraphic data, consisting of archaeological contexts with textual descriptions, a set of chronological earlier-than and later-than relations between the contexts, which are edges of a directed acyclic graph of interest to archaeologists, called Harris matrix. Later we added groups, i.e., named sets of contexts, to the data model. We identified the following main requirements:

  1. A.

    Entry and storage of the data, with estimated growth rate of at most 20 records a day and average of 200 records per month.

  2. B.

    Warn about/prevent duplicated context id.

  3. C.

    Warn about/prevent cycles in the relations between contexts.

  4. D.

    Warn about/prevent using undefined contexts in the relations.

  5. E.

    Warn about/prevent duplicate group id.

  6. F.

    Warn about/prevent assigning contexts to undefined groups.

  7. G.

    Operating on a mobile devices and laptops, with strong preference toward network-independent operation.

  8. H.

    Automatic or semi-automatic data export in a format accepted by Stratify [6], a popular free desktop application for maintaining and processing stratigraphic data.

  9. I.

    Low resource consumption, to enable smooth operation on a tablet and prevent draining the battery, assuming load from item A above.

As it can be recognized, most of the above items were related to data validation. While analyzing the methods to implement them, we came up with the conclusion, that it was impossible to support all popular spreadsheet systems.

Instead, we decided to work toward a more realistic goal of supporting sufficiently many spreadsheet systems to offer our tool on all major operating systems, and to cover a few most important spreadsheet systems: Microsoft Excel for Windows desktop and Mac, LibreOffice and Apache OpenOffice for Windows, Linux and Mac, and Google sheets.

Fig. 1.
figure 1

Schematic structure of the user interface. The gray top row(s) and leftmost column(s) are frozen panes.

2.2 Structure of the Interface

After some consideration, we decided that the key services we needed from the spreadsheet systems were:

  • Reporting emergence of cyclic references (required for reporting cyclic relations between contexts);

  • Support for array formulas;

  • Support for “freeze panes” (i.e., making the top row(s) and/or the leftmost column(s) always visible on the screen);

  • Support for data validation by a list of allowed values;

  • Support for conditional formatting.

They were needed for a very schematic idea of the interface, shown on Fig. 1. It was based on the freeze panes function applied to a few rows and columns. Referring to the colors on the figure, the roles of the particular areas are as follows:

  • The dark gray area in the top left corner is always present on the screen. It can display messages about spreadsheet’s global state, by formulas producing texts of messages and conditional formatting indicating their presence.

  • The medium gray area on the top edge consists of cells which are visible whenever any fragment of their column is visible. They can be used to display column-related messages.

  • The medium gray area on the left edge consists of cells which are visible whenever any fragment of their row is visible. They can be used to display row-related messages.

  • The light gray area are individual cells and the functions used in them are most likely data validations and perhaps conditional formatting.

Eventually, the following operating systems and spreadsheet systems passed tests of compliance with the above requirements:Footnote 2

  • Windows Excel, LibreOffice, OpenOffice, WPS Office

  • MacOS X Excel, LibreOffice, OpenOffice, WPS Office

  • Linux LibreOffice, OpenOffice, WPS Office

  • Android WPS Office

  • iOS WPS Office

  • Windows mobile MS Office

  • All systems Google sheets.

Fig. 2.
figure 2

The structure of the data entry worksheet of Strati5 (top). The colors follow Fig. 1. There are no global state messages. The cells in the white areas undergo no validation. Below Strati5 in Microsoft Excel under Windows 7 desktop (middle) and in WPS Office under Android on a smartphone (bottom).

3 Interaction with the User

In this section we point out the main issues we encountered while designing and programming the way Strati5 interacts with the user.

First of all, we separated the spreadsheet into a number of worksheets. The bulk of user interaction is performed on worksheet intended for data entry,Footnote 3 whose structure is presented in Fig. 2 below and follows the schema from Fig. 1. The headers of rows do not produce any messages, but contain the id of the archaeological context being described in the row. Column headers are used to display column-related messages. The descriptions of the contexts area contains no validations, as it is intended for textual descriptions. The corresponding column headers are merely names of columns.

The requirements presented above determine, that the dominating activity was to either block entries violating data integrity, or at least to warn the user about them. The formulas of Strati5 we present below come from a variant with 200 contexts, at most 500 contexts and relations together, at most 12 relations per context and at most 20 groups.

Requirements D and F can be very efficiently programmed using data validation of type “List”, where the user has to choose the data element from a list declared in the data validation form. This form of data validation is widely supported by spreadsheet systems. On a majority of systems, the spreadsheet displays a drop-down list of permitted values, from which the user can choose the right one. It is particularly convenient on mobile devices, where typing is much less comfortable and produces more typos than on machines with hardware keyboards. Such lists are used in the ids of contexts earlier than the present one and group id areas. The drop-down list can be potentially quite long, but we did not decide to do anything about that.

Concerning requirements B and E, the only method to block a new data entry in spreadsheets is to use data validation tools. Initially we intended to use them in the context ids area. It is possible and not too difficult to write a custom formula in the Excel data validation form, which will permit entering only a new context id in a cell. However, data validation in Apache OpenOffice and LibreOffice does not permit custom formulas. Worse still, upon reading an Excel-created file with such a data validation, it is corrupted and produces a faulty data validation, which permits duplicates, but blocks completely some entries. Consequently, we submitted bug reports and had to choose another method to satisfy our need.

Being unable to block duplicates, we decided to indicate them instead, leaving corrections to the user. There are two tools, which can be used for this purpose: conditional formatting and formulas. The former can be used to mark duplicates. However, Google sheets does not support marking duplicates, so the only way was to enter a custom formula there. After the unpleasant experience with formula-based data validation, we decided to use the header of context id column cell to insert a single array formula, verifying if there were any duplicates on the list below. It was written to yield a warning message if there are duplicates, and the usual column header otherwise:

figure a

A single simple conditional formatting rule applied to this cell turns it red when the warning is displayed, to increase its visibility. This form is sufficient, because in archaeological practice, context ids are created, stored and almost never changed. Consequently, if a new entry is a duplicate, it is clear that this entry must be changed, not the other. Very recently an update of LibreOffice introduced a faulty mechanism of array formula computation, which in our case produces a constant duplicate warning. Therefore we decided to replace the array formula by a formula-based conditional formatting in column A.

3.1 Reporting Emergence of Cyclic References

This was probably the source of the largest problem we had in the development of Strati5. Our implementation of breadth-first-search (BFS) graph traversal (presented for completeness in Appendix A) produces a cyclic reference between spreadsheet cells as a manifestation of a cycle in the “earlier than” relation created by the user. Therefore, we needed to notify the user of this event. Unfortunately, the way spreadsheet management systems react on cyclic references is very diverse. We have noted the following basic groups:

pop-up Displaying a pop-up window with an appropriate message. This form is exhibited by Microsoft Excel for Windows desktop and Mac OS, WPS Office for Windows and for Android. Additionally, the cells which are lying on the cycle, as well as those which depend on them, are not recomputed.

error Evaluating the cells which are lying on the cycle to an error value. This form is exhibited by OpenOffice, LibreOffice and Google sheets.

stop The cells on the cycle, as well as those which depend on them, are not recomputed. This form is exhibited by Microsoft Excel for Windows Mobile, Microsoft Excel Online, WPS Office for iOS.

Systems exhibiting pop-up message notify the user themselves.

For error group we used the top row to display the message. The common header of the columns where the user was supposed to enter the ids of later contexts were a single cell merged from several individual ones, whose text was constructed by a formula of the form =IF(ISERROR(SUM(’Cycle test’!I2:I500)),"Cycle detected!", "Earlier than"). The column I on the worksheet Cycle test is the place where cyclic references, and consequently, error values, emerge (see Appendix A). SUM function evaluates to an error if there are any errors in the summation area, and to a number otherwise. The whole formula thus produces the message, whose visibility is increased by conditional formatting. This gave us a common solution for pop-up and error systems.

Supporting stop was very important for us, because all available spreadsheet management systems for iOS and Windows Phone were of this type.

We divided the header into two cells. The first of them contains the formula

=IF(ISERROR(SUM(’Cycle test’!I2:I500)),"Cycle detected!",

"Earlier than relations"&SUM(’Cycle test’!H2:H500)),

while the other cell contains

="Earlier than relations"&SUM(’Cycle test’!H2:H500)).

Then conditional formatting is applied to both cells, turning them red if their values differ.

The cells in the range ’Cycle test’!H2:H500 contain numbers related to each tuple in the earlier-than relation such that any single change of the relations causes a change of their sum. ’Cycle test’!I2:I500 contains distances of the contexts from the sterile layer and is, as before, the place where cyclic references emerge.

In systems with pop-up and error responses the new formulas work very much as before. In systems with stop response, the two results are obviously equal if there are no cyclic references. However, if the user adds to, or modifies a tuple in the relation creating a cyclic reference, the system stops the evaluation of some cells in the range ’Cycle test’!I2:I500. One of the header formulas depends on this range, the other does not. The latter is computed normally and its value changes. The former is not recomputed due to its dependence on unevaluated cells. Crucially, after the recomputation is finished, conditional formatting is applied to all cells, irrespectively of their evaluation status and the two cells become red and issue a visual warning to the user, although no text message is produced.

Fig. 3.
figure 3

Strati5 in WPS Office under iOS (top) and in Excel Online (bottom), showing a warning about cyclic references. They are stop systems, so the indication is the red color of the header. The bottom screenshot shows the the second cell of the header, with a different value in it, triggering the conditional formatting.

3.2 By-Product: A Cycle Indicator

A by-product of the above mechanism is our construction of a universal detector of cyclic references in Excel desktop and online.

Excel desktop reports the emergence of cyclic references by a pop-up window and indicates one, more-or-less randomly selected cycle, among all that are created. This gives access to one cycle at a time. If such references are not enabled, but already present somewhere in the workbook, there will be no warning or indication of the subsequently created ones. This is especially likely if formulas with OFFSET or INDIRECT are used and edited, but INDEX function used in the reference form can cause the same effect.

Excel online does not report emergence of cyclic references in any way and have no error checking tool.

The design pattern described below allows the user to set up cycle warnings for any number of cells. All of them are activated simultaneously and visually indicate ones which lie on cycles or depend on such cells. The form presented here works for cells which do not evaluate to error values, but this can be overcome with minor modifications.

As a matter of example, let us assume that the range A1:A10 are the cells suspect of becoming elements of cycles we want to monitor.

We set up a single reference cell, let it be

figure b

The monitors are installed by formulas (we assume that the formula is entered into the top cell of the range and copied down, with automatic modifications introduced by the spreadsheet)

figure c

and conditional formatting rules applied to cells in B1:B10, so that they change the formatting of cell Bi iff its value is smaller than the value of the reference C1.

Let us assume the user makes an edit, which might cause some of the cells to become members of cycles themselves or to depend on cells which are now on cycles. Then the following events happen:

  • Some of the cells in A1:A10 are attempted to be recomputed. Ones which are now on cycles are not recomputed and the recomputation of their dependents, including those in B1:B10, is also blockedFootnote 4.

  • The formula in C1 is volatile, hence it is recomputed and its value increases.

  • Formulas in B1:B10 depend on cell C1 whose value has changed, hence their recomputation is attempted. It succeeds for Bi iff Ai is not on a cycle and does not depend on a cycle, otherwise it is blocked and the old timestamp is retained.

  • Conditional formatting is applied to all cells in B1:B10. Those which have been recomputed are equal to C1 and are not formatted; those which have not been recomputed contain a timestamp older than the present value of C1 and are therefore formatted.

A small variation allows a single monitor to be applied to several cells, e.g.,

figure d

collectively monitors the whole range A1:A10.

This design pattern incurs only low computational overhead and does not require modifying the monitored spreadsheet computation. In particular, it can be easily removed, when it is no longer needed.

3.3 Data Export

Requirement H was to provide data export from Strati5 to Stratify [6]. This tool can read csv files of a specific structure, defined by the number of columns, their data types and headers. The solution we implemented was to produce such a csv file by concatenating values of certain cells, in a worksheet intended for data export. The user is supposed to either copy and paste its content into a text editor, or save it directly from the spreadsheet system as a csv file. Subsequently Stratify can import such a file.

3.4 “Soft” Methods to Reduce Resource Consumption

We used two “soft” tricks to reduce the resource consumption of Strati5, both related to the interaction with the user. This was done to satisfy requirement I.

The first one is that the user, while describing a context, is allowed to specify only contexts which are later than the present one, while relations in both directions make perfect sense in archaeology, and Stratify permits them to be specified. Specifying that a context c is earlier than the presently edited context d can still be done: by going to the row with context c and entering there d as a later context. This way we avoided expensive sorting by spreadsheet formulas to group the tuples of the earlier-than relation by the first coordinate, which is crucial for our implementation of BFS.

The next trick is that we introduced two predefined contexts: the chronologically oldest context “sterile layer” and the top context “surface”, which are crucially not processed in the acyclicity test (but are processed in the duplicate-freeness test). The typical structure of many archaeological sites causes these two contexts to be present in a very large fraction of the earlier-than tuples, while they have no impact on the cyclicity of the relation. By eliminating them, we get a significant reduction of the number of tuples Strati5 must process.

We expect that in many other contexts, domain-specific knowledge about data to be processed can help devising analogous layout solutions to reduce the computational cost of the spreadsheet application.

4 Scalability Problem

This is a problem we did not find any good solution for. Spreadsheets come always with certain fixed number of rows and columns of formulas, and thus are capable of processing a predefined maximal number of data items. Too small a spreadsheet is therefore bad, too large one slows down the applications and drains the battery—the opposite of requirement I. The designer has two basic methods to overcome this problem: either to produce a couple of spreadsheets of different sizes and let the user transfer the data between them when necessary (which is error prone), or to assume that the user will modify the spreadsheet, adding or removing rows of formulas. Preparing the spreadsheet for the latter action is not trivial and requires a good deal of additional design work. It also interferes with the good practice of hiding or locking those portions of the spreadsheet which are not supposed to be edited by the end user.

5 Availability

Strati5 is available from http://bit.ly/Strati5, and is an open-source software with a BSD license. Let us note here that, as far as we know, there is no technology to close the code of a multiplatform spreadsheet application, so multiplatform spreadsheets are open source by necessity.

6 Standardization Issues

We were asked by the anonymous reviewers to discuss the issue of standardization between spreadsheet management systems.

The following list indicates the main difficulties encountered while developing our multiplatform spreadsheet:

  1. a.

    Limited mutual compatibility of spreadsheet systems.

  2. b.

    Highly insufficient documentation.

  3. c.

    Technical and legal problems with using SDK tools for mobile systems.

It is clear that the more functionalities are compatible between spreadsheet systems, the easier is to program multiplatform spreadsheets. Issues a and b are two sides of the same coin. Definitely, they are real: during the whole development process of Strati5 we had problems with mutual compatibility between spreadsheet management systems and with their (lack of) documentation. Therefore we had to rely on experiments choosing solutions in many cases, and we could not use SDK tools for that.

A lot of work was necessary to find a workaround (not a solution!) of the problem that cyclic references, a fundamental property of a spreadsheet, are reported in so many different ways. Even spreadsheet systems coming from the same vendor differ in this respect: Microsoft Excel for Windows (pop-up) differs from its Online and Windows mobile versions (stop), WPS Office for Android and Windows (pop-up) differ from WPS Office for iOS (stop).

Cross-vendor problems concern other fundamental issues: LibreOffice and OpenOffice do not permit formula-based data validations. Even the syntax and behavior of formulas tends to differ from vendor to vendor like, e.g., error encoding and handling between various Excel variants and LibreOffice on the one hand and OpenOffice on the other hand.

The methods to create references to external workbooks are not transferable between spreadsheet systems, in particular if one wants to use INDIRECT or some other programmable, foolproof mechanism.

Indeed, as commented during the workshop, the relations between different spreadsheet systems and their vendors seem to resemble those between Web browsers and their vendors during the browser wars, with similar consequences for the users and programmers.

A significant problem is related to the performance of the whole systems and of their particular functions. Working with hardware of potentially low performance, the programmer must know how spreadsheet systems implement specific functions.Footnote 5 E.g., in desktop Excel =MATCH(val,rng,0) is of linear time complexity, while =MATCH(val,rng,1) is logarithmic, assuming that rng is sorted. It would be desired to have guarantees that these complexities carry over to other spreadsheet systems. In desktop Excel =COUNTIFS(rng1,val1,rng2,val2) is much faster than =SUMPRODUCT((rng1=val1)*(rng2=val2)). Again, one would like to know that the same relation holds in other systems. If many formulas are being evaluated, the ability of the spreadsheet systems under consideration to perform multi-threaded computations becomes an important factor, too.

We can also list two particular functionalities which would be very useful in programming multiplatform spreadsheets.

Widely adopted and extended INFO function to determine the identity of the spreadsheet system. That would help in programming spreadsheets in a clean way, with clearly indicated portions of the code to be executed on particular spreadsheet systems. The present functionality of INFO is insufficient: it returns the version number of the system, but not its identity. E.g., =INFO("RELEASE") returns the same value 11 in the latest WPS Office on Windows and on Android, even though both spreadsheet systems differ in behavior and functionality, and in quite different and much older Excel 2003. The situation is even worse with Excel, whose online and Android versions do not support this function at all.

Row-wide and column-wide conditional formatting – in particular, setting conditionally the width of columns and height of rows. This way one could set a column to be a (part of a) frozen pane and be always present on the screen, fill it with formulas computing warning messages, and set its width conditionally to be 0 if it does not contain any warnings. Effectively, the column would then play the role of a pop-up window produced without any scripts or macros.

Built-in array function sorting its input (like the one present in Google sheets) in \(O(n\log n)\) time would be a great tool to reduce resource consumption by many algorithms implemented by spreadsheet formulas. The typical sorting algorithms built from formulas are quadratic. There is a spreadsheet sorting algorithm of \(O(n\log ^2n)\) time complexity, but it requires \(O(n\log n)\) formulas and is quite complex [11]. Reducing resource consumption is particularly important on mobile systems, due to their relatively low performance and dependence on battery.