1 Introduction

Spreadsheet software, such as Excel, is convenient for checking simple statistics and calculations. Entries can be easily typed into corresponding cells, and by selecting a range, such things as data number, average, and total can be displayed on the status bar and confirmed immediately. Scatter charts and line graphs can also be displayed on the screen if selected from the menu. However, as this report will show, [1] can be used to handle complicated data processing with greater efficiency than spreadsheets such as Excel.

The advantage of for data processing can be illustrated with an example from a project the author participated in. The project analyzed pollutants found in a certain area since 2013 onwards. Figure 1 shows part of the data (April 1\(^\mathrm{st}\) 2015 to February 29\(^\mathrm{th}\) 2016) as displayed in an Excel spreadsheet. The names of pollutants are displayed in the first row, e.g. PM2.5, PM10, etc.

To obtain the average level of each pollutant measured in ppm over the period, the following formula

figure a

was entered in cell B337 of the Excel file and applied to all cells to the right.

Also, in order to obtain the moving average [2] of PM2.5, the following formula

Fig. 1.
figure 1

Table of pollutants

figure b

was entered in cell J8 and applied to all the other cells in the column. By applying

figure c

to the column, the average of every week could be displayed in every 7th cell.

However, it was not easy to make an average list every week. For example, by entering either

figure d

in cell J2 and copying down, the formula tended to be more error prone and difficult to correct. Furthermore, calculation of the monthly average was made more difficult because the number of days of each month varied. In the case of Excel, a macro can be used but with difficulty.

To avoid the above difficulties encountered while attempting to use Excel, the author turned to Cinderella in order to complete the task more efficiently. Cinderella is a type of DGS (dynamic geometry software) developed by J. R-Gebert and U. Kortenkamp [3]. Unlike other DGS, it incorporates CindyScript, an easy-to-use programming language. CindyScript handles data in addition to numerical values and character strings called lists. A vector (sequence data), for example, can be expressed as vec=[1,3,5]. In addition, a matrix (table data) can be expressed as a double nested list. For example, mat=[[1,3,5],[2,4,6]] is a matrix of 2 rows and 3 columns. Also we can use functions of CindyScript to handle lists as explained in [4]. The following is some samples of functions in CindyScript which are subsequently used to define new functions in this paper.

figure e

Here # represents the running variable, successively taking the value of all elements in the list.

While was developed as a Cinderella plug-in to generate LaTeX source code for high quality mathematical artwork [5], it can perform several data processing functions as well. Data processing with is in fact quite easy and very flexible as the next sections show.

2 Data Handling with

2.1 Data Input, Output and Display

The following is an example of how can be used to handle data. First, converts the csv file (file.csv in this case) as shown in Fig. 2

Fig. 2.
figure 2

View in excel

into the data named dt by the command

figure f

if file.csv exists in a working directory. Also this command takes dt from the directory and arranges it into a matrix. Inputting and executing the command

figure g

on the edit area (the area on the top right of Fig. 3), the resulting matrix is displayed on the console (lower right of Fig. 3).

Fig. 3.
figure 3

Readcsv and println

Substituting Dispmat(dt) for println(dt), the matrix is displayed as tab-separated strings on the console, as shown in Fig. 4.

Fig. 4.
figure 4

Readcsv and Dispmat

Conversely, tab-separated strings can be translated into a matrix using the command, Tab2list. For example, Fig. 5 shows the data copied from the Excel window in Fig. 2 onto a CindyScript screen.

Fig. 5.
figure 5

Tab2list (edit area)

Fig. 6.
figure 6

Tab2list (console)

Here the data, in double quotes following dt, is converted into a list using the command Tab2list. Figure 6 shows the matrix which is made from dt and displayed on the console.

When the amount of data is large, it is necessary to scroll down the screen and look for the start and end of data to select and copy the range, which might cause mistakes. On the other hand, the command Readcsv is useful because it automatically selects the data and copies it into the edit area.

2.2 Data Processing with Commands

While it is necessary to scroll down the window to select a wide data range in Excel, the commands Nrow and Ncol added by the author enable us to select it automatically in . Also, the translation of row data into column data is accomplished simply by using the CindyScript command transpose. Moreover, we can easily keep or delete some part of a large table simply by using commands prepared to store and substitute variables. Time consuming cell-by-cell work is needed to accomplish these tasks in Excel. Thus, much labor can be saved by adding commands to .

The author will introduce the commands added to using file.csv as an example. Commands beginning with lower case letters were of Cindyscript origin. Those newly defined by the author in begin with capital letters like other commands in . The commands added to to handle matrices are as follows

figure h

For example, let dt be the data of file.csv obtained using the command

figure i

We can easily delete the first row of dt with the command

figure j

Also we can easily find the number of rows and columns in dt1 using the following commands (left side of Fig. 7) whose result is on the right of Fig. 7.

figure k
Fig. 7.
figure 7

Remove 1st Row and Return Nrow, Ncol

Similarly, the results obtained by the following commands

figure l

can be seen on the right of Fig. 8.

Fig. 8.
figure 8

Remove 1st and 3rd Columns, and Return Nrow, Ncol

Also the command, Removemat was added by the author. The combination of RemoveR and RemoveC returns the same results as Removemat. The following commands

figure m

create the matrix dt4, [[1,3],[4,6]], and the next command

figure n

creates a matrix dt5 identical to dt4. Furthermore, the command Submat was added which leaves the specified part of the matrix.

Important commands that draw and label graphs are defined as follows

figure o

Here the commands Linegraph, Bargraph, and Putlabel have many arguments as explained below.

For example, let dt be the list [21,"",22,23,"","",24,25,26,"",27,28,""].

Then, the following command

figure p

produces dt1 which consists of four lists

figure q

Furthermore, let ma be the matrix made by the following command

figure r

which consists of [101,1], [102,””], [103,3], [104,4], [105,””], [106,””], [107,7], [108,8], [109,9], [110,””], [111,11], [112,12], [113,””]. Then, ma1 made by

figure s

consists of four lists

figure t

Based on these commands, the commands Linegraph, Bargraph and Putlabel were defined to generate line graphs and bar graphs from data like dt1 and create graph labels from data like mat1. Both Linegraph and Bargraph take the following arguments: “name”, “hscale”, “vscale” and, “valuelist” which has the structure of dt1. Linegraph connects points that are continuously measured, while points with gaps are not connected. The argument “hscale” gives the horizontal scale of the graph, while “vscale” gives the vertical scale. Putlabel takes the following arguments: “gap”, “scale”, “depth”, and “labellist” which has the structure of mat1 whose element is a list consisting of pairs of natural numbers and labels (strings or real numbers). If there are too many labels, labels can be skipped by specifying gaps between the labels. Finally, “hscale” gives the horizontal scale of labels for gap=1, and “depth” adjusts the height of the labels with respect to the horizontal axis.

2.3 Other Functions

In addition to the before-mentioned commands, other functions are defined as follows

figure u

Although CindyScript does not have a function to obtain averages, they can be calculated using the following formula

figure v

However, by this calculation method, the expected value cannot be obtained in the case of a list that includes blanks and strings. As in Excel, the author’s function Average gives an average value by ignoring blanks and strings.

In this way, labor intensive work in Excel can be replaced by the addition of efficient and easy-to-use commands.

3 Practical Example

For the data on air pollution introduced earlier, examples of data processing are shown. The data “PM20160229.csv” is given in csv format and read by the following command

figure w

to give dtorg. The only data needed in that section is the date and part of the measured value for PM2.5. By the following command

figure x

the author made a submatrix dtall from the second to the last row of the data, the first and second columns of dtorg, using the command Submat.

3.1 Transition of Daily Values and of Moving Average per Week

The graph in Fig. 9 shows the daily change in levels of PM2.5 as measured in ppm and the change in the moving average over the week.

Fig. 9.
figure 9

Average and moving average

The graph was made using the following commands

figure y

Here the commands ApplymatR, ApplymatC were added to modify the matrix to behave like apply. Without ApplymatC, the command

figure z

would be more complicated.

For the date expressed as a decimal number Sprintf was used, so that April 10 is displayed as “4.10” not “4.1”. Putlabel was used to add labels. And to prevent overlap, gaps between labels were specified by skipping 14 and rotating the labels by 90\(^{\circ }\) with the option "rot".

3.2 Monthly Averages

To see the change in average levels of PM2.5 month by month, there was the difficulty of identifying the months consisting of different numbers of days.

To draw the graph (Fig. 10), the prepared data, dtall, was processed in the following way

Fig. 10.
figure 10

Averages of each month

figure aa

Though the list, monlist, is not in the natural order, it caused no problem since forall in CindyScript runs a temporary variable in the order listed.

4 Conclusions and Future Work

The author found that using to process data by commands according to complicated specifications was superior to using spreadsheet software such as Excel. Also, documents, including accurate graphs, could be created more efficiently using .

Furthermore, there is no need to buy expensive software to improve the functionality of spreadsheet software, which is already expensive enough, because all related software is freely accessible, making especially valuable for education.

Finally, there is room to improve the ease of use, processing speed and modifiability of commands and functions introduced. In addition, the author would like to integrate similar commands that are still confusing.