Abstract
is helpful not only for making documents needing mathematically precise artwork but also for performing data processing efficiently. Furthermore, graphs obtained using can be used in documents without conversion.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
1 Introduction
Spreadsheet software, such as Excel, is convenient for checking simple statistics and calculations. Entries can be easily typed into corresponding cells, and by selecting a range, such things as data number, average, and total can be displayed on the status bar and confirmed immediately. Scatter charts and line graphs can also be displayed on the screen if selected from the menu. However, as this report will show, [1] can be used to handle complicated data processing with greater efficiency than spreadsheets such as Excel.
The advantage of for data processing can be illustrated with an example from a project the author participated in. The project analyzed pollutants found in a certain area since 2013 onwards. Figure 1 shows part of the data (April 1\(^\mathrm{st}\) 2015 to February 29\(^\mathrm{th}\) 2016) as displayed in an Excel spreadsheet. The names of pollutants are displayed in the first row, e.g. PM2.5, PM10, etc.
To obtain the average level of each pollutant measured in ppm over the period, the following formula
was entered in cell B337 of the Excel file and applied to all cells to the right.
Also, in order to obtain the moving average [2] of PM2.5, the following formula
was entered in cell J8 and applied to all the other cells in the column. By applying
to the column, the average of every week could be displayed in every 7th cell.
However, it was not easy to make an average list every week. For example, by entering either
in cell J2 and copying down, the formula tended to be more error prone and difficult to correct. Furthermore, calculation of the monthly average was made more difficult because the number of days of each month varied. In the case of Excel, a macro can be used but with difficulty.
To avoid the above difficulties encountered while attempting to use Excel, the author turned to Cinderella in order to complete the task more efficiently. Cinderella is a type of DGS (dynamic geometry software) developed by J. R-Gebert and U. Kortenkamp [3]. Unlike other DGS, it incorporates CindyScript, an easy-to-use programming language. CindyScript handles data in addition to numerical values and character strings called lists. A vector (sequence data), for example, can be expressed as vec=[1,3,5]. In addition, a matrix (table data) can be expressed as a double nested list. For example, mat=[[1,3,5],[2,4,6]] is a matrix of 2 rows and 3 columns. Also we can use functions of CindyScript to handle lists as explained in [4]. The following is some samples of functions in CindyScript which are subsequently used to define new functions in this paper.
Here # represents the running variable, successively taking the value of all elements in the list.
While was developed as a Cinderella plug-in to generate LaTeX source code for high quality mathematical artwork [5], it can perform several data processing functions as well. Data processing with is in fact quite easy and very flexible as the next sections show.
2 Data Handling with
2.1 Data Input, Output and Display
The following is an example of how can be used to handle data. First, converts the csv file (file.csv in this case) as shown in Fig. 2
into the data named dt by the command
if file.csv exists in a working directory. Also this command takes dt from the directory and arranges it into a matrix. Inputting and executing the command
on the edit area (the area on the top right of Fig. 3), the resulting matrix is displayed on the console (lower right of Fig. 3).
Substituting Dispmat(dt) for println(dt), the matrix is displayed as tab-separated strings on the console, as shown in Fig. 4.
Conversely, tab-separated strings can be translated into a matrix using the command, Tab2list. For example, Fig. 5 shows the data copied from the Excel window in Fig. 2 onto a CindyScript screen.
Here the data, in double quotes following dt, is converted into a list using the command Tab2list. Figure 6 shows the matrix which is made from dt and displayed on the console.
When the amount of data is large, it is necessary to scroll down the screen and look for the start and end of data to select and copy the range, which might cause mistakes. On the other hand, the command Readcsv is useful because it automatically selects the data and copies it into the edit area.
2.2 Data Processing with Commands
While it is necessary to scroll down the window to select a wide data range in Excel, the commands Nrow and Ncol added by the author enable us to select it automatically in . Also, the translation of row data into column data is accomplished simply by using the CindyScript command transpose. Moreover, we can easily keep or delete some part of a large table simply by using commands prepared to store and substitute variables. Time consuming cell-by-cell work is needed to accomplish these tasks in Excel. Thus, much labor can be saved by adding commands to .
The author will introduce the commands added to using file.csv as an example. Commands beginning with lower case letters were of Cindyscript origin. Those newly defined by the author in begin with capital letters like other commands in . The commands added to to handle matrices are as follows
For example, let dt be the data of file.csv obtained using the command
We can easily delete the first row of dt with the command
Also we can easily find the number of rows and columns in dt1 using the following commands (left side of Fig. 7) whose result is on the right of Fig. 7.
Similarly, the results obtained by the following commands
can be seen on the right of Fig. 8.
Also the command, Removemat was added by the author. The combination of RemoveR and RemoveC returns the same results as Removemat. The following commands
create the matrix dt4, [[1,3],[4,6]], and the next command
creates a matrix dt5 identical to dt4. Furthermore, the command Submat was added which leaves the specified part of the matrix.
Important commands that draw and label graphs are defined as follows
Here the commands Linegraph, Bargraph, and Putlabel have many arguments as explained below.
For example, let dt be the list [21,"",22,23,"","",24,25,26,"",27,28,""].
Then, the following command
produces dt1 which consists of four lists
Furthermore, let ma be the matrix made by the following command
which consists of [101,1], [102,””], [103,3], [104,4], [105,””], [106,””], [107,7], [108,8], [109,9], [110,””], [111,11], [112,12], [113,””]. Then, ma1 made by
consists of four lists
Based on these commands, the commands Linegraph, Bargraph and Putlabel were defined to generate line graphs and bar graphs from data like dt1 and create graph labels from data like mat1. Both Linegraph and Bargraph take the following arguments: “name”, “hscale”, “vscale” and, “valuelist” which has the structure of dt1. Linegraph connects points that are continuously measured, while points with gaps are not connected. The argument “hscale” gives the horizontal scale of the graph, while “vscale” gives the vertical scale. Putlabel takes the following arguments: “gap”, “scale”, “depth”, and “labellist” which has the structure of mat1 whose element is a list consisting of pairs of natural numbers and labels (strings or real numbers). If there are too many labels, labels can be skipped by specifying gaps between the labels. Finally, “hscale” gives the horizontal scale of labels for gap=1, and “depth” adjusts the height of the labels with respect to the horizontal axis.
2.3 Other Functions
In addition to the before-mentioned commands, other functions are defined as follows
Although CindyScript does not have a function to obtain averages, they can be calculated using the following formula
However, by this calculation method, the expected value cannot be obtained in the case of a list that includes blanks and strings. As in Excel, the author’s function Average gives an average value by ignoring blanks and strings.
In this way, labor intensive work in Excel can be replaced by the addition of efficient and easy-to-use commands.
3 Practical Example
For the data on air pollution introduced earlier, examples of data processing are shown. The data “PM20160229.csv” is given in csv format and read by the following command
to give dtorg. The only data needed in that section is the date and part of the measured value for PM2.5. By the following command
the author made a submatrix dtall from the second to the last row of the data, the first and second columns of dtorg, using the command Submat.
3.1 Transition of Daily Values and of Moving Average per Week
The graph in Fig. 9 shows the daily change in levels of PM2.5 as measured in ppm and the change in the moving average over the week.
The graph was made using the following commands
Here the commands ApplymatR, ApplymatC were added to modify the matrix to behave like apply. Without ApplymatC, the command
would be more complicated.
For the date expressed as a decimal number Sprintf was used, so that April 10 is displayed as “4.10” not “4.1”. Putlabel was used to add labels. And to prevent overlap, gaps between labels were specified by skipping 14 and rotating the labels by 90\(^{\circ }\) with the option "rot".
3.2 Monthly Averages
To see the change in average levels of PM2.5 month by month, there was the difficulty of identifying the months consisting of different numbers of days.
To draw the graph (Fig. 10), the prepared data, dtall, was processed in the following way
Though the list, monlist, is not in the natural order, it caused no problem since forall in CindyScript runs a temporary variable in the order listed.
4 Conclusions and Future Work
The author found that using to process data by commands according to complicated specifications was superior to using spreadsheet software such as Excel. Also, documents, including accurate graphs, could be created more efficiently using .
Furthermore, there is no need to buy expensive software to improve the functionality of spreadsheet software, which is already expensive enough, because all related software is freely accessible, making especially valuable for education.
Finally, there is room to improve the ease of use, processing speed and modifiability of commands and functions introduced. In addition, the author would like to integrate similar commands that are still confusing.
References
Takato, S.: What is and how to use KeTCindy – linkage between dynamic geometry software and KeTCindy graphics capabilities. In: Greuel, G.-M., Koch, T., Paule, P., Sommese, A. (eds.) ICMS 2016. LNCS, vol. 9725, pp. 371–379. Springer, Cham (2016). doi:10.1007/978-3-319-42432-3_46
Moving Average. https://en.wikipedia.org/wiki/Moving_average
CinderellaJapan. https://sites.google.com/site/cinderellajapan/
Cinderella.2 Documentation. https://doc.cinderella.de/tiki-index.php
KETpic.com. http://ketpic.com/
Acknowledgments
I am deeply grateful to Professor Setsuo Takato of Toho University for supporting and advising me in this research. I am also grateful to Mr. Akira Iritani, representative of Cinderella Japan, for advice on Cinderella and CindyScript input manners.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kubo, Y. (2017). Data Processing with . In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10407. Springer, Cham. https://doi.org/10.1007/978-3-319-62401-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-62401-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62400-6
Online ISBN: 978-3-319-62401-3
eBook Packages: Computer ScienceComputer Science (R0)