An Analytics Tool for Exploring Scientific Software and Related Publications

Hoppe, Anett; Hagen, Jascha; Holzmann, Helge; Kniesel, Günter; Ewerth, Ralph

doi:10.1007/978-3-030-00066-0_27

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11057))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1864 Accesses
2 Citations

Abstract

Scientific software is one of the key elements for reproducible research. However, classic publications and related scientific software are typically not (sufficiently) linked, and tools are missing to jointly explore these artefacts. In this paper, we report on our work on developing the analytics tool SciSoftX (https://labs.tib.eu/info/projekt/scisoftx/) for jointly exploring software and publications. The presented prototype, a concept for automatic code discovery, and two use cases demonstrate the feasibility and usefulness of the proposal.

Access provided by CONRICYT-eBooks. Download conference paper PDF

ISSA: Generic Pipeline, Knowledge Model and Visualization Tools to Help Scientists Search and Make Sense of a Scientific Archive

Scholarly Knowledge Extraction from Published Software Packages

ACE 2.0: A Comprehensive tool for automatic extraction, analysis, and digital profiling of the researchers in Scientific Communities

Article Open access 05 May 2023

Keywords

1 Introduction

The open science movement works towards the general availability of scientific insight and is considered one answer to the so-called “reproducibility crisis” [2]. Science results are often generated by a combination of software, data, and parameters, all of which contribute to the final result (and its interpretation). The complexity of all these elements is hardly describable in a single article – and often the publication does not allow the full reproduction of the achieved results. In the line of work towards consequent reproducibility of scientific results, there are three main tasks to be tackled: (a) motivate researchers to reproduce past results; (b) develop novel ways for the integrated presentation of scientific results; (c) develop tools which allow for exploration of existing scientific works.

The work at hand focuses on the two latter objectives. It presents a tool which facilitates the examination of existing research involving software by joint exploration of a scientific article and the respective source code. The prototype allows the exploration of both in one interface, and the semi-automatic creation of semantic relations between them. The software is extended by basic visualisations. This kind of work is related to research areas, which have been active for decades: (a) automatic code analysis, and (b) automatic analysis of scientific publications. Solutions for automatic code analysis aim at generating textual documentation [7], summarising code [8], or at generating visualisations [4]. Also common is the generation of formal code models using semantic technologies [1] or logical constructs as realised in tools such as JTransformer^{Footnote 1}. While there is much work on linking code to other (textual) resources (e.g. traceability [3]), to documentation [4], on the automatic understanding of scientific publications [5], or on linking publications with software and archiving them [6], there has been little work on joint analytics of scientific software and publications, yet [9].

2 SciSoftX: Scientific Software Explorer

The Scientific Software Explorer provides researchers with functionalities for the exploration of external article-software ensembles and/or annotation of own works for better comprehensibility. Its final version will provide functionalities such as (a) manual annotation of article-software relations, (b) semi-automatic discovery of relations, and (c) visualisations for relation exploration.

2.1 Functionality

SciSoftX allows the user to open and simultaneously view a software project and a publication (Fig. 1). Parsing and processing of source code is realised using ANTLR^{Footnote 2} (Another tool for Language Recognition) that supports most of the relevant programming languages, while publications are processed via PDF.js^{Footnote 3}. The user can manually link code identifiers to relevant locations in the publication. When the user moves the mouse over a linked identifier in the publications, a tool tip shows the relevant source code positions.

Automatic Discovery of Code Identifiers and Snippets: At the current stage, the tool contains a basic method for the detection of code-relevant text snippets: It relies on the common convention of setting code elements in monospace fonts. The found identifiers are used to search the code model produced by ANTLR, multiple finds are disambiguated based on vicinity. In a random sample of 24 articles from computer science, the monospace-based linker was able to correctly detect 89.9% of the links annotated by a human expert.

Manual Annotation of Links: As a facilitator of exchange between scientists the tool also allows for the manual annotation of resources. In a step-wise process, the user marks article snippets, code elements and annotates the established link with one of the pre-defined labels. The created set of links can be exported to an XML format and imported by an interested reader.

Visualisations: Graph-based visualisations illustrate relations between software and publication on different levels of abstraction. Figure 2 shows an example displaying the connections at the package (software) and page (publication) level.

2.2 Use Cases

Use Case 1 – Reader-side: A researcher reads a publication that refers to a blob of software and then tries to understand the structure and rationale of the software. This time-consuming task can be supported by the automatic creation of links between textual description and actual source code, and the visualisations provided by SciSoftX. The user can click on nodes in the visualisation or on text elements that are highlighted in the publication and explore the implementation details, discover additional parameters, and understand the relevant code part step by step. Furthermore, it is possible to manually add and save useful information and metadata, which can help future users to explore the software.

Use Case 2 – Author-side: Paper authors can use SciSoftX to ensure their software is easily understood, e.g. in a reviewing process or for re-use. Therefore, they make use of the manual and automatic methods to annotate the semantic relations between their paper and the underlying software and publish the annotations. The visualisation of cross-modal relations can aid the authors (and the reviewers) to decide whether all relevant code parts and parameters are covered by the publication. In this way, the tool helps to evaluate the quality of the software description in a paper.

3 Conclusion

Reproducibility is one of the major issues of today’s scientific landscape. In this paper, we have reported on work in progress for an analytics tool that allows users to explore relations between scientific software and publications. To this date, the tool features simple mechanisms for detecting links between software and publications which serve as a proof of concept. Future work will explore (a) more powerful infrastructures for code analysis, (b) more sophisticated means for text/image analysis, e.g. mapping diagrams and formulas to source code.

Notes

References

Atzeni, M., Atzori, M.: Codeontology: RDF-ization of source code. In: d’Amato, C. (ed.) ISWC 2017. LNCS, vol. 10588, pp. 20–28. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_2
Chapter Google Scholar
Baker, M.: 1,500 scientists lift the lid on reproducibility. Nat. News 533(7604), 452 (2016)
Article Google Scholar
Borg, M., Runeson, P., Ardö, A.: Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir. Softw. Eng. 19(6), 1565–1616 (2014). https://doi.org/10.1007/s10664-013-9255-y
Article Google Scholar
Chen, X., Hosking, J.G., Grundy, J.: Visualizing traceability links between source code and documentation. In: IEEE Symposium on Visual Languages and Human-Centric Computing, Innsbruck, Austria, pp. 119–126 (2012). https://doi.org/10.1109/VLHCC.2012.6344496
Constantin, A.: Automatic structure and keyphrase analysis of scientific publications. Ph.D. thesis, University of Manchester, UK (2014). http://www.manchester.ac.uk/escholar/uk-ac-man-scw:230124
Holzmann, H., Sperber, W., Runnwerth, M.: Archiving software surrogates on the web for future reference. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds.) TPDL 2016. LNCS, vol. 9819, pp. 215–226. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43997-6_17
Chapter Google Scholar
Moser, M., Pichler, J.: Documentation generation from annotated source code of scientific software: position paper. In: Proceedings of the International Workshop on Software Engineering for Science, SE4Science@ICSE 2016, 14 May 2016–22 May 2016, Austin, Texas, USA, pp. 12–15. ACM (2016). https://doi.org/10.1145/2897676.2897679
Nazar, N., Hu, Y., Jiang, H.: Summarizing software artifacts: a literature review. J. Comput. Sci. Technol. 31(5), 883–909 (2016). https://doi.org/10.1007/s11390-016-1671-1
Article Google Scholar
Witte, R., Li, Q., Zhang, Y., Rilling, J.: Text mining and software engineering: an integrated source code and document analysis approach. IET Softw. 2(1), 3–16 (2008). https://doi.org/10.1049/iet-sen:20070110
Article Google Scholar

Download references

Author information

Authors and Affiliations

Leibniz Information Centre for Science and Technology (TIB), Hannover, Germany
Anett Hoppe & Ralph Ewerth
Leibniz Universität Hannover, Hannover, Germany
Jascha Hagen
L3S Research Center, Leibniz Universität Hannover, Hannover, Germany
Helge Holzmann & Ralph Ewerth
University of Bonn, Bonn, Germany
Günter Kniesel

Authors

Anett Hoppe
View author publications
You can also search for this author in PubMed Google Scholar
Jascha Hagen
View author publications
You can also search for this author in PubMed Google Scholar
Helge Holzmann
View author publications
You can also search for this author in PubMed Google Scholar
Günter Kniesel
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Ewerth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anett Hoppe .

Editor information

Editors and Affiliations

University Carlos III, Madrid, Spain
Eva Méndez
USI, Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
Cristina Ribeiro
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
Gabriel David
INESC TEC, Faculty of Engineering, University of Porto, Porto, Portugal
João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoppe, A., Hagen, J., Holzmann, H., Kniesel, G., Ewerth, R. (2018). An Analytics Tool for Exploring Scientific Software and Related Publications. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-00066-0_27
Published: 05 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00065-3
Online ISBN: 978-3-030-00066-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Analytics Tool for Exploring Scientific Software and Related Publications

Abstract

Similar content being viewed by others

ISSA: Generic Pipeline, Knowledge Model and Visualization Tools to Help Scientists Search and Make Sense of a Scientific Archive

Scholarly Knowledge Extraction from Published Software Packages

ACE 2.0: A Comprehensive tool for automatic extraction, analysis, and digital profiling of the researchers in Scientific Communities

Keywords

1 Introduction

2 SciSoftX: Scientific Software Explorer

2.1 Functionality

2.2 Use Cases

3 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Analytics Tool for Exploring Scientific Software and Related Publications

Abstract

Similar content being viewed by others

ISSA: Generic Pipeline, Knowledge Model and Visualization Tools to Help Scientists Search and Make Sense of a Scientific Archive

Scholarly Knowledge Extraction from Published Software Packages

ACE 2.0: A Comprehensive tool for automatic extraction, analysis, and digital profiling of the researchers in Scientific Communities

Keywords

1 Introduction

2 SciSoftX: Scientific Software Explorer

2.1 Functionality

2.2 Use Cases

3 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation