1 Introduction

The interoperability of proof assistants and the integration of their libraries is a long-standing goal in theorem proving. One of the biggest prizes here is the Mizar Mathematical Library (MML) [BBG+18]. It has been exported multiple times [Urb03, IKRU13] and other efforts are ongoing [KP19]. Mizar stores all kernel data structures in externally readable XML files [Urb05] so that exports are reduced to interpreting the XML files. However, Mizar generates about a dozen XML files per MML article, and these use ad-hoc, under-documented, and evolving XML schemas that are as complex as Mizar’s feature-rich language and contain many parts that are only needed internally.

This was a major difficulty in our first translation of the MML to OMDoc [IKRU13], an XML-based representation format for mathematical knowledge and made the translation, while successful, prohibitively difficult to maintain. In the ten years since then, two things have changed. The Mizar XML data structures have been heavily improved, both for internal reasons and in response to the woes of translation developers. And we have developed much better routines for prover library translations [KR20]. The present system description redoes the export from scratch.

Like [IKRU13], we manually formalize the Mizar logic as a theory M in a logical framework of the LF family, which is realized in Mmt [RK13]. Then we generate one Mmt theory relative to M for every article in the MML. All these Mmt theories are stored in OMDoc format [Koh06] and are available online. During the export, we aim at both preserving all Mizar features exactly as they are (as opposed to implementing logically complex feature eliminations) and simplifying the language by reducing features to the primitives provided by the Mmt framework.

Besides rejuvenating the export, we go beyond [IKRU13] in multiple ways: (i) We implement the relevant parts of Mizar’s XML schema as a set of Scala data types, from which the parser is automatically generated, thus massively simplifying the maintenance of the translation. (ii) Our translation is almost entirely context-free (every file can be exported without processing its dependencies), which is critical for scalability of the export, and we identify the last remaining context-sensitivity issues in the XML schemas. (iii) We represent Mizar’s rich set of declarations as Mmt patterns in the sense of [HR15] and Mizar’s structures (akin to record types) as an Mmt structural feature in the sense of [MRRK20]. (iv) We cover many recondite Mizar features that were not handled well in [IKRU13] such as redefinitions.

2 Design

2.1 Formalizing the Mizar Logic

At the object-level, we formalize Mizar’s softly-typed set theory in a logical framework in Mmt. We omit technical details that are already part of [IKRU13] and only point out that we use LF-style HOAS with \(\{\_\}\_\) and \([\_]\_\) representing \(\Pi \) and \(\lambda \). At the declaration-level, we represent Mizar’s many conservative extension principles such as definitions and registrations as Mmt patterns [HKR12] and structural features [MRRK20]. The formalization is available at https://gl.mathhub.info/MMT/LATIN2/.

For example, Mizar’s direct partial predicate definitions are formalized as the Mmt pattern below. Its header takes natural numbers (the arity of the new predicate ) and (the number of cases in its definition), a list of n argument types (each potentially depending on the other arguments), m cases (each consisting of a condition \(cases_i\) on the n arguments and the resulting predicate ), a default result if no case applies, and a proof that the results of cases agree when their conditions overlap (we omit the definition of ). These are the argument given in Mizar and exported in the XML files. The body of the pattern contains the elaboration performed by Mizar: the n-ary predicate and its defining axiom . Typical for all pattern is the heavy use of Mmt ’s support for flexary operators such as conjunction and mapping over argument sequences.

figure a

2.2 Exporting the MML as XML

The Mizar verifier creates several XML files per source file that store information of the various processing phases. The most important are Weakly Strict Mizar (WSM, .wsx files) [BA12, NP16], which contain the syntax trees of statically analyzed articles and More Strict Mizar (MSM, .msx files), which extends WSM with the resolution of variables, constants, and labels. To additionally keep the original syntax, Even more Strict Mizar (EMSM, .msx files) is being developed, which adds semantic information about used constructors and Mizar patterns (which store the used format, constructor, argument types, and positions of visible arguments for a definition).

For example, the definition of subset below contains the formula , which results in EMSM in the XML fragment underneath. Here the attribute gives the original syntax. The other blue-highlighted attributes are what we can use to form identifiers in OMDoc for the three different kinds of references: the constant defined in the MML, and the kinds of bound variables (bound by the declaration) and (bound by a quantifier). The other attributes ( , etc.) represent internally used numbers that we ignore. The decision which attributes to use/ignore requires a Mizar expert but is now documented in our export.

figure h
figure i

2.3 Reading the XML into Scala Classes

Contrary to other parser generators, which generate the source code of the classes and the parser from a grammar, we directly implement the classes in Scala and then generate the parser. For example, the (heavily simplified) classes below are used to pick out the relevant attributes from the element visible above. is the abstract class of all Mizar expressions, and is an auxiliary class that groups XML attributes that often occur together.

figure m

This is sufficient to generate the XML parser using Scala reflection. This has the advantage that the Scala classes, which are what the translation developer interacts with primarily in the next step, are much more easily maintainable, can be better documented than generated code, and can be manually tweaked better to be practical.

These classes are available at https://github.com/UniFormal/MMT/ in the package . Note that this XML parsing step is independent of Mmt. Thus, other developers can easily reuse these classes and our XML parser as a starting point for translations into other target languages.

2.4 Translating the Scala Classes to MMT

The logical heart of the translation is now isolated in an inductive function that traverses the Scala classes holding the MML XML and producing corresponding Mmt classes. This happens in memory, and Mmt ’s existing emitters for OMDoc and Mmt source syntax can be used out of the box. Critically, it requires handling all idiosyncrasies of the Mizar language. The resulting export of the MML is available at https://gl.mathhub.info/Mizar/MML.

We translate each Mizar article to an Mmt theory (relative to those from Sect. 2.1) that contains include declarations for all Mizar articles it depends on. Each theorems is translated to a single Mmt declaration whose type gives the claim and whose definiens the proof. Each scheme, functor/predicate/mode, and attribute definitions as well as synonyms/antonyms and registrations are translated into instances of the corresponding patterns mentioned in Sect. 2.1. Where applicable, they are followed by declarations stating and proving the specially treated properties such as reflexivity for binary predicates. Redefinitions are translated into fresh constants with a new definition; if only the type is changed and no new definiens is provided, we synthesize a definition by applying the original constant to the corresponding arguments (this works, as the new types are required to be subtypes). Structure definitions (record types) are translated using an Mmt structural feature in the style of [MRRK20] that reimplements in Mmt Mizar’s conservative extension principle for adding named record types. Mizar’s forgetful functors between structures become record subtyping.

Proofs, which Mizar does not store entirely anyway, are translated only partially by using a special constant for a proof oracle: it takes a claim and a number of references to used theorems and returns a proof of the claim. This way proof dependencies are preserved in the export.

Improving on [IKRU13], our translation covers correctness conditions and properties of definitions, all registrations ([IKRU13] covered only existential registrations), forgetful-functors between structures, and partial proofs.

For example, consider the definition of the subset predicate from Sect. 2.2, which uses \(n=2\) arguments, whose types are just and do not depend on other arguments, \(m=0\) cases, and one default case for the actual definition. Its name is build from a character characterizing the kind of declaration (R) and the article-scoped counters contained in the and , which yields a unique identifier within the article. It is translated to the following instance of the Mmt pattern from Sect. 2.1 (the formula is deliberately not simplified, to emphasize how it is derived by elaborating the pattern):

figure t

The translation is context-free except for a few cases where the EMSM files do not quite contain enough information yet—in those cases some static analysis of Mizar must be reimplemented in Mmt and therefore the depended-upon articles must have been processed already:

  • Some functor redefinitions and functorial registrations require type inference of the return type

  • The arity of the original declaration of a redefinition without definiens must be determined.

The Mizar developers plan to address this issue with a new set of files in a new extension of EMSM.

Format

Size

Gen. time

MML

100 MB

-

XML

4.7 GB

15 min

MMT (zipped OMDoc)

18 MB

2 h

MMT (text syntax)

200 MB

30 h

3 Conclusion and Future Work

We have presented a thorough overhaul of the 10-year old export of the MML into OMDoc, leveraging all lessons learned and improvements made since then. Our export covers the entire MML with the only exception being the partial translation of proofs. The table on the right gives an overview of the sizes and generation times of the digital artifacts, all of which are available online as referenced throughout the paper. The export has so far been run only on simple hardware, and we expect shorter times when parallelizing on a server as soon as all issues of context-sensitivity have been removed. The generation time of the Mizar XML is so short because it is parallelized and Mizar only needs to resolve identifiers, which does not require verifying the proofs. The generation time of the MMT text syntax is excessive due to a scalability issue in MMT that was uncovered by the present export; it is unrelated to the Mizar export and will be fixed in a future release. The sizes of the Mizar and the MMT text are not directly comparable: the latter lacks full proofs, but includes some longer generated variable names, includes instances with their elaborations, and uses a less optimized syntax for conciseness and readability.

The XML produced by Mizar has much higher quality, the representation uses modern Mmt feature for declaratively mimicking Mizars’s highly idiosyncratic language, and the export implementation is substantially more maintainable, easier to use, and scalable. This provides promising evidence that investments into proof assistant library export workflows (albeit costly ones at glacial pace) are putting library translations and the thus-enabled system integrations ever more feasible.