Keywords

1 Introduction

Databases being, essentially, collections of (possibly interrelated) tables of data, a foundational question is how to best represent such collections of tables mathematically in order to study their properties and ways of manipulating them. The relational model, essentially treating tables as structures of first-order relational signatures, is a simple and powerful representation. Nevertheless, areas exist in which the relational model is less adequate than in others. One familiar example is the question of how to represent partially filled out rows or missing information. Another, more fundamental perhaps, is how to relate instances of different schemas, as opposed to the relatively well understood relations between instances of the same schema. Adding to this an increasing need to improve the ability to relate and map data structured in different ways suggests looking for alternative and supplemental ways of modelling tables more suitable to “dynamic” settings. It seems natural, in that case, to try to model tables of different shapes as living in a single mathematical structure, facilitating their manipulation across different schemas.

We investigate, here, a novel way of representing data structured in systems of tables which is based on simplicial sets and type theory rather than sets of relations and first-order logic. Formally, we present a soundness theorem (Theorem 1) for a certain dependent type theory with respect to a rather simple category of (finite, abstract) simplicial complexes. An interesting type-theoretic feature of this is that the type theory has context constants, mirroring that our choice of “display maps” does not include all maps to the terminal object. But from the database perspective the interesting aspect is that this category can in a natural way be seen as a category of tables; collecting in a single mathematical structure—an indexed or fibered category—the totality of schemas and instances.

This representation can be introduced as follows. Let a schema S be presented as a finite set \(\mathbf{A}\) of attributes and a set of relation variables over those attributes. One way of allowing for missing information or partially filled out rows is to assume that whenever the schema has a relation variable R, say over attributes \(A_0,\ldots ,A_n\), it also has relation variables over all non-empty subsets of \(\{A_0,\ldots ,A_n\}\). So a partially filled out row over R is a full row over such a “partial relation” or “part-relation” of R. To this we add the requirement that the schema does not have two relation variables over exactly the same attributesFootnote 1. This requirement means that a relation variable can be identified with the set of its attributes. Together with the first requirement, this means that the schema can be seen as a downward closed sub-poset of the positive power set of the set of attributes \(\mathbf{A}\). Thus a schema is an (abstract) simplicial complex—a combinatorial and geometric object familiar from algebraic topology.

The key observation is now that an instance of the schema S can also be regarded as a simplicial complex, by regarding the data as attributes and the tuples as relation variables. Accordingly, an instance over S is a schema of its own, and the fact that it is an instance of S is “displayed” by a certain projection to S. Thus the category \(\mathbb {S}\) of finite simplicial complexes and morphisms between them form a category of schemas which includes, at the same time, all instances of those schemas; where the connection between schema and instance is given by a collection D of maps in \(\mathbb {S}\) called display maps.

We show, essentially, that \(\mathbb {S}\) together with this collection D of maps form a so-called display-map category [7], a notion originally developed in connection with categorical models of dependent type theory. First, this means that the category \(\mathbb {S}\) has a rich variety of ready-made operations that can be applied to schemas and instances. For example, the so-called dependent product operation can be used to model the natural join operation. Second, it is a model of dependent type theory. We specify a dependent type theory with context constants and a sound interpretation which interprets contexts as schemas and types as instances. This interpretation is with respect to the display-map category \((\mathbb {S}, D)\) in its equivalent form as an indexed category. The context constants are interpreted as distinguished single relation variable schemas (or relation schemas in the terminology of [1]), reflecting the special status of such schemas. The type theory allows for the syntactic specification of both schemas and instances, and formally derives the answers to queries; for instance, using the dependent product of the type theory, the elements of the natural join of two instances can be derived in the type theory.

We focus, in the space available here, on the basic presentation of the model and the type theory (Sects. 2 and 3, respectively). In Sect. 4 we then give a few brief indications of the use and further development of the model and the type theory: we give a suggestion of how the dependent structure of the model can be put to use to model e.g. updates; and we sketch the introduction and use of large types such as, and in particular, a universe. The universe allows reasoning generically about classes of instances of in the type theory itself, without having to resort to the metalanguage, and provides the basis for e.g. a precise, formal definition and analysis of the notion of query in this setting.

2 The Model

2.1 Complexes

We fix the following terminology and notation, adjusting the standard terminology somewhat for our purposes. More details can be found in [4]. A background on simplicial complexes and simplicial sets can be found in e.g. [5, 6]. The question of whether vertices or attributes should be ordered is not essential for our presentation here, and is swept under the rug.

A simplicial complex, or just complex, X consists of the union of a finite set \(X_0\) and a collection \(X_{\ge 1}\) of non-empty, non-singleton subsets of \(X_0\), satisfying the condition that if x is a set in \(X_{\ge 1}\), then all non-empty, non-singleton subsets of x are in \(X_{\ge 1}\). It is convenient to also allow singleton subsets, and identify them with the elements of \(X_0\). The natural order on X is then given by subset inclusion. The elements of \(X_0\) are referred to as vertices. The elements of \(X_{\ge 1}\) as faces. For \(n\ge 0\), we write \(X_{n}\) for the set of elements of X with size \(n+1\), and refer to them as faces of dimension n. Accordingly, vertices are seen as having dimension 0. We use square rather than curly brackets for faces, e.g. \([A,B]\) rather than \(\{A,B\}\).

A morphism \(f\! :\! X \longrightarrow Y \) of complexes is a function \(f_0\! :\! X_0 \longrightarrow Y_0 \) satisfying the condition that for all x in X the image \(f_0(x)\) is in Y (again, with a singleton identified with its element). Thus a morphism of complexes can be seen as a morphism of posets by setting f(x) to be the image \(f_0(x)\).

A morphism \(f\! :\! X \longrightarrow Y \) of complexes is said to be a display map if \(x\in X_n\) implies \(f(x)\in Y_n\) for all n. Thus display maps are the maps that “preserve dimension”. Display maps are also the “local isomorphisms”, in the sense that for \(x\in {X}\) the restriction \(f\upharpoonright _{(\downarrow x)}: (\downarrow x)\rightarrow (\downarrow f(x))\) is an isomorphism. Note that a display map need not be a injection of vertices.

A poset P that is isomorphic to a complex can clearly be uniquely rewritten to a complex with the same set of vertices as P. For that reason, and as it is occasionally notationally convenient and can yield more intuitive examples, we allow ourselves to extend the notion of complex to any poset that is isomorphic to a complex. We say that it is a strict complex if we need to emphasize that it is of the form defined above.

2.2 Schemas and Instances

Schemas. A simplicial schema is a complex with the natural order reversed. We consider the resulting poset as a category, in the usual way. If \(x\le y\) in the natural order, we write \(\delta ^y_x\! :\! y \longrightarrow x \) for the corresponding arrow in the simplicial schema. In the context of simplicial schemas, we use “attribute” and “relation variable” synonomously with “vertex” and “face”, respectively. Let \(\mathbb {S}\) be the category of simplicial schemas and morphisms, and \(\mathbb {S}_d\) be the category of simplicial schemas and display maps.

With respect to the traditional notion of schema, a simplicial schema X can be thought of as given in the usual way by a finite set of attributes \(X_0=\{A_0,\ldots ,A_{n-1}\}\) and a set of relational variables \(X_{\ge 1}=\{R_0,\ldots R_{m-1}\}\), each with a specification of column names in the form of a subset of \(X_0\), but with the restrictions (1) that no two relation variables are over exactly the same attributes; and (2) for any non-empty subset of the attributes of a relation variable there exists a relation variable over (exactly) those attributes. As with “complex”, we henceforth mostly drop the word “simplicial” and simply say “schema”.

The category \(\mathbb {S}_d\) contains in particular the n-simplices \(\varDelta _n\) and the face maps. Recall that the the n-simplex \(\varDelta _n\) is the complex given by the set \(\{0,\ldots ,n\}\) as vertices and all non-empty, non-singleton subsets as faces. For \(0\le i\le n+1\), the face map \(d^n_i: \varDelta _n\longrightarrow \varDelta _{n+1}\) is the morphism of complexes defined by the vertex function \(k\mapsto k\), if \(k<i\) and \(k\mapsto k+1\) else. These satisfy the simplicial identities \(d^{n+1}_i\circ d^{n}_j = d^{n+1}_{j-1}\circ d^n_i\) if \(i < j\). As a schema, \(\varDelta _n\) is the schema of a single relation on \(n+1\) attributes named by numbers \(0,\ldots ,n\) (and all its “generated” part-relations). The face map \(d^n_i: \varDelta _n\longrightarrow \varDelta _{n+1}\) can be seen as the inclusion of the relation variable \([0, \ldots , i-1, i+1,\ldots , n+1]\) in \(\varDelta _{n+1}\). These schemas and morphisms play a special role in Sect. 3 where they are used to specify general schemas and instances syntactically.

Example 1

Let S be the schema the attributes of which are ABC and the relation variables R : AB and Q : BC, with indicated column names. From a “simplicial” point of view, S is the category

figure a

Replacing R with \([A,B]\) and Q with \([B,C]\) (and inverting the order) yields a strict complex. For another example, the 2-simplex \(\varDelta _2\) can be seen as a schema on attributes 0,1, and 2, with relation variables \([0,1,2]\), and its part-relations. The function \(f_0\) given by \(A\mapsto 0\), \(B\mapsto 1\), and \(C\mapsto 2\) defines a morphism \(f\! :\! S \longrightarrow \varDelta _2 \) of schemas/complexes. f is a display map. \(f_0^{-1}\) does not define a morphism of schemas.

Instances. Let X be a schema, say with attributes \(X_0=\{A_0,\ldots ,A_{n-1}\}\). A functor \(F:X\longrightarrow \mathbf {FinSet}\) from X to the category of finite sets and functions can be regarded as an instance of the schema X. For \(x=[A_{i_0},\dots ,A_{i_{m-1}}]\in X\), the set F(x) can be regarded as a set of “keys” or “row-names”. The “value” \(k[A_{i_j}]\) of such a key \(k\in F(x)\) at attribute \(A_{i_j}\) is then the element \(k[A_{i_j}]:=F(d^x_{A_{i_j}})(k)\). Accordingly, F(x) maps to the set of tuples \(F(A_{i_0})\times \ldots \times F(A_{i_{m-1}})\) by \(k\mapsto \left\langle {k[A_{i_0}],\ldots ,k[A_{i_{m-1}}]} \right\rangle \). For arbitrary F, this function is not 1–1; that is, there can be distinct keys with the same values at all attributes. We say that F is a relational instance if this does not happen. That is, a relational instance is a functor \(F:X\longrightarrow \mathbf {FinSet}\) such that for all \(x\in X\) the functions \(\left\{ {\delta ^{x}_{A}}\ \; | \; \ {A\in x}\right\} \) are jointly injective. Say that a relational instance is strict if the keys are actually tuples and the \(\delta \)’s are the expected projections.

Example 2

Let S be the schema of Example 1. Let an instance I be given by

figure b

Then I is the functor

figure c

with \(I(\delta ^R_A)(1)=a\), \(I(\delta ^R_B)(1)=b\) and so on.

Let J be the strict instance \(J:\varDelta _2\longrightarrow \mathbf {FinSet}\) given in tabular form by

figure d

Explicitly, then, J is the functor which e.g. maps \([0,1]\) to \(\{\left\langle {a,b} \right\rangle ,\left\langle {a',b} \right\rangle \}\) and such that \(J(\delta ^{[0,1]}_{1})(\left\langle {a,b} \right\rangle )=b\).

Substition, Strictification, and Induced Schemas. Let \(f\! :\! X \longrightarrow Y \) be a morphism of schemas, and let \(I\! :\! Y \longrightarrow \mathbf {FinSet} \) be a relational instance. Then it is easily seen that the composite \(I\circ f\! :\! X \longrightarrow \mathbf {FinSet} \) is a relational instance. We write \(I[f]:=I\circ f\) and say it is the substitution of I along f.

It is clear that a relational instance is naturally isomorphic to exactly one strict relational instance with the same values. We say that the latter is the strictification of the former.

Example 3

Consider the morphism \(f\! :\! S \longrightarrow \varDelta _2 \) of Example 1 and the instances I and J of Example 2. Then J[f] is the strictification of I.

Let \(\mathrm {Rel}(X)\) be the category of strict relational instances and natural transformations between them. For convenience and brevity (as with complexes) we often disregard the requirement that instances need to be strict in the sequel. However, working with relational instances up to strictification, or restricting to the strict ones, resolves the coherence issues so typical of categorical models of type theory. To have the “strict” instances be those “on tuple form” presents itself as a natural choice, both by the connection to the relational model and by the following formal connection between such instances and display maps.

Lemma 1

Let \(f:X\longrightarrow Y\) be a morphism of schemas. Then f is display if and only if for all strict instances J of Y the instance J[f] is also strict.

The connection between display maps, relational instances and simplicial schemas is given by the following. Let X be a schema and \(F:X\longrightarrow \mathbf {FinSet}\) an arbitrary functor. Recall, e.g. from [8], that the category of elements \(\int _{{X}}F\) has objects \(\left\langle {x,a} \right\rangle \) with \(x\in X\) and \(a\in F(x)\). A morphism \(\delta ^{\left\langle {x,a} \right\rangle }_{\left\langle {y,b} \right\rangle }: \left\langle {x,a} \right\rangle \longrightarrow \left\langle {y,b} \right\rangle \) is a morphism \(\delta ^{x}_{y}: x\longrightarrow y\) with \(F(\delta ^{x}_{y})(a)=b\). The projection \(p:\int _{X}{F}\longrightarrow X\) is defined by \(\left\langle {x,a} \right\rangle \mapsto x\) and \(\delta ^{\left\langle {x,a} \right\rangle }_{\left\langle {y,b} \right\rangle }\mapsto \delta ^{x}_{y}\). We then have

Lemma 2

Let X be a simplicial schema and \(F:X\longrightarrow \mathbf {FinSet}\) be a functor. Then F is a relational instance if and only if \(\int _X F\) is a simplicial schema and \(p:\int _X F\rightarrow X\) is a display morphism.

When F is a relational instance we write \(X.F\) for \(\int _X F\), and refer to it as the canonical schema corresponding to F. We refer to p as the canonical projection.

Example 4

The canonical schema of instance J of Example 2 has attribute set \(\{\left\langle {0,a} \right\rangle , \left\langle {0,a'} \right\rangle , \left\langle {1,b} \right\rangle , \left\langle {1,d} \right\rangle , \left\langle {2,e} \right\rangle , \left\langle {2,c} \right\rangle \}\) and relation variables e.g.

\(\left\langle {[0,1,2],\left\langle {a,b,c} \right\rangle } \right\rangle \) (or, strictly, \([\left\langle {0,a} \right\rangle ,\left\langle {1,b} \right\rangle ,\left\langle {2,c} \right\rangle ]\)).

Full Tuples. A schema X induces a canonical instance of itself by filling out the relations by a single row each, consisting of the attributes of the relation. This instance is terminal in the category of instances of X; that is, every other instance of X has a unique morphism to it. Accordingly, we define the terminal instance \(1_X:X\longrightarrow \mathbf {FinSet}\) to be the functor defined by \(x\mapsto \{x\}\).Footnote 2

A full or matching tuple t of an instance I over schema X is a natural transformation \(t: 1_X \Rightarrow I \). We write \(\mathrm {Trm}_{X}(I)\) for the set of full tuples (indicating that we see them as terms type-theoretically).

Given a full tuple \(t: 1_X \Rightarrow I \), the induced section is the morphism \(\hat{t}:X\longrightarrow X.I\) defined by \(x\mapsto \left\langle {x, t_x(x)} \right\rangle \). Notice that the induced section is always a display morphism.

Example 5

The instance I of Example 2 has precisely two full tuples. A full tuple can be seen as a tuple over the full attribute set of the schema with the property that for all relation variables the projection of the tuple is a row of that relation. The two full tuples of I are, then, \(\left\langle {a,b,c} \right\rangle \) and \(\left\langle {a',b,c} \right\rangle \). The instance J of Example 2 has precisely one full tuple \(\left\langle {a,b,c} \right\rangle \).

2.3 Simplicial Databases

We have a functor \(\mathrm {Rel}(-):{\mathbb {S}_d}^\mathrm {op}\longrightarrow Cat\) which maps X to \( \mathrm {Rel}(X)\) and \(f:X\longrightarrow Y\) to \(\mathrm {Rel}(f)= (-)[f]:\mathrm {Rel}(Y)\longrightarrow \mathrm {Rel}(X)\). We denote this indexed category by \(\mathfrak {R}\), and think of it as a “category of databases” in which the totality of databases and schemas are collected. It is a model of a certain dependent type theory with context constants which we give in Sect. 3. We briefly outline some of the relevant structure available in \(\mathfrak {R}\).

Definition 1

For \(f:X\longrightarrow Y\) in \(\mathbb {S}_d\) and \(J\in \mathrm {Rel}(Y)\) and \(t\! :\! 1_Y \Rightarrow J \) in \(\mathrm {Trm}_{Y}(J)\):

  1. 1.

    Define \(t[f]\in \mathrm {Trm}_{X}(J[f])\) by \(x\mapsto t(f(x))\in J[f](x)\). Note that for \(g:Z\longrightarrow ~X\) we have \(t[f][g]=t[f\circ g]\).

  2. 2.

    With \(p_J\! :\! Y.J \longrightarrow Y \) the canonical projection, let \(v_J\! :\! 1_{Y.J} \Rightarrow J[p_J] \) be the full tuple defined by \(\left\langle {y,a} \right\rangle \mapsto a\). (This term is needed for the type theory. We elsewhere leave subscripts on v and p determined by context.)

  3. 3.

    Denote by \(\tilde{f}\! :\! X.J[f] \longrightarrow Y.J \) the schema morphism defined by \(\left\langle {x,a} \right\rangle \mapsto \left\langle {f(x),a} \right\rangle \). Notice that since f is display, so is \(\tilde{f}\).

Lemma 3

The following equations hold:

  1. 1.

    For X in \(\mathbb {S}_d\) and \(I\in \mathrm {Rel}(X)\) and \(t\in \mathrm {Trm}_{X}(I)\) we have \(p\circ \hat{t}=\mathrm {id}_X\) and \(t=v[\hat{t}]\).

  2. 2.

    For \(f:X\longrightarrow Y\) in \(\mathbb {S}_d\) and \(J\in \mathrm {Rel}(Y)\) and \(t\in \mathrm {Trm}_{Y}(J)\) we have

    1. (a)

      \(p\circ \tilde{f}= f\circ p\! :\! X.J[f] \longrightarrow Y \);

    2. (b)

      \(\tilde{f}\circ \widehat{t[f]}=\hat{t}\circ f\! :\! X \longrightarrow Y.J \); and

    3. (c)

      \(v_J[\tilde{f}]=v_{J[f]}\! :\! 1_{X.J[f]} \Rightarrow J[f][p] \).

  3. 3.

    For \(f:X\longrightarrow Y\) and \(g:Y\longrightarrow Z\) in \(\mathbb {S}_d\) and \(J\in \mathrm {Rel}(Z)\) we have \(\widetilde{g\circ f}=\tilde{g}\circ \tilde{f}\).

  4. 4.

    For \(X \in \mathbb {S}_d\) and \(I\in \mathrm {Rel}(X)\) we have \(\tilde{p}\circ \hat{v}=\mathrm {Id}_{X.I}\).

The following instance-forming operations exist and commute with substitution.

0 and 1 instances: Given \(X\in \mathbb {S}_d\) the terminal instance \(1_X\) has already been defined. The initial instance \(0_X\) is the constant empty functor, \(x\mapsto \emptyset \).

Dependent Sum: Let \(X\in \mathbb {S}_d\), \(J\in \mathrm {Rel}(X)\), and \(G\in \mathrm {Rel}(X.J)\). We define the instance \(\varSigma _JG\! :\! X \longrightarrow \mathbf {FinSet} \) up to strictification by

\(x\mapsto \left\{ {\left\langle {a,b} \right\rangle }\ \; | \; \ {a\in J(x),\ b\in G(x,a)}\right\} \). For \(\delta ^x_y\) in X, let \(\varSigma _JG(\delta ^x_y)(a,b)=\) \(\left\langle {\delta ^x_y(a),\delta ^{x,a}_{y,\delta ^x_y(a)}(b)} \right\rangle \).

Identity: Given \(X\in \mathbb {S}_d\) and \(J\in \mathrm {Rel}(X)\) the Identity instance \(\mathrm {Id}_J\in \mathrm {Rel}(X.J.J[p])\) is defined, up to strictification, by \(\left\langle {\left\langle {x,a} \right\rangle ,b} \right\rangle \mapsto \star \) if \(a=b\) and \(\left\langle {\left\langle {x,a} \right\rangle ,b} \right\rangle \mapsto \emptyset \) else. \(\star \) being e.g. the empty tuple. The full tuple \({\text {refl}}\in \mathrm {Trm}_{(X.J)}(\mathrm {Id}_J[\hat{v}])\) is defined by \(\left\langle {x,a} \right\rangle \mapsto \star \).

Disjoint Union: Given \(X\in \mathbb {S}_d\) and \(I,J\in \mathrm {Rel}(X)\), the instance \(I+J\in \mathrm {Rel}(X)\) is defined up to strictification by

\(x\mapsto \left\{ {\left\langle {n,a} \right\rangle }\ \; | \; \ {(n=0\wedge a\in I(x)) \vee (n=1\wedge a\in J(x))}\right\} \). We have full tuples \({\text {left}} \) \( \in \mathrm {Trm}_{X.I}((I+J)[p])\) defined by \(\left\langle {x,a} \right\rangle \mapsto \left\langle {0,a} \right\rangle \) and \({\text {right}}\in \mathrm {Trm}_{X.J}((I+J)[p])\) defined by \(\left\langle {x,a} \right\rangle \mapsto \left\langle {1,a} \right\rangle \).

Dependent Product: Let \(X\in \mathbb {S}_d\), \(J\in \mathrm {Rel}(X)\), and \(G\in \mathrm {Rel}(X.J)\). We define the instance \(\varPi _JG\! :\! X \longrightarrow \mathbf {FinSet} \) as strictification of the right Kan-extension (in the sense of e.g. [8]) of G along p. See [4] for an explicit construction. There are operations \(\mathrm {Ap}\) and \(\lambda \) which for any full tuple \(t\in \mathrm {Trm}_{X.J}(G)\) yields a full tuple \(\lambda t\in \mathrm {Trm}_{X}(\varPi _JG)\), and for any full tuple \(s\in \mathrm {Trm}_{X}(\varPi _JG)\) yields a full tuple \(\mathrm {Ap}_s\in \mathrm {Trm}_{X.J}(G)\). Moreover, \(\mathrm {Ap}_{\lambda t}= t\). We further indicate the relationship between the dependent product and full tuples, and the way in which the dependent product models the natural join operation, with the following example.

Example 6

Consider the schema S and instance I of Examples 1 and 2. Corresponding to the display map \(f\! :\! S \longrightarrow \varDelta _2 \), we can present S an instance of \(\varDelta _2\) as (ignoring strictification for readability) \(S\! :\! \varDelta _2 \longrightarrow \mathbf {FinSet} \) by \(S(0)=\{A\}\), \(S(1)=\{B\}\), \(S(2)=\{C\}\), \(S(01)=\{R\}\), \(S(12)=\{Q\}\), and \(S(02)=S(012)=\emptyset \). Notice that, modulo the isomorphism between S as presented in Example 1 and \(\varDelta _2.S\), the morphism \(f\! :\! S \longrightarrow \varDelta _2 \) is the canonical projection \(p\! :\! \varDelta _2.S \longrightarrow \varDelta _2 \). Similarly we have \(I\in \varDelta _2.S\) as (in tabular form, using subscript instead of pairing for elements in \( \varDelta _2.S\), and omitting the three single-column tables)

figure e

Notice that the three-column “top” table of \(\varPi _{S}I\) is the natural join \(R_{01}\bowtie Q_{12}\). The type theory of the next section will syntactically derive the rows of this table from the syntactic specification of S and I and the rules for the dependent product (see [4]).

3 The Type Theory

We introduce a Martin-Löf style type theory [9], with explicit substitutions (in the style of [3]), extended with context and substitution constants representing simplices and face maps. The type theory contains familiar constructs such as \(\varSigma \)- and \(\varPi \)-types. For this type theory we give an interpretation in the indexed category \(\mathfrak {R}\) of the previous section. The goal is to use the type theory as a formal language for databases. We give examples how to specify instances and schemas formally in the theory. Further details can be found in [4].

3.1 The Type Theory \(\mathcal {T}\)

The type system has the following eight judgements, with intended interpretations.

figure f

The type theory \(\mathcal {T}\) has the rules listed in Figs. 1 and 2. The interpretation of these are given by the constructions in the previous section, and summarised in Fig. 3.

Fig. 1.
figure 1

Rules of the type theory: contexts and substitution

Fig. 2.
figure 2

Rules of the type theory: Types

Each rule introduces a context, substitution, type or element. We will apply usual abbreviations such as \(A \longrightarrow B\) for \(\varPi _AB[\downarrow _A]\) and \(A \times B\) for \(\varSigma _A B[\downarrow _A]\). In addition to these term introducing rules there are a number of equalities which should hold; such as the simplicial identities \(d_i^{n+1} \circ d_j^n \equiv d_{j-1}^{n+1} \circ d_i^n: \varDelta _n \longrightarrow \varDelta _{n+2}\). We list the definitional equalities in Fig. 4.

Fig. 3.
figure 3

Interpretation of the type theory

Fig. 4.
figure 4

Definitional equalities in the type theory

These all hold in our model. (The equalities for substitution are verified in Lemma 3. The remaining equations are mostly routine verifications.) We display this for reference.

Theorem 1

The intended interpretation yields a sound interpretation of the type theory \(\mathcal {T}\) in \(\mathfrak {R}\).

3.2 Instance Specification as Type Introduction

The intended interpretation of \(A : \mathtt {type}(\varGamma )\) is that A is an instance of the schema \(\varGamma \). However, context extension allows us to view every instance as a schema in its own right; for every instance \(A : \mathtt {type}(\varGamma )\), we get a schema \(\varGamma .A\). It turns out that the most convenient way to specify a schema is by introducing a new type/instance over one of the simplex schemas \(\varDelta _n\). To specify a schema, with a maximum of n attributes, may be seen as introducing a type in the context \(\varDelta _n\). A relation variable with k attributes in the schema is introduced as an element of the schema substituted into \(\varDelta _k\). Names of attributes are given as elements of the schema substituted down to \(\varDelta _0\).

Example 7

We construct the rules of the schema S presented as an instance of \(\varDelta _2\) as in Example 6. The introduction rules tells us the names of tables and attributes in S.

figure g

From these introduction rules, we can generate an elimination rule. The elimination rule tells us how to construct full tuples in an instance over the schema S. Another interpretation of the elimination rule is that it formulates that the schema S contains only what is specified by the above introduction rules; it specifies the schema up to isomorphism.

An instance of a schema is a type depending in the context of the schema. Therefore instance specification is completely analoguous to schema specification. See [4] for an example. In [4] one can also find a derivation of the terms in \(\mathcal T\) corresponding to the full tuples of the natural join in Example 6.

4 Dependent Structure, Large Types, and Queries

Most of this paper has been devoted to explaining the basic structure of the display map category of finite simplicial complexes seen as encoding systems of tables, and to stating the type theory which it models. In the space that remains, we briefly indicate some approaches of ongoing and future work, in particular emphasizing the definitions and roles of large types and universes. Before introducing additional types, however, we point to the use that can be made of the dependent structure itself. It is a cornerstone of the model that an instance over a schema can itself, by context extension, be seen as a schema over which new instances can be constructed. Thus context extension provides, for a given instance, the built in possibility to enter data related to the instance into tables formed by its rows. One immediate suggestion for the potential use of this feature is for updates; an update \(I'\) of an instance I over \(\varGamma \) is the instance over \(\varGamma .I\) obtained by writing the new (or old or empty) row in the table formed by the row to be replaced (or kept or deleted). Adding new rows can be done by writing \(I'\) over \(\varGamma .I+1\) instead, as \(I+1\) has a copy of \(\varGamma \) over which new additions can be entered. (Multiple copies of \(\varGamma \), and indeed of I, can be added if need be; notice that polynomial expressions over I such as \(2I+3\) yield meaningful instances over \(\varGamma \)). In this way a current update occurs in a context formed by a string of previous updates. Applying the dependent product operation gives an instance over the original schema \(\varGamma \), if desired.

Returning to large types, Example 6 gives a glimpse of a “type-theoretic” operation and its relation to one of the standard queries of relational databases. A formal investigation of queries (and dependencies and views) in the setting of the type theory and model of the previous sections requires a formal understanding of what constitutes a query in this setting. For (partly) this purpose, we introduce a universe. This is a large type, corresponding to an infinite instance which encodes all finite instances over a fixed domain. Thus, given a schema X, the universe \(U_X\) of finite instances of X is an infinite instance of X where the full tuples encode the finite instances of X (over the fixed domain). The universe comes equipped with a decoding instance \(T_X\) over \(X.{U_X}\) such that given a full tuple \(t \in \mathrm {Trm}_{X}(U_X)\), the instance it encodes is decoded by \(T_X[\widehat{t}]\),

figure h

The universe and decoding instance are stable under substitution, and are closed under the other type constructions, such as \(\varPi \)- and \(\varSigma \)-types. We omit the details of the constructions.

The universe, \(U_\varGamma \) along with \(T_\varGamma : \mathtt {Type}(\varGamma .U_\varGamma )\), allows reasoning generically about classes of instances of \(\varGamma \) in the type theory itself, without having to resort to the metalanguage. Since schemas can be though of as instances, they too can be constructed using the universe. In particular, given a schema \(\varGamma \), the type \(\varOmega _\varGamma := \varSigma _{U_\varGamma } \varPi _{T_\varGamma } \varPi _{T_{\varGamma }[\downarrow _{T_{\varGamma }}]} \text {Id}_{T_{\varGamma }}\) is the large type of subschemas of \(\varGamma \). Its elements are decoded to instances by the family \(\mathcal O_{\varGamma } := T[\downarrow _\varOmega .U][\pi _0\mathord {\uparrow }]\). Given \(t : \mathtt {Elem}(\varOmega _\varGamma )\), the subschema it encodes is \(\varGamma .\mathcal O[t\mathord {\uparrow }]\).

A query can then be seen as an operation which takes an instance of a schema to another instance of a related schema. Given codes for a source subschema \(t : \mathtt {Elem}(\varOmega )\) and a target subschema \(u : \mathtt {Elem}(\varOmega )\), the type of queries from t to u is thus \((\mathcal O [t\mathord {\uparrow }] \rightarrow U) \rightarrow (\mathcal O [u\mathord {\uparrow }] \rightarrow U)\). Having given a concrete type of queries leads the way to investigations as to exactly which queries can be expressed in the language. For illustration, we present an example query formulated in this way.

Example 8

In the spririt of Example 6, let \(a : \mathtt {Elem}(\varOmega )\) be the code for a subschema covering the schema \(\varGamma \), in the sense that the set of attributes are the same. The query taking the dependent product, or natural join, of an instance of this subschema is expressed by the term

$$\begin{aligned} q := \lambda \lambda (\pi [(\downarrow _{\varOmega }.U)\circ (\pi _0\mathord {\uparrow })][a\mathord {\uparrow }.(T \rightarrow U)][\downarrow _{1}]) : \mathtt {Elem}((\mathcal O [a\mathord {\uparrow }] \rightarrow U) \rightarrow (1_\varGamma \rightarrow U)). \end{aligned}$$

Further new types of interest and use are the type \(\varLambda _{\varGamma }\) of “tables”, or faces, of a schema and the (large) type \(\mathbb {N}_{\varGamma }\) of “finite” instances, or instances of the form \(1+1+\ldots +1\). The former can be constructed so as to be both small and stable (whereas \(\varOmega _{\varGamma }\) seems to have to be large in order to be stable), and is sufficient for schema and instance specification (cf. Example 7). The latter is of relevance e.g. with respect to instances determined by their full tuples (or “with no nulls”).