1 Introduction
Proof assistants have become invaluable tools in the hands of mathematicians, computer scientists and proof engineers that aim to build certified theories, software, systems and hardware, as evidenced by successful, large formalization projects ranging from famously hard mathematical results ([21],[24]) to realistic compilers ([34], [51]), program logics ([29, 6]), operating systems ([31, 22]) and even hardware design ([3, 37, 15]). Ultimately, however, all these formalizations rely on a Trusted Theory Base (TTB), that consists of the mathematical foundations of the proofassistant most often a variant of HigherOrder Logic, Set Theory or Type Theory and a Trusted Code Base (TCB): its actual implementation in a general purpose programming language. To obtain the highest guarantees on the proof assistants results, one should in principle also verify the consistency of the foundations, usually by building models in a Trusted Theory (ZermeloFraenkel Set Theory being the most common one), the adequacy of the implementation with this theory, and the correct compilation of this implementation.
1.1 A little history
To date, only the HOL family of provers have benefitted from such a guarantee, due to the seminal work of Kummar et al [32], who built a selfformalization of HigherOrder Logic modeled by set theory (ZF) in HOL Light (building on Harrison’s work [25]), implemented itself in CakeML. In contrast, for dependent type theories at the basis of proof assistants like Coq, Agda, Idris or Lean, selfcertification, or more informally typetheory eating itself [14] is a longstanding open problem. A rather large fragment of the theory at the basis of Coq, the Calculus of Constructions with universes, was formalized in Coq by Barras during his PhD [8] and extended in his habilitation thesis [9], culminating in a proof of Strong Normalization (SN) and hence relative consistency with IZF with a number of inaccessible cardinals for this theory. His development includes a proof of subject reduction and a modeltheoretic proof using a specification of Intuitionistic Zermelo Fraenkel Set Theory in Type Theory, resulting from a long line of work relating the two [4, 55]. Due to Gödel’s second incompleteness theorem, one can only hope to prove the consistency of the theory with universes in a theory with universes. These results prove that a quite large fragment of Coq can be shown relatively consistent to a standard foundation. Since Barras’ work, both penandpaper and formalized modeltheoretic proofs have been constructed for many variants of dependent type theories, from decidability of typechecking for a type theory with universes [2] or canonicity [27] and normalization [49] for cubical type theories. We hence consider Coq’s core type theory to be wellstudied, a most recent reference for a consistency proof of the calculus with inductive types, universe polymorphism and cumulativity is [54, 53].
1.2 Goals of the project
The theory behind Coq’s calculus, the Polymorphic Cumulative Calculus of (Co)Inductive Constructions (Pcuic), is rather wellstudied and welltested now: most important fragments have accepted consistency proofs and no inconsistency was found in the rules that have been implemented in the last 30 years. That is, until a new axiom like Univalence breaks implicit assumptions in the calculus as happened recently for the guard checking algorithm. More worryingly, blatant inconsistencies (accepted proofs of False), are reported at least once a year, due to bugs in the implementation^{1}^{1}1https://github.com/coq/coq/blob/master/dev/doc/criticalbugs. The source of these bugs falls generally under the category of programming errors and unforeseen interactions between subsystems. Coq is indeed a complex piece of software resulting from 37 years of research and development, incorporating many features in a single system. Its architecture was last revised in the V7 series (19992004) by JeanChristophe Filliâtre [18, 17], following the de Bruijn criterion. It means that Coq does have a welldelimited, trustable proofchecking kernel, so these errors are not to be attributed to bad design in general. Rather, the problem is that this ”small” kernel already comprises around 20kLoC of complex OCaml code + around 10kLoC of C code implementing a virtual machine for conversion. The system also relies on the whole OCaml compiler when used with the nativecompute tactic for fast computation/conversion. To be fair (and grateful!), one should note that we never had to blame OCaml for a miscompilation resulting in an inconsistency. In conclusion, to avoid these errors, we should rather apply program verification to Coq’s implementation.
This is a slightly different endeavor than the above works. Indeed, mechanized or not, the aforementionned proofs are concerned with idealized versions of the calculus that do not correspond to the actual implementation of PCUIC in OCaml, nor do they have any bearing on its compiled version, a priori. The MetaCoq ^{2}^{2}2https://metacoq.github.io project’s goal is to bridge the gap between the model theoretic justification of the theory and the actual implementation of the Coq kernel. To do so, we need to answer the following questions in a formal, concrete way:

What calculus is implemented by Coq exactly?

Which metatheoretical properties hold on the implementation?
To answer these questions, we develop a specification of Coq’s type theory (§2), a Coq definition of a typechecker and conversion procedure that corresponds to the current Coq implementation, and verify both the sanity of the specification and correctness and completeness of the implementation.
Plan of the article.
To verify the sanity of the specification we develop the metatheory (§3) of the PCUIC calculus and show that it enjoys type preservation (§3.2.2) and principality (§3.2.4) of types, along with the expected confluence of reduction (§3.2.3) and proof that conversion is a congruence. We can then construct a corresponding type checker (§4) that is shown correct and complete with respect to the specification. Finally, to be able to execute this typechecker on large examples, we can extract it to OCaml and compare it to the typechecker implemented in Coq. Ideally, this last step should also be verified: we present a verified erasure procedure (§5) that takes a Coq environment and definition and produces a program in an (untyped) weak callbyvalue calculus extended with a dummy constructor (). We prove that erasure preserves observations of values, starting from a welltyped closed term. In combination with the CertiCoq compiler [5] from to Clight and the CompCert [19] compiler from Clight to assembly, this will provide an endtoend verified implementation of Coq’s kernel. We discuss the remaining hurdles to realize this and further extensions in section 6.
Attribution
The results surveyed in this article are due to MetaCoq team as a whole: Abhishek Anand, Dannil Annenkov, Simon Boulier, Cyril Cohen, Yannick Forster, Meven LennonBertrand, Gregory Malecha, Jakob Botsch Nielsen, Matthieu Sozeau, Nicolas Tabareau and Théo Winterhalter.
Link to the formal development
This article is best read online, as links in the text point to the formal development definition, which are generally too large to include in the presentation.
2 Syntax and Semantics
The MetaCoq project initially started as an extension of Gregory Malecha’s TemplateCoq plugin, developed during his PhD on efficient and extensible reflection in dependent type theory [38]. TemplateCoq provided a reification of Coq’s term and environment structures in Coq itself, i.e. Coq Inductive datatype declarations that model the syntax of terms and global declarations of definitions and inductive types, as close as possible to the OCaml definitions used in Coq’s kernel. In addition, it implemented in an OCaml Coq plugin the metadefinitions of quoting and unquoting of terms between the two representations, similarly to Agda’s reflection mechanism. The syntax of terms is given in figure 1. It corresponds closely to Coq’s internal syntax, handling local variables (tRel, with a de Bruijn index), free named variables (tVar), existential variables (tEvar), sorts (tSort), the typecast construct (tCast), dependent products, lambdaabstraction, letins and nary application (tApp), application of global references: constants, inductives, and constructors, a dependent case analysis construct (tCase, see 3.5 for more details) and primitive projections (tProj), fixpoints and cofixpoints and finally, primitive integers and floatingpoint values (a recent [12] addition to Coq).
On top of the term syntax, the MetaCoq.Template library also defines the type of local contexts which are lists of typing assumptions or local definitions, and global contexts: associative lists of global reference names to constant/axiom or inductive and constructors declarations.
2.1 The Template Monad
On top of these syntactic objects, one can define an API much like Coq’s OCaml API to interact with the kernel of Coq: adding and looking up global definitions, calling the typechecker or higherlevel primitives like proofsearch for specific typeclass instances. In [45], we show how this can be organized by defining a general TemplateMonad typeclass that describes the structure of programs interacting with Coq’s kernel. We need a monadic abstraction to encapsulate the handling of the global state of the Coq system that is being manipulated: updates to the global environment but also the obligation handling machinery of Program [44]. An interpreter for actions in this monad (e.g. adding a new definition with a given name and proof term) is metaprogrammed in OCaml, using continuations to handle the interactive aspect of execution and maintaining coherence of the system state, similarly to the proof engine monad of the MTac2 [30] tactic language.
Using this monad, one can metaprogram plugins in Coq that take input from the user, ask for user to provide definitions or proofs of particular types and update the environment from computed definitions. This can be used to define userextensible translations [45, §4], or to derive lenses for a record definition [45, §5]. The TemplateMonad has a variant that also allows extraction of the monadic programs so they can be run efficiently in OCaml, putting it on par with the development of plugins for Coq directly in OCaml. MetaCoq plugins deriving parametricity theorems [45, §4.3.1] and induction principles and subterm relations from inductive definitions [13] can be defined this way, opening the possibility to verify their implementations. For parametricity for example, one can show that there is a procedure to construct from any welltyped term from Coq a proof that the parametricity predicate derived from its type holds on the term. As shown by Pédrot and Tabareau in [41], this can in turn be used to build internal syntactic models of richer, effectful type theories.
In the rest of this article, we will review how we built certified foundations needed for such efforts, that is the typing specification and the type inference algorithm used to check welltypedness.
2.2 Typing, Reduction and Conversion
The calculus at the basis of Coq is the Polymorphic Cumulative Calculus of (Co)Inductive Constructions (Pcuic). Pcuic
is a general dependentlytyped programming language, with patternmatching and (co)recursive definitions, universe polymorphic global declarations (constants or inductive types). Its origin is the Calculus of Constructions of Coquand and Huet
[16] with conversion, extended by the (Co)Inductive type definition scheme of PaulinMohring [39], guarded (co)fixpoint definitions [20], universe polymorphism [47] and cumulative inductive types [54]. The latest additions to the core calculus are a definitionally proofirrelevant sort SProp and the addition of primitive types [7]. While they are supported in the syntax, they are not yet supported by our specification.The sort system includes an infinite hierarchy of predicative universes Type@{i} and an impredicative sort Prop. We consider Set to be a synonym for Type@{0}, hence its interpretation is always predicative^{3}^{3}3Coq supports an impredicativeset flag to switch to an impredicative interpretation, but it is seldom used today. A specificity of Coq is the treatment of Prop and in particular the singleton elimination rule that allows to eliminate propositional content into computational content, if it is trivial (a proof of absurdity, an equality, conjunction or accessibility proof): we will see in the section on erasure (5) how that is justified from the computational standpoint.
2.2.1 Conversion, Cumulativity
In dependent type theory, conversion and typing are usually intertwined, through the conversion rule which states that a term of type can be seen as a term of type for any (wellformed) typed convertible to . Pcuic is presented in the style of Pure Type Systems, where this conversion relation is untyped and can be defined independently on raw terms as the reflexive, symmetric and transitive closure of onestep reduction. We hence first define a reduction relation as an inductive predicate that includes all reduction rules of Pcuic: for application, ι for cases, for letins, fix and cofix, delta for constants and congruence rules allowing to apply reductions under any context (under lambdas, dependent products, etc). We take its closure with an additional twist: rather than a strict reflexivity rule, we define an equivalence relation that ignores the name annotations of binders (they are included in the core syntax for printing purposes only). Moreover, this relation is parameterized by a relation on universes to implement syntactic cumulativity of universes and inductive types. Indeed in Coq’s theory we have:
A similar rule applies to cumulative inductive types. We call this relation cumulativity when instantiated with the large inequality of universes, in which case it is only a preorder. In case we instantiate it with equality of universes, we recover the usual conversion relation, which is an equivalence.
Two terms are hence in the cumulativity relation if they can be linked by reductions or expansions upto cumulativity.
2.2.2 Typing
The typing relation of Pcuic is a fairly standard inductively defined relation that mostly corresponds to usual ”on paper” treatments (e.g. Coq’s reference manual [52]):
The typing judgement is a predicate taking as parameters the global context extended with a universe declaration, a local context of assumptions and two terms corresponding to the subject and type of the typing rules. Derivations are defined in the Type sort to allow for easy inductions on the size of derivations. The typing rules are explained in detail in [45, §2] and [46][§2.2]. The typing rules are syntaxdirected, i.e. there is one rule per head constructor of term, except for the cumulativity rule which can apply anywhere in a derivation. Note that we use a standard de Bruijn encoding for local variables, along with lifting and parallel substitution operations. As can be seen from the definition of figure 1, we used a nested datatype definition (list term, list (branch term)), hence some care must be taken to define recursive definitions and userfriendly induction principles on terms and derivations, lifting a predicate on terms to lists in the appropriate way. This is done by defining first a size measure on terms and then using wellfounded induction on sizes of terms and derivations to derive easy to use induction principles.
Global environments
Typing extends straightforwardly to local and global contexts. We formalize in particular which invariants should hold on the definition of inductive types, including strict positivity and the invariants enjoyed by cumulative inductive types. This is one point where we depart from the penandpaper treatments: indeed in [54], the theory that is studied is an idealization of Coq’s implementation where equality is a judgement and inductive declarations do not carry parameters. In contrast, the implementation cannot rely on typing conditions to decide the cumulativity judgement and subtyping is rather defined on two different instances of the same inductive type, e..g list@{Set} nat and list@{i} nat. We hence had to retroengineer, from Coq’s OCaml code, a proper treatment of cumulative inductive types. We’ll see in section 3.5 that this was a nontrivial endeavor.
2.3 Translation from Template to PCUIC
The tApp constructor represents nary application of a term to a list of arguments . This follows rather closely Coq’s implementation, where the application node takes an array of arguments, for an even more compact representation of applications. Immediate access to the head of applications is an important optimization in practice, but this representation, while memoryefficient, imposes a hidden invariant on the term representation: the term should not itself be an application, and the list of arguments should also always be nonempty. The application typing rule is likewise complicated as we have to consider application to a spine of arguments, rather than a single argument.
In Coq’s kernel, this is handled by making the term type abstract and using smart constructors to enforce these invariants. Replicating this in Coq is tedious, as we have to either:

work everywhere with an abstract/subset type of terms, precluding the use of primitive fixpoint and caseanalysis

or work with the raw syntax and add wellformedness preconditions everywhere
Our solution is to interface with Coq using the raw Template term syntax, keeping close to the implementation. To avoid dealing with wellformedness assumptions, we define a translation from this syntax to the Pcuic term syntax where application is a binary constructor. We define similar reduction and typing judgments on the simplified syntax and show the equivalence of the two systems Template and Pcuic. This crucially relies on wellfounded induction on the size of derivations to ”reassociate” between binary applications and nary ones. The metatheory hereafter is developed on the more prooffriendly Pcuic syntax, but its theorems also apply to the original system. We simplify the Pcuic syntax further by removing the tCast constructor and translating it by an application of the identity function: this is observationally equivalent. The cast syntax in Coq is solely used to mark a place where a specific conversion algorithm should be used to convert the inferred type of a term with a specified expected type. This is used to call vm_compoute or native_compute to perform the cumulativity check, rather than Coq’s standard ”callbyneed” conversion algorithm. As we do not formalize these fast conversion checks, this is not a loss. Note also that this has no bearing on the typeability of the term, in theory. Only in practice performing conversion with the default algorithm might be infeasible.
3 Metatheory
Now armed with the definition of typing and reduction in Pcuic, we can start proving the usual metatheoretical results of dependent type theories. We first derive the theory of our binding structures: the usual lemmas about de Bruijn operations of lifting and substitution are proven easily.
3.1 Digression on binding
Unfortunately, at the time we started the project (circa 2017), the Autosubst framework [42] could not be used to automatically derive this theory for us, due to the use of nested lists. We however learned from their work [48] and developed the more expressive σcalculus, defined from operations of renaming (for a renaming ) and instantiation (for a function ), which provide a more prooffriendly interface to reason on the de Bruijn representation. We show that Coq’s kernel functions of lifting and substitution (carrying just the number of crossed binders) are simulated with specific renaming and instantiation operations. Both variants are still of interest: it would be out of the question to use the σcalculus operations which build closures while traversing terms in Coq’s implementation. However the structured nature of the calculus and its amenability to automation, having a decidable equational theory [48], is a clear advantage in practice.
One example where this shines is the treatment of (dependent) letbindings in the calculus. Dependent letbindings are an entirely different beast than MLlike letbindings which can be simulated with abstraction and application. In particular, three rules of reduction can apply on local definitions:
Here represents the shifting of indices of the free variables of by , and the usual captureavoiding substitution. The first rule is usual letreduction, the second is a congruence rule allowing to reduce under a let and the last allows to expand a local definition from the context. In the course of the metatheoretical development we must prove lemmas that allow to ”squeeze” or smash the letbindings in a context. This results in a reduced context with no letbindings anymore and a corresponding substitution that mixes the properly substituted bodies corresponding to the removed letbindings and regular variables, to apply to terms in the original context. This involves interchanging , and congletbody rules combined with the proper liftings and substitutions. This kind of reasoning appears in particular as soon as we want to invert an application judgment as letbindings can appear anywhere in the type of the functional argument. Using calculus reasoning and building the right abstractions tremendously helped simplify the proofs which would otherwise easily become indecipherable algebraic rewritings with the low level indices in liftings and substitutions.
3.2 Properties
3.2.1 Structural Properties
The usual Weakening and Substitution theorems can be proven straightforwardly by induction on the derivations. We use a beefedup eliminator that automatically lifts the typing property we want to prove to wellformedness of the global environment, which also contains typing derivations. Likewise, we always prove properties simulataneously on wellformed local contexts and type derivations, so our theorems provide a conjunction of properties. When we moved to the σcalculus represenation, we factorized these proofs by first stating renaming and instantiation lemmas, from which weakening and substitution follow as corrolaries. We also verify that typing is invariant by alphaconversion, so name annotations on binders are indeed irrelevant.
3.2.2 Type Preservation
Proving subject reduction (a.k.a. type preservation) for dependent type theories can be rather difficult in a setting where definitional equality is typed, as it usually requires a logical relation argument/model construction, see e.g. [2]. However, the syntactic theory is relatively wellunderstood for PTS: one can first independently prove context conversion/cumulativity and injectivity of types (i.e. ), to prove type preservation in the application case. Similarly, we have injectivity of inductive type applications, upto the cumulativity relation on universes.
However, two other difficulties arise for Pcuic. First, we are considering a realistic type theory, will fullblown inductive family declarations, with a distinction between parameters and indices (that can contain letbindings), and cumulative, universe polymorphic inductive types. To our knowledge, nobody ever attempted to formalize the proof at this level of detail before, even on paper. There is a good reason for that: the level of complexity is very high. Showing that the dependent case analysis reduction rule is sound mixes features such as letbindings, the de Bruijn representation of binding adding various liftings and substitutions and the usual indigestible nesting of quantifications on indices in the typing rules of inductive types, constructors and branches. This is also ample reason to verify the code: many critical bugs a propos letbinding and inductive types were reported over the years. To tame this complexity, we tried to modularize the proof and provide the most abstract inversion lemmas on typing of inductive values and match constructs.
Second, Coq’s theory is known to be broken regarding positive coinductive types and subject reduction. We hence parameterize the proof: subject reduction holds only on judgments where no dependent case analysis is performed on coinductive types. Negative coinductive types implemented with primitive projections however can be show to enjoy subject reduction without restriction.
3.2.3 Confluence
To support inversion lemmas such as type injectivity, we need to show that reduction is confluent. From this proof, it follows that the abstract, undirected conversion relation is equivalent to reduction of the two sides to terms and that are in the syntactic αcumulativity relation. We extend here Takahashi’s refinement [50] of Tait/MartinLöf’s seminal proof of confluence for calculus. The basic idea of the confluence proof is to consider parallel reduction instead of onestep reduction and prove the following triangle lemma:
The function here is an ”optimal” reduction function that reduces simultaneously all parallel redexes in a term or context. The fact that we need to consider contexts is due to letbindings again: in one step of , we might reduce a local definition to an abstraction, expand it in function position of an application and reduce the produced betaredex. Using the triangle property twice, it is trivial to derive the diamond lemma and hence confluence for parallel reduction. By inclusion of one step reduction in parallel reduction, and parallel reduction in the transitive closure of onestep reduction (the ”squashing” lemma), reduction is also confluent. This last part of reasoning is done abstractly, but accounting for the generalization to reduction in pairs of a context and a term. It then suffices to show commutation lemmas for reduction and cumulativity to show the equivalence of reduction upto cumulativity and the undirected cumulativity relation. Confluence is crucial to show transitivity of the directed version.
Using this characterization of cumulativity, we can easily show that it is a congruence and that it enjoys expected inversion lemmas: if two types are convertible, then they both reduce to types that are in the cumulativity relation, so their domains are convertible and their codomains are in the cumulativity relation. Similarly, types cannot be convertible to sorts or inductive types: there is no confusion between distinct type constructors.
3.2.4 Principality
As Pcuic is based on the notion of cumulativity rather than mere conversion, type uniqueness does not hold in the system. However, the refined property of principality does: for any context and welltyped term , there is a unique type such that any other typing of , we have . This property is essential for deciding typechecking: to check , it suffices to infer the principal type of and check cumulativity. Principal typing is also used by the erasure procedure to take sound decisions based on the principal type of a given term.
3.3 Strengthening
The last expected structural property of the system is strengthening (a.k.a. thinning), which can be stated as:
This property ensures that typing is not influenced by unused variables, and is at the basis of the clear tactic of Coq, a quite essential building block in proof scripts. Unfortunately, this property cannot be proven by a simple induction on the typing derivation: the freefloating conversion rule allows to go through types mentioning the variables to clear, even if they do not appear in the term and type in the conclusion of the whole derivation.
3.4 Bidirectional Type Checking To The Rescue
This unfortunate situation can be resolved by reifying the principality property and type checking strategy as a bidirectional typing inductive definition. This variant of typing explicitly keeps track of the information flow in typing rules. In practice it separates the syntaxdirected rules in a synthesis (a.k.a. inference) judgment (the term position is an input, while the type is an output) from the nondirected ones as checking rules (both positions are input). In [33], Meven LennonBertrand develops a bidirectional variant for Pcuic, show equivalent to the orignal Pcuic, in which strengthening and principality become trivial to derive.
The crux of these argument is that bidrectional typing derivations are ”canonical” for a given term, and move uses of the conversion rule to the top of the derivation, where they are strictly necessary. In addition, multiple cumulativity rules get ”compressed” into a single changeofphase rule, relying on transitivity of cumulativity. In a bidirectional synthesis derivation, if a variable does not appear in the term position, then it cannot appear in the inferred type. Simultaneously, in a checking derivation, if a variable does not appear in the term and type, then it cannot appear in these positions in any of the subderivations.
3.5 Case In Point
This detour through bidirectional typechecking is not accidental. In [46], we only proved the soundness of a typechecking algorithm for Pcuic (§4). It is in the course of formalizing the completeness of the typechecker (§4) that we discovered a problem in the typing rules of Coq. The problem appears in the dependent case analysis construct match. The gist of the typing rule was to typecheck the scrutinee at some unspecified inductive type , where is a universe instance, the parameters and the indices of the inductive family. The match construct also takes an elimination predicate, expected to be of type:
Looking at this type, we would extract the universe instance and parameters of the inductive X assumption. The typing rule checked that the universe instance of the scrutinee was convertible to , rather than only in the cumulativity relation according to the subtyping rules for cumulative inductive types. It also compared the parameters and for convertibility, by first lifting in a context extended with the bindings, but these two instances did not necessarily live in the same type!
These mistakes lead to a loss of subject reduction, if cumulativity is used to lower the universes of the scrutinee, making the whole patternmatching untypeable^{4}^{4}4https://github.com/coq/coq/issues/13495. The problem appeared immediately while trying to prove completeness of typechecking, at the stage of designing the bidirectional typing rules: the flow of information was unclear and led us to the bug. We also realized that, to justify the comparison of the parameters, we would need to verify that and apply strengthening, which as we have just seen is not directly provable on undirected typing rules. This motivated us to push for a change in the term representation of match in Coq ^{5}^{5}5CEP 34 by Hugo Herbelin, Coq PR 13563 by PierreMarie Pédrot, integrated in Coq 8.14 that solves both problems at once, by storing at the match node the universe instance and parameters that define the eliminator, and doing a sound cumulativity test of the infered type of the scrutinee and the (reconstructed) expected type of the eliminator. We are currently finishing to update the whole MetaCoq development to handle this change of representation^{6}^{6}6https://github.com/MetaCoq/metacoq/pull/534.
4 A TypeChecker for PCUIC
4.1 Cumulativity
In [46, §3], we present a sound typechecker for Pcuic. To implement typechecking, we had to first develop a reasonably efficient reduction machine and algorithms to decide cumulativity. There are three separate algorithms at play to implement the cumulativity test.
Universes
in Coq are floating variables subject to constraints [26], not directly natural numbers. To ensure consistency, one hence needs to verify that the constraints always have a valuation in the natural numbers. This boils down to deciding the (in)equational theory of the tropical algebra . We develop a longestsimplepaths algorithm to check consistency of universe constraints: the valuation of each variable can be read off as the weight of its longest simple path from the source (). This is a naïve model and implementation of the stateoftheart algorithm implemented in Coq, which derives from an incremental cycle detection algorithm [11] and whose formal verification is a workinprogress [23]. Our specification is more expressive than Coq’s current implementation, as it is able to handle arbitrary constraints between universe expressions, avoiding to distinguish socalled algebraic universes and implement costly universe refreshing operations when building terms. We hope to integrate this generalization back in Coq’s implementation. Using this consistency check, it is easy to define cumulativity by structural recursion on terms.
Reduction
We implement a weakhead reduction stack machine that can efficiently find the weakhead normal form of a term. To define this function, we must assume an axiom of strong normalization, which implies that reduction is wellfounded on welltyped terms. This is the only axiom used in the development.
Conversion
Coq
uses a smart, mostly callbyname, conversion algorithm, that uses performancecritical heuristics to decide which constants to unfold and when.
Coq’s algorithm does not naïvely reduce both terms to normal form to compare them, but rather incrementally reduces them to whnfs (without δreduction), compare their heads and recursively calls itself. When faced with the same defined constant on both sides, it first tries to unify their arguments before resorting to unfolding, resulting in faster successes but also potentially costly backtracking.The main difficulty in the development of the conversion algorithm is that its termination and correctness are intertwined, so it is developed as a dependentlytyped program that takes welltyped terms as arguments (ensuring termination of recursive calls assuming SN) and returns a proof of their convertibility (or nonconvertibility). In other words it is proven sound and complete by construction. The design of the termination measure also involves a delicate construction of a dependent lexicographic ordering on terms in a stack due to Théo Winterhalter [56].
4.2 Type Checking
On top of conversion, implementing a type inference algorithm is straightforward: it is a simple structurally recursive function that takes wellformed global and local contexts and a raw term. It simply checks if the rule determined by the head of the term can apply. Figure 3 shows the type and beginning of the inference algorithm.
Again, the function is strongly typed: its result lives in the typingresult monad, which is an exception monad, returning (ret) a sigmatype of an inferred type and a ”squashed” proof that the term has this type or failing with type error (raise). As all our derivations are in by default, we use explicit squashing into when writing programs manipulating terms:
The elimination rules for propositional inductives ensures that our programs cannot make computational choices based on the shape of the squashed derivations, and that extraction will remove these arguments. The extracted version of infer hence only takes a context (assumed to be wellformed) and a term and returns an error or an infered type, just like in Coq’s implementation.
Using the bidirectional presentation of the system, we can simplify the correctness and completeness proof in [46] as the algorithm really follows the same structure as bidirectional derivations. Typechecking is simply defined as type inference followed by a conversion test, as usual.
4.3 Verifying Global Environments
Once the typechecker for terms is defined, we can lift it to verify global environment declarations. For constants and axioms, this is straightforward. However, declarations of inductive types are more complex and require to first define a sound context cumulativity test, a strict positivity check and to turn the universe constraints into a graph structure. This is again done using a monad EnvCheck:
Given a global environment , this produces either an error or a pair of a graph and a proof that the universe graph models the constraints in and a (squashed) proof that the environment is wellformed.
5 Erasure from PCUIC to λcalculus
The typechecker can be extracted to OCaml and run on reasonably large programs. For example it can be used to successfully check the prelude of the HoTT library [10], including a large proof that adjoint equivalences can be promoted to homotopy equivalences. However, our first attempt to extraction was unsuccessful: we had to change the Coq definitions so that OCaml could typecheck the generated code, as we hit a limitation of the extraction mechanism in presence of dependentlytyped variadic functions. The obvious next step was hence to verify the erasure procedure itself!
In [46, §4], we present a sound erasure procedure from Pcuic to untyped, callbyvalue calculus. This corresponds to the first part of Coq’s extraction mechanism [35], which additionally tries to maintain simple types corresponding to the original program. Erausre is performed by a single traversal of the term, expected to be welltyped. It checks if the given subterm is a type (its type is a sort or ) or if it is a proof of a proposition (its type has sort ), in which case it returns a dummy term, and otherwise proceeds recursively, copying the structure of the original term. The result of erasure hence contains no type information anymore, and all propositional content is replaced with .
We can prove the following correctness statement on this procedure:
Our correctness theorem shows that if we have a welltyped term and erases to , then if reduces to a value using weakcbv reduction, then the erased term also reduces to an observationally equivalent value . The extraction_pre precondition enforces that the environment is wellformed. The proof follows Letouzey’s original penandpaper proof closely [36]. Since [46], we proved two additional verified passes of optimization on the erased term and environment:

We remove from the definitions that are not used for evaluation, pruning the environment from useless declarations that are no longer needed for the computation.

We remove dummy patternmatchings on terms, that should always trivially reduce to their single branch.
The end result of erasure is an untyped term that contains only the raw computational content of the original definition. It can be further compiled with the CertiCoq compiler and CompCert to produce certified assembly code from Coq definitions.
6 Going further
We have presented the whole MetaCoq project, which spans from the reification of Coq terms down to the erasure of welltyped terms to untyped calculs. The whole project weights about 100kLoC of OCaml and Coq code, and is still under active development. We think this is a convincing example that we can move from a Trusted Code Base consisting of Coq’s unverified kernel down to a Trusted Theory Base that consists of the formalized typing rules of Pcuic and its axiom of Strong Normalization.
The MetaCoq (and CertiCoq) projects are both ongoing work subject to limitations, we summarize here the current state of affairs.
6.1 Limitations
While Pcuic models a large part of Coq’s implementation, it still misses a few impotant features of the theory:

The conversion rule is not supported in our formalization, preventing us to check most of the standard library. Dealing with rules in an untyped conversion setting is a notoriously hard issue. We are however hopeful that we found a solution to this problem by quotienting definitional equality with reduction, and hope to present this result soon.

Similarly, we do not handle the new SProp sort of Coq. Our methodology for conversion should however also apply for this case.

We do not formalize yet the guardchecking of fixpoint and cofixpoint definitions, relying instead on oracles. Our strong normalization assumption hence includes an assumption of correctness of the guard checkers. We are currently working on integrating a definition of the guard checking algorithm and verifying its basic metatheory (invariance by renaming, substitution, etc.).

We did not consider the module system of Coq, which is mostly orthogonal to the core typing algorithm but represents a significant share of Coq’s kernel implementation, we leave this to future work.

We leave out the socalled ”template”polymorphism feature of Coq, which is a somewhat fragile (i.e. prone to bugs), nonmodular alternative to cumulative inductive types. This prevents us from checking most of the Coq standard library today as it makes intensive use of this feature. We are working with the Coq development team to move the standard library to universe polymorphism to sidestep this issue.
6.2 Conclusion and Perspectives
There are many directions in which we consider to extend the project:

On the specification side we would like to link our typing judgment to the ”Coq en Coq” formalization of Barras [9], which provides the Strong Normalization proof we are assuming, for a slightly different variant of the calculus. This formalization is based on a sizedtyping discipling for inductive types, which will require to show an equivalence with Pcuic’s guardness checker, or an update of Pcuic to handle sized typing altogether.

Proving that the theory is equivalent to a variant where conversion is typed, i.e. definitional equality is a judgment would also make our theory closer to categorical models of type theory, e.g., Categories with Families. This can build on recent results in this direction by Siles and Herbelin [43], updating them to handle cumulativity.

We have concentrated our verification efforts on the core typechecking algorithm of Coq, but higherlevel components like unification, elaboration and the proof engine would also benefit from formal treatment. We hope to tackle these components in the future.

Finally, on the user side, we are still at the beginning of the exploration of the metaprogramming features of MetaCoq. It could be used to justify for example the foundations of the MTac 2 language [30], to turn the typed tactic language into a definitional extension of Coq’s theory.
7 Bibliography
References
 [1]
 [2] Andreas Abel, Joakim Öhman & Andrea Vezzosi (2018): Decidability of conversion for type theory in type theory. PACMPL 2(POPL), pp. 23:1–23:29, doi:http://dx.doi.org/10.1145/3158111.
 [3] Behzad Akbarpour, Amr T. AbdelHamid, Sofiène Tahar & John Harrison (2010): Verifying a Synthesized Implementation of IEEE754 FloatingPoint Exponential Function using HOL. Comput. J. 53(4), pp. 465–488, doi:http://dx.doi.org/10.1093/comjnl/bxp023.
 [4] Thorsten Altenkirch (1993): Constructions, Inductive Types and Strong Normalization. Ph.D. thesis, University of Edinburgh.
 [5] Abhishek Anand, Andrew Appel, Greg Morrisett, Zoe Paraskevopoulou, Randy Pollack, Olivier Savary Belanger, Matthieu Sozeau & Matthew Weaver (2017): CertiCoq: A verified compiler for Coq. In: CoqPL, Paris, France.
 [6] Andrew W. Appel (2014): Program Logics  for Certified Compilers. Cambridge University Press, doi:http://dx.doi.org/10.1017/CBO9781107256552.
 [7] Michaël Armand, Benjamin Grégoire, Arnaud Spiwack & Laurent Théry (2010): Extending Coq with Imperative Features and Its Application to SAT Verification. In Matt Kaufmann & Lawrence C. Paulson, editors: Interactive Theorem Proving, Springer, pp. 83–98, doi:http://dx.doi.org/10.1016/j.jal.2007.07.003.
 [8] Bruno Barras (1999): Autovalidation d’un système de preuves avec familles inductives. Thèse de doctorat, Université Paris 7.
 [9] Bruno Barras (2012): Semantical Investigations in Intuitionistic Set Theory and Type Theories with Inductive Families. Unpublished.
 [10] Andrej Bauer, Jason Gross, Peter LeFanu Lumsdaine, Michael Shulman, Matthieu Sozeau & Bas Spitters (2017): The HoTT library: a formalization of homotopy type theory in Coq. In Yves Bertot & Viktor Vafeiadis, editors: Proceedings of the 6th ACM SIGPLAN Conference on Certified Programs and Proofs, CPP 2017, Paris, France, January 1617, 2017, ACM, pp. 164–172, doi:http://dx.doi.org/10.1145/3018610.3018615.
 [11] Michael A. Bender, Jeremy T. Fineman, Seth Gilbert & Robert E. Tarjan (2016): A New Approach to Incremental Cycle Detection and Related Problems. ACM Trans. Algorithms 12(2), pp. 14:1–14:22, doi:http://dx.doi.org/10.1145/2756553.
 [12] Guillaume Bertholon, Érik MartinDorel & Pierre Roux (2019): Primitive Floats in Coq. In John Harrison, John O’Leary & Andrew Tolmach, editors: 10th International Conference on Interactive Theorem Proving, ITP 2019, September 912, 2019, Portland, OR, USA, LIPIcs 141, Schloss Dagstuhl  LeibnizZentrum für Informatik, pp. 7:1–7:20, doi:http://dx.doi.org/10.4230/LIPIcs.ITP.2019.7.
 [13] Marcel Ullrich Bohdan Liesnikov & Yannick Forster (2020): Generating induction principles and subterm relations for inductive types using MetaCoq. The Coq Workshop 2020.
 [14] James Chapman (2009): Type Theory Should Eat Itself. Electron. Notes Theor. Comput. Sci. 228, pp. 21–36, doi:http://dx.doi.org/10.1016/j.entcs.2008.12.114.
 [15] Adam Chlipala (2020): Proof assistants at the hardwaresoftware interface (invited talk). In Jasmin Blanchette & Catalin Hritcu, editors: CPP 2020, ACM, p. 2, doi:http://dx.doi.org/10.1145/3372885.3378575.
 [16] Thierry Coquand & Gérard Huet (1988): The Calculus of Constructions. Information and Computation 76(2–3), pp. 95–120, doi:http://dx.doi.org/10.1016/08905401(88)900053.
 [17] JeanChristophe Filliâtre (2000): Design of a proof assistant: Coq version 7. Research Report, Université ParisSud.
 [18] JeanChristophe Filliâtre (2020): A Coq retrospective, at the heart of Coq architecture, the genesis of version 7.0. Invited talk at the Coq Workshop 2020.
 [19] Gallium, Marelle, CEDRIC & PPS (2008): The CompCert project. Compilers You Can Formally Trust.
 [20] Eduardo Giménez (1998): Structural Recursive Definitions in Type Theory. In Kim Guldstrand Larsen, Sven Skyum & Glynn Winskel, editors: ICALP, LNCS 1443, Springer, pp. 397–408.

[21]
Georges Gonthier,
Andrea Asperti,
Jeremy Avigad,
Yves Bertot,
Cyril Cohen,
François Garillot,
Stéphane Le Roux,
Assia Mahboubi,
Russell O’Connor,
Sidi Ould Biha,
Ioana Pasca,
Laurence Rideau,
Alexey Solovyev,
Enrico Tassi &
Laurent Théry
(2013):
A MachineChecked Proof of the Odd Order Theorem
. In Sandrine Blazy, Christine PaulinMohring & David Pichardie, editors: ITP 2013, LNCS 7998, Springer, pp. 163–179, doi:http://dx.doi.org/10.1007/9783642396342_14.  [22] Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan (Newman) Wu, Jieung Kim, Vilhelm Sjöberg & David Costanzo (2016): CertiKOS: An Extensible Architecture for Building Certified Concurrent OS Kernels. In Kimberly Keeton & Timothy Roscoe, editors: 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 24, 2016, USENIX Association, pp. 653–669, doi:http://dx.doi.org/10.5555/3026877.3026928.
 [23] Armaël Guéneau, JacquesHenri Jourdan, Arthur Charguéraud & François Pottier (2019): Formal Proof and Analysis of an Incremental Cycle Detection Algorithm. In: ITP 2019  10th Conference on Interactive Theorem Proving, Portland, United States.
 [24] Thomas C. Hales, Mark Adams, Gertrud Bauer, Dat Tat Dang, John Harrison, Truong Le Hoang, Cezary Kaliszyk, Victor Magron, Sean McLaughlin, Thang Tat Nguyen, Truong Quang Nguyen, Tobias Nipkow, Steven Obua, Joseph Pleso, Jason M. Rute, Alexey Solovyev, An Hoai Thi Ta, Trung Nam Tran, Diep Thi Trieu, Josef Urban, Ky Khac Vu & Roland Zumkeller (2015): A formal proof of the Kepler conjecture. CoRR abs/1501.02155.
 [25] John Harrison (2006): Towards selfverification of HOL Light. In Ulrich Furbach & Natarajan Shankar, editors: Proceedings of the third International Joint Conference, IJCAR 2006, LNCS 4130, SpringerVerlag, Seattle, WA, pp. 177–191.
 [26] Hugo Herbelin (2005): Type Inference with Algebraic Universes in the Calculus of Inductive Constructions. Manuscript.

[27]
Simon Huber (2019):
Canonicity for Cubical Type Theory.
Journal of Automated Reasoning
63(2), pp. 173–210, doi:http://dx.doi.org/10.1007/s1081701894691.  [28] Guilhem Jaber, Gabriel Lewertowski, PierreMarie Pédrot, Matthieu Sozeau & Nicolas Tabareau (2016): The Definitional Side of the Forcing. In Martin Grohe, Eric Koskinen & Natarajan Shankar, editors: LICS ’16, ACM, pp. 367–376, doi:http://dx.doi.org/10.1145/2933575.2935320.
 [29] Ralf Jung, JacquesHenri Jourdan, Robbert Krebbers & Derek Dreyer (2021): Safe systems programming in Rust. Commun. ACM 64(4), pp. 144–152, doi:http://dx.doi.org/10.1145/3418295.
 [30] JanOliver Kaiser, Beta Ziliani, Robbert Krebbers, Yann RégisGianas & Derek Dreyer (2018): Mtac2: typed tactics for backward reasoning in Coq. PACMPL 2(ICFP), pp. 78:1–78:31, doi:http://dx.doi.org/10.1145/3236773.
 [31] Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch & Simon Winwood (2009): seL4: formal verification of an OS kernel. In Jeanna Neefe Matthews & Thomas E. Anderson, editors: SOSP, ACM, pp. 207–220, doi:http://dx.doi.org/10.1145/1629575.1629596.
 [32] Ramana Kumar, Rob Arthan, Magnus O. Myreen & Scott Owens (2016): SelfFormalisation of HigherOrder Logic  Semantics, Soundness, and a Verified Implementation. J. Autom. Reason. 56(3), pp. 221–259, doi:http://dx.doi.org/10.1007/s108170159357x.
 [33] Meven LennonBertrand (2021): Complete Bidirectional Typing for the Calculus of Inductive Constructions. In Liron Cohen & Cezary Kaliszyk, editors: ITP 2021, LIPIcs 193, Schloss Dagstuhl  LeibnizZentrum für Informatik, pp. 24:1–24:19, doi:http://dx.doi.org/10.4230/LIPIcs.ITP.2021.24.
 [34] Xavier Leroy (2006): Formal certification of a compiler backend, or: programming a compiler with a proof assistant. In: 33rd symposium Principles of Programming Languages, ACM Press, pp. 42–54.
 [35] Pierre Letouzey (2002): A New Extraction for Coq. In Herman Geuvers & Freek Wiedijk, editors: TYPES’02, LNCS 2646, Springer, pp. 200–219.
 [36] Pierre Letouzey (2004): Programmation fonctionnelle certifiée: l’extraction de programmes dans l’assistant Coq. Thèse de doctorat, Université ParisSud.
 [37] Andreas Lööw, Ramana Kumar, Yong Kiam Tan, Magnus O. Myreen, Michael Norrish, Oskar Abrahamsson & Anthony Fox (2019): Verified Compilation on a Verified Processor. In: PLDI 2019, ACM, New York, NY, USA, pp. 1041–1053, doi:http://dx.doi.org/10.1145/3314221.3314622.
 [38] Gregory Michael Malecha (2014): Extensible Proof Engineering in Intensional Type Theory. Ph.D. thesis, Harvard University.
 [39] Christine PaulinMohring (1993): Inductive Definitions in the System Coq  Rules and Properties. In Marc Bezem & Jan Friso Groote, editors: Typed Lambda Calculi and Applications, doi:http://dx.doi.org/10.1007/BFb0037116.
 [40] PierreMarie Pédrot & Nicolas Tabareau (2017): An effectful way to eliminate addiction to dependence. In: LICS 2017, IEEE Computer Society, pp. 1–12, doi:http://dx.doi.org/10.1109/LICS.2017.8005113.
 [41] PierreMarie Pédrot & Nicolas Tabareau (2020): The fire triangle: how to mix substitution, dependent elimination, and effects. Proc. ACM Program. Lang. 4(POPL), pp. 58:1–58:28, doi:http://dx.doi.org/10.1145/3371126.
 [42] Steven Schäfer, Tobias Tebbi & Gert Smolka (2015): Autosubst: Reasoning with de Bruijn Terms and Parallel Substitutions. In Christian Urban & Xingyuan Zhang, editors: ITP 2015, LNCS 9236, Springer, pp. 359–374, doi:http://dx.doi.org/10.1007/9783319221021_24.
 [43] Vincent Siles & Hugo Herbelin (2012): Pure Type System conversion is always typable. J. Funct. Program. 22(2), pp. 153–180, doi:http://dx.doi.org/10.1017/S0956796812000044.
 [44] Matthieu Sozeau (2007): Subset Coercions in Coq. In Thorsten Altenkirch & Conor McBride, editors: TYPES’06, LNCS 4502, Springer, pp. 237–252, doi:http://dx.doi.org/10.1007/9783540744641˙16.
 [45] Matthieu Sozeau, Abhishek Anand, Simon Boulier, Cyril Cohen, Yannick Forster, Fabian Kunze, Gregory Malecha, Nicolas Tabareau & Théo Winterhalter (2020): The MetaCoq Project. Journal of Automated Reasoning 64(5), pp. 947–999, doi:http://dx.doi.org/10.1007/s10817019095400.
 [46] Matthieu Sozeau, Simon Boulier, Yannick Forster, Nicolas Tabareau & Théo Winterhalter (2020): Coq Coq Correct! Verifying Typechecking and Erasure for Coq, in Coq. Proceedings of the ACM on Programming Languages 4(POPL), doi:http://dx.doi.org/10.1145/3371076.
 [47] Matthieu Sozeau & Nicolas Tabareau (2014): Universe Polymorphism in Coq. In Gerwin Klein & Ruben Gamboa, editors: ITP 2014, LNCS 8558, Springer, pp. 499–514, doi:http://dx.doi.org/10.1007/9783319089706˙32.

[48]
Kathrin Stark,
Steven Schäfer &
Jonas Kaiser
(2019):
Autosubst 2: reasoning with multisorted de Bruijn terms and vector substitutions
. In Assia Mahboubi & Magnus O. Myreen, editors: CPP 2019, ACM, pp. 166–180, doi:http://dx.doi.org/10.1145/3293880.3294101.  [49] Jonathan Sterling & Carlo Angiuli (2021): Normalization for Cubical Type Theory. CoRR abs/2101.11479.
 [50] Masako Takahashi (1995): Parallel Reductions in lambdaCalculus. Inf. Comput. 118(1), pp. 120–127, doi:http://dx.doi.org/10.1006/inco.1995.1057.
 [51] Yong Kiam Tan, Magnus O. Myreen, Ramana Kumar, Anthony C. J. Fox, Scott Owens & Michael Norrish (2019): The verified CakeML compiler backend. J. Funct. Program. 29, p. e2, doi:http://dx.doi.org/10.1017/S0956796818000229.
 [52] The Coq Development Team (2021): The Coq Proof Assistant, doi:http://dx.doi.org/10.5281/zenodo.4501022.
 [53] Amin Timany & Matthieu Sozeau (2017): Consistency of the Predicative Calculus of Cumulative Inductive Constructions (pCuIC). Research Report RR9105, KU Leuven, Belgium ; Inria Paris.
 [54] Amin Timany & Matthieu Sozeau (2018): Cumulative Inductive Types In Coq. In Hélène Kirchner, editor: FSCD, LIPIcs 108, pp. 29:1–29:16, doi:http://dx.doi.org/10.4230/LIPIcs.FSCD.2018.29.
 [55] Benjamin Werner (1997): Sets in types, types in sets. In Martín Abadi & Takayasu Ito, editors: Theoretical Aspects of Computer Software, Springer, pp. 530–546, doi:http://dx.doi.org/10.1007/BFb0014566.
 [56] Théo Winterhalter (2020): Formalisation and MetaTheory of Type Theory. Ph.D. thesis, Université de Nantes. 2020NANT4012.
Comments
There are no comments yet.