Almost three weeks ago, I proposed a collaborative project, combining the efforts of professional and amateur mathematicians, automatic theorem provers, AI tools, and the proof assistant language Lean, to describe the implication graph relating the 4694 equational laws for magmas that can be expressed using up to four invocations of the magma operation. That is to say, one needs to determine the truth or falsity of the possible implications between the these 4694 laws.
The project was launched on the day of the blog post, and has been running for a hectic 19 days thus far; see my personal log of the project for a day-by-day summary of events. From the perspective of raw implications resolved, the project is (of the time of writing) 99.9963% complete: of the implications to resolve, have been proven to be true, have been proven to be false, and only remain open, although even within this set, there are implications that we conjecture to be false and for which we are likely to be able to formally disprove soon. For reasons of compilation efficiency, we do not record the proof of every single one of these assertions in Lean; we only prove a smaller set of implications in Lean, which then imply the broader set of implications through transitivity (for instance, using the fact that if Equation implies Equation and Equation implies Equation , then Equation implies Equation ); we will also shortly implement a further reduction utilizing a duality symmetry of the implication graph.
Thanks to the tireless efforts of many volunteer contributors to the project, we now have a number of nice visualization tools to inspect various portions of the (not quite completed) implication graph. For instance, this graph depicts all the consequences of Equation 1491: , which I have nicknamed the “Obelix law” (and it has a companion, the “Asterix law“, Equation 65: ). And here is a table of all the equational laws we are studying, together with a count of how many laws they imply, or are implied by. These interfaces are also somewhat integrated with Lean: for instance, you can click here to try your hand at showing that the Obelix law implies Equation 359, ; I’ll leave this as a challenge (a four-line proof in Lean is possible).
Over the last few weeks, I’ve learned that many of these laws have previously appeared in the literature, and compiled a “tour” of these equations here. For instance, in addition to the very well known commutative law (Equation 43) and associative law (Equation 4512), some equations (Equation 14, Equation 29, Equation 381, Equation 3722, and Equation 3744) appeared in some Putnam math competitions; Equation 168 defines a fascinating structure, known as a “central groupoid”, that was studied in particular by Evans and by Knuth, and was a key inspiration for the Knuth-Bendix completion algorithm; and Equation 1571 classifies abelian groups of exponent two.
Thanks to the Birkhoff completeness theorem, if one equational law implies another, then it can be proven by a finite number of rewrite operations; however the number of rewrites needed could be quite lengthy. The implication of 359 from 1491 mentioned above is already moderately challenging, requiring four or five rewrites; the implication of Equation 2 from Equation 1689 is incredibly long (try it!). Nevertheless, standard automated theorem provers, such as Vampire, are quite capable of proving the vast majority of these implications.
More subtle are the anti-implications, in which we have to show that a law does not imply a law . In principle, one just has to exhibit a magma that obeys but not . In a large fraction of cases, one can simply search through small finite magmas – such as magmas on two, three, or four elements – to obtain this anti-implication; but they do not always suffice, and in fact we know of anti-implications that can only be proven through a construction of an infinite magma. For instance, the “Asterix law” is now known (from the efforts of this project) to not imply the “Obelix law”, but all counterexamples are necessarily infinite. Curiously, the known constructions have some affinity with the famous technique of forcing in set theory, in that we continually add “generic” elements to a (partial) magma in order to force a counterexample with certain specified properties to exist, though the constructions here are certainly far simpler than in the set-theoretic constructions.
We have also obtained profitable mileage out of constructions of “linear” magmas in both commutative and non-commutative rings; free magmas associated to “confluent” equational laws, and more generally laws with complete rewriting systems. As such, the number of unresolved implications continues to shrink at a steady pace, although we are not yet ready to declare victory on the project.
After quite hectic amount of back end setup and “putting out fires”, the project is now running fairly smoothly, with activity coordinated on a Lean Zulip channel, and all contributions going through a pull request process on Github, and tracked via an issues-based Github project with the invaluable oversight provided by the other two maintainers of the project, Pietro Monticone and Shreyas Srinivas. In contrast to the prior PFR formalization project, the workflow follows standard Github practices and proceeds roughly as follows: if, during the course of the Zulip discussion, it becomes clear that some specific task needs to be done to move the project forward (e.g., to formalize in Lean the proof of an implication that had been worked out in the discussion threads), an “issue” is made (often by myself or one of the other maintainers), which other contributors can then “claim”, work on separately (using a local copy of the main Github repository), and then submit a “pull request” to merge their contribution back into the main repository. This request can then be reviewed by both maintainers and other contributors, and if approved, closes the relevant issue.
More generally, we are trying to document all of the processes and lessons learned from this setup, and this will be part of a forthcoming paper on this project, which we are now in the preliminary stages of planning, and will likely include dozens of authors.
While the project is still ongoing, I can say that I am quite satisfied with the progress accomplished to date, and that many of my hopes for such a project have already been realized. On the scientific side, we have discovered some new techniques and constructions to show that a given equational theory does not imply another one, and have also discovered some exotic algebraic structures, such as the “Asterix” and “Obelix” pair, that have interesting features, and which would likely not have been discovered by any means other than the type of systematic search conducted here. The participants are very diverse, ranging from mathematicians and computer scientists at all stages of career, to interested students and amateurs. The Lean platform has worked well in integrating both human-generated and machine-generated contributions; the latter are numerically by far the largest source of contributions, but many of the automatically generated results were first obtained in special cases by humans, and then generalized and formalized (often by different members of the project). We still make many informal mathematical arguments on the discussion thread, but they tend to be rapidly formalized in Lean, at which point disputes about correctness disappear, and we can instead focus on how best to deploy various verified techniques to tackle the remaining implications.
Perhaps the only thing that I was expecting to see at this point that has not yet materialized is significant contributions from modern AI tools. They are being used in a number of secondary ways in this project, for instance through tools such as Github copilot to speed up the writing of Lean proofs, the LaTeX blueprint, and other software code, and several of our visualization tools were also largely co-written using large language models such as Claude. However, for the core task of resolving implications, the more “good old-fashioned” automated theorem provers have so far proven superior. However, most of the remaining 700 or so implications are not amenable to these older tools, and several (particularly the ones involving “Asterix” and “Obelix” had stymied the human collaborators for several days), so I can still see a role for modern AI to play a more active role in finishing off the hardest and most stubborn of the remaining implications.