On Compositional Learning Behaviours in Formal Mathematics

Abstract

Self-evolving scientific agents capable of conquering the hard tail of formal mathematics require Compositional Learning Behaviours (CLBs) – the capacity to ground and recombine novel symbolic structures in context, beyond mere recombination of prelearned atoms. We propose S2B-LM, an adaptation of the Symbolic Behaviour Benchmark that removes numerical processing as a confound and adds chain-of-thought scaffolding to elicit rather than merely probe latent CLB competency. Cross-evaluating ten Lean 4 theorem provers on CLB competency (adj-ZSCT) and miniF2F whole-proof performance, exact permutation tests establish a hierarchical necessity structure: search-heavy models cover the tractable bulk without detectable CLBs, yet every model breaking into the Olympiad-level tier (miniF2F >75%) is among the five highest CLB scorers (p=0.004). After ruling out model scale as a confound, our results show that CLB competency is necessary but not sufficient for the hard tail of formal mathematical verification.

Kevin Denamganaï
Kevin Denamganaï
Independent Researcher

My research investigates the conditions under which AI systems acquire and deploy structured symbolic representations — towards in-context grounding of novel atomic symbols and their systematic recombination into unseen configurations — spanning Compositional Generalisation, Formal Mathematics, Differentiable Language Models, and Physical Simulation. I have also investigated Language Emergence & Grounding (Emergent Communication), Unsupervised Representation Learning, Natural Language Processing, and Multi-Agent Deep Reinforcement Learning.

Related