An exploration of what AI can do without frontier-scale compute — focused on the capabilities current systems, at any size, still fundamentally lack.
Try the Demo — 7 games playableA child learns chess in dozens of games — not millions. They carry that knowledge for decades. They notice when they're confused. They ask clarifying questions. They correct themselves when shown they were wrong.
No AI system demonstrates any of this, at any scale. The assumption that frontier compute is the path to genuine intelligence leaves fundamental capabilities unsolved. We think those capabilities are accessible through a different route — smaller, more structured, more honest about what current systems still cannot do.
Current AI has optimized one axis — general competence through scale — more aggressively than any technology in recent memory. The axis it has not optimized is the one children excel at: learning, remembering, and correcting, efficiently.
Can a system accumulate knowledge across dozens of different structured tasks without earlier knowledge degrading as new tasks are added? No neural approach has yet solved this cleanly at any meaningful scale.
Can a system learn the rules, structure, and strategy of a new domain from the kind of sparse examples a child works with — tens, not millions — and play competently thereafter?
When told its understanding is wrong, can a system identify the specific rule at fault, revise it, generalize appropriately without over-generalizing, and carry the correction forward?
Can a system estimate how much thought a decision warrants, know when it is uncertain, and ask clarifying questions — rather than generating fluent output indistinguishable from confidence?
Games factor out perception and physics while preserving the cognitive capabilities that matter: rule parsing, state tracking, strategic reasoning, memory across tasks, and transfer between related problems.
The benchmark is tiered. Each game must be learned within a tight sample budget — the claim is not that the system can learn, but that it can learn efficiently.
No single technique handles all of them. The research specifically targets games that force architectural diversity — a system that only masters one category has not demonstrated general learning.
Chess, Reversi, Connect Four. Both players see the full board. The AI challenge is searching deeply enough to see tactical threats while evaluating positions accurately. Raw computation helps, but learned evaluation helps more.
Card games — poker, gin rummy, Go Fish. The opponent's hand is hidden. The system must reason about probability distributions, infer hidden state from observed behavior, and make decisions under genuine uncertainty. Minimax breaks here.
Backgammon, Chutes & Ladders, dice games. Outcomes depend partly on chance. The AI must reason about expected values, manage risk, and adapt strategy to probability rather than certainty. Plans must be robust to bad luck.
Bridge, Spades, Euchre. You share a goal with a partner but can't communicate freely. The AI must model what its partner knows, signal through play, and coordinate strategy through limited information channels. The hardest category.
Everything — parsing, training, inference, evaluation — must run on one commodity machine. No cloud APIs as hidden capability, no pre-trained frontier models doing the real work. The constraint is the point.
Sample budgets are published. Novel test games are generated after the system is frozen. Failure on the stated benchmark is not a footnote but the primary scientific outcome.
This is research, not a product launch. Most individual techniques are classical. The novelty is in integration and the capabilities demonstrated — not in invented mathematics. We do not claim more than we have shown.