A Research Program

Small models. Genuine learning. A different path for machine intelligence.

An exploration of what AI can do without frontier-scale compute — focused on the capabilities current systems, at any size, still fundamentally lack.

Substrate One laptop

Target Continual learning

Status Phase 5 — card games

Try the Demo — 7 games playable

§ I — Premise

The brain runs on twenty watts.

A child learns chess in dozens of games — not millions. They carry that knowledge for decades. They notice when they're confused. They ask clarifying questions. They correct themselves when shown they were wrong.

No AI system demonstrates any of this, at any scale. The assumption that frontier compute is the path to genuine intelligence leaves fundamental capabilities unsolved. We think those capabilities are accessible through a different route — smaller, more structured, more honest about what current systems still cannot do.

§ II — Disposition

A different kind of machine intelligence

Frontier LLMs

Large. Trained once. Frozen.

Require megawatts of training compute and billions of parameters to function
Cannot meaningfully learn from experience after deployment
Forget corrections at the end of each conversation
Hallucinate confidently — cannot tell what they don't know
Contamination means benchmarks measure memorization, not capability
Opaque weights; internals resistant to inspection or audit

Thin AI

Small. Structured. Continually learning.

Runs end-to-end on a laptop, trained and deployed on the same machine
Learns new domains from few examples and retains them indefinitely
Persistently integrates corrections into its working models
Tracks its own confidence; knows when to ask rather than guess
Evaluated on novel tasks generated after the system is built
Interpretable by construction — reasoning is traceable step by step

§ III — Open Questions

The capabilities no current system demonstrates — at any scale.

Current AI has optimized one axis — general competence through scale — more aggressively than any technology in recent memory. The axis it has not optimized is the one children excel at: learning, remembering, and correcting, efficiently.

Q.01

Continual learning without catastrophic forgetting.

Can a system accumulate knowledge across dozens of different structured tasks without earlier knowledge degrading as new tasks are added? No neural approach has yet solved this cleanly at any meaningful scale.

Q.02

Sample-efficient acquisition of novel rule systems.

Can a system learn the rules, structure, and strategy of a new domain from the kind of sparse examples a child works with — tens, not millions — and play competently thereafter?

Q.03

Corrections as first-class updates.

When told its understanding is wrong, can a system identify the specific rule at fault, revise it, generalize appropriately without over-generalizing, and carry the correction forward?

Q.04

Metacognitive effort allocation.

Can a system estimate how much thought a decision warrants, know when it is uncertain, and ask clarifying questions — rather than generating fluent output indistinguishable from confidence?

§ IV — The Benchmark

Games are the cleanest testbed.

Games factor out perception and physics while preserving the cognitive capabilities that matter: rule parsing, state tracking, strategic reasoning, memory across tasks, and transfer between related problems.

The benchmark is tiered. Each game must be learned within a tight sample budget — the claim is not that the system can learn, but that it can learn efficiently.

Tier Budget

I tic-tac-toe · nim · chutes & ladders playable ≤ 10 games

II connect four · reversi · mancala playable ≤ 30 games

III checkers · backgammon · gin rummy upcoming ≤ 50 games

IV chess · scrabble · risk upcoming ≤ 100 games

V novel games · procedurally generated upcoming transfer test

§ IV.b — Game Categories

Each category demands a different kind of intelligence.

No single technique handles all of them. The research specifically targets games that force architectural diversity — a system that only masters one category has not demonstrated general learning.

Perfect information

Everything is visible. Depth is the challenge.

Chess, Reversi, Connect Four. Both players see the full board. The AI challenge is searching deeply enough to see tactical threats while evaluating positions accurately. Raw computation helps, but learned evaluation helps more.

Hidden information

You can't search what you can't see.

Card games — poker, gin rummy, Go Fish. The opponent's hand is hidden. The system must reason about probability distributions, infer hidden state from observed behavior, and make decisions under genuine uncertainty. Minimax breaks here.

Stochastic

Randomness changes everything.

Backgammon, Chutes & Ladders, dice games. Outcomes depend partly on chance. The AI must reason about expected values, manage risk, and adapt strategy to probability rather than certainty. Plans must be robust to bad luck.

Cooperative / Partnership

Your partner knows things you don't.

Bridge, Spades, Euchre. You share a goal with a partner but can't communicate freely. The AI must model what its partner knows, signal through play, and coordinate strategy through limited information channels. The hardest category.

§ V — Principles

Commitments the research cannot compromise.

The laptop is the limit.

Everything — parsing, training, inference, evaluation — must run on one commodity machine. No cloud APIs as hidden capability, no pre-trained frontier models doing the real work. The constraint is the point.

ii.

Claims must be falsifiable.

Sample budgets are published. Novel test games are generated after the system is frozen. Failure on the stated benchmark is not a footnote but the primary scientific outcome.

iii.

Honesty over hype.

This is research, not a product launch. Most individual techniques are classical. The novelty is in integration and the capabilities demonstrated — not in invented mathematics. We do not claim more than we have shown.