A Research Program

Small models. Genuine learning. A different path for machine intelligence.

An exploration of what AI can do without frontier-scale compute — focused on the capabilities current systems, at any size, still fundamentally lack.

Substrate One laptop
Target Continual learning
Status Phase 5 — card games
Try the Demo — 7 games playable
FIG. 01 — THE LEARNING LOOP novel rules generated after build human feedback PARSE from few examples REASON step by step readable trace out REFLECT CORRECT revise & integrate RETAIN never forget STRUCTURED KNOWLEDGE chess go nim risk + each new game 72% asks when unsure ONE MACHINE

The brain runs on twenty watts.

A child learns chess in dozens of games — not millions. They carry that knowledge for decades. They notice when they're confused. They ask clarifying questions. They correct themselves when shown they were wrong.

No AI system demonstrates any of this, at any scale. The assumption that frontier compute is the path to genuine intelligence leaves fundamental capabilities unsolved. We think those capabilities are accessible through a different route — smaller, more structured, more honest about what current systems still cannot do.

A different kind of machine intelligence

Frontier LLMs

Large. Trained once. Frozen.

  • Require megawatts of training compute and billions of parameters to function
  • Cannot meaningfully learn from experience after deployment
  • Forget corrections at the end of each conversation
  • Hallucinate confidently — cannot tell what they don't know
  • Contamination means benchmarks measure memorization, not capability
  • Opaque weights; internals resistant to inspection or audit
Thin AI

Small. Structured. Continually learning.

  • Runs end-to-end on a laptop, trained and deployed on the same machine
  • Learns new domains from few examples and retains them indefinitely
  • Persistently integrates corrections into its working models
  • Tracks its own confidence; knows when to ask rather than guess
  • Evaluated on novel tasks generated after the system is built
  • Interpretable by construction — reasoning is traceable step by step

The capabilities no current system demonstrates — at any scale.

Current AI has optimized one axis — general competence through scale — more aggressively than any technology in recent memory. The axis it has not optimized is the one children excel at: learning, remembering, and correcting, efficiently.

Q.01

Continual learning without catastrophic forgetting.

Can a system accumulate knowledge across dozens of different structured tasks without earlier knowledge degrading as new tasks are added? No neural approach has yet solved this cleanly at any meaningful scale.

Q.02

Sample-efficient acquisition of novel rule systems.

Can a system learn the rules, structure, and strategy of a new domain from the kind of sparse examples a child works with — tens, not millions — and play competently thereafter?

Q.03

Corrections as first-class updates.

When told its understanding is wrong, can a system identify the specific rule at fault, revise it, generalize appropriately without over-generalizing, and carry the correction forward?

Q.04

Metacognitive effort allocation.

Can a system estimate how much thought a decision warrants, know when it is uncertain, and ask clarifying questions — rather than generating fluent output indistinguishable from confidence?

Games are the cleanest testbed.

Games factor out perception and physics while preserving the cognitive capabilities that matter: rule parsing, state tracking, strategic reasoning, memory across tasks, and transfer between related problems.

The benchmark is tiered. Each game must be learned within a tight sample budget — the claim is not that the system can learn, but that it can learn efficiently.

Tier Budget
I tic-tac-toe · nim · chutes & ladders playable ≤ 10 games
II connect four · reversi · mancala playable ≤ 30 games
III checkers · backgammon · gin rummy upcoming ≤ 50 games
IV chess · scrabble · risk upcoming ≤ 100 games
V novel games · procedurally generated upcoming transfer test

Each category demands a different kind of intelligence.

No single technique handles all of them. The research specifically targets games that force architectural diversity — a system that only masters one category has not demonstrated general learning.

Perfect information

Everything is visible. Depth is the challenge.

Chess, Reversi, Connect Four. Both players see the full board. The AI challenge is searching deeply enough to see tactical threats while evaluating positions accurately. Raw computation helps, but learned evaluation helps more.

Hidden information

You can't search what you can't see.

Card games — poker, gin rummy, Go Fish. The opponent's hand is hidden. The system must reason about probability distributions, infer hidden state from observed behavior, and make decisions under genuine uncertainty. Minimax breaks here.

Stochastic

Randomness changes everything.

Backgammon, Chutes & Ladders, dice games. Outcomes depend partly on chance. The AI must reason about expected values, manage risk, and adapt strategy to probability rather than certainty. Plans must be robust to bad luck.

Cooperative / Partnership

Your partner knows things you don't.

Bridge, Spades, Euchre. You share a goal with a partner but can't communicate freely. The AI must model what its partner knows, signal through play, and coordinate strategy through limited information channels. The hardest category.

Commitments the research cannot compromise.

i.

The laptop is the limit.

Everything — parsing, training, inference, evaluation — must run on one commodity machine. No cloud APIs as hidden capability, no pre-trained frontier models doing the real work. The constraint is the point.

ii.

Claims must be falsifiable.

Sample budgets are published. Novel test games are generated after the system is frozen. Failure on the stated benchmark is not a footnote but the primary scientific outcome.

iii.

Honesty over hype.

This is research, not a product launch. Most individual techniques are classical. The novelty is in integration and the capabilities demonstrated — not in invented mathematics. We do not claim more than we have shown.