Thin ai — Research Notes

A plain-language log of what's been built, what works, what doesn't, and what's next.

Last updated: June 2026

What ThinAI Is

ThinAI is a system that learns to play arbitrary turn-based games from a natural-language description of the rules. It runs entirely on a single laptop — no cloud APIs, no pre-trained models, no GPU clusters. The system parses game rules into a structured format, generates evaluation features automatically, and learns through self-play.

The core question: can classical AI techniques (search, evaluation learning, structured representation) match or exceed what neural approaches do for game learning — at a fraction of the compute?

Architecture

Natural language parser — converts plain English game descriptions into a Game Description Language (GDL) JSON format
Game engine — loads any GDL spec and handles legal moves, state transitions, and win/draw detection
Auto-feature discovery — analyzes GDL structure to generate evaluation features without game-specific knowledge
Auto-priors ("intuition") — derives initial weight biases from rule structure (e.g., "line-based wins suggest lines matter")
Minimax search with alpha-beta pruning and selective deepening — at depth 1-2, considers all moves; at depth 3+, focuses on the top 8 most promising moves (like a human ignoring obviously bad options)
Sampling-based search — for card games with hidden information, samples possible opponent hands and evaluates moves across multiple possible worlds
Adaptive effort allocation — adjusts search depth per position based on branching factor (how many choices are available) and a node budget
Self-play training — learns feature weights through win/loss reinforcement over ~40 games, similar to how a human improves by playing practice matches

Phase Status

Phase	Description	Status
0	Foundations — GDL design, stack, initial games	done
1	Core learning — minimax, features, self-play training	done
2	More games — Nim, Chutes & Ladders, benchmarks	done
3	Corrections — rule revision, error detection	done
4	Metacognition — adaptive search, confidence, self-assessment	done
5	Card games — hidden information, 11 card games, sampling search, opponent modeling	done
6	New game types — Hex, Backgammon, Wizard, Scrabble, Spades, race games, novel game pipeline	done
7	Auto-discovery — feature auto-generation, luck detection, progressive depth, training transparency	done
8	Visualization — 17 board components, SVG boards, commentary, parser reference	~90%
9	Generic mechanics — meld system, multi-phase turns, double deck, strategy hints	done
10	Polish — chess support, opponent modeling in training, partnership games	planned

Key Numbers

22 playable games across 14 categories
418+ automated tests (including targeted regression tests for C4 and Checkers)
~12,000 lines of engine code
~2,700 word Scrabble dictionary
40 training games to reach competence (board games)
< 2 seconds per AI move (typically under 1 second)
1,500 / 2,500 max nodes per move (card / board games)
350+ game objects recognized by the parser (58 with SVG icons)
0 GPUs required

Recent Changes (June 2026)

Cribbage (new game): Full 2-player Cribbage — deal 6, discard 2 to crib, cut starter (his heels), pegging toward 31 with pairs/runs/15s/go, hand scoring (15s, pairs, runs with multiplicity, flush, nobs), crib scoring, dealer alternation, race to 121. Custom UI with peg count display, play area, Go button, and phase-aware prompts.
Generic card scoring module: New reusable engine/gdl/card_scoring.py with primitives for any card game: find_subsets_summing_to() (card subsets hitting a target value), count_rank_pairs(), count_rank_run_points() (cross-suit runs with duplicate multiplicity), count_flush(), and sequential play scoring (sequential_play_pairs/runs()). Configurable rank values (Cribbage A=1 vs standard A=14).
Cross-suit runs: melds.py find_runs() gains same_suit and rank_values parameters (backward compatible). Enables run detection across suits for Cribbage and similar games.
Score-racing features: Auto-detected for any game with score state variables — score_racing_lead, score_racing_progress, is_dealer. Benefits Cribbage, Canasta, and future score-target games without game-specific code.
Training regression fixes: Two regressions from the training stability commit (f06cb35) fixed:
- Connect Four depth: Board game node budget restored to 2500 (was 2000). C4 needs depth 4 to see 3-in-a-row blocking threats (7⁴=2401 nodes).
- Checkers nosedive: Games with hand-crafted features (Checkers, Mancala, Reversi) restored to independent ReasonerOpponent instead of snapshot-of-self. Snapshot opponents caused a death spiral — learner fights frozen copy, degrades weights on every loss, never recovers.
- Learning rate decay: Loss decay restored to 0.995 (was 0.92). The aggressive 8% decay killed learning capacity by game 30.
New regression tests: Targeted tests for C4 (search depth ≥ 4, training improves) and Checkers (uses hand-crafted features, no nosedive, correct opponent type) to prevent recurrence.

Changes (May 27, 2026)

Canasta (new game): Full 2-player Canasta with double deck (108 cards), multi-phase turns (draw → meld → discard), wild cards (2s, Jokers), discard pile pickup, canasta bonuses (7+ card meld: natural 500 pts, mixed 300 pts), and going out detection. Custom UI with table melds, meld action buttons showing actual cards, and discard prompts.
Generic meld system: Reusable module (engine/gdl/melds.py) with find_sets(), find_runs(), best_melds(), deadwood_value(), and can_add_to_meld(). Configurable wild cards, minimum meld size, and scoring bonuses. Any card game mentioning "melds" or "sets" can use it.
Multi-phase turns: Generic turn phase system — games define a phase list (e.g. ["draw", "meld", "discard"]) and phases advance automatically. Phase resets to the first phase when the turn passes to the next player.
Double deck support: New "double_deck" deck type (2×52 + 4 jokers = 108 cards) in the card system.
Strategy hints: Users can type plain-English strategy advice before training (e.g. "control the center", "save high cards"). Keywords are matched to feature priors, giving the AI a head start on what matters.
Spades (new game): 2-player trick-taking with bidding and spades-always-trump. Shares UI with Wizard.
Deep copy fix: State copying now properly handles nested lists (list-of-lists like melds), preventing card duplication during AI search.
Untrained AI speed: Card games without trained weights now use depth 1 instead of 4, making untrained AI respond instantly instead of getting stuck on large branching factors.
413 tests all passing.

Changes (May 23, 2026)

Visual training replay: After training, watch sampled game replays with the actual game board UI — pieces moving, cards being played. Play/Pause, speed slider, Prev/Next game navigation. 5 games sampled evenly across training to show the AI's progression from clumsy to competent.
Training stability: Fixed weight corruption spiral where consecutive losses degraded the AI's evaluation function. Learning rate now decays 8% per loss (was 0.5%) with per-update weight clamping. Feature-based opponents use graduated difficulty (random → self-snapshot) instead of fixed depth-3 that caused nosedives. 8 new regression tests verify no game's late win rate collapses to 0%.
Stop training: New "Stop Training" button lets users halt training mid-run while keeping all results accumulated so far.
GameBoardRenderer refactor: Extracted unified board routing component used by both play mode and training replay — replays now look identical to actual gameplay for all 20 games.
Click-to-move UI: Novel movement/capture games support click-to-select-piece → click-destination in the grid board with movable/selected/destination highlighting.
Card special powers: Parser detects wild cards, Skip, Reverse, Draw Two keywords and routes to appropriate engine. Uno deck auto-detected from description.
413 tests including 8 training quality regression tests across all major games.

Changes (May 21, 2026)

Wizard (new game): Trick-taking card game with trump suit and bidding. Multi-round with cumulative scoring.
Simplified Scrabble (new game): Word/tile placement on a 9×9 board with bonus squares and ~2,700 word dictionary. Click-to-place UI.
Generic movement engine: Novel games can now describe movement + capture mechanics: diagonal, orthogonal, or all-direction movement with jump capture. Forward-only movement supported. Piece promotion (reaching back row → becomes king with backward movement). Mandatory capture enforced (jumps override regular moves).
Novel game pipeline expansion: Seven novel game types now work end-to-end:
- Grid placement (N-in-a-row, territory)
- Gravity/column drop (Connect Four-style)
- Diagonal capture with promotion (Checkers-like)
- Forward movement to goal (race on grid)
- Flanking/flipping with pass-when-stuck (Reversi-like)
- Card comparing with escalating stakes
- Dice race with bumping
Adaptive training: Novel games graduate from random opponent to self-snapshot at game 10. L2 feature discovery runs mid-training to find correlated features. Escalating-stakes card games get a "card conservation" feature with strong prior.
Clarification system wired: Parser clarification answers now actually modify the saved game definition via API. Movement games prompt for piece setup (5 layout options). Clarification choices preserved when renaming games.
Click-to-move UI: Novel movement/capture games now support click-to-select-piece → click-destination in the grid board. Movable pieces highlighted, valid destinations shown on selection.
Card special powers: Parser detects wild cards ("eights are wild"), Skip, Reverse, Draw Two/Four keywords. Routes to Uno engine when action cards detected, Crazy Eights otherwise. Uno-style deck auto-detected from "four colors numbered 0-9".
Extra turn rules: "Take another turn", "go again", "bonus turn" keywords detected and generate conditional turn rules.
Training transparency: Training results show the training partner's strategy. Dice games train at depth 1. Novel games can reach depth 3 with time guard.

Known limitations in novel games

Card action effects (Skip, Reverse, Draw Two) work via built-in Uno engine, not generically expressible for fully custom card effects
Extra turn triggers detected but the specific conditions that set the flag (e.g., "landing in store") require game-specific code
Capture games default to filling all squares — dark-square-only placement requires "dark squares" in the description

Games Implemented (22)

Game	Type	Training Quality	Notes
Reversi	Flanking	strong	Learns corner strategy, mobility. Consistently good play.
Connect Four	Placement	strong	Hand-crafted features. Blocks threats, plays center.
Checkers	Movement	strong	Captures, kings, advancement. Solid play.
Hex	Territory	decent	Connection progress + center control. Developing.
Backgammon	Race + Dice	decent	Pip advantage, blot safety. Depth-1 training (dice variance).
Mancala	Sowing	strong	Store lead, captures, extra turns. Reliable.
Hearts	Trick-taking	decent	Follows suit, avoids points. Reasonable play.
Wizard	Trick-taking + Bidding	decent	Trump + bidding. Learns to bid based on hand strength.
Scrabble	Word/Tile	decent	Simplified 9×9. Finds valid words, uses bonus squares.
Gin Rummy	Collecting	decent	Deadwood, melds, knocking. Hidden information.
Five-Card Draw	Comparing	decent	Hand strength evaluation. Discard decisions.
Uno	Shedding	decent	~55% win rate. Mostly luck, some card management.
Crazy Eights	Shedding	decent	Similar to Uno. Color/rank matching.
Blackjack	Comparing	strong	Hit/stand decisions. Learns basic strategy.
Go Fish	Collecting	strong	Near-sets, pairs tracking. 70%+ win rate.
War	Comparing	luck	Pure luck. Detected and flagged automatically.
Spades	Trick-taking + Bidding	decent	Spades always trump. 2-player with bidding. Shares UI with Wizard.
Canasta	Rummy/Melding	decent	Double deck, multi-phase turns, wild cards, meld management, canasta bonuses.
Cribbage	Pegging/Scoring	decent	Pegging toward 31, hand scoring (15s/pairs/runs/flush/nobs), crib, race to 121. Uses generic card scoring module.
Tic-Tac-Toe	Placement	decent	Auto-features. Learning against strong opponent.
Nim	Take-away	strong	Pile balance, single-pile detection.
Chutes & Ladders	Race	luck	Pure luck. Detected and flagged automatically.
Custom/Novel Games	Various	varies	Parser-generated. Supports: placement, gravity drop, movement/capture with promotion, flanking, territory, card games, dice races. Auto-features + graduated opponent.

What Works Well

Learning from scratch: Games like Reversi, Mancala, Go Fish, and Checkers show clear improvement from zero knowledge to competent play in 20-40 training games.
Feature auto-discovery: Auto-generated features for line games, capture games (piece_advantage), territory (count_pieces), and escalating-stakes card games (card_conservation). L2 correlation discovery finds additional features mid-training.
Luck detection: Pure-luck games (War, Chutes & Ladders) are automatically identified and flagged — no false "mastery" claims.
Game variety: 22 built-in games across 14 categories, plus novel games parsed from English descriptions.
Novel game pipeline: Describe a game in English → parse → auto-generate features → train → play. Works for grid placement, gravity drop, movement/capture, flanking, territory, card games, dice races, and tile placement.
Movement and capture: Generic movement engine handles diagonal, orthogonal, and all-direction movement with jump capture, mandatory capture, and piece promotion (kinging).
Interpretable AI: Every decision can be traced to feature weights. Commentary explains moves. Training partner strength is disclosed.

What Needs Work

Complex dice: Two independent dice (Backgammon: move one piece by die1 AND another by die2), doubles giving extra moves — not supported for novel games.
Draw-then-play turns: Multi-phase turns are now supported for built-in games (Canasta) but not yet auto-detected by the parser for novel card games.
Meld detection for novel games: The generic meld system exists (find_sets, find_runs) but the parser doesn't yet auto-wire it for novel game descriptions mentioning "melds" or "sets".
Novel game strategy depth: Novel games graduate from random to self-snapshot opponent, but strategy discovery remains shallow for complex games.
Custom conditional effects: "When you land on a red space, draw a card" — arbitrary effect triggers not yet parseable beyond built-in patterns (Skip, Reverse, Draw Two).

Changes (May 20, 2026)

Auto-discovery replaces hand-crafted features: Tic-Tac-Toe, Connect Four, and Chutes & Ladders no longer use hand-coded evaluation features. Instead, the system auto-generates features from the game's rules (line_threats, center_control, position_lead, etc.) and learns their importance through self-play. This means these games show real learning curves — starting weak and improving — rather than being pre-programmed to play well.
Smarter auto-features: For line-win games (N-in-a-row), the system now skips noisy features like edge_presence and corner_presence that don't help, keeping only relevant ones (center_control, longest_line, line_threats, mobility). Reduces noise-to-signal ratio during learning.
Progressive search depth: Training now starts at depth 1 and deepens every 5 games, while the opponent stays at full depth (the "adult"). This mirrors how a kid learns — start with simple thinking, gradually look further ahead. The training curve shows both weight learning and depth progression.
Luck detection: The system now identifies pure-luck games (like Chutes & Ladders and War) using two checks — L0 analyzes the rules for absence of player decisions, L1 checks post-training for flat weights and ~50% win rate. Luck games show a notice instead of misleading self-assessment or mastery claims.
Race game support: Track/race games with forward/backward choice and opponent bumping. AI evaluates positions by distance to finish. Track board UI with winding path, directional arrows, and roll/choice buttons.
Parser reference: New modal overlay documenting all 12 supported game categories, recognized piece vocabulary (~60 words), colors, cosmetics, and tips for writing game descriptions.
UI improvements: Game list split into "Trained" / "Not yet trained" groups. Training dashboard properly resets when switching games. "Back to Games" returns to menu. Server restart script (restart.sh).
Stronger auto-priors: Line features start at 0.5 (was 0.15) and center control at 0.2 (was 0.05), giving the learner a better starting point for line-win games.

Changes (May 19, 2026)

Selective deepening: At search depth 3+, only the top 8 most promising moves (scored by quick eval) are explored. This lets the AI look 3-4 moves ahead on large boards without exponential blowup — like a human focusing on "interesting" moves. All 20 AI competency tests pass.
AI commentary: After each AI move, a brief explanation appears — "Blocked opponent's 3-in-a-row threat", "Took a central position for control", "Best for center control (improved)". Generated from feature deltas and pattern detection.
Game object library: 238 words recognized across 9 categories (pieces, chess, military, nature, characters, resources, symbols, board, cards). 58 SVG icons for visual rendering. Games described with "wizards" or "knights" show actual icons on the board.
Cosmetics system: Parser detects piece colors ("red and blue stones"), board styles ("wooden board", "checkerboard"), and applies them. Custom games render with the described visual theme.
Clarification questions: After parsing, the system detects missing details (no draw condition? no board size? one color for both players?) and asks the user before training.
Game naming: Users can name their custom games before training. Names display in the game selector and training dashboard.
Parser expansion: Now handles 10 game categories — added movement/capture, sowing/mancala, and expanded trick-taking, flanking, and matching/shedding detection. 26 parser tests.
Novel game training: Games without hand-crafted features now train against a random opponent (was default_eval, which was too strong for a learner starting from scratch). 10W 0L vs random on novel 6x6 grid game.
Generic track board: Novel track/race games auto-render with a winding numbered path.

Changes (May 18, 2026)

3 new games: Hex (territory, hex grid with SVG board), Backgammon (race+dice, click-to-move with SVG triangles and pip dice), Hearts (trick-taking, first of its kind in the system)
Auto-feature discovery: Two-level system for novel games — Level 1 infers features from GDL rule structure (e.g., "track game with bear-off → pip count matters"), Level 2 discovers features via win/loss correlation from play data. Outperforms hand-crafted features on Backgammon (12W vs 7W) and Hex (16W vs 13W).
Node budget reduced: Board games 5000→2500, card games 1500. All 20 AI competency tests still pass. Moving toward the goal of smart evaluation over brute-force search.
Hand-crafted features for Gin Rummy (deadwood, melds, near-knock), Poker (hand strength), Backgammon (pip count, blots, home progress), and Hex (connection progress, blocking).
UI refactoring: Extracted 14 board components from App.jsx (2300→1100 lines). Custom card game fallback. Gin Rummy shows AI's hand and melds at game over. Backgammon click-to-move with highlighted destinations.
Backgammon training fix: Move limit raised from 200→500 for dice games (was causing false draws). Proper features instead of generic auto-features.
Game-over card reveal: All hidden cards revealed at end of game across all card games.

Game Categories and Coverage

Turn-based games fall into several structural categories. ThinAI's architecture can handle some natively and others with extensions. Here's where we are and where we're headed:

Category	Examples	Status	Coverage
Placement — place pieces on a grid	Tic-Tac-Toe, Connect Four, Go, Gomoku	works	Handles any N-in-a-row or territory game
Flipping/Flanking — capture by surrounding	Reversi, Othello	works	Fully supported
Movement/Capture — move pieces, capture opponents	Checkers, Chess, Shogi	partial	Checkers works; chess-level complexity is a stretch goal
Sowing/Mancala — distribute tokens around a track	Mancala, Oware, Kalah	works	Standard mancala variants supported
Nim-like — remove items from piles	Nim, Wythoff's, Sprouts	works	Any take-away game
Matching/Shedding — match cards, empty your hand	Crazy Eights, Uno, Rummy	works	Color/rank matching, action cards, melds
Collecting/Melding — gather sets, lay melds	Go Fish, Canasta, Rummy	works	Set detection, meld system with wilds, multi-phase turns, canasta bonuses
Comparing — compare hands for best rank	Poker, Blackjack, War	works	Hand ranking, hit/stand, draw/discard
Trick-taking — play cards to win tricks	Hearts, Wizard, Spades, Bridge	works	Hearts, Wizard, Spades (all with trump/bidding). Follow suit, trick resolution.
Auction/Bidding — bid resources for advantage	Bridge (bidding), Power Grid	partial	Wizard has per-round bidding. Full auction mechanics planned.
Race — move pieces to finish line	Chutes & Ladders, Backgammon, Parcheesi	works	Backgammon, custom race games with forward/backward choice and bumping, luck detection
Territory — control areas of the board	Go, Hex, Blokus	partial	Hex works (7×7, connect sides). Go-level complexity is a stretch goal
Word/Tile — form words or patterns with tiles	Scrabble, Bananagrams	works	Simplified Scrabble: 9×9 board, ~2,700 word dictionary, bonus squares, cross-word scoring
Partnership — teams of players cooperate	Bridge, Tichu, Euchre	planned	Needs multi-agent cooperation model

Estimated coverage of popular games

Of the ~50 most commonly played tabletop/card games worldwide, ThinAI can represent roughly 85% with its existing game types. The remaining ~15% require partnership dynamics, real-time mechanics, or negotiation.

Technical Decisions

Why not neural networks?

Neural approaches like AlphaZero (DeepMind's game-playing system) and MuZero (its successor that learns without knowing the rules) need millions of training games and significant compute per game. They learn one game at a time and don't transfer knowledge between games. ThinAI's structured approach learns from rules, not just from playing, and transfers features across games via a shared game description format. The tradeoff: less raw playing strength, but vastly more sample-efficient and interpretable.

Why minimax instead of MCTS?

Minimax (a search algorithm that evaluates moves by assuming the opponent plays optimally) with alpha-beta pruning (a technique to skip branches that can't affect the outcome) is simpler, more predictable, and works well with learned evaluation functions. MCTS (Monte Carlo Tree Search — a probabilistic approach that samples random playouts) would be better for very high branching factor games like Go, but none of our current games require it. The evaluation function is the learning target — depth is just how far ahead we look.

Why JSON for GDL?

Machine-parseable, human-readable, and web-friendly. GDL (Game Description Language) is a formal way to describe game rules so a computer can understand them. Our version uses JSON: the parser converts English to JSON, the engine loads JSON, the frontend displays JSON. One format throughout. The tradeoff: JSON can't express truly arbitrary game logic — complex conditions need built-in functions.