Portfolio Project
Nonogram Solver
Reinforcement Learning (RL)
Context
I wanted to see if an RL agent could learn to solve Nonogram puzzles.
Approach
- Trained a hybrid CNN + Transformer policy network on 25M+ generated 5×5 puzzles.
- Shaped rewards around unique guesses, row/column completions, and full-board solves to guide exploration.
Impact
- Reached 94% accuracy on unseen 5×5 boards.
Environment Design
I built a reinforcement learning environment for Nonograms: puzzle generation, clue computation, and reward shaping.
- Generated large batches of unique 5×5 puzzles and computed row/column clues automatically.
- Represented state as board + clue embeddings so the agent can act on constraints.
- Rewarded progress (row/column completions) and penalized repeated guesses.
Policy Network
- Combined a CNN (board) with a Transformer (clues) to capture both spatial and sequence structure.
- Output an action distribution over grid cells so the agent learns a strategy, not a hard-coded solver.
- Kept the network small enough to train over millions of puzzles.
Training
- Trained with policy gradients at scale (25M+ puzzles; large batched episodes).
- Used reward shaping around unique guesses, row/column completions, and full-board solves.
- Tracked learning curves and saved checkpoints to compare policy improvements over time.
What I'd Improve
- Scale beyond 5×5 with curriculum learning (5×5 → 10×10) and stronger value baselines.
- Add search (like MCTS) on top of the learned policy for harder puzzles.
- Evaluate on benchmark Nonogram sets and compare against classical solvers.