Portfolio Project

Nonogram Solver

Reinforcement Learning (RL)

Machine Learning Python PyTorch AWS Docker

Context

I wanted to see if an RL agent could learn to solve Nonogram puzzles.

Approach

  • Trained a hybrid CNN + Transformer policy network on 25M+ generated 5×5 puzzles.
  • Shaped rewards around unique guesses, row/column completions, and full-board solves to guide exploration.

Impact

  • Reached 94% accuracy on unseen 5×5 boards.

Environment Design

I built a reinforcement learning environment for Nonograms: puzzle generation, clue computation, and reward shaping.

  • Generated large batches of unique 5×5 puzzles and computed row/column clues automatically.
  • Represented state as board + clue embeddings so the agent can act on constraints.
  • Rewarded progress (row/column completions) and penalized repeated guesses.

Policy Network

  • Combined a CNN (board) with a Transformer (clues) to capture both spatial and sequence structure.
  • Output an action distribution over grid cells so the agent learns a strategy, not a hard-coded solver.
  • Kept the network small enough to train over millions of puzzles.

Training

  • Trained with policy gradients at scale (25M+ puzzles; large batched episodes).
  • Used reward shaping around unique guesses, row/column completions, and full-board solves.
  • Tracked learning curves and saved checkpoints to compare policy improvements over time.

What I'd Improve

  • Scale beyond 5×5 with curriculum learning (5×5 → 10×10) and stronger value baselines.
  • Add search (like MCTS) on top of the learned policy for harder puzzles.
  • Evaluate on benchmark Nonogram sets and compare against classical solvers.

Links