Teaching Claude & Gemini to Play StarCraft II: The General and the Soldier

🎧 Podcast version available Listen on Spotify →

The Setup

I’m building a StarCraft II bot. Protoss. I’ve been playing StarCraft for over 15 years — so yes, I take this seriously. The mission: teach Claude and Gemini to beat the Blizzard Elite AI. Not Very Hard. Elite. The highest difficulty Blizzard ships with the game — the one that makes seasoned players question their life choices.

But the interesting part isn’t the game. It’s the architecture I’m experimenting with.

The idea is this: what if an LLM — Claude or Gemini — acts as the General, making high-level strategic decisions, while traditional hardcoded AI handles the micro-execution? Mine minerals. Build pylons. Train soldiers. The fast, repetitive, rule-based stuff that needs to happen in milliseconds.

Two kinds of intelligence. One army.

Why Two Systems?

StarCraft II is a brutal test for AI because it demands two completely different types of thinking happening simultaneously.

Macro strategy — the big picture. When do I expand? Am I building the right unit composition for what the Zerg is throwing at me? Should I pressure early or turtle and tech up? These decisions require reading the game state, anticipating the opponent, and adapting. This is where LLMs shine — reasoning, pattern recognition, contextual decision-making.

Micro execution — the tiny stuff. Move this Zealot. Kite that Zergling. Keep the Probe alive while it builds. These decisions happen dozens of times per second and need to be near-instant and precise. This is where hardcoded, rule-based AI is still king. No LLM is fast enough to handle this in real time.

The hybrid approach tries to get the best of both worlds. Claude thinks like a General. The hardcoded layer fights like a soldier.

The Zerg Problem

Here’s the thing about Zerg: they are designed to punish indecision.

An early Zergling rush hits before you’ve had time to think. A Roach/Hydra composition can overwhelm a Protoss player who didn’t scout properly. And the Blizzard Very Hard AI plays Zerg with a kind of relentless aggression that makes you question your life choices.

My bot, being Protoss, has tools to deal with this — Zealots, Stalkers, the Shield Battery, eventually Colossi and Storm. But getting Claude to understand when to build what, why certain units counter others, and how to read aggression cues from a Zerg opponent — that’s the real work.

It turns out describing a real-time strategy game to a language model is harder than it sounds.

The Hardest Part: Balance

The biggest challenge so far isn’t getting Claude to make decisions. It’s getting the two systems to talk to each other in a way that makes sense.

The hardcoded layer is fast but dumb. It executes instructions but doesn’t understand context. The LLM is smart but slow. It understands context but can’t react in milliseconds.

Bridging that gap — defining the right abstraction layer between “General gives order” and “Soldier executes order” — is where most of the complexity lives.

Too much control to the LLM and the bot hesitates, over-thinks, misses timing windows. Too much control to the hardcoded layer and you’re just playing a scripted bot with an expensive narrator.

The goal is a clean handoff. Claude decides: “We need to push now, build two more Immortals, and set up a contain.” The hardcoded layer handles: exactly which units move where, in what formation, avoiding what obstacles, attacking what targets first.

Getting that boundary right is genuinely hard. Every time I think I’ve found it, the Zerg does something unexpected and the whole thing falls apart.

Where Things Stand

I have working code — Claude and Gemini integrations that can read the game state and issue strategic commands. The hardcoded micro layer handles the basics: resource gathering, unit production, basic combat response.

What I’m testing now is the decision loop. How often does Claude get to “think”? What information does it need to make good decisions? How do you translate a real-time game state into something an LLM can reason about quickly enough to be useful?

Early results are humbling. The bot makes interesting decisions sometimes. Other times it does things that would make any Protoss player cry. But that’s the process — every failure teaches you something about where the boundary between the two systems needs to move.

What’s Coming

This is an ongoing project and I’ll be sharing more as it develops — the technical architecture, the prompting strategies, the specific challenges of getting an LLM to think in terms of unit supply, map control, and timing attacks.

The goal is still to beat Elite Blizzard AI consistently. We’re not there yet. The Zerg are still winning more than I’d like.

But we’re learning. And that’s the whole point.

More updates coming soon.

Following along? Connect on LinkedIn for updates as this project develops.