10  What GTO Really Is: Nash, Indifference, Unexploitability

Few terms in modern poker are used more often and understood less than “GTO.” You will hear a player at a $1/$2 table announce that “GTO says you always bet your flush draws,” or that “GTO is unbeatable,” or that a play was “not GTO” as though that settled the matter. Almost all of these statements are wrong, and they are wrong in ways that actively cost people money. This chapter strips the buzzword down to its mathematical core. By the end you should understand precisely what game-theory-optimal play is, what it guarantees, what it deliberately gives up, and why the same hand can correctly be played two different ways on the same board.

We will lean on three ideas — Nash equilibrium, the indifference principle, and unexploitability — and we will keep returning to one uncomfortable truth: GTO is not the strategy that wins the most money. It is the strategy that can lose the least when you have no information. Knowing the difference is the foundation for everything in the exploitative chapters that follow.

10.1 Nash Equilibrium: A Strategy That Answers Itself

The term Game-Theory-Optimal is poker’s branding for a concept from John Nash’s work in the 1950s. A Nash equilibrium is a set of strategies — one for each player — with a single defining property:

No player can increase their expected value (EV) by unilaterally changing their own strategy, given that everyone else keeps theirs fixed.

Read that again, because the conditional clause at the end is the whole game. At equilibrium, your strategy is the best possible response to your opponent’s strategy, and theirs is simultaneously the best response to yours. The two strategies lock together. Neither player has any incentive to deviate, so the situation is stable — it is an equilibrium in the literal sense.

Heads-up No-Limit Hold’em is a two-player zero-sum game: every chip you win is a chip your opponent loses. A deep theorem (the minimax theorem) guarantees that such games have an equilibrium, and that playing your half of it gives you a mathematical floor on your results. This is why “solvers” — software such as PioSOLVER, GTO+, or WizardOfOdds-style browser trainers — exist at all. You feed the machine a board, ranges, and stack depths, and it grinds toward the equilibrium strategy through millions of iterations, refining frequencies until neither side can find a profitable deviation. The output is what people mean when they say “the solver says.”

WarningCommon mistake

“GTO” is not a fixed playbook you can memorize, and it is not a property of a single hand. It is a pair of strategies in balance. A bet sizing is only “GTO” relative to a specific board, range, and stack depth. Change any input — make the stacks 40bb instead of 100bb, narrow the preflop range — and the equilibrium changes. There is no universal “GTO bet.”

A crucial caveat: the clean guarantees above hold cleanly only for heads-up, zero-sum play. Real poker is usually multiway (three or more players), and multiplayer games can have multiple equilibria with no guarantee that your “GTO” line is best when two opponents are also deviating. We will treat the heads-up case as the rigorous foundation and the multiway case as a useful but looser approximation.

10.2 Unexploitable Does Not Mean Unbeatable

Here is the single most important sentence in this chapter:

TipKey idea

An unexploitable strategy is one that cannot be beaten faster by any counter-strategy. It is not a strategy that wins the maximum, and against weak opponents it deliberately leaves money on the table.

To make this concrete, imagine the simplest gambling game in the world: matching pennies, or its poker cousin, a forced guessing game. If you play Rock-Paper-Scissors by throwing each option exactly one-third of the time at random, you are playing the GTO strategy. What is your expected result against any opponent — a world champion, a child, a random number generator? Exactly break-even. You cannot lose. You also cannot win. If your opponent has a horrible habit of throwing Rock 80% of the time, your unexploitable one-third strategy still just breaks even against them, while a player who adjusts and throws Paper every time prints money.

That is GTO in miniature. It guarantees you cannot be exploited; it does not guarantee you profit from your opponent’s mistakes. The break-even result against the Rock-spammer is the cost of refusing to adjust.

Poker is not zero-EV like Rock-Paper-Scissors — there is real money in the pot and players make genuine errors — so an equilibrium poker strategy does win against weak fields, because the weak players beat themselves by deviating. But the principle stands: every time you play the pure equilibrium against a flawed opponent, you win less than you could have by exploiting them. GTO is the strategy you fall back on when you do not know how your opponent is flawed, or when your opponent is good enough that trying to exploit them would expose you to being exploited back.

WarningCommon mistake

“GTO is a money-printing machine that beats everyone.” No. GTO is a defensive equilibrium. Its promise is “you can’t get exploited,” not “you’ll crush.” A pure GTO bot heads-up against a calling station will still win — because the station’s mistakes lose money on their own — but it will win far less than an exploitative human who tightens up bluffs and value-bets the station relentlessly. Maximum profit lives in deviating from GTO in response to reads. GTO is the baseline you deviate from.

10.3 The Indifference Principle: The Engine Under the Hood

If equilibrium is the what, the indifference principle is the how. It is the mechanism that produces all those mysterious solver frequencies, and once you see it you can never unsee it.

The principle is this: at equilibrium, you choose your action frequencies so that your opponent is made indifferent between their options — their EV is identical whether they call or fold.

Why would you want your opponent to be indifferent? Because if they are genuinely indifferent — if calling and folding earn them exactly the same amount — then nothing they do can hurt you. They can call always, fold always, or flip a coin, and your EV is unchanged. You have removed their decision’s power. That is exactly what “unexploitable” means at the level of a single spot. You are not trying to trick them; you are constructing a situation where their read on you is worthless because both of their options are mathematically equal.

Worked example: how often to bluff

Let us make this rigorous with the cleanest possible case — a river bet on the final street, where there are no future cards to complicate the math.

The setup. The pot is 100bb. You are last to act. You bet 50bb — a half-pot bet. Your opponent has a “bluff catcher”: a hand that beats your bluffs but loses to your value bets. They must decide whether to call.

You want to choose a ratio of value bets to bluffs that makes your opponent indifferent to calling. Their call risks 50bb to win the 150bb that would be in the pot (the 100bb pot plus your 50bb bet). Let (b) be the probability that your betting range is a bluff.

  • If they call, they win 150bb when you are bluffing (probability (b)) and lose 50bb when you are value betting (probability (1-b)).
  • If they fold, they win 0.

Set the EV of calling equal to the EV of folding (which is 0):

\[ EV_{call} = b \cdot (150) - (1-b) \cdot (50) = 0 \]

Solving: (150b - 50 + 50b = 0), so (200b = 50), and (b = 0.25).

The answer: your betting range on the river should be 25% bluffs and 75% value. At that ratio, your opponent’s bluff-catcher makes exactly 0bb whether they call or fold. You have made them indifferent. They cannot exploit you by calling more, and they cannot exploit you by folding more.

Notice the elegant symmetry hiding in this. The other half of the equilibrium is the defender’s job: the bettor’s optimal bluff frequency is governed by the pot odds the bet lays, and the caller’s optimal calling frequency is governed by minimum defense frequency (MDF) — they must defend often enough that the bettor’s pure bluffs (which risk 50bb to win 100bb) cannot profit by betting any two cards. For a half-pot bet, MDF works out to roughly 67%: the defender folds at most one-third of the time. Bluff frequency and defense frequency are two sides of the same indifference coin, each making the other player unable to exploit.

TipKey idea

Bet sizing and bluff frequency are linked, not independent. A bigger bet lays your opponent worse pot odds, which means you are allowed — in fact required — to bluff more to keep them indifferent. The classic ratios:

Bet size (fraction of pot) Value : bluff ratio Bluff % of betting range
1/2 pot 3 : 1 25%
2/3 pot ~2.3 : 1 ~30%
Full pot 2 : 1 ~33%
2x pot (overbet) ~1.5 : 1 ~40%

These are river, polarized-range approximations. On earlier streets the math shifts because draws still have equity, but the direction of the relationship — bigger bet, more bluffs — always holds.

This is also why the lazy claim “GTO says always bet your draws” is nonsense. GTO says you bet draws at a frequency, balanced against value, sized so the opponent is indifferent. The draw is a candidate bluff; whether this specific draw bets this time is a separate question — which brings us to mixed strategies.

10.4 Mixed Strategies: Why the Same Hand Bets and Checks

Open a solver output and you will see something that baffles newcomers: a single hand, in a single spot, marked “bet 62%, check 38%.” How can the correct play be to bet a hand only some of the time? Isn’t there one right answer?

No — and the reason is the indifference principle again, viewed from the player’s own side. A mixed strategy is a strategy that randomizes between actions according to fixed probabilities. Pure strategies (always bet, always check) are just the special case where one probability is 100%.

Mixed strategies appear at equilibrium whenever a hand is a genuinely close decision — when betting and checking have nearly the same EV. If a hand were clearly better as a bet, the solver would bet it 100%. The hands that get mixed are precisely the ones on the knife’s edge, where the EV of betting and the EV of checking have been driven equal by the opponent’s optimal response. And when two options are exactly equal in EV, any mix between them is equally good — including a 62/38 split.

Crucially, the equilibrium mix is the one that also protects your other holdings. Consider A♠Q♠ on a Q-7-2 rainbow flop. This is top pair, good kicker. The solver might bet it 70% of the time and check 30%. Why ever check such a strong hand?

  1. Range protection. If you bet every strong hand, your checking range becomes capped — full of weak holdings — and a sharp opponent will mercilessly bet you off the pot whenever you check. By checking AQ sometimes, your checking range contains traps, so your opponent cannot blast away at it for free.
  2. Board coverage on later streets. Mixing AQ into your check now keeps strong hands available to call or raise on the turn and river, so your turn checking range isn’t pure air.
  3. Inducing and balancing. The 30% checks let you check-raise and check-call with a credible range, which in turn lets your bluffs in those lines get paid.
TipKey idea

You, a human, do not need a random number generator at the table. The frequencies in solver outputs describe what an unexploitable population of you would do across many identical spots. In practice you approximate a mix with simple randomizers — the second hand on your watch, the suits of your cards (“I’ll take the spade-suited combo as my bet”), or your hole-card position. The point of mixing is not mysticism; it is that your range in each line stays balanced even though each individual hand goes one way.

A natural objection: “If betting and checking are exactly equal EV, why bother mixing at all — why not just always bet?” Against an opponent who is also playing equilibrium, it genuinely does not matter for this hand’s EV. But the moment you pick a pure strategy, you change the composition of your ranges in both lines, and a thinking opponent can attack that imbalance. Mixing is insurance for your range, not for the hand. (And against a non-equilibrium opponent, you should often abandon the mix entirely and pick the pure exploit — more on that in the exploitative chapters.)

10.5 Why GTO Is Defensive, Not Maximal

Let us gather the threads. GTO is defensive by construction because of how it is derived: the solver assumes your opponent will find and punish any weakness, and it builds a strategy with no weaknesses to punish. That is a paranoid design philosophy. It optimizes for the worst case — an opponent playing perfectly against you — and pays for that safety by not capitalizing on the (very common) case where the opponent plays badly.

Think of it as the difference between a fortress and an army. GTO builds an impregnable fortress: nobody can break in. But fortresses do not conquer territory. Exploitative play is the army that marches out to take advantage of the enemy’s mistakes — and in doing so it leaves the gates open, becoming exploitable in return if the enemy is secretly stronger than you assumed.

TipKey idea

The strategic decision tree in real poker is:

  1. Is my opponent making a clear, identifiable mistake? (Calling too much? Folding too much? Never bluffing?) If yes and you are confident in the read, deviate from GTO to exploit it — this earns more than GTO.
  2. Am I unsure how they’re flawed, or are they strong enough to counter-exploit me? If yes, default to GTO — it guarantees you can’t be the one who gets exploited.

GTO is the baseline. Exploitation is the upside. You need both. A player who only knows GTO leaves money on the table against weak fields; a player who only exploits gets destroyed by good regulars who notice the imbalances and punish them.

This framing dissolves the tired “GTO vs. exploitative” debate. They are not rivals. GTO is the center of gravity you deviate from, and the amount you deviate is proportional to how confident you are in your read. The better you understand the equilibrium, the better you understand which direction a given opponent’s mistake points — and therefore exactly how to exploit it. You cannot know that a player “folds too much to river bets” without an internalized sense of how much is correct. GTO knowledge is what makes exploitation precise instead of guesswork.

10.6 A Reality Check on Variance and Limits

Because poker is a game of incomplete information, even perfect equilibrium play loses pots constantly — that is by design. Indifference means your value bets get called and your bluffs get caught at exactly the breakeven rate. You will stack off with the best hand and lose to a river card; you will run your balanced bluff into the top of their range. Over a single session, a perfectly GTO strategy can lose a great deal. Its guarantee is asymptotic — it holds in expectation, over a large sample, not on any given night. Never confuse a losing session with a wrong strategy, or a winning session with a right one.

And the honest practical caveats:

  • Solvers solve simplified games. They use discrete bet sizes, assume both players know the exact starting ranges, and (usually) solve one isolated spot. Real opponents misjudge ranges and pick non-solver sizings. Solver output is a brilliant guide, not gospel.
  • True GTO for full-ring, multiway, ICM-laden tournament poker has never been fully solved. What we call “GTO study” is really study of close approximations to heads-up and a handful of common multiway spots.
  • Humans cannot execute true equilibrium. You cannot recall exact frequencies for thousands of nodes. The realistic goal is to absorb the principles — balanced bluff ratios, range protection, MDF, indifference — so your play is robust and hard to exploit, then layer reads on top.
NoteDrill

For each scenario, state (a) whether to play GTO or to deviate, and (b) the direction of the deviation.

  1. You are heads-up on the river, pot 100bb, against an unknown reg in a tough online pool. You have a marginal bluff-catcher. What’s your default and why?
  2. Same spot, but you have 600 hands on this opponent and their river check-raise has shown up only with the nuts, never as a bluff. They check-raise you. What do you do, and is it GTO?
  3. You bet 1/2 pot as a bluff against a player you know folds his bluff-catchers about 50% of the time (versus the ~33% an unexploitable defender would). Should you bluff more or less than the 25% balanced frequency, and why?
  4. A solver tells you to bet A♠Q♠ on Q-7-2r exactly 70% of the time. You are at a live table and cannot randomize precisely. How do you approximate this, and does it matter for this one hand’s EV?

Answers. (1) Default to GTO — call at your indifference frequency; you have no read, so refuse to be exploited. (2) Deviate hard and fold — this is a massive, confident read that their raising range is pure value; folding is more profitable than the GTO call, and yes, it is exploitable in return, but you have the data to justify it. (3) Bluff more than 25% — he over-folds, so each bluff is more profitable than breakeven; you exploit by tilting your range toward bluffs (and abandon balance because he isn’t punishing imbalance). (4) Use a randomizer (card suits, watch hand) or simply pick one action; for this hand’s EV against an equilibrium opponent it doesn’t matter at all, because the two options are equal EV by definition — the mix only matters for keeping your overall range balanced.

10.7 Summary

  • A Nash equilibrium is a pair of strategies where neither player can profit by unilaterally changing theirs. “GTO” is poker’s name for playing your half of it.
  • Unexploitable ≠ unbeatable. GTO guarantees you cannot be exploited; it does not extract maximum value, and it deliberately breaks even against mistakes it could punish.
  • The indifference principle is the engine: you choose frequencies that make your opponent equally happy calling or folding, which strips their decisions of power. This produces the canonical bluff-to-value ratios (25% bluffs at half pot, ~33% at full pot, more for overbets).
  • Mixed strategies arise because close-EV hands are equally good as bets or checks; mixing keeps your ranges balanced even though each hand goes one way. You approximate the mix with simple table randomizers.
  • GTO is defensive — a fortress optimized against a perfect opponent. The money is made by deviating toward exploitation when you have reads, using GTO as the baseline that tells you which way the opponent’s mistake points.

Master the equilibrium not so you can robotically reproduce it, but so you understand exactly what you are giving up every time you choose to exploit — and exactly when you should refuse.