Ben Follow Old School computer nerd

Game Theory In Two Lines Of Code

Can we simplify the Theory? I’m Game.

Wanna skip to the end? The summary is, game theory studies suggest that your best strategy set for life consists of:

Being ‘nice’ to others
Forgiving others
Being provokable by others

Sounds like grand-parent wisdom, ja? Except for the last point. Read on if you’d like to break down Game Theory and what the formal studies have shown. I can promise you some surprises.

As a long time reader of Dubner and Levitt, I’ve been a listener at the Freakanomics Radio Network

It’s a great resource. The newer episodes are Steve Levitt and Angela Duckworth, who have an awesome rapport and chemistry. I recommend all of the Freakanomics shows, including People I mostly Admire, No Stupid Questions and Sudhir Breaks the Internet

Enter Robert Axelrod

This writing was inspired by a recent People I (mostly) Admire podcast, EP47 Why Being Nice, Forgiving, and Provokable are the Best Strategies for Life

In my living study of audio learning, I’m amazed to find that salient points, such as the names of authors, will ‘stick’ even when I lose most of the content by listening while working or being otherwise distracted.

I’d heard Robert Axelrod’s name many times before. Steve explains this early in the podcast.

Steve - Roughly 40 years ago, you had the idea to run a little tournament for 13 academic Game Theorists. Did you ever imagine at the time, this would launch a research agenda that get over 50,000 academic citations?

John - No, I just did it for fun

The interview begins with Steve and John attempting to give a primer on Game Theory. We’ve all heard something like this before, no doubt. It’s actually not that clear of an explanation.

The show’s producer, [Morgan Levy]{https://morganlevey.com/}, steps in to clarify. (I’m paraphrasing slightly here.)

The Prisoner’s Dilemma is a hypothetical scenario in Game Theory. Imagine Steve and Morgan are in custody for jointly committing a crime. The evidence the police have is weak. The are arrested at the same time and taken to separate holding cells at a police station

The police explain to Morgan that their are four possible outcomes to the current situation and the Morgan’s outcome will depend on what she chooses to do and what Steve, separately, chooses to do.

*Two more key pieces of information are: Steve and Morgan cannot talk to each other. Morgan has only two courses of action - Rat Out on Steve, or Stay silent.

The joint scenario set looks like this:

Morgan Rat’s on Steve. Steve stays silent. In this case, Morgan is Defecting on her partnership with Steve. Morgan walks free. Steve goes to jail for 20 years.
Neither party Rat’s Out on the other. Both parties are cooperating with each other. As the evidence is weak, both are sentenced to 1 year of jail time.
Both Steve and Morgan Rat Out on each other. Both go to jail for 10 years, a lower sentence as a reward for testifying against each other.
Steve Rat’s Out on Morgan. He walks free. Morgan gets 20 years in prison.

What’s the best course of action for Morgan?

John explains: Each party is better of Defecting, double crossing the other, acting on their own interest only. But of both of them do that, the both end up worse off as a result. From a Game Theory perspective, Defection is called a Dominant Strategy, meaning it’s the best strategy, no matter what the other party does.

If the other guy Cooperates (stays silent), you can exploit their cooperation by Defecting. If the other party Defects, then you’d be a sucker to Cooperate.

John goes on to point out that these kinds of scenarios (which I’ve heard many times before), describe the situation where the Game is Played only Once. The emphasis on once is where the term Dilemma is earned.

Steve sums up

So, this scenario describes a situation where the best thing we could do is Cooperate. But, since our private incentives get in the way, so we can’t do that. We left stuck at the only other outcome - we Both Defect, and end up with a pretty bad outcome, but not as bad as if we were made the Sucker by the Other.

The Set of All Games

This podcast took me to new places in terms of game theory by distinguishing this hypothetical single-pass game from the far more common human experience of…behaviours that arise over time in circumstances that repeat, at least partially.

John: The original analysis I read when I was 17 said, if you know when the game is going to end, you are better of Defecting right from the beginning. If you don’t know when the game is going to end, you can use the history of the game to update your strategy.

Steve: “If you play the game over and over, you can use the future rounds to Punish the other party if the Defect on you now. You are demonstrating to the other, “I’ll cooperate with you, as long as you don’t Defect on me. Over time, a Game-Theoretical Equilibrium can arise as both parties learn to exploit the benefits of Cooperation”

John: “I call that the Shadow of the Future, the idea that the Future can effect what you’re doing now.”

The Repeated Prisoners Dilemma Tournament

John had ran a contest based on Computerised Chess Tournaments. He involved 13 academic researches who each submitted a codified or algorithmic solutions to the Repeated Prisoner’s Dilemma. John collected results over a number of iterations of this Tournament approach, and found the following ‘winning’ strategies’.

Insights

None of the submissions included to the two most simple and obvious strategies. Always Cooperating or Always Defecting lead to the worst set of pay-offs.

Reciprocity (or Tit for Tat) was far and away the most successful strategy found in the Tournament(s). The summary, from John is:

“Tit for Tat - A player Cooperates on the First move. Then does what the other ‘player’ does on each subsequent move”

What you are communicating is: “I’m willing to Cooperate if you are. If you’re not, I’m going to Defect. I’m just gonna Echo what you do. Hopefully you realise that you are better off Cooperating with me, then I’ll Cooperate the next time. Then we can bother do pretty well”

John goes on to give explanations of tragic strategic failures - the worst of which, is Massive Retaliation or the Grim Reaper strategy. This is when a ‘player’ responds to Defection by always Defecting, getting their ‘payback’ over and over. Does this remind you of anyone you know?

John finds himself stuck for words other than Nice or ‘Niceness’ to describe a key element of a successful strategy. This is to signal to the other ‘player’ of you’re preparedness to Cooperate. There is a corresponding ‘Not-Nice’ with the opposite level of success in the Tournament.

The Tournament was won by Anatole Rappaport, who liaised with John after the tournament, reminding him that this thought experiment would have a poor representation in Daylight Life, due to the immense complexity of human behaviour in ‘the Wild’.

The key findings from the tournament are, in order:

Be Nice. Keep Cooperating as long as the ‘other guy does’. Don’t ‘start trouble’
Be forgiving. Leaning into ‘payback’ or ‘punishment’ will cost you as a strategy.
Be provokable. If you’re a ‘sucker’ all the time, you are setting yourself up for exploitation from ‘the other guy’

Why so many quoted ‘other guys’ and ‘other players’ in my writing above? Here, I’m paying credence to John’s advice through the interview. Be careful of Zero Sum thinking. There is no room for ‘opponents’ in successful Game Theoretical Strategy. In John’s experience, there are almost zero real life examples of Zero Sum situations in life - “Sports and Warfare” are about the only two.

This last thought called up something pretty deep for me. I am notoriously un-interested in Games that have victors. I will write this up one day with a working title of How Winners behave like Loosers”.

Larger Tournaments

Axelrod went on to involved thousands of people in tournaments to find the best strategies for the Repeated Prisoners Dilemma. The results follow the same findings, published by Stanford

Daylight World examples of The Prisoners Dilemma

Here is a brief summary of ‘real world’ examples of the Reciprocity (tit for tat) strategy.

Trench Warfare.

John reviewed a sociology journal referring to the Live and Let Live strategy that was reported during WWI. Artillery would shoot between the first and second trench lines. In doing so, they demonstrated accuracy, while showing the other side how not to kill or main, while keeping their Generals happy.

The Selfish Gene and Altruism

John and Steve discuss Richard Dawkin’s glowing review to The Evolution of Cooperation (a book I can’t wait to read). Dawkin’s describes it as the best replacement for the Gideon Bible. But Steve asks, don’t the key findings around Cooperation run contrary to the Selfish Gene ideas?

Steve - “When I interact with strangers, I, for a variety of complicated reasons around identity and wanting to feel like I’m a good person… I act like I care about them a little. I’ll be nice to them even if it’s unlikely we’ll play games over and over again.”

John - “Kinship and ethnocentrism are well studied basis for Cooperation” (BB: Mmmm - I don’t think these ideas are likely to fly in academia these day)

After some back and forth on altruism, John suggests that a strong bias towards ‘nice-ness’/tendance-to-cooperate will result in “being taken advantage of”. (BB - I think we’d all have seen or experienced this behavioral dynamic)

The Key Finding for me

John: “You can do really well without ever doing better than the ‘other player’. In fact, ‘tit for tat’ can not possibly do better, because it never Defects until the other ‘player’ does. This is quite counter-intuitive to most people.”

Here’s the kicker (IMO) from John “Zero sum thinking can’t make sense of this at all. But it makes sense in this context, where what works well is, if you can elicit Cooperation, not necessarily to hurt the ‘other guy’.”

For me, this helps me understand my instinct to call Winners Loosers

John “It’s obvious after you Say it. You can do really well without outdoing the ‘other guy’. It’s not obvious before you say it and in fact, it even seems nuts”

Wrapping Up

You can finish off the episode with some application of John’s work to the method of action in cancer cells through to the design of Cyber warfare strategies.

Steve’s coda “My first Life Principle is to be Nice. My second Life Principle, is to be provokable - when someone tries to take advantage of me, I’m willing to fight. My third life principle is to be Forgiving. While I still have some work to do to reach the unlimited forgiveness embodied by Tit for Tat. All and all, these principles seem a pretty good framework to build a Life. Not bad for a strategy that only requires two lines of code”

I’m looking forward to reading The Evolution of Cooperation

Oh, I realise that I haven’t crystalied the Two Line of Code thing. The podcasts assumes you can visualise this or grok it in your own way.

I believe it would resemble the following (which could be written on 2 lines, but I’m not feeling the relevance of that right now.)

start:

Cooperate()
while playing:
  if OtherPlayerCooperated:
    Cooperate()
  else
    Defect()
end

10 Oct 2021

winning-is-boring

« Spring Is The Word From Both Sides »

Ben's musings on life