Coding Challenge 154: Tic Tac Toe AI with Minimax Algorithm

Coding Challenge 154: Tic Tac Toe AI with Minimax Algorithm


[WHISTLE] Hello, and welcome to
a coding challenge, Tic, Tac, Toe. Oh, with the Minimax algorithm. That’s why I’m here, because
I made this other coding challenge, Tic-Tac-Toe, where
I created a kind of a big– it was kind of a mess if
I’m being perfectly honest. But I made a working version
of the Tic-Tac-Toe game that just played with two
players picking random spots. So since then, you can
check one of my live streams if you want to find
where I did this. I made some adjustments to it so
that I could, as a human being, play the game. So right now I’m going to play
the random computer picker. I’m going to go here. And then I’m going to block X.
And then I’m going to go here. And ha, ha, I win. O wins. So what I have the
adjustment that I made is that I added a
mouse pressed function where I find where did I click? And I put my human
variable, which is the letter o onto the board. And then I called next
turn where next turn picks a random spot
in the board and makes that the AI spot or X’s spot. So the whole point
of this video is for me to implement
something called Minimax, which is an algorithm,
a search algorithm, if you will, to find the optimal
next move for the AI. I mean, it could
find it for myself. And then I could implement it. But the idea is I want this
player, the computer player to beat me at the
game or at least tie to play a perfect game. If you want to learn more
about the Minimax algorithm, I would suggest two
resources that you can look at that I actually
looked at before beginning this coding challenge. I haven’t programmed
this before. But I watched this video
by Sebastian Lague that explains the Minimax algorithm,
also something called alpha-beta pruning, which
I’m not going to implement but could be a good
exercise, next step for you. And I also found this article on
the geeksforgeeks.org website, which is three-part series
about the Minimax algorithm and how to apply
it to Tic-Tac-Toe. So those resources are
probably your best bet for learning about the theory
of the Minimax algorithm and how it can be applied to
a wide variety of scenarios. But I’m going to look at the
specifics of Tic-Tac-Toe, which is a great starting
example, because it’s a simple enough game that
I can actually diagram out all of the possibilities, which
even if that’s computationally possible, it’s very
hard to diagram. And there’s lots of scenarios
where you couldn’t even compute all of the possible outcomes. And chess is an example of that. So Minimax is typically
visualized as a tree. So the root of the tree
is the current state of the game board. [MUSIC PLAYING] So I’ve drawn a configuration
here of the Tic-Tac-Toe board midway through
the game because I don’t want to start
at the beginning where there’s so
many possibilities that I couldn’t diagram. So right now it’s X’s turn. And X has three possible moves. So I could express that in
a tree diagram as move one, move two, and move three. So let’s take a look
at what I mean by that. [MUSIC PLAYING] I’m going to use a blue marker
so you can see the new moves. So let’s say one possible
move is X going here. Let me diagram out the
next two possible moves. [MUSIC PLAYING] Another possible
move is X going here. [MUSIC PLAYING] And another possible
move is X going here. So we need to look
at each of these and determine is the game
over, or is it continuing? So only in this case
is the game over. And in this case, the game
is over, and X has one. So I’m going to mark this
green for the state of X having won the game. Now, it’s O’s turn. And for each of
these spots, O could make two possible options. [MUSIC PLAYING] O could go here,
or O could go here. Oh could go here,
or O could go here. And look at these. In this case and
this case, O has won. Remember, we’re trying to
solve for the optimal move for X. X is making
the first turn. So this means I can mark
these were O wins as red, as like, those are bad outcomes. So while these are
not terminal states, there’s only one move possible
for X. So I could draw an arrow and put that down here. But ultimately I can
just consider this as X has to go here. So these, you could consider
as terminal and in which case X will win. Now, you see that I visualized
all the possible outcomes of the Tic-Tac-Toe game based
on this starting configuration. So how does X pick
which path to go down where we can see here that
the goal is X to win the game to get to here, here, or there. How does it do that? And the way that it does that– by the way, spoiler alert. This is the move it should pick. It should just win instantly. But the idea of the
Minimax algorithm is if we could score
every possible outcome, those scores can
ripple back up the tree and help X decide which way
to go to find the highest possible score. The nice thing
about Tic-Tac-Toe is there’s not a range of scores. The endpoints are either you
won, you lost, or you tied. I could say if X wins,
it’s a score of plus 1 . If O wins, it’s a
score of negative 1. And if it’s a tie,
it’s a score of 0. So in other words, we can
give this a score of plus 1. This a score of plus 1 . This a score of negative 1. This a score of negative 1,
and this a score of plus 1. But we’ve got an issue. I’m trying to pick between
this option, this option, and this option. These two don’t have a score. How do I pick their score? Well, I can pick their score
based on which one of these would be picked. But who’s picking these? This is why the algorithm
is called Minimax, because we have two
players each trying to make an optimal score. X is what’s known as
the maximizing player. So to go from here to
here, it’s X’s turn. And X is maximizing. But here, it’s now O’s turn. O wants to win. X can assume that O is going
to play their optimal move. Now, you could have a situation
where your player doesn’t play optimally. And that’s something you
might have to consider in terms of game theory. But in this case
of the algorithm, we can assume that O is going
to make the best possible move. I mean, if it makes a worse
move, that’s only better for us anyway so it’s all
going to work out. So O is what’s referred to
as the minimizing player– the minimizing player. So O has the option of
picking negative 1 or plus 1, negative 1 or plus 1. Which one is O going to pick
as the minimizing player? The option to win so we can then
take the best outcome for O, the minimizing outcome for
O and pass that back up here’s the score and
the same thing here. Now, X is the maximizing player. Which one of these
is it going to pick– negative 1, plus
1, or negative 1. These are negative
1, because even though it could win
here, if X goes this way, O is going to win. If X goes this way, O is going
to win if O plays optimally. So this is the way to go. This scenario that
I picked might not be the best demonstration
of the idea. In fact, there’s no ties here. There’s only two
turns being played. So maybe what you might want
to do is pause this video and do the same exact
diagram but have X and O with two
starting positions and see if you can diagram
that whole thing out yourself. So how do we implement
this in code? So that your first
clue should be the fact that this is a tree diagram. Anytime you see a tree
diagram, typically speaking, a recursive
algorithm, a function that calls itself is
something that will apply. And this case is no different. I want to call Minimax
on this board position, try, and then loop through
all the possible moves and call Minimax on each one
of these board positions, loop through all the possible moves. If I ever get to an end
point, return the score, and have that score backtrack
up or ripple back up the chain. So let’s see if we can
write the code to do this. So this here is the place where
I want to write the new code. Currently, a move
is chosen by picking a random available spot, and
I no longer want to do that. I’m actually going to change
the name of this function to Best Move. And I don’t need to worry
about this, what’s available. I just want to actually look
at all the possible spots. So basically, I just want to
know is the spot available? If the spot is available,
I want to try going there. So I’m going to say board– IJ is the AI player. And then I want to call
Minimax on the board. Why do I want to call Minimax? Because I want to get
what is the score? I want Minimax with
return to me the score for this particular move. Now, it’s going to have
to recursively check these two spots. But we’ll get there. We’re getting there. Ultimately, though,
I need to figure out which one was the best. So I need to keep
track by saying, like the best score
is negative infinity. If score is greater
than the best score, then that’s the new best score. And the best move is
this particular IJ. And let me declare best move. So, again, I haven’t
implemented Minimax yet. I’m just getting
the basic idea down. For the furthest
turn, I need to start by looking at all the possible
moves, get the score for those by calling this
mysterious Minimax function I’m about to write,
and find the best one, then apply that move. There’s a big
issue here, though, because I am changing the
global variable aboard. I am altering the game. I am moving X to that spot and
then moving X to the next spot. So I can make a
copy of the board. But I think an easy way to deal
with this would actually just be to undo it. So right afterwards, I’m going
to just quickly undo that move. The next step would be to write
the Minimax function that truly write the algorithm itself. I kind of want to know to see
if there’s any errors here. So actually, what
I’m going to do is I’m just going to put
a placeholder function in, Minimax, which receives a board. And then I’m just going
to return the score of 1. So Minimax is always going
to return the score of 1, meaning it’s not going
to be able to pick. They’re all going to be a tie. So it still is going
to pick the first one. But let’s see how that goes. Next turn is not defined
because I called it best move. And it’s in one more
place here in Setup. OK, where? It went there. Watch, it’s going to go to
the next available spot, to the next available spot,
to the next available spot. So you can see this is working. It’s working in a sense that the
algorithm is making a choice. But no matter what,
it’s always just going to go in order of the
board and oh, X won. There’s also a big issue
here, which thankfully no major problems happened. But I named this
function Best Move. And then I created a variable
inside of the function with the same name, which
can always run into trouble. So let me actually just
change this to move. Sort of by definition,
it is the best move– make sure this still works. All right. X is just picking straight down. Now, the hard part. I need to write the
Minimax function itself. So one thing that
I’ve skipped here, which is typical of
a Minimax algorithm, is I would also keep
track of the depth, because ultimately, I could
want to consider like, how long is it taking
me to get somewhere? What is the depth of
the tree that I’m on? So that’s something that you’ll
see often within the algorithm implementation. So let me add that. So I add an argument depth here
and then give it a 0 at first. Oh, you know what. I need another really
important argument. This is a turn-based game. And the algorithm is going
to behave differently, whether it is the
maximizing player’s turn, X, or the minimizing players
turn, O, in this case. And, again, you could have
more than two players. Or there could be a lot
of weird stuff going on. But that’s ultimately
what I need here. So let me also add
is maximizing– is maximizing. And this here would be true. All right, so I want
to perform Minimax on this board with a
starting depth to 0. And the first move is
the maximizing player. So let’s add maximizing. What do I want to do? Actually, what I want to
do even before this is check to see if somebody won. [MUSIC PLAYING] I’m pretty sure I already
have a check winner function. That’s something that I wrote in
the previous coding challenge. Let me double-check that– check winner. The winner is null. And then by the end,
it returns a tie. It returns null, a
tie, or the winner. So the score is– let’s make a little lookup
table that’s basically exactly what I wrote here. If X wins, it’s a 0. If O wins, it’s a a negative 1. If it’s a tie, it’s a 0. [MUSIC PLAYING] All right, so this is my lookup
table for what the score then– and, again, it’s, a
very simple scenario. There’s only three
possible scores. If the result is
not equal to null, the score is the
number associated with this particular
O. It got rid of those. I didn’t need those quotes. Visual Studio Code, just
clean that up for me, thank you very much. The score is based
on whoever won, and then I can return that. So this would be the
terminal condition. If I am calling Minimax
on this particular board configuration at this particular
depth, and it’s an end state, if it’s a terminal state,
just return the score. That’s what it’s supposed to do. [RINGING] If it’s not this state, I
can’t just return the score. I need to check all the
possible next moves. So I got this wrong actually. The next move, this
is the AI player trying a spot, their move. So the next move is actually
not the maximizing player. This should be false. But I’m going to write
this as if it is. So if it’s the
maximizing player, I can copy, paste
that from above, check all the
possible spots again. If it’s available, try. Now, that’s the human, right? Then, oh, no, this is the AI. I’m confused, because
clearly the way I’m stepping through this,
in my mind it’s O’s turn. But I’m ready the code
for both, whether it’s X’s turn or O’s turn. So for the sake of argument,
it’s X’s turn right now. So I’m checking
all of those spots. And then I’m going
to say return. I need to find the best move. I’m kind of doing
when I did again. Is there a way for me to reduce
the amount of code I’m writing? I’m not going to
worry about this. I want to, once
again, find the best score, which will be in this
case, negative infinity. And I want to say the score
is the Minimax algorithm of the board at depth plus 1. And now the next
player’s turn is false because I’m the
maximizing player. And then I’m going to undo
this just like I did before. Why do I have to
write this code twice? I’ll think about it later
if I can refactor this, only do it once. If score is greater
than best score, best score is equal to score. And then after all of this– for loop, for loop,
if, it’s an empty spot, boom, boom, return
the best score. So this is finding
the best score for all the possible next
turns by the AI player. Or if it’s not the AI player,
if it’s the minimizing player, we can do exactly
the same thing. And, again, maybe there’s a way. I’m missing a curly bracket? Yes, thank you. There’s a curly bracket. So this is very important. There’s a lot of ifs and
for state for loops here. So maybe there’s also a nice
way to refactor this and make it more readable. But if it’s the maximizing
player, check all the spots, find the best possible
outcome, and return it. But call Minimax recursively
at the next future move. And then here, the
minimizing player wants to find the best score for
it, which is the lowest score. So and that’s the human player. And if the score is less
than the best score and then return that best score. Well, oh, forget
about this return one. I’m going to stare
at this to make sure all my curly brackets line up– if, if, for, for, and then
return the best score. Otherwise, if, if, for, for,
and return the best score. I think this might
actually be good. Should we just try it? [DRUM ROLL] Error on line 33,
return– hold on. Oh, this should be–
ah, look at this. So this is a huge mistake here. Return true– why did
I have that there? I don’t know why I did that. That’s return score. This should be a return score. The whole point of
this– and I don’t even need a separate variable here. I can just say, return
scores result. Right, so this is any kind
of recursive function needs a terminal condition,
needs to exit, right? This function is always going to
keep calling itself and calling itself Minimax– Minimax– Minimax– Minimax. I don’t know what this extra
code is in here and Minimax, OK. Oh, yes. Simon is pointing
out something to me, which is great, which I don’t
need this IF statement to find the best score– the whole point of this mini. And also, there’s
a great chance we use the min function and the
max function because it’s the Minimax algorithm. So I can actually say
score is the lower one. The minimum between
score and best score is going to make it
way easier to read. Thank you. And oh, no, but
this one is max– is max. And then this one–
oh, thank you for this. I don’t know why I
didn’t think of this. I’m sure it’s in
every implementation score is the minimum between
score and best score. So this makes it much
easier to read here, OK. Let’s try it. Let’s try it. Why not? Caution to the wind. [DRUM ROLL] OK, X went in the top left. That’s definitely the
optimal place to go. I’m going to go here. [DRUM ROLL] No, no bad X– bad X. You’re not going
in the right place. [MUSIC PLAYING] Yeah, I beat you. You’re still going in order. Why are you still
going in order? Oh, whoa, you’re
not going in order. You’re making weird decisions. Oh, oh, oh. I see the error. I see the error of my ways. This has to be true, right? After the maximizing
player’s turn, it’s the minimizing
player’s turn. After the minimizing
player’s turn, it’s the maximizing
player’s turn, OK. [DRUM ROLL] That’s a good move, X. I
see what you did there. [DRUM ROLL] [MUSIC PLAYING] Why can’t you figure
out to go there, X? Oh, oh. [BUZZ] OK. This, I’m finding this. Score is the new score. And the best score
should be the bigger one, the maximum between score
and best score, not score. This has to be this, because
I’m returning best score. And this has to be this. OK, OK. [WHISTLING] Let’s see. I really should not continue
to play this drum sound. [DRUM ROLL] Come on, X– figure
out where to go. OK, I’m going to go here. Oh, shit, come on. Oh, no, no, no, no, no, no, no,
no, no, no, no, no, no, no, no, no– [BUZZ] That’s in the wrong place. It’s in the wrong part. I knew. Look, my brackets–
bad brackets. I need to return the
best score, hello, after I checked all
of the possibilities, meaning both for loops, the
I and the J I’ve completed. I got that right up
here but not down here. Oh, OK. I really think we’ve
got it this time. Tympani sound. [DRUM ROLL] Ah, yes. [DRUM ROLL] [MUSIC PLAYING] Ha, ha, tied. All right. All right, you. We’re going to go with
the competition now. I’m going let you
in by going here. Ah. So if I don’t go in,
if I go in a corner, yet, if I don’t
go to the middle, it’s always going to win. So X is always going
to win, unless I go to the middle in which
there’s always going to be a tie unless I make a mistake. Then, it’s going to win. But if I go to any
other spot, X is going to be able to win
because it’s always going to– [LAUGHS] OK, this work. Hey, man. So that’s the Minimax algorithm. [RING] So one thing that
might be interesting for me to try here really
quickly before I sort of finish off this challenge
would just be the comment out the AI going first. So I know this is
technically correct because X is supposed to go first. But let’s see what happens
if I start with a fresh broad and I go first. So I’m going to go here. I’m going to go here. I’m going to here. Oh. I’m going to go here. Oh, that’s high. Ooh, and something
became undefined. Oh, if I go last, I guess
there’s no move for X. And I’ve got an error. So that’s [INAUDIBLE]. Can I beat– No, I lost. I don’t believe there’s any way
for me to beat the AI player. It’s a solved game
because it’s always going to go to the center. So I could try going here. I could go here, go here. The best I could do is a tie. So you’ve watched
this coding challenge. Maybe you have a
creative idea about how to make your own
version of this. But I just quickly whipped
up thanks to the chat in the Livestream. Also, some suggestions
of things you could try. So first of all, what
happens if you just make two AI players that
just play against each other over and over again? Well, in that case, it’s going
to be a tie every single time. It’s a solved game. But that might lead to some
interesting possibilities. One thing that I’m not using is
using the depth in the score. So, for example, if I
go back to the code, and I change these to,
say, 10 and negative 10, I could account for
the depth, meaning I could have a higher score if
I win more quickly than if it takes longer to win. Right now, I’m not
accounting for that. So where in the code would
you subtract out depth? That’s something you could try. Certainly it would be
interesting to make a larger board. So can you play Tic-Tac-Toe
in a 5 by 5 or a 7 by 7? The winning conditions
have changed. The optimal play would change. That would be exciting to try. I hope somebody does that. You could try more players. Like, how would you
have a Tic-Tac-Toe game with three players? That’s something you
try with a larger board. You could try another game. Maybe Connect Four would work. Might be able to
apply Minimax too. Just thinking about the
interface, animation. Like, right now whenever
I go, the next turn happens immediately. You could think about timing
and that sort of thing. But then I would say high
degree of difficulty. And maybe worth
me coming back to in a future video at some point
would be these two last items. One is known as
alpha beta pruning. Alpha beta pruning refers to an
aspect of the Minimax algorithm where you find an optimal path. And you know that all
the other possibilities are going to be worse. And you don’t need to go forth
and check every possibility. So you could research
how that works and add it to this algorithm. It’s explained in the
Sebastian Lague video. Then there’s a possibility of a
game where the number of moves is so vast you couldn’t possibly
compute the entire tree. Chess is the classic
example of that. So you need some
heuristic or a way of an estimated guess
of what the score could be with any given state. So one way of doing this,
which I’m not saying is a good strategy. But a simple thing you
could do with chess is you could add up the
total number of white pieces versus black pieces. And you could say,
like, well, the score is higher if the
maximizing player has more pieces than the opponent. So that’s one way
of approaching it. So you don’t actually stop
at the end of the tree. But you just go some
predetermined depth and then calculate an estimate
and have that ripple back up the tree. So maybe give that a try
with a more complex game. Certainly, other types of AI
algorithms and neural networks could play a role at
some point with how you make that estimate. But ultimately,
that is something that would be interesting
to try as well. One more idea that just
came from the chat. So this is quite similar to
the idea of a larger board. But instead of just a larger
two-dimensional board, you could create a
three-dimensional board. So Tic-Tac-Toe, that
happens in a cube. I should just do that as a
coding challenge separately myself. So hopefully, you’ve
learned something. And maybe this algorithm looked
a little terrifying to you at one point, now seems
quite accessible and doable. Maybe your creative idea
is not on this list. Please make your
own version of this. Share it with me at
the codingtrain.com. They’ll be a link in
this video’s description with instructions
for how to share it. If you don’t know how to share
it, just write a comment, file an issue. We, the coding train community
are here to help you. [WHISTLE] And I can’t wait
to see you in a future video. Goodbye. [MUSIC PLAYING]

31 thoughts to “Coding Challenge 154: Tic Tac Toe AI with Minimax Algorithm”

  1. I know you briefly covered this but if you intentionally chose a sub-optimal move (for one or two moves), would this hinder the algorithm more than it would your own chances?

  2. This reminds me of one time that i made Tic Tac Toe in Flash using AS2. I wanted to create unbeatable AI so i just hardcoded every possible move combination. It was about 4K lines of if statements.

  3. I’ve been wanting to do this for Chess for a long time with minmax so this was exciting to see in subscriptions tab. My goal: make a chess game that can beat me (in Python). The first i,j loop you wrote which calls minmax seems redundant because you can just call minmax with the opposite isMaximizing value I think.

  4. Hi Dani, i would like to ask you a quick question. How do you usually find an algorithm to solve a a problem like in this video? Do you just research in the google about a particular problem or need a deep research? Thank you Dani

  5. Hey Dan…
    A thought just hit my mind and before asking on stackoverflow … I am going to put it here..

    Is it possible to use evolutionary algorithms to break a hash? *

    If yes, can you please give it a try…

    *If I sound silly then you are absolutely right because I don't have much knowledge about hashing 😅 and mathematical stuff behind it.

  6. great video! one tip: not copying the entire matrix worked in this case, but won't work in games when tokens move or change like in chess or reversi

  7. I think, as you say, this algorithm could well do with some optimisation Dan. I tried increasing to a 4×4 and the code wouldn't run. Eventually I put a counter on the original 3x example and got the minimax() function being recursively re-entered this many times just on setup:-
    Minimax entered: 59705
    Minimax entered: 63905
    Minimax entered: 59705
    Minimax entered: 63905
    Minimax entered: 55505
    Minimax entered: 63905
    Minimax entered: 59705
    Minimax entered: 63905
    Minimax entered: 59705
    I think anything larger than 3×3 just blows the CPU.

  8. Dan talking to the ai is the most adorable thing I've seen on the internet for a while 🤣
    19:14 "No, no, Bad X! Bad X! 😡"
    19:28 "Oh Woah! You're not going in order, you're making weird decisions"
    19:56 "That's a good move X, I see what you did there 😏"

Leave a Reply

Your email address will not be published. Required fields are marked *