Python Connect4 AI with MTD(f) algorithm

Connect_Four

My favorite three games to implement when learning a new programming are: TicTacToe, Connect Four, and Minesweeper. Last month, I wrote a post about implementing TicTacToe in Python. This time I implemented “Connect Four” (also known as “Four in a Row” and “Four in a Line”). When implementing some AI for TicTacToe I chose the basic “minimax” algorithm without any optimization. This time I went a little further and improved the minimax algorithm by adding a “horizon”, “alpha-beta pruning”, a “null-window”, “iterative deepening”, and a “transposition table” turning the minimax into a “Memory-enhanced Test Driver” also known as: MTD(f).

MTD(f) = minimax + horizon + alpha-beta pruning + null-window + iterative deepening + transposition table

MTD(f), is a higly optimized minimax algorithm. It was developed in 1994 by Aske Plaat, Jonathan Schaeffer, Wim Pijls, and Arie de Bruin.  The name MTD(f) is an abbreviation for MTD(n,f) (Memory-enhanced Test Driver with node n and value f). It is a variant of minimax and an alternative to the alpha-beta pruning algorithm.

Minimax

The minimax theorem states (source: Wikipedia)

For every two-person, zero-sum game with finitely many strategies, there exists a value V and a mixed strategy for each player, such that:

(a) Given player 2’s strategy, the best payoff possible for player 1 is V, and
(b) Given player 1’s strategy, the best payoff possible for player 2 is −V.

300px-Minimax.svg

A game with just two possible moves on each turn will have a search tree as shown in the above picture. The circles represent possible moves by the current player (max), while the squares represent the opponents moves (min).

Horizon

One could limit the tree to a look-ahead of four moves to limit the time the AI is “thinking”. This is done in the tree above. This is called the “horizon effect”, since the algorithm will not look any further. On the 4th level a “heuristic” function will be used to score the nodes. The better the heuristic function, the better the AI will play. Values for won and lost games are typically +infinity and -infinity, but in MTD(f) they should probably be a valid integer, like +1000 and -1000.

Alpha-beta pruning

Using alpha-beta pruning one can reduce that amount of nodes that have to be visited by the minimax algorithm by keep track of the best and worst values. If it finds a move to be worse than a previously visited move it will not evaluate that move any further. Effectively the algorithm “prunes” away the branches of the search tree that do not influence the final decision. Below is an example of a search tree in which the moves are evaluated from left to right. The red marks are where the branches are pruned away, so that the gray branches are not evaluated.

400px-AB_pruning.svg

Null window

The concept of a “null window” is also known as a “zero”, “scout” or “narrow” window. This is a way to reduce the search space by performing a series of boolean tests, thereby searching for a move that produces a worse or better score. With integers a null-window is defined as alpha == beta – 1. Note that a null window search does not return an exact score, but either an upper bound with score < beta (score <= alpha) or a lower bound with score >= beta (score > alpha). It is kind of hard to explain and I believe that if you want to truly understand null window you should read the code.

Iterative deepening

Iterative deepening means that we limit the search time to some fixed amount and then we evaluate the nodes in the search tree. We start evaluating with a horizon or look ahead of one move, then we look ahead for two moves, then three, etc. Whenever we hit the time limit we continue the search with the current look ahead value, but stop afterwards. This makes the AI “thinking” time more or less constant and the look ahead variable, but dependent on the (remaining) size of the tree. This way we avoid that the AI is very slow for the first moves (when the search tree is still large) and very fast at the end (when there are not many possible moves left).

Transposition table

A transposition table stores the outcome (upper and lower bound scores) of the evaluation of a board state in a hash table. This allows to reduce the search space if a single board state can be reached with different sequences of moves. The transposition table also stores the mirrored board state, since these are supposed to be equally scored. This may greatly reduce the size of the search tree in the early stage of the game.

Connect Four Python source

connect4

          # coding=UTF8
          from Tkinter import Tk, Button, Frame, Canvas
          from tkFont import Font
          from copy import deepcopy
          from time import time

          class Board:

            nodes = {}

            def __init__(self,other=None):
              self.player = 'X'
              self.opponent = 'O'
              self.empty = '.'
              self.width = 7
              self.height = 6
              self.fields = {}
              for y in range(self.height):
                for x in range(self.width):
                  self.fields[x,y] = self.empty
              # copy constructor
              if other:
                self.__dict__ = deepcopy(other.__dict__)

            def move(self,x):
              board = Board(self)
              for y in range(board.height):
                if board.fields[x,y] == board.empty:
                  board.fields[x,y] = board.player
                  break
              board.player,board.opponent = board.opponent,board.player
              return board

            def __heuristic(self):
              return self.__heuristic_score(self.player)-self.__heuristic_score(self.opponent)

            def __heuristic_score(self, player):
              lines = self.__winlines(player)
              winpositions = self.__winpositions(lines,player)
              score = 0
              for x in range(self.width):
                for y in range(self.height-1,0,-1):
                  win = winpositions.get("{0},{1}".format(x,y),False)
                  below = winpositions.get("{0},{1}".format(x,y-1),False)
                  if win and below:
                    score+=self.height-y*100
              for line in lines:
                pieces = 0
                height = []
                for x,y in line:
                  if self.fields[x,y] == player:
                    pieces = pieces + 1
                  elif self.fields[x,y] == self.empty:
                    height.append(y)
                heightscore = self.height - int(sum(height) / float(len(height)))
                score=score+pieces*heightscore
              return score

            def __winpositions(self, lines, player):
              lines = self.__winlines(player)
              winpositions = {}
              for line in lines:
                pieces = 0
                empty = None
                for x,y in line:
                  if self.fields[x,y] == player:
                    pieces = pieces + 1
                  elif self.fields[x,y] == self.empty:
                    if not empty == None:
                      break
                    empty = (x,y)
                if pieces==3:
                  winpositions["{0},{1}".format(x,y)]=True
              return winpositions

            def __winlines(self, player):
              lines = []
              # horizontal
              for y in range(self.height):
                winning = []
                for x in range(self.width):
                  if self.fields[x,y] == player or self.fields[x,y] == self.empty:
                    winning.append((x,y))
                    if len(winning) >= 4:
                      lines.append(winning[-4:])
                  else:
                    winning = []
              # vertical
              for x in range(self.width):
                winning = []
                for y in range(self.height):
                  if self.fields[x,y] == player or self.fields[x,y] == self.empty:
                    winning.append((x,y))
                    if len(winning) >= 4:
                      lines.append(winning[-4:])
                  else:
                    winning = []
              # diagonal
              winning = []
              for cx in range(self.width-1):
                sx,sy = max(cx-2,0),abs(min(cx-2,0))
                winning = []
                for cy in range(self.height):
                  x,y = sx+cy,sy+cy
                  if x<0 or y<0 or x>=self.width or y>=self.height:
                    continue
                  if self.fields[x,y] == player or self.fields[x,y] == self.empty:
                    winning.append((x,y))
                    if len(winning) >= 4:
                      lines.append(winning[-4:])
                  else:
                    winning = []
              # other diagonal
              winning = []
              for cx in range(self.width-1):
                sx,sy = self.width-1-max(cx-2,0),abs(min(cx-2,0))
                winning = []
                for cy in range(self.height):
                  x,y = sx-cy,sy+cy
                  if x<0 or y<0 or x>=self.width or y>=self.height:
                    continue
                  if self.fields[x,y] == player or self.fields[x,y] == self.empty:
                    winning.append((x,y))
                    if len(winning) >= 4:
                      lines.append(winning[-4:])
                  else:
                    winning = []
              # return
              return lines

            def __iterative_deepening(self,think):
              g = (3,None)
              start = time()
              for d in range(1,10):
                g = self.__mtdf(g, d)
                if time()-start>think:
                  break
              return g;

            def __mtdf(self, g, d):
              upperBound = +1000
              lowerBound = -1000
              best = g
              while lowerBound < upperBound:
                if g[0] == lowerBound:
                  beta = g[0]+1
                else:
                  beta = g[0]
                g = self.__minimax(True, d, beta-1, beta)
                if g[1]!=None:
                  best = g
                if g[0] < beta:
                  upperBound = g[0]
                else:
                  lowerBound = g[0]
              return best

            def __minimax(self, player, depth, alpha, beta):
              lower = Board.nodes.get(str(self)+str(depth)+'lower',None)
              upper = Board.nodes.get(str(self)+str(depth)+'upper',None)
              if lower != None:
                if lower >= beta:
                  return (lower,None)
                alpha = max(alpha,lower)
              if upper != None:
                if upper <= alpha:
                  return (upper,None)
                beta = max(beta,upper)
              if self.won():
                if player:
                  return (-999,None)
                else:
                  return (+999,None)
              elif self.tied():
                return (0,None)
              elif depth==0:
                return (self.__heuristic(),None)
              elif player:
                best = (alpha,None)
                for x in range(self.width):
                  if self.fields[x,self.height-1]==self.empty:
                    value = self.move(x).__minimax(not player,depth-1,best[0],beta)[0]
                    if value>best[0]:
                      best = value,x
                    if value>beta:
                      break
              else:
                best = (beta,None)
                for x in range(self.width):
                  if self.fields[x,self.height-1]==self.empty:
                    value = self.move(x).__minimax(not player,depth-1,alpha,best[0])[0]
                    if value<best[0]:
                      best = value,x
                    if alpha>value:
                      break
              if best[0] <= alpha:
                Board.nodes[str(self)+str(depth)+"upper"] = best[0]
                Board.nodes[self.__mirror()+str(depth)+"upper"] = best[0]
              elif best[0] >= beta:
                Board.nodes[str(self)+str(depth)+"lower"] = best[0]
                Board.nodes[self.__mirror()+str(depth)+"lower"] = best[0]
              return best

            def best(self):
              return self.__iterative_deepening(2)[1]

            def tied(self):
              for (x,y) in self.fields:
                if self.fields[x,y]==self.empty:
                  return False
              return True

            def won(self):
              # horizontal
              for y in range(self.height):
                winning = []
                for x in range(self.width):
                  if self.fields[x,y] == self.opponent:
                    winning.append((x,y))
                    if len(winning) == 4:
                      return winning
                  else:
                    winning = []
              # vertical
              for x in range(self.width):
                winning = []
                for y in range(self.height):
                  if self.fields[x,y] == self.opponent:
                    winning.append((x,y))
                    if len(winning) == 4:
                      return winning
                  else:
                    winning = []
              # diagonal
              winning = []
              for cx in range(self.width-1):
                sx,sy = max(cx-2,0),abs(min(cx-2,0))
                winning = []
                for cy in range(self.height):
                  x,y = sx+cy,sy+cy
                  if x<0 or y<0 or x>=self.width or y>=self.height:
                    continue
                  if self.fields[x,y] == self.opponent:
                    winning.append((x,y))
                    if len(winning) == 4:
                      return winning
                  else:
                    winning = []
              # other diagonal
              winning = []
              for cx in range(self.width-1):
                sx,sy = self.width-1-max(cx-2,0),abs(min(cx-2,0))
                winning = []
                for cy in range(self.height):
                  x,y = sx-cy,sy+cy
                  if x<0 or y<0 or x>=self.width or y>=self.height:
                    continue
                  if self.fields[x,y] == self.opponent:
                    winning.append((x,y))
                    if len(winning) == 4:
                      return winning
                  else:
                    winning = []
              # default
              return None

            def __mirror(self):
              string = ''
              for y in range(self.height):
                for x in range(self.width):
                  string+=' '+self.fields[self.width-1-x,self.height-1-y]
                string+="\n"
              return string

            def __str__(self):
              string = ''
              for y in range(self.height):
                for x in range(self.width):
                  string+=' '+self.fields[x,self.height-1-y]
                string+="\n"
              return string

          class GUI:

            def __init__(self):
              self.app = Tk()
              self.app.title('Connect4')
              self.app.resizable(width=False, height=False)
              self.board = Board()
              self.buttons = {}
              self.frame = Frame(self.app, borderwidth=1, relief="raised")
              self.tiles = {}
              for x in range(self.board.width):
                handler = lambda x=x: self.move(x)
                button = Button(self.app, command=handler, font=Font(family="Helvetica", size=14), text=x+1)
                button.grid(row=0, column=x, sticky="WE")
                self.buttons[x] = button
              self.frame.grid(row=1, column=0, columnspan=self.board.width)
              for x,y in self.board.fields:
                tile = Canvas(self.frame, width=60, height=50, bg="navy", highlightthickness=0)
                tile.grid(row=self.board.height-1-y, column=x)
                self.tiles[x,y] = tile
              handler = lambda: self.reset()
              self.restart = Button(self.app, command=handler, text='reset')
              self.restart.grid(row=2, column=0, columnspan=self.board.width, sticky="WE")
              self.update()

            def reset(self):
              self.board = Board()
              self.update()

            def move(self,x):
              self.app.config(cursor="watch")
              self.app.update()
              self.board = self.board.move(x)
              self.update()
              move = self.board.best()
              if move!=None:
                self.board = self.board.move(move)
                self.update()
              self.app.config(cursor="")

            def update(self):
              for (x,y) in self.board.fields:
                text = self.board.fields[x,y]
                if (text=='.'):
                  self.tiles[x,y].create_oval(10, 5, 50, 45, fill="black", outline="blue", width=1)
                if (text=='X'):
                  self.tiles[x,y].create_oval(10, 5, 50, 45, fill="yellow", outline="blue", width=1)
                if (text=='O'):
                  self.tiles[x,y].create_oval(10, 5, 50, 45, fill="red", outline="blue", width=1)
              for x in range(self.board.width):
                if self.board.fields[x,self.board.height-1]==self.board.empty:
                  self.buttons[x]['state'] = 'normal'
                else:
                  self.buttons[x]['state'] = 'disabled'
              winning = self.board.won()
              if winning:
                for x,y in winning:
                  self.tiles[x,y].create_oval(25, 20, 35, 30, fill="black")
                for x in range(self.board.width):
                  self.buttons[x]['state'] = 'disabled'

            def mainloop(self):
              self.app.mainloop()

          if __name__ == '__main__':
            GUI().mainloop()

Links

http://en.wikipedia.org/wiki/MTD-f

http://chessprogramming.wikispaces.com/MTD%28f%29

http://people.csail.mit.edu/plaat/mtdf.html

Share

Python TicTacToe with Tk and minimax AI

tictactoe

Whenever I’m learning a new language I always implement some little games. My favorite 3 games to are:

  • TicTacToe
  • Connect4
  • Minesweeper

These are easy to implement since they are easy point-and-click games that do not require complex graphics or animation. They will require some multi-dimensional arrays fiddling and basic logic to implement. The GUI needed is pretty straight-forward and can be build using a window and a grid of buttons. Any window toolkit will provide these. If you like the idea of programming a game to learn a language, but you don’t like these specific games, check out “49 ideas for game clones to code“. Note that learning Python by programming small computer games is also suggested on “inventwithpython.com” (free e-book).

If you are not attracted to game programming you might want to check out “Python koans“.

Koan: a paradoxical anecdote or riddle without a solution, used in Zen Buddhism to demonstrate the inadequacy of logical reasoning and provoke enlightenment.

The Python Koans is a set of exercises that will make you learn Python. It is inspired by the “Ruby Koans“, which are also very good. You will run the tests and see them fail (red) and by changing the code you can make the tests pass (green). This is just as the Test Driven Development (TDD) method prescribes. So the Koans make you learn programming and TDD at the same time. Awesome!

TicTacToe in Python with “minimax” AI

This is my code for TicTacToe. It uses Tk since this is included in the Windows and MacOSX distributions of Python. In Linux “python-tk” is easy to install using the package manager. The game implements the “minimax” algorithm for Artificial Intelligence (AI). It has no heuristic (depth parameter and horizon effect) so it plays at infinite strength. With this game that is possible, because of it limited set of possible states.

          # coding=UTF8
          from Tkinter import Tk, Button
          from tkFont import Font
          from copy import deepcopy

          class Board:

            def __init__(self,other=None):
              self.player = 'X'
              self.opponent = 'O'
              self.empty = '.'
              self.size = 3
              self.fields = {}
              for y in range(self.size):
                for x in range(self.size):
                  self.fields[x,y] = self.empty
              # copy constructor
              if other:
                self.__dict__ = deepcopy(other.__dict__)

            def move(self,x,y):
              board = Board(self)
              board.fields[x,y] = board.player
              (board.player,board.opponent) = (board.opponent,board.player)
              return board

            def __minimax(self, player):
              if self.won():
                if player:
                  return (-1,None)
                else:
                  return (+1,None)
              elif self.tied():
                return (0,None)
              elif player:
                best = (-2,None)
                for x,y in self.fields:
                  if self.fields[x,y]==self.empty:
                    value = self.move(x,y).__minimax(not player)[0]
                    if value>best[0]:
                      best = (value,(x,y))
                return best
              else:
                best = (+2,None)
                for x,y in self.fields:
                  if self.fields[x,y]==self.empty:
                    value = self.move(x,y).__minimax(not player)[0]
                    if value<best[0]:
                      best = (value,(x,y))
                return best

            def best(self):
              return self.__minimax(True)[1]

            def tied(self):
              for (x,y) in self.fields:
                if self.fields[x,y]==self.empty:
                  return False
              return True

            def won(self):
              # horizontal
              for y in range(self.size):
                winning = []
                for x in range(self.size):
                  if self.fields[x,y] == self.opponent:
                    winning.append((x,y))
                if len(winning) == self.size:
                  return winning
              # vertical
              for x in range(self.size):
                winning = []
                for y in range(self.size):
                  if self.fields[x,y] == self.opponent:
                    winning.append((x,y))
                if len(winning) == self.size:
                  return winning
              # diagonal
              winning = []
              for y in range(self.size):
                x = y
                if self.fields[x,y] == self.opponent:
                  winning.append((x,y))
              if len(winning) == self.size:
                return winning
              # other diagonal
              winning = []
              for y in range(self.size):
                x = self.size-1-y
                if self.fields[x,y] == self.opponent:
                  winning.append((x,y))
              if len(winning) == self.size:
                return winning
              # default
              return None

            def __str__(self):
              string = ''
              for y in range(self.size):
                for x in range(self.size):
                  string+=self.fields[x,y]
                string+="\n"
              return string

          class GUI:

            def __init__(self):
              self.app = Tk()
              self.app.title('TicTacToe')
              self.app.resizable(width=False, height=False)
              self.board = Board()
              self.font = Font(family="Helvetica", size=32)
              self.buttons = {}
              for x,y in self.board.fields:
                handler = lambda x=x,y=y: self.move(x,y)
                button = Button(self.app, command=handler, font=self.font, width=2, height=1)
                button.grid(row=y, column=x)
                self.buttons[x,y] = button
              handler = lambda: self.reset()
              button = Button(self.app, text='reset', command=handler)
              button.grid(row=self.board.size+1, column=0, columnspan=self.board.size, sticky="WE")
              self.update()

            def reset(self):
              self.board = Board()
              self.update()

            def move(self,x,y):
              self.app.config(cursor="watch")
              self.app.update()
              self.board = self.board.move(x,y)
              self.update()
              move = self.board.best()
              if move:
                self.board = self.board.move(*move)
                self.update()
              self.app.config(cursor="")

            def update(self):
              for (x,y) in self.board.fields:
                text = self.board.fields[x,y]
                self.buttons[x,y]['text'] = text
                self.buttons[x,y]['disabledforeground'] = 'black'
                if text==self.board.empty:
                  self.buttons[x,y]['state'] = 'normal'
                else:
                  self.buttons[x,y]['state'] = 'disabled'
              winning = self.board.won()
              if winning:
                for x,y in winning:
                  self.buttons[x,y]['disabledforeground'] = 'red'
                for x,y in self.buttons:
                  self.buttons[x,y]['state'] = 'disabled'
              for (x,y) in self.board.fields:
                self.buttons[x,y].update()

            def mainloop(self):
              self.app.mainloop()

          if __name__ == '__main__':
            GUI().mainloop()

Copy the above code into a file and save that as “tictactoe.py”. Then execute it using “python tictactoe.py”. Enjoy!

Python3 compatibility and dependencies

The code is fully Python3 compatible, you only need to change the names of the external modules, so that:

from Tkinter import Tk, Button
from tkFont import Font

Becomes:

from tkinter import Tk, Button
from tkinter.font import Font

The packages you need to install on Ubuntu Linux for Python3 may be called “python3” and “python3-tk”. In order to run the code in Python3 you need to use the “python3” command instead of the normal “python” one.

Share