## Python Connect4 AI with MTD(f) algorithm My favorite three games to implement when learning a new programming are: TicTacToe, Connect Four, and Minesweeper. Last month, I wrote a post about implementing TicTacToe in Python. This time I implemented “Connect Four” (also known as “Four in a Row” and “Four in a Line”). When implementing some AI for TicTacToe I chose the basic “minimax” algorithm without any optimization. This time I went a little further and improved the minimax algorithm by adding a “horizon”, “alpha-beta pruning”, a “null-window”, “iterative deepening”, and a “transposition table” turning the minimax into a “Memory-enhanced Test Driver” also known as: MTD(f).

MTD(f) = minimax + horizon + alpha-beta pruning + null-window + iterative deepening + transposition table

MTD(f), is a higly optimized minimax algorithm. It was developed in 1994 by Aske Plaat, Jonathan Schaeffer, Wim Pijls, and Arie de Bruin.  The name MTD(f) is an abbreviation for MTD(n,f) (Memory-enhanced Test Driver with node n and value f). It is a variant of minimax and an alternative to the alpha-beta pruning algorithm.

## Minimax

The minimax theorem states (source: Wikipedia)

For every two-person, zero-sum game with finitely many strategies, there exists a value V and a mixed strategy for each player, such that:

(a) Given player 2’s strategy, the best payoff possible for player 1 is V, and
(b) Given player 1’s strategy, the best payoff possible for player 2 is −V. A game with just two possible moves on each turn will have a search tree as shown in the above picture. The circles represent possible moves by the current player (max), while the squares represent the opponents moves (min).

## Horizon

One could limit the tree to a look-ahead of four moves to limit the time the AI is “thinking”. This is done in the tree above. This is called the “horizon effect”, since the algorithm will not look any further. On the 4th level a “heuristic” function will be used to score the nodes. The better the heuristic function, the better the AI will play. Values for won and lost games are typically +infinity and -infinity, but in MTD(f) they should probably be a valid integer, like +1000 and -1000.

## Alpha-beta pruning

Using alpha-beta pruning one can reduce that amount of nodes that have to be visited by the minimax algorithm by keep track of the best and worst values. If it finds a move to be worse than a previously visited move it will not evaluate that move any further. Effectively the algorithm “prunes” away the branches of the search tree that do not influence the final decision. Below is an example of a search tree in which the moves are evaluated from left to right. The red marks are where the branches are pruned away, so that the gray branches are not evaluated. ## Null window

The concept of a “null window” is also known as a “zero”, “scout” or “narrow” window. This is a way to reduce the search space by performing a series of boolean tests, thereby searching for a move that produces a worse or better score. With integers a null-window is defined as alpha == beta – 1. Note that a null window search does not return an exact score, but either an upper bound with score < beta (score <= alpha) or a lower bound with score >= beta (score > alpha). It is kind of hard to explain and I believe that if you want to truly understand null window you should read the code.

## Iterative deepening

Iterative deepening means that we limit the search time to some fixed amount and then we evaluate the nodes in the search tree. We start evaluating with a horizon or look ahead of one move, then we look ahead for two moves, then three, etc. Whenever we hit the time limit we continue the search with the current look ahead value, but stop afterwards. This makes the AI “thinking” time more or less constant and the look ahead variable, but dependent on the (remaining) size of the tree. This way we avoid that the AI is very slow for the first moves (when the search tree is still large) and very fast at the end (when there are not many possible moves left).

## Transposition table

A transposition table stores the outcome (upper and lower bound scores) of the evaluation of a board state in a hash table. This allows to reduce the search space if a single board state can be reached with different sequences of moves. The transposition table also stores the mirrored board state, since these are supposed to be equally scored. This may greatly reduce the size of the search tree in the early stage of the game.

## Connect Four Python source ```          # coding=UTF8
from Tkinter import Tk, Button, Frame, Canvas
from tkFont import Font
from copy import deepcopy
from time import time

class Board:

nodes = {}

def __init__(self,other=None):
self.player = 'X'
self.opponent = 'O'
self.empty = '.'
self.width = 7
self.height = 6
self.fields = {}
for y in range(self.height):
for x in range(self.width):
self.fields[x,y] = self.empty
# copy constructor
if other:
self.__dict__ = deepcopy(other.__dict__)

def move(self,x):
board = Board(self)
for y in range(board.height):
if board.fields[x,y] == board.empty:
board.fields[x,y] = board.player
break
board.player,board.opponent = board.opponent,board.player
return board

def __heuristic(self):
return self.__heuristic_score(self.player)-self.__heuristic_score(self.opponent)

def __heuristic_score(self, player):
lines = self.__winlines(player)
winpositions = self.__winpositions(lines,player)
score = 0
for x in range(self.width):
for y in range(self.height-1,0,-1):
win = winpositions.get("{0},{1}".format(x,y),False)
below = winpositions.get("{0},{1}".format(x,y-1),False)
if win and below:
score+=self.height-y*100
for line in lines:
pieces = 0
height = []
for x,y in line:
if self.fields[x,y] == player:
pieces = pieces + 1
elif self.fields[x,y] == self.empty:
height.append(y)
heightscore = self.height - int(sum(height) / float(len(height)))
score=score+pieces*heightscore
return score

def __winpositions(self, lines, player):
lines = self.__winlines(player)
winpositions = {}
for line in lines:
pieces = 0
empty = None
for x,y in line:
if self.fields[x,y] == player:
pieces = pieces + 1
elif self.fields[x,y] == self.empty:
if not empty == None:
break
empty = (x,y)
if pieces==3:
winpositions["{0},{1}".format(x,y)]=True
return winpositions

def __winlines(self, player):
lines = []
# horizontal
for y in range(self.height):
winning = []
for x in range(self.width):
if self.fields[x,y] == player or self.fields[x,y] == self.empty:
winning.append((x,y))
if len(winning) >= 4:
lines.append(winning[-4:])
else:
winning = []
# vertical
for x in range(self.width):
winning = []
for y in range(self.height):
if self.fields[x,y] == player or self.fields[x,y] == self.empty:
winning.append((x,y))
if len(winning) >= 4:
lines.append(winning[-4:])
else:
winning = []
# diagonal
winning = []
for cx in range(self.width-1):
sx,sy = max(cx-2,0),abs(min(cx-2,0))
winning = []
for cy in range(self.height):
x,y = sx+cy,sy+cy
if x<0 or y<0 or x>=self.width or y>=self.height:
continue
if self.fields[x,y] == player or self.fields[x,y] == self.empty:
winning.append((x,y))
if len(winning) >= 4:
lines.append(winning[-4:])
else:
winning = []
# other diagonal
winning = []
for cx in range(self.width-1):
sx,sy = self.width-1-max(cx-2,0),abs(min(cx-2,0))
winning = []
for cy in range(self.height):
x,y = sx-cy,sy+cy
if x<0 or y<0 or x>=self.width or y>=self.height:
continue
if self.fields[x,y] == player or self.fields[x,y] == self.empty:
winning.append((x,y))
if len(winning) >= 4:
lines.append(winning[-4:])
else:
winning = []
# return
return lines

def __iterative_deepening(self,think):
g = (3,None)
start = time()
for d in range(1,10):
g = self.__mtdf(g, d)
if time()-start>think:
break
return g;

def __mtdf(self, g, d):
upperBound = +1000
lowerBound = -1000
best = g
while lowerBound < upperBound:
if g == lowerBound:
beta = g+1
else:
beta = g
g = self.__minimax(True, d, beta-1, beta)
if g!=None:
best = g
if g < beta:
upperBound = g
else:
lowerBound = g
return best

def __minimax(self, player, depth, alpha, beta):
lower = Board.nodes.get(str(self)+str(depth)+'lower',None)
upper = Board.nodes.get(str(self)+str(depth)+'upper',None)
if lower != None:
if lower >= beta:
return (lower,None)
alpha = max(alpha,lower)
if upper != None:
if upper <= alpha:
return (upper,None)
beta = max(beta,upper)
if self.won():
if player:
return (-999,None)
else:
return (+999,None)
elif self.tied():
return (0,None)
elif depth==0:
return (self.__heuristic(),None)
elif player:
best = (alpha,None)
for x in range(self.width):
if self.fields[x,self.height-1]==self.empty:
value = self.move(x).__minimax(not player,depth-1,best,beta)
if value>best:
best = value,x
if value>beta:
break
else:
best = (beta,None)
for x in range(self.width):
if self.fields[x,self.height-1]==self.empty:
value = self.move(x).__minimax(not player,depth-1,alpha,best)
if value<best:
best = value,x
if alpha>value:
break
if best <= alpha:
Board.nodes[str(self)+str(depth)+"upper"] = best
Board.nodes[self.__mirror()+str(depth)+"upper"] = best
elif best >= beta:
Board.nodes[str(self)+str(depth)+"lower"] = best
Board.nodes[self.__mirror()+str(depth)+"lower"] = best
return best

def best(self):
return self.__iterative_deepening(2)

def tied(self):
for (x,y) in self.fields:
if self.fields[x,y]==self.empty:
return False
return True

def won(self):
# horizontal
for y in range(self.height):
winning = []
for x in range(self.width):
if self.fields[x,y] == self.opponent:
winning.append((x,y))
if len(winning) == 4:
return winning
else:
winning = []
# vertical
for x in range(self.width):
winning = []
for y in range(self.height):
if self.fields[x,y] == self.opponent:
winning.append((x,y))
if len(winning) == 4:
return winning
else:
winning = []
# diagonal
winning = []
for cx in range(self.width-1):
sx,sy = max(cx-2,0),abs(min(cx-2,0))
winning = []
for cy in range(self.height):
x,y = sx+cy,sy+cy
if x<0 or y<0 or x>=self.width or y>=self.height:
continue
if self.fields[x,y] == self.opponent:
winning.append((x,y))
if len(winning) == 4:
return winning
else:
winning = []
# other diagonal
winning = []
for cx in range(self.width-1):
sx,sy = self.width-1-max(cx-2,0),abs(min(cx-2,0))
winning = []
for cy in range(self.height):
x,y = sx-cy,sy+cy
if x<0 or y<0 or x>=self.width or y>=self.height:
continue
if self.fields[x,y] == self.opponent:
winning.append((x,y))
if len(winning) == 4:
return winning
else:
winning = []
# default
return None

def __mirror(self):
string = ''
for y in range(self.height):
for x in range(self.width):
string+=' '+self.fields[self.width-1-x,self.height-1-y]
string+="\n"
return string

def __str__(self):
string = ''
for y in range(self.height):
for x in range(self.width):
string+=' '+self.fields[x,self.height-1-y]
string+="\n"
return string

class GUI:

def __init__(self):
self.app = Tk()
self.app.title('Connect4')
self.app.resizable(width=False, height=False)
self.board = Board()
self.buttons = {}
self.frame = Frame(self.app, borderwidth=1, relief="raised")
self.tiles = {}
for x in range(self.board.width):
handler = lambda x=x: self.move(x)
button = Button(self.app, command=handler, font=Font(family="Helvetica", size=14), text=x+1)
button.grid(row=0, column=x, sticky="WE")
self.buttons[x] = button
self.frame.grid(row=1, column=0, columnspan=self.board.width)
for x,y in self.board.fields:
tile = Canvas(self.frame, width=60, height=50, bg="navy", highlightthickness=0)
tile.grid(row=self.board.height-1-y, column=x)
self.tiles[x,y] = tile
handler = lambda: self.reset()
self.restart = Button(self.app, command=handler, text='reset')
self.restart.grid(row=2, column=0, columnspan=self.board.width, sticky="WE")
self.update()

def reset(self):
self.board = Board()
self.update()

def move(self,x):
self.app.config(cursor="watch")
self.app.update()
self.board = self.board.move(x)
self.update()
move = self.board.best()
if move!=None:
self.board = self.board.move(move)
self.update()
self.app.config(cursor="")

def update(self):
for (x,y) in self.board.fields:
text = self.board.fields[x,y]
if (text=='.'):
self.tiles[x,y].create_oval(10, 5, 50, 45, fill="black", outline="blue", width=1)
if (text=='X'):
self.tiles[x,y].create_oval(10, 5, 50, 45, fill="yellow", outline="blue", width=1)
if (text=='O'):
self.tiles[x,y].create_oval(10, 5, 50, 45, fill="red", outline="blue", width=1)
for x in range(self.board.width):
if self.board.fields[x,self.board.height-1]==self.board.empty:
self.buttons[x]['state'] = 'normal'
else:
self.buttons[x]['state'] = 'disabled'
winning = self.board.won()
if winning:
for x,y in winning:
self.tiles[x,y].create_oval(25, 20, 35, 30, fill="black")
for x in range(self.board.width):
self.buttons[x]['state'] = 'disabled'

def mainloop(self):
self.app.mainloop()

if __name__ == '__main__':
GUI().mainloop()
```