player

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


def player(board):
    """
    Returns player who has the next turn on a board.
    """
    x_count = sum(row.count(X) for row in board)
    o_count = sum(row.count(O) for row in board)
    if x_count == o_count:
        return X
    else:
        return O

这里通过遍历 board，统计 X 与 O 的数量来确定当前的回合的 player 是谁

copilot 在审查我的代码的时候给出了比较巧妙的统计方法，我其实对 python 的一些类的方法不是很熟悉，因此自己写的会比较笨

actions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


def actions(board):
    """
    Returns set of all possible actions (i, j) available on the board.
    """
    actions = set()
    for i in range(3):
        for j in range(3):
            if board[i][j] is EMPTY:
                actions.add((i,j))
    return actions

这里依旧是遍历，然后把空的位置作为 action 给添加

按照要求，这里的 actions 应该是 set 类型

result

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


def result(board, action):
    """
    Returns the board that results from making move (i, j) on the board.
    """
    if not isinstance(action, tuple) or len(action) != 2:
        raise ValueError("Invalid action")
    
    i,j = action
    
    if i < 0 or i > 2 or j < 0 or j > 2:
        raise ValueError("Invalid action")
        
    if board[i][j] is not EMPTY:
        raise ValueError("Invalid action: cell already occupied")
    
    new_board = copy.deepcopy(board)
    symbol = player(board)
    new_board[i][j] = symbol
    return new_board

这里是通过当前的 board 和 action，来返回 action 后的结果

按照文档的要求，需要对异常情况进行处理，但我一开始忽视了

同时，这里需要用到深度拷贝，防止更改原本的 board（需要导入 copy）

winner

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


def winner(board):
    """
    Returns the winner of the game, if there is one.
    """
    symbol = board[0][0]
    if symbol is not EMPTY and symbol == board[1][1] and symbol == board[2][2]:
        return symbol
    symbol = board[0][2]
    if symbol is not EMPTY and symbol == board[1][1] and symbol == board[2][0]:
        return symbol
    for i in range(3):
        row_symbol = board[i][0]
        col_symbol = board[0][i]
        if row_symbol is not EMPTY and row_symbol == board[i][1] and row_symbol == board[i][2]:
            return row_symbol
        if col_symbol is not EMPTY and col_symbol == board[1][i] and col_symbol == board[2][i]:
            return col_symbol
    return None

这里需要检查是否获胜，也就是某个方向能三个连起来

在检查的时候需要注意，在赋值给 symbol ，比较之前，需要查看是否为 empty，不然的话三个 empty 连着也会 return

terminal

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


def terminal(board):
    """
    Returns True if game is over, False otherwise.
    """
    if winner(board) is not None:
        return True
    else:
        for i in range(3):
            for j in range(3):
                if board[i][j] is EMPTY:
                    return False
        return True

这里是检查是否结束

有赢家的话，就结束

没赢家，且没空位，结束

没有加，有空位，没结束

utility

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


def utility(board):
    """
    Returns 1 if X has won the game, -1 if O has won, 0 otherwise.
    """
    result = winner(board)
    if result is X:
        return 1
    elif result is O:
        return -1
    else:
        return 0

给分环节

minimax

这里是最重要的部分，也是最难的部分，需要实现 minimax

这里我根据课上给的伪代码，实现了min value 和 max value 的函数，用于作为辅助函数帮着递归接下来的情况

对于 minimax 函数，需要根据 player 来分情况讨论

X 是追求 max，O 最求 min

这里由于要求是返回 action，而不是 value，因此我们在 minimax 中不能直接套用 min/max value

而是需要遍历 actions，在 value 改变时，同时改变 action 的值，实现记录，以便最终返回

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43


def minimax(board):
    """
    Returns the optimal action for the current player on the board.
    """
    current_player = player(board)
    possible_actions = actions(board)
    best_action = None
    
    if terminal(board):
        return None
    if current_player is X:
        best_value = float('-inf')
        for action in actions(board):
            value = min_value(result(board,action))
            if value > best_value:
                best_value = value
                best_action = action
    
    if current_player is O:
        best_value = float('inf')
        for action in actions(board):
            value = min_value(result(board,action))
            if value < best_value:
                best_value = value
                best_action = action
    
    return best_action

def max_value(board):
    v =  float('-inf')
    if terminal(board):
        return utility(board)
    for action in actions(board):
        v = max(v, min_value(result(board,action)))
    return v

def min_value(board):
    v = float('inf')
    if terminal(board):
        return utility(board)
    for action in actions(board):
        v = min(v, max_value(result(board,action)))
    return v