API¶
In this page we provide documentation for our Xiangqi environment and other APIs the users might be interested in using.
XiangQiEnv¶
XiangQiEnv provides core environment attributes and methods that all OpenAI Gym environments provide.
-
class
gym_xiangqi.envs.xiangqi_env.XiangQiEnv(ally_color=0)¶ This is Xiangqi (Chinese chess) game implemented as reinforcement learning environment using OpenAI Gym framework. Xiangqi is played on a board of 10 rows and 9 columns with 16 pieces on each side (7 unique pieces called General, Advisor, Elephant, Horse, Chariot, Cannon and Soldier.
Starting State: The initial board state with pieces laid out in correct position. Reference our GitHub Wiki page for initial board illustration.
Episode Termination: Either the red or black general is captured by the opponent.
-
observation_space¶ The observation space is the state of the board and pieces. Each item in the space corresponds to a single coordinate on the board with the value range from -16 to 16 which represents the pieces. Negative integers are enemy pieces and positive integers are ally pieces. For specific piece ID mapping, please reference gym_xiangqi/constants.py.
- Type
gym.spaces.Box(10, 9)
-
action_space¶ The action space is an aggregation of all possible moves even including illegal moves. Each space encodes 3 information: which piece, from where, and to where.
From the size 16 * 10 * 9 * 10 * 9, 16 is the number of pieces and 10 * 9 is all possible grid positions on the board where the first 10 * 9 represents the start position and the second part represents the position the piece wants to move to.
In addition to this, the environment will calculate legal and illegal moves within the action space to forbid illegal moves and penalize an agent trying to perform illegal moves and to correctly implement Xiangqi rules.
- Type
gym.spaces.Discrete(16 * 10 * 9 * 10 * 9)
-
ally_color¶ Current environment’s ally color
RED = 0 and BLACK = 1
- Type
int
-
enemy_color¶ Current environment’s enemy color
RED = 0 and BLACK = 1
- Type
int
-
turn¶ Current player that is playing
ALLY = 0 and ENEMY = 1
- Type
int
-
done¶ Flag to indicate current game termination condition
- Type
bool
-
state¶ 2 dimensional numpy array representing current board state
- Type
np.array
-
ally_actions¶ 1 dimensional numpy array indicating legal and illegal actions among all ally’s action space.
Piece ID, start position, and target position are encoded in action ID which is the index to the ally_actions array. Action ID can be encoded and decoded using move_to_action_space and action_space_to_move functions in utils.py.
Values of the array are 0 and 1 indicating legal and illegal actions respectively.
- Type
np.array
-
enemy_actions¶ 1 dimensional numpy array indicating legal and illegal actions among all enemy’s action space.
Piece ID, start position, and target position are encoded in action ID which is the index to the ally_actions array. Action ID can be encoded and decoded using move_to_action_space and action_space_to_move functions in utils.py.
Values of the array are 0 and 1 indicating legal and illegal actions respectively.
- Type
np.array
-
ally_piece¶ List of all ally piece objects
- Type
list
-
enemy_piece¶ List of all enemy piece objects
- Type
list
-
check_jiang()¶ Check if the general is in threat (i.e. it is check or “jiang”) by any of current player’s pieces
- Returns
list of actions that lead to Jiang based on current board state
- Return type
list
-
close()¶ Free up resources and gracefully exit the Xiangqi environment
-
get_possible_actions(player)¶ Searches all valid actions each given player’s piece can perform
- Parameters
player (int) – -1 for ENEMY and 1 for ALLY
-
get_possible_actions_by_piece(piece_id)¶ Given a piece ID, saves the possible actions of the piece inside the piece object.
- Parameters
piece_id (int) – Piece ID to particularize the piece
-
init_pieces()¶ Initialize and store all ally and enemy pieces
-
render(mode='human')¶ Render current game state with PyGame
For more information on ‘mode’ parameter refer to gym.Env.render() in OpenAI Gym repository. Currently, we support ‘human’ mode.
- Parameters
mode (str) – string to indicate render mode
-
reset()¶ Reset all environment components to initial state
- Returns
the initial state
- Return type
np.array
-
seed(seed=None)¶ Generate random seed value used to reproduce the current game
- Parameters
seed – User defined input seed. If this is None, then it is generated by this method.
-
step(action)¶ Run one turn of Xiangqi game (ally or enemy side plays a move) by processing given action based on current game turn owner
- Parameters
action (int) – a valid action in Xiangqi action space
- Returns
observation, reward, done, info
observation (object): current game state of the environment
reward (float): amount of reward returned after given action
We apply points to every type of pieces following the most widely used standard.
General: 100.0 (AKA win)
Advisor: 2.0
Elephant: 2.0
Horse: 4.0
Chariot: 9.0
Cannon: 4.5
Soldier: 1.0 (2.0 if it has crossed the river)
done (bool): whether the episode has ended, in which case further step() calls will return undefined results
info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
tuple
-
step_user()¶ This method functions like the environment’s step() method, but it is specifically designed to serve users when they are player of a Xiangqi game (user VS agent mode). The method first renders game GUI and listens to user inputs. Then, when the user’s piece movement is entered, it is converted into the action space and passed to environment’s step() method. The environment then handles the input action just like it handles any actions from RL agents.
- Returns
Observation, Reward, Done, Info The return values are the same with step() method.
- Return type
tuple
-
property
unwrapped¶ Completely unwrap this env.
- Returns
The base non-wrapped gym.Env instance
- Return type
gym.Env
-
From Piece Move to XiangQiEnv Action Space¶
Intuitively we can interpret a piece’s move having the three information:
Piece: unique identifiable piece on the board
Start Position: the selected piece’s current Position
End Position: the desired position the selected piece wants to move to
Given the piece ID, its current location and its target position, this function encodes this information into XiangQiEnv’s action space and the resulting output is an action ID which is a single integer.
-
gym_xiangqi.utils.move_to_action_space(piece_id, start, end)¶ The action space is a 1D flat array. We can convert piece id, start position and end position to a corresponding index value in the action space.
- Parameters
piece_id (int) – a piece ID integer
start (tuple(int)) – (row, col) start coordinate
end (tuple(int)) – (row, col) end coordinate
- Returns
Index within the self.possible_actions
From XiangQiEnv Action Space to Piece Move¶
If we can encode piece move information into the action space, we should also be able to decode the encoded action ID back into original information.
-
gym_xiangqi.utils.action_space_to_move(action)¶ This is exact opposite of move_to_action_space() method. With index value, we can convert this back to piece id, start position and end position values.
- Parameters
action (int) – index value within action space
- Returns
piece ID, start coordinate, end coordinate