API¶

Gym Xiangqi provides core environment methods that all OpenAI Gym environments provide.

XiangQiEnv¶

class gym_xiangqi.envs.xiangqi_env.XiangQiEnv(ally_color=0)¶

This is Xiangqi (Chinese chess) game implemented as reinforcement learning environment using OpenAI Gym framework. Xiangqi is played on a board of 10 rows and 9 columns with 16 pieces on each side (7 unique pieces called General, Advisor, Elephant, Horse, Chariot, Cannon and Soldier.

Starting State: The initial board state with pieces laid out in correct position. Reference the README for initial board illustration.

Episode Termination: Either the red or black runs out of moves or also known as the general is captured. Reference the README for more details.

observation_space¶

The observation space is the state of the board and pieces. Each item in the space corresponds to a single coordinate on the board with the value range from -16 to 16. Each piece is encoded as an integer in that range. Negative integers are enemy pieces and positive integers are ally pieces.

Type: gym.spaces.Box(10, 9)

action_space¶

The action space is an aggregation of all possible moves even including illegal moves. Each space encodes 3 information: which piece, from where, and to where. From 16 * 10 * 9 * 10 * 9, 16 is the number of pieces and 10 * 9 is all possible grid positions on the board. The first 10 * 9 represents the start position and the second half represents the end position which is the position the piece wants to move to.

In addition to this, the environment will calculate legal and illegal moves within the action space to penalize an agent trying to perform illegal moves and to correctly implement Xiangqi rules.

Type: gym.spaces.Discrete(16 * 10 * 9 * 10 * 9)

ally_color¶

Current environment’s ally color RED = 0 and BLACK = 1

Type: int

enemy_color¶

Current environment’s enemy color RED = 0 and BLACK = 1

Type: int

turn¶

Current player that is playing ALLY = 0 and ENEMY = 1

Type: int

done¶

flag to indicate current game termination

Type: bool

state¶

2 dimensional numpy array representing current board state

Type: np.array

ally_actions¶

1 dimensional numpy array indicating legal and illegal actions among all ally’s action space Possible values of the array are 0 and 1 indicating legal and illegal actions respectively

Type: np.array

enemy_actions¶

1 dimensional numpy array indicating legal and illegal actions among all enemy’s action space Possible values of the array are 0 and 1 indicating legal and illegal actions respectively

Type: np.array

ally_piece¶

List of all ally piece objects

Type: list

enemy_piece¶

List of all enemy piece objects

Type: list

check_jiang()¶: Check if the general is in threat (i.e. it is check or “jiang”) by any of current player’s pieces

close()¶: Free up resources and gracefully exit the Xiangqi environment

get_possible_actions(player)¶

Searches all valid actions each piece can perform

Parameters: player (int) – -1 for ENEMY 1 for ALLY

get_possible_actions_by_piece(piece_id)¶

Given a piece_id, returns only the possible actions that can be taken by the piece.

Parameters: piece_id (int) – Piece ID to filter possible actions.
Returns: actions that are can be taken by the piece.

init_pieces()¶: Initialize and store all ally and enemy pieces

render(mode='human')¶

Render current game state with PyGame

For more information refer to gym.Env.render() in OpenAI Gym repository.

Parameters: mode (str) – string to indicate render mode

reset()¶

Reset all environment components to initial state

Returns: the initial observation.
Return type: observation (object)

seed(seed=None)¶

Generate random seed value used to reproduce the current game

Parameters: seed – User defined input seed. If this is None, then it is generated by this method

step(action)¶

Run one turn of Xiangqi game (ally or enemy side plays a move) by processing given action based on current game turn owner

Parameters

action (int) – a valid action in Xiangqi action space

Returns

observation, reward, done, info

observation (object): current game state of the environment

reward (float): amount of reward returned after given action

We apply points to every type of pieces following the most widely used standard.

General: infinity

Advisor: 2.0

Elephant: 2.0

Horse: 4.0

Chariot: 9.0

Cannon: 4.5

Soldier: 1.0 (2.0 if it has crossed the river)

done (bool): whether the episode has ended, in which case further step() calls will return undefined results

info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Return type

tuple

step_user()¶

This method functions like the environment’s step() method, but it is specifically for users when they are player of a Xiangqi game. The method first renders game GUI and listens to user inputs. Then, the user input, the piece movement, is converted into action space and passed to environment’s step() method. The environment then is able to handle the input action just like it handles any actions from RL agents.

Returns: The return values are the same with step() method.
Return type: tuple

property unwrapped¶

Completely unwrap this env.

Returns: The base non-wrapped gym.Env instance
Return type: gym.Env