Home

Training Data Format

There are three formats of training data for the learning programs of this project. One is for supervised learning (SL), one is for offline reinforcement learning (offline RL), and the last one is for predicting something from the situation of a given round. The first format can be obtained by converting Mahjong Soul game records using cryolite/kanachan.annotate. The second one can be obtained by converting the annotated data of the first format using bin/annotate4rl/annotate4rl.py. The last one can be obtained by converting the annotated data of the first format using (TODO).

Common Conventions

Before explaining the details of annotations, the following explains the conventions used in annotations.

Seat

Each player, of course there are four players in a 4-player mahjong game, is distinguished by the notion of "seat"; the 0th seat is the dealer (zhuang jia, 荘家) of the start of a game (qi jia, 起家), the 1st seat the right next to the 0th seat (xia jia of qi jia, 起家の下家), the 2nd seat the one across from the 0th seat (dui mian of qi jia, 起家の対面), and the 3rd seat the left next to the 0th seat (shang jia of qi jia, 起家の上家).

Seat	Meaning
`0`	the dealer of the start of a game
`1`	the right next to the 0th seat
`2`	the one across from the 0th seat
`3`	the left next to the 0th seat

Relative Seat (Relseat)

There are cases where the relative positions of two players need to be represented. For example, complete information about a pon (peng, 碰, ポン) includes information about who melds the pon and who discards the melded tile. In such a case, one information is represented by a seat index, and the other information is represented by the position relative to the former.

Relseat	Meaning
`0`	the player right next to the player of interest
`1`	the player across from the player of interest
`2`	the player left next to the player of interest

Tile

The type of a tile is represented by an integer from 0 to 36, inclusive.

Tile	Value
0m ~ 9m	`0` ~ `9`
0p ~ 9p	`10` ~ `19`
0s ~ 9s	`20` ~ `29`
1z ~ 7z	`30` ~ `36`

Tile'

There is no need to distinguish between black and red tiles of certain kinds to indicate a type of closed kong (an gang, 暗槓). In such a case, the 34 types of tiles excluding red ones are represented by integers from 0 to 33, inclusive.

Tile	Value
1m ~ 9m	`0` ~ `8`
1p ~ 9p	`9` ~ `17`
1s ~ 9s	`18` ~ `26`
1z ~ 7z	`27` ~ `33`

Grade

The grade (段位) is represented by integers from 0 to 15, inclusive.

Grade	Value
Novice (初心) 1~3	`0` ~ `2`
Adept (雀士) 1~3	`3` ~ `5`
Expert (雀傑) 1~3	`6` ~ `8`
Master (雀豪) 1~3	`9` ~ `11`
Saint (雀聖) 1~3	`12` ~ `14`
Celestial (魂天)	`15`

Chow (Chi, 吃, チー)

Chows are represented by integers from 0 to 89, inclusive.

Value	Chow (The last element represents the discarded tile)
`0`	(2m, 3m, 1m)
`1`	(1m, 3m, 2m)
`2`	(3m, 4m, 2m)
`3`	(1m, 2m, 3m)
`4`	(2m, 4m, 3m)
`5`	(4m, 5m, 3m)
`6`	(4m, 0m, 3m)
`7`	(2m, 3m, 4m)
`8`	(3m, 5m, 4m)
`9`	(3m, 0m, 4m)
`10`	(5m, 6m, 4m)
`11`	(0m, 6m, 4m)
`12`	(3m, 4m, 5m)
`13`	(3m, 4m, 0m)
`14`	(4m, 6m, 5m)
`15`	(4m, 6m, 0m)
`16`	(6m, 7m, 5m)
`17`	(6m, 7m, 0m)
`18`	(4m, 5m, 6m)
`19`	(4m, 0m, 6m)
`20`	(5m, 7m, 6m)
`21`	(0m, 7m, 6m)
`22`	(7m, 8m, 6m)
`23`	(5m, 6m, 7m)
`24`	(0m, 6m, 7m)
`25`	(6m, 8m, 7m)
`26`	(8m, 9m, 7m)
`27`	(6m, 7m, 8m)
`28`	(7m, 9m, 8m)
`29`	(7m, 8m, 9m)
`30` ~ `59`	Likewise for Circle tiles (筒子)
`60` ~ `89`	Likewise for Bamboo tiles (索子)

Pon (Peng, 碰, ポン)

Pons are represented by integers from 0 to 39, inclusive.

Value	Pon (The last element represents the discarded tile)
`0`	(1m, 1m, 1m)
`1`	(2m, 2m, 2m)
`2`	(3m, 3m, 3m)
`3`	(4m, 4m, 4m)
`4`	(5m, 5m, 5m)
`5`	(0m, 5m, 5m)
`6`	(5m, 5m, 0m)
`7`	(6m, 6m, 6m)
`8`	(7m, 7m, 7m)
`9`	(8m, 8m, 8m)
`10`	(9m, 9m, 9m)
`11`	(1p, 1p, 1p)
`12`	(2p, 2p, 2p)
`13`	(3p, 3p, 3p)
`14`	(4p, 4p, 4p)
`15`	(5p, 5p, 5p)
`16`	(0p, 5p, 5p)
`17`	(5p, 5p, 0p)
`18`	(6p, 6p, 6p)
`19`	(7p, 7p, 7p)
`20`	(8p, 8p, 8p)
`21`	(9p, 9p, 9p)
`22`	(1s, 1s, 1s)
`23`	(2s, 2s, 2s)
`24`	(3s, 3s, 3s)
`25`	(4s, 4s, 4s)
`26`	(5s, 5s, 5s)
`27`	(0s, 5s, 5s)
`28`	(5s, 5s, 0s)
`29`	(6s, 6s, 6s)
`30`	(7s, 7s, 7s)
`31`	(8s, 8s, 8s)
`32`	(9s, 9s, 9s)
`33`	(1z, 1z, 1z)
`34`	(2z, 2z, 2z)
`35`	(3z, 3z, 3z)
`36`	(4z, 4z, 4z)
`37`	(5z, 5z, 5z)
`38`	(6z, 6z, 6z)
`39`	(7z, 7z, 7z)

Training Data Format for Supervised Learning (SL)

Roughly speaking, the training data format for supervised learning represents the set of triplets, which consist of the situation of a decision-making point (see Annotate for the definition of a decision-making point), the actual action taken by the player at that point, and the results of the round and game where that point appears.

In this format, the annotation of a decision-making point is represented by one text line. Each line is tab-separated into 8 fields, and each field is in turn comma-separated into elements. In each line, the first column is for debugging purposes only, the next 4 columns represent the situation of a decision-making point, the next column represents the actual action taken by the player at that point, and the final two columns represent the round and game results.

0th Column: Game UUID

The 0th column consists of the game UUID, which uniquely identifies the game in which the decision-making point appears. This column is for debugging purposes only and is not used for training at all.

1st Column: Sparse Features

The 1st column consists of sparse features. All the elements in this column are an non-negative integer. These integers are used as indices for embeddings, which are finally used as a part of inputs to models. The meaning of each integer is as follows.

Element Index	Title	Value	Note
0	Room	`0`: Bronze Room (銅の間) `1`: Silver Room (銀の間) `2`: Gold Room (金の間) `3`: Jade Room (玉の間) `4`: Throne Room (王座の間)
1	Game Style	`5`: quarter-length game (dong feng zhan, 東風戦) `6`: half-length game (ban zhuang zhan, 半荘戦)
2	Grade of the player at the seat `0`	`7` ~ `22`	`7 + grade`
3	Grade of the player at the seat `1`	`23` ~ `38`	`23 + grade`
4	Grade of the player at the seat `2`	`39` ~ `54`	`39 + grade`
5	Grade of the player at the seat `3`	`55` ~ `70`	`55 + grade`
6	Seat	`71` ~ `74`	`71 + seat`
7	Game Wind (Chang, 場)	`75`: East (東場) `76`: South (南場) `77`: West (西場)
8	Round (Ju, 局)	`78` ~ `81`	`78 + round`
9	# of Left Tiles to Draw	`82` ~ `151`	`82 + (# of left tiles)`
10	Dora Indicator	`152` ~ `188`	`152 + tile`
11	2nd Dora Indicator	`189` ~ `225`	optional, `189 + tile`
12	3rd Dora Indicator	`226` ~ `262`	optional, `226 + tile`
13	4th Dora Indicator	`263` ~ `299`	optional, `263 + tile`
14	5th Dora Indicator	`300` ~ `336`	optional, `300 + tile`
15	Hand (shou pai, 手牌)	`337` ~ `472`	(combination, see below)
16	Drawn Tile (zimo pai, 自摸牌)	`473` ~ `509`	optional, `473 + tile`
17	<PADDING>	`510`	(does not appear in annotation)

The following is how a tile in the hand is represented:

Tile	Value
Red 5m	337
First 1m	338
Second 2m	339
Third 1m	340
Fourth 1m	341
First 2m	342
.....	...
First black 5m	354
Second black 5m	355
Third black 5m	356
First 6m	357
.....	...
Fourth 9m	372
Red 5p	373
First 1p	374
.....	...
Red 5s	409
First 1s	410
.....	...
Fourth 9s	446
First East	445
Second East	446
Third East	447
Fourth East	448
First South	449
.....	...
First White Dragon (白)	461
.....	...
Fourth Red Dragon (中)	472

2nd Column: Numeric Features

The 2nd column consists of numeric features. This column consists of exactly 6 elements. These features are numerically meaningful and directly used as a part of inputs to models. The meaning of each element is as follows.

Element Index	Explanation
0	The number of counter sticks (ben chang, 本場)
1	The number of riichi deposits (供託本数)
2	The score of the player at the seat `0`
3	The score of the player at the seat `1`
4	The score of the player at the seat `2`
5	The score of the player at the seat `3`

3rd Column: Progression Features

The 3rd column consists of progression features. This column represents a sequence of non-negative integers. Each integer stands for some event in a round of a game. The order of the integers in the sequence directly represents the order in which the events occurred until the decision-making point. These integers are used as indices for embeddings, which are finally used as a part of inputs to models. Note, however, that positional encoding must be applied to the embeddings if they are to be used as a part of inputs to models such as ones using transformer, which erase the positional/order information of the input embeddings. The meaning of each integer is as follows.

Element Index	Title	Values	Note
0	Begging of Round	`0`	Always starts with this feature
1	Discard of Tile (打牌)	`5` ~ `596`	`5 + seat * 148 + tile * 4 + a * 2 + b`, where; `a = 0`: not moqi (手出し) `a = 1`: moqi (自摸切り) `b = 0`: w/o riichi declaration `b = 1`: w/ riichi declaration
2	Chow (Chi, チー, 吃)	`597` ~ `956`	`597 + seat * 90 + chi`
3	Pon (peng, ポン, 碰)	`957` ~ `1436`	`957 + seat * 120 + relseat * 40 + peng`
4	Da Ming Gang (大明槓)	`1437` ~ `1880`	`1437 + seat * 111 + relseat * 37 + tile`
5	An Gang (暗槓)	`1881` ~ `2016`	`1881 + seat * 34 + tile'`
6	Jia Gang (加槓)	`2017` ~ `2164`	`2017 + seat * 37 + tile`
7	<PADDING>	`2165`	(does not appear in annotation)

4th Column: Candidate Features

The 4th column consists of all the possible actions at that decision-making point. They are called candidate features (or simply candidates).

Element Index	Type of Actions	Value	Note
0	Discarding tile	`0` ~ `147`	`tile * 4 + a * 2 + b`, where; `a = 0`: not moqi (手出し) `a = 1`: moqi (自摸切り) `b = 0`: w/o riichi declaration `b = 1`: w/ riichi declaration
1	An Gang (暗槓)	`148` ~ `181`	`148 + tile'`
2	Jia Gang (加槓)	`182` ~ `218`	Represented by the tile newly added to an existing peng. `182 + tile`
3	Zimo Hu (自摸和)	`219`
4	Jiu Zhong Jiu Pai (九種九牌)	`220`
5	Skip	`221`
6	Chow (chi, チー, 吃)	`222` ~ `311`	`222 + chi`
7	Pon, (peng, ポン, 碰)	`312` ~ `431`	`312 + relseat * 40 + peng`
8	Da Ming Gang (大明槓)	`432` ~ `542`	Represented by the discarded tile. `432 + relseat * 37 + tile`
9	Rong (栄和)	`543` ~ `545`	`543`: from xia jia (下家から) `544`: from dui mian (対面から) `545`: from shang Jia (上家から)
10	<VALUE>	`546`	(does not appear in annotation)
11	<PADDING>	`547`	(does not appear in annotation)

5th Column: Actual Action

The 5th column indicates the actual action chosen by the player (indicated by Seat) at that decision-making point. This column is the index to one of the possible actions enumerated in the 4th column.

6th Column: Round Summary

The 6th column indicates the summary of the round where the decision-making point appears. This column consists of a maximum of 7 elements. This column consists of multiple elements only in the case of double or triple deal-ins (ダブロン, トリプルロン), or the end of a round due to an exhaustive draw (荒牌平局).

Element Index	Value	Explanation
0	`0`	Win of the player at the seat `0` by drawing a tile (席`0`の自摸和)
1	`1`	Win of the player at the seat `1` by drawing a tile (席`1`の自摸和)
2	`2`	Win of the player at the seat `2` by drawing a tile (席`2`の自摸和)
3	`3`	Win of the player at the seat `3` by drawing a tile (席`3`の自摸和)
4	`4`	Win of the player at the seat `0` by dealt-in by the player at the seat `1` (席`1`から席`0`への放銃)
5	`5`	Win of the player at the seat `0` by dealt-in by the player at the seat `2` (席`2`から席`0`への放銃)
6	`6`	Win of the player at the seat `0` by dealt-in by the player at the seat `3` (席`3`から席`0`への放銃)
7	`7`	Win of the player at the seat `1` by dealt-in by the player at the seat `0` (席`0`から席`1`への放銃)
8	`8`	Win of the player at the seat `1` by dealt-in by the player at the seat `2` (席`2`から席`1`への放銃)
9	`9`	Win of the player at the seat `1` by dealt-in by the player at the seat `3` (席`3`から席`1`への放銃)
10	`10`	Win of the player at the seat `2` by dealt-in by the player at the seat `0` (席`0`から席`2`への放銃)
11	`11`	Win of the player at the seat `2` by dealt-in by the player at the seat `1` (席`1`から席`2`への放銃)
12	`12`	Win of the player at the seat `2` by dealt-in by the player at the seat `3` (席`3`から席`2`への放銃)
13	`13`	Win of the player at the seat `3` by dealt-in by the player at the seat `0` (席`0`から席`3`への放銃)
14	`14`	Win of the player at the seat `3` by dealt-in by the player at the seat `1` (席`1`から席`3`への放銃)
15	`15`	Win of the player at the seat `3` by dealt-in by the player at the seat `2` (席`2`から席`3`への放銃)
16	`16`	No left tile without any ready hand of the player at the seat `0` (席`0`の不聴を伴う荒牌平局)
17	`17`	No left tile with a ready hand of the player at the seat `0` (席`0`の聴牌を伴う荒牌平局)
18	`18`	No left tile with Liuju Manguan (流し満貫) by the player at the seat `0`
19	`19`	No left tile without any ready hand of the player at the seat `1` (席`1`の不聴を伴う荒牌平局)
20	`20`	No left tile with a ready hand of the player at the seat `1` (席`1`の聴牌を伴う荒牌平局)
21	`21`	No left tile with Liuju Manguan (流し満貫) by the player at the seat `1`
22	`22`	No left tile without any ready hand of the player at the seat `2` (席`2`の不聴を伴う荒牌平局)
23	`23`	No left tile with a ready hand of the player at the seat `2` (席`2`の聴牌を伴う荒牌平局)
24	`24`	No left tile with Liuju Manguan (流し満貫) by the player at the seat `2`
25	`25`	No left tile without any ready hand of the player at the seat `3` (席`3`の不聴を伴う荒牌平局)
26	`26`	No left tile with a ready hand of the player at the seat `3` (席`3`の聴牌を伴う荒牌平局)
27	`27`	No left tile with Liuju Manguan (流し満貫) by the player at the seat `3`
28	`28`	Interruption of the game
29	`29`	<PADDING> (does not appear in annotation)

7th Column: Results

The 7th column represents the result of the round where the decision-making point appears and the result of the game. This column consists of exactly 12 elements.

Element Index	Explanation
0	Round delta of the score of the player at the seat `0`
1	Round delta of the score of the player at the seat `1`
2	Round delta of the score of the player at the seat `2`
3	Round delta of the score of the player at the seat `3`
4	End-of-round score of the player at the seat `0`
5	End-of-round score of the player at the seat `1`
6	End-of-round score of the player at the seat `2`
7	End-of-round score of the player at the seat `3`
8	End-of-game score of the player at the seat `0`
9	End-of-game score of the player at the seat `1`
10	End-of-game score of the player at the seat `2`
11	End-of-game score of the player at the seat `3`

Training Data Format for Offline Reinforcement Learning (Offline RL)

Roughly speaking, the training data format for offline reinforcement learning consists of a set of triplets (s, a, s') or (s, a, o), which represent state transitions from a decision-making point to either the next consecutive decision-making point or the "terminal state" of the game.

In the former, (s, a, s'), s and s' represent the situation at two consecutive decision-making points as seen from one player's perspective. From this, s is not the last decison-making point of each game for any given player. a represents the action taken by the player at s. In other words, (s, a, s') represents a state transition from s to s', from the perspective of one player.

In the latter, (s, a, o), s represents the situation at the last decision-making point from the perspective of a player in each game. Note that (s, a, o) represents the last decision-making point "from the perspective of a player", so there exist four (s, a, o) in each game of a 4-player mahjong. a represents the action taken by the player at s. In other words, (s, a) represents a state transition from s to the "terminal state" of each game, where a is the last action taken by the player in that game. o is the result of the game.

Let me describe this format in more detail. The annotation of a state transition from a decision-making point to the next consecutive decision-making point or the terminal state of the game is represented by one text line. Each line is tab-separated into either 9 or 7 fields, and each field is in turn comma-separated into elements. Lines with 9 tab-separated fields are annotations of state transitions from a decision-making point to the next consecutive decision-making point. Lines with 7 tab-separated fields are annotations of state transitions from a decision-making point to the terminal state of the game.

Each line with 9 tab-separated fields is as follows:

(FIRST SPARSE FEATURES)\t(FIRST NUMERIC FEATURES)\t(FIRST PROGRESSION FEATURES)\t(FIRST OPTION FEATURES)\t(ACTION INDEX)\t(SECOND SPARSE FEATURES)\t(SECOND NUMERIC FEATURES)\t(SECOND PROGRESSION FEATURES)\t(SECOND OPTION FEATURES)

In each line with 9 tab-separated fields, the first 4 fields (FIRST SPARSE FEATURES, FIRST NUMERIC FEATURES, FIRST PROGRESSION FEATURES, and FIRST OPTION FEATURES) represent the situation of the decision-making point before the transition, the next field (ACTION INDEX) represents the action taken by the player causing the transition, the next 4 fields (SECOND SPARSE FEATURES, SECOND NUMERIC FEATURES, SECOND PROGRESSION FEATURES, and SECOND OPTION FEATURES) represent the situation of the decision making point after the transition.

In each line with 7 tab-separated field is as follows:

(SPARSE FEATURES)\t(NUMERIC FEATURES)\t(PROGRESSION FEATURES)\t(OPTION FEATURES)\t(ACTION INDEX)\t(GAME RANK)\t(GAME SCORE)

In each line with 7 tab-separated fields, the first 4 fields (SPARSE FEATURES, NUMERIC FEATURES, PROGRESSION FEATURES, and OPTION FEATURES) represent the situation of the decision-making point before the transition, the next field (ACTION INDEX) represents the action taken by the player causing the transition to the terminal state, and the final 2 fields represent the result of the game, i.e., the final rank and score at the game end.

Training Data Format for Round

0th Column: Game UUID

1st Column: Sparse Features

The 1st column consists of sparse features. All the elements in this column are an non-negative integer. These integers are used as indices for embeddings, which are finally used as a part of inputs to models. The meaning of each integer is as follows.

Element Index	Title	Value	Note
0	Room	`0`: Bronze Room (銅の間) `1`: Silver Room (銀の間) `2`: Gold Room (金の間) `3`: Jade Room (玉の間) `4`: Throne Room (王座の間)
1	Game Style	`5`: quarter-length game (dong feng zhan, 東風戦) `6`: half-length game (ban zhuang zhan, 半荘戦)
2	Grade of the player at the seat `0`	`7` ~ `22`	`7 + grade`
3	Grade of the player at the seat `1`	`23` ~ `38`	`23 + grade`
4	Grade of the player at the seat `2`	`39` ~ `54`	`39 + grade`
5	Grade of the player at the seat `3`	`55` ~ `70`	`55 + grade`
6	Game Wind (Chang, 場)	`71`: East (東場) `72`: South (南場) `73`: West (西場)
7	Round (Ju, 局)	`74` ~ `77`	`74 + round`

2nd Column: Numeric Features

The 2nd column consists of numeric features. This field consists of exactly 6 elements. The numbers in this column are all at the very beginning of the round. These features are numerically meaningful and directly used as a part of inputs to models. The meaning of each element is as follows.

Element Index	Explanation
0	The beginning-of-round number of counter sticks (ben chang, 本場)
1	The number of riichi deposits (供託本数)
2	The beginning-of-round score of the player at the seat `0`
3	The beginning-of-round score of the player at the seat `1`
4	The beginning-of-round score of the player at the seat `2`
5	The beginning-of-round score of the player at the seat `3`

3rd Column: Result

Element Index	Explanation
0	The round score delta of the player at the seat `0`
1	The round score delta of the player at the seat `1`
2	The round score delta of the player at the seat `2`
3	The round score delta of the player at the seat `3`
4	The end-of-game score of the player at the seat `0`
5	The end-of-game score of the player at the seat `1`
6	The end-of-game score of the player at the seat `2`
7	The end-of-game score of the player at the seat `3`

Notes on Training Data

All the learning programs in this project assume that training data may be very huge. This includes the possibility that the training data will not fit in main memory (not GPU memory) or even on disk. Therefore, the learning programs do not put whole the training data into memory at the start time, but access the training data sequentially from the beginning as needed. This way, the learning programs consume very little main memory, no matter how large training data is. The learning programs also support the case where training data is compressed using gzip or bzip2. If the file name of training data ends with ".gz" or ".bz2", the learning programs automatically decompress the training data as they read it.

On the other hand, there is a downside to always accessing training data sequentially from the beginning, i.e., users need to shuffle training data before inputting them to a learning program. In particular, it is strongly discouraged to input annotated data created by annotate into learning programs without shuffling. This is because, in annotated data created using annotate, the annotations for each round are clustered together in a certain part of training data, and it is quite likely for very similar training samples to appear in a certain mini-batch of training. In general, training samples in machine learning are assumed to be independent and identically distributed (i.i.d.), and it is best to avoid such a bias in training samples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Training Data Format

Common Conventions

Seat

Relative Seat (Relseat)

Tile

Tile'

Grade

Chow (Chi, 吃, チー)

Pon (Peng, 碰, ポン)

Training Data Format for Supervised Learning (SL)

0th Column: Game UUID

1st Column: Sparse Features

2nd Column: Numeric Features

3rd Column: Progression Features

4th Column: Candidate Features

5th Column: Actual Action

6th Column: Round Summary

7th Column: Results

Training Data Format for Offline Reinforcement Learning (Offline RL)

Training Data Format for Round

0th Column: Game UUID

1st Column: Sparse Features

2nd Column: Numeric Features

3rd Column: Result

Notes on Training Data

Clone this wiki locally