-
Notifications
You must be signed in to change notification settings - Fork 0
Home
There are three formats of training data for the learning programs of this project. One is for supervised learning (SL), one is for offline reinforcement learning (offline RL), and the last one is for predicting something from the situation of a given round. The first format can be obtained by converting Mahjong Soul game records using cryolite/kanachan.annotate
. The second one can be obtained by converting the annotated data of the first format using bin/annotate4rl/annotate4rl.py
. The last one can be obtained by converting the annotated data of the first format using (TODO).
Before explaining the details of annotations, the following explains the conventions used in annotations.
Each player, of course there are four players in a 4-player mahjong game, is distinguished by the notion of "seat"; the 0th seat is the dealer (zhuang jia, 荘家) of the start of a game (qi jia, 起家), the 1st seat the right next to the 0th seat (xia jia of qi jia, 起家の下家), the 2nd seat the one across from the 0th seat (dui mian of qi jia, 起家の対面), and the 3rd seat the left next to the 0th seat (shang jia of qi jia, 起家の上家).
Seat | Meaning |
---|---|
0 |
the dealer of the start of a game |
1 |
the right next to the 0th seat |
2 |
the one across from the 0th seat |
3 |
the left next to the 0th seat |
There are cases where the relative positions of two players need to be represented. For example, complete information about a pon (peng, 碰, ポン) includes information about who melds the pon and who discards the melded tile. In such a case, one information is represented by a seat index, and the other information is represented by the position relative to the former.
Relseat | Meaning |
---|---|
0 |
the player right next to the player of interest |
1 |
the player across from the player of interest |
2 |
the player left next to the player of interest |
The type of a tile is represented by an integer from 0 to 36, inclusive.
Tile | Value |
---|---|
0m ~ 9m |
0 ~ 9
|
0p ~ 9p |
10 ~ 19
|
0s ~ 9s |
20 ~ 29
|
1z ~ 7z |
30 ~ 36
|
There is no need to distinguish between black and red tiles of certain kinds to indicate a type of closed kong (an gang, 暗槓). In such a case, the 34 types of tiles excluding red ones are represented by integers from 0 to 33, inclusive.
Tile | Value |
---|---|
1m ~ 9m |
0 ~ 8
|
1p ~ 9p |
9 ~ 17
|
1s ~ 9s |
18 ~ 26
|
1z ~ 7z |
27 ~ 33
|
The grade (段位) is represented by integers from 0 to 15, inclusive.
Grade | Value |
---|---|
Novice (初心) 1~3 |
0 ~ 2
|
Adept (雀士) 1~3 |
3 ~ 5
|
Expert (雀傑) 1~3 |
6 ~ 8
|
Master (雀豪) 1~3 |
9 ~ 11
|
Saint (雀聖) 1~3 |
12 ~ 14
|
Celestial (魂天) | 15 |
Chows are represented by integers from 0 to 89, inclusive.
Value | Chow (The last element represents the discarded tile) |
---|---|
0 |
(2m, 3m, 1m) |
1 |
(1m, 3m, 2m) |
2 |
(3m, 4m, 2m) |
3 |
(1m, 2m, 3m) |
4 |
(2m, 4m, 3m) |
5 |
(4m, 5m, 3m) |
6 |
(4m, 0m, 3m) |
7 |
(2m, 3m, 4m) |
8 |
(3m, 5m, 4m) |
9 |
(3m, 0m, 4m) |
10 |
(5m, 6m, 4m) |
11 |
(0m, 6m, 4m) |
12 |
(3m, 4m, 5m) |
13 |
(3m, 4m, 0m) |
14 |
(4m, 6m, 5m) |
15 |
(4m, 6m, 0m) |
16 |
(6m, 7m, 5m) |
17 |
(6m, 7m, 0m) |
18 |
(4m, 5m, 6m) |
19 |
(4m, 0m, 6m) |
20 |
(5m, 7m, 6m) |
21 |
(0m, 7m, 6m) |
22 |
(7m, 8m, 6m) |
23 |
(5m, 6m, 7m) |
24 |
(0m, 6m, 7m) |
25 |
(6m, 8m, 7m) |
26 |
(8m, 9m, 7m) |
27 |
(6m, 7m, 8m) |
28 |
(7m, 9m, 8m) |
29 |
(7m, 8m, 9m) |
30 ~ 59
|
Likewise for Circle tiles (筒子) |
60 ~ 89
|
Likewise for Bamboo tiles (索子) |
Pons are represented by integers from 0 to 39, inclusive.
Value | Pon (The last element represents the discarded tile) |
---|---|
0 |
(1m, 1m, 1m) |
1 |
(2m, 2m, 2m) |
2 |
(3m, 3m, 3m) |
3 |
(4m, 4m, 4m) |
4 |
(5m, 5m, 5m) |
5 |
(0m, 5m, 5m) |
6 |
(5m, 5m, 0m) |
7 |
(6m, 6m, 6m) |
8 |
(7m, 7m, 7m) |
9 |
(8m, 8m, 8m) |
10 |
(9m, 9m, 9m) |
11 |
(1p, 1p, 1p) |
12 |
(2p, 2p, 2p) |
13 |
(3p, 3p, 3p) |
14 |
(4p, 4p, 4p) |
15 |
(5p, 5p, 5p) |
16 |
(0p, 5p, 5p) |
17 |
(5p, 5p, 0p) |
18 |
(6p, 6p, 6p) |
19 |
(7p, 7p, 7p) |
20 |
(8p, 8p, 8p) |
21 |
(9p, 9p, 9p) |
22 |
(1s, 1s, 1s) |
23 |
(2s, 2s, 2s) |
24 |
(3s, 3s, 3s) |
25 |
(4s, 4s, 4s) |
26 |
(5s, 5s, 5s) |
27 |
(0s, 5s, 5s) |
28 |
(5s, 5s, 0s) |
29 |
(6s, 6s, 6s) |
30 |
(7s, 7s, 7s) |
31 |
(8s, 8s, 8s) |
32 |
(9s, 9s, 9s) |
33 |
(1z, 1z, 1z) |
34 |
(2z, 2z, 2z) |
35 |
(3z, 3z, 3z) |
36 |
(4z, 4z, 4z) |
37 |
(5z, 5z, 5z) |
38 |
(6z, 6z, 6z) |
39 |
(7z, 7z, 7z) |
Roughly speaking, the training data format for supervised learning represents the set of triplets, which consist of the situation of a decision-making point (see Annotate for the definition of a decision-making point), the actual action taken by the player at that point, and the results of the round and game where that point appears.
In this format, the annotation of a decision-making point is represented by one text line. Each line is tab-separated into 8 fields, and each field is in turn comma-separated into elements. In each line, the first column is for debugging purposes only, the next 4 columns represent the situation of a decision-making point, the next column represents the actual action taken by the player at that point, and the final two columns represent the round and game results.
The 0th column consists of the game UUID, which uniquely identifies the game in which the decision-making point appears. This column is for debugging purposes only and is not used for training at all.
The 1st column consists of sparse features. All the elements in this column are an non-negative integer. These integers are used as indices for embeddings, which are finally used as a part of inputs to models. The meaning of each integer is as follows.
Element Index | Title | Value | Note |
---|---|---|---|
0 | Room |
0 : Bronze Room (銅の間)1 : Silver Room (銀の間)2 : Gold Room (金の間)3 : Jade Room (玉の間)4 : Throne Room (王座の間) |
|
1 | Game Style |
5 : quarter-length game (dong feng zhan, 東風戦)6 : half-length game (ban zhuang zhan, 半荘戦) |
|
2 | Grade of the player at the seat 0
|
7 ~ 22
|
7 + grade |
3 | Grade of the player at the seat 1
|
23 ~ 38
|
23 + grade |
4 | Grade of the player at the seat 2
|
39 ~ 54
|
39 + grade |
5 | Grade of the player at the seat 3
|
55 ~ 70
|
55 + grade |
6 | Seat |
71 ~ 74
|
71 + seat |
7 | Game Wind (Chang, 場) |
75 : East (東場)76 : South (南場)77 : West (西場) |
|
8 | Round (Ju, 局) |
78 ~ 81
|
78 + round |
9 | # of Left Tiles to Draw |
82 ~ 151
|
82 + (# of left tiles) |
10 | Dora Indicator |
152 ~ 188
|
152 + tile |
11 | 2nd Dora Indicator |
189 ~ 225
|
optional, 189 + tile
|
12 | 3rd Dora Indicator |
226 ~ 262
|
optional, 226 + tile
|
13 | 4th Dora Indicator |
263 ~ 299
|
optional, 263 + tile
|
14 | 5th Dora Indicator |
300 ~ 336
|
optional, 300 + tile
|
15 | Hand (shou pai, 手牌) |
337 ~ 472
|
(combination, see below) |
16 | Drawn Tile (zimo pai, 自摸牌) |
473 ~ 509
|
optional, 473 + tile
|
17 | <PADDING> | 510 |
(does not appear in annotation) |
The following is how a tile in the hand is represented:
Tile | Value |
---|---|
Red 5m | 337 |
First 1m | 338 |
Second 2m | 339 |
Third 1m | 340 |
Fourth 1m | 341 |
First 2m | 342 |
..... | ... |
First black 5m | 354 |
Second black 5m | 355 |
Third black 5m | 356 |
First 6m | 357 |
..... | ... |
Fourth 9m | 372 |
Red 5p | 373 |
First 1p | 374 |
..... | ... |
Red 5s | 409 |
First 1s | 410 |
..... | ... |
Fourth 9s | 446 |
First East | 445 |
Second East | 446 |
Third East | 447 |
Fourth East | 448 |
First South | 449 |
..... | ... |
First White Dragon (白) | 461 |
..... | ... |
Fourth Red Dragon (中) | 472 |
The 2nd column consists of numeric features. This column consists of exactly 6 elements. These features are numerically meaningful and directly used as a part of inputs to models. The meaning of each element is as follows.
Element Index | Explanation |
---|---|
0 | The number of counter sticks (ben chang, 本場) |
1 | The number of riichi deposits (供託本数) |
2 | The score of the player at the seat 0
|
3 | The score of the player at the seat 1
|
4 | The score of the player at the seat 2
|
5 | The score of the player at the seat 3
|
The 3rd column consists of progression features. This column represents a sequence of non-negative integers. Each integer stands for some event in a round of a game. The order of the integers in the sequence directly represents the order in which the events occurred until the decision-making point. These integers are used as indices for embeddings, which are finally used as a part of inputs to models. Note, however, that positional encoding must be applied to the embeddings if they are to be used as a part of inputs to models such as ones using transformer, which erase the positional/order information of the input embeddings. The meaning of each integer is as follows.
Element Index | Title | Values | Note |
---|---|---|---|
0 | Begging of Round | 0 |
Always starts with this feature |
1 | Discard of Tile (打牌) |
5 ~ 596
|
5 + seat * 148 + tile * 4 + a * 2 + b , where;a = 0 : not moqi (手出し)a = 1 : moqi (自摸切り)b = 0 : w/o riichi declarationb = 1 : w/ riichi declaration |
2 | Chow (Chi, チー, 吃) |
597 ~ 956
|
597 + seat * 90 + chi |
3 | Pon (peng, ポン, 碰) |
957 ~ 1436
|
957 + seat * 120 + relseat * 40 + peng |
4 | Da Ming Gang (大明槓) |
1437 ~ 1880
|
1437 + seat * 111 + relseat * 37 + tile |
5 | An Gang (暗槓) |
1881 ~ 2016
|
1881 + seat * 34 + tile' |
6 | Jia Gang (加槓) |
2017 ~ 2164
|
2017 + seat * 37 + tile |
7 | <PADDING> | 2165 |
(does not appear in annotation) |
The 4th column consists of all the possible actions at that decision-making point. They are called candidate features (or simply candidates).
Element Index | Type of Actions | Value | Note |
---|---|---|---|
0 | Discarding tile |
0 ~ 147
|
tile * 4 + a * 2 + b , where;a = 0 : not moqi (手出し)a = 1 : moqi (自摸切り)b = 0 : w/o riichi declarationb = 1 : w/ riichi declaration |
1 | An Gang (暗槓) |
148 ~ 181
|
148 + tile' |
2 | Jia Gang (加槓) |
182 ~ 218
|
Represented by the tile newly added to an existing peng.182 + tile
|
3 | Zimo Hu (自摸和) | 219 |
|
4 | Jiu Zhong Jiu Pai (九種九牌) | 220 |
|
5 | Skip | 221 |
|
6 | Chow (chi, チー, 吃) |
222 ~ 311
|
222 + chi |
7 | Pon, (peng, ポン, 碰) |
312 ~ 431
|
312 + relseat * 40 + peng |
8 | Da Ming Gang (大明槓) |
432 ~ 542
|
Represented by the discarded tile.432 + relseat * 37 + tile
|
9 | Rong (栄和) |
543 ~ 545
|
543 : from xia jia (下家から)544 : from dui mian (対面から)545 : from shang Jia (上家から) |
10 | <VALUE> | 546 |
(does not appear in annotation) |
11 | <PADDING> | 547 |
(does not appear in annotation) |
The 5th column indicates the actual action chosen by the player (indicated by Seat) at that decision-making point. This column is the index to one of the possible actions enumerated in the 4th column.
The 6th column indicates the summary of the round where the decision-making point appears. This column consists of a maximum of 7 elements. This column consists of multiple elements only in the case of double or triple deal-ins (ダブロン, トリプルロン), or the end of a round due to an exhaustive draw (荒牌平局).
Element Index | Value | Explanation |
---|---|---|
0 | 0 |
Win of the player at the seat 0 by drawing a tile (席0 の自摸和) |
1 | 1 |
Win of the player at the seat 1 by drawing a tile (席1 の自摸和) |
2 | 2 |
Win of the player at the seat 2 by drawing a tile (席2 の自摸和) |
3 | 3 |
Win of the player at the seat 3 by drawing a tile (席3 の自摸和) |
4 | 4 |
Win of the player at the seat 0 by dealt-in by the player at the seat 1 (席1 から席0 への放銃) |
5 | 5 |
Win of the player at the seat 0 by dealt-in by the player at the seat 2 (席2 から席0 への放銃) |
6 | 6 |
Win of the player at the seat 0 by dealt-in by the player at the seat 3 (席3 から席0 への放銃) |
7 | 7 |
Win of the player at the seat 1 by dealt-in by the player at the seat 0 (席0 から席1 への放銃) |
8 | 8 |
Win of the player at the seat 1 by dealt-in by the player at the seat 2 (席2 から席1 への放銃) |
9 | 9 |
Win of the player at the seat 1 by dealt-in by the player at the seat 3 (席3 から席1 への放銃) |
10 | 10 |
Win of the player at the seat 2 by dealt-in by the player at the seat 0 (席0 から席2 への放銃) |
11 | 11 |
Win of the player at the seat 2 by dealt-in by the player at the seat 1 (席1 から席2 への放銃) |
12 | 12 |
Win of the player at the seat 2 by dealt-in by the player at the seat 3 (席3 から席2 への放銃) |
13 | 13 |
Win of the player at the seat 3 by dealt-in by the player at the seat 0 (席0 から席3 への放銃) |
14 | 14 |
Win of the player at the seat 3 by dealt-in by the player at the seat 1 (席1 から席3 への放銃) |
15 | 15 |
Win of the player at the seat 3 by dealt-in by the player at the seat 2 (席2 から席3 への放銃) |
16 | 16 |
No left tile without any ready hand of the player at the seat 0 (席0 の不聴を伴う荒牌平局) |
17 | 17 |
No left tile with a ready hand of the player at the seat 0 (席0 の聴牌を伴う荒牌平局) |
18 | 18 |
No left tile with Liuju Manguan (流し満貫) by the player at the seat 0
|
19 | 19 |
No left tile without any ready hand of the player at the seat 1 (席1 の不聴を伴う荒牌平局) |
20 | 20 |
No left tile with a ready hand of the player at the seat 1 (席1 の聴牌を伴う荒牌平局) |
21 | 21 |
No left tile with Liuju Manguan (流し満貫) by the player at the seat 1
|
22 | 22 |
No left tile without any ready hand of the player at the seat 2 (席2 の不聴を伴う荒牌平局) |
23 | 23 |
No left tile with a ready hand of the player at the seat 2 (席2 の聴牌を伴う荒牌平局) |
24 | 24 |
No left tile with Liuju Manguan (流し満貫) by the player at the seat 2
|
25 | 25 |
No left tile without any ready hand of the player at the seat 3 (席3 の不聴を伴う荒牌平局) |
26 | 26 |
No left tile with a ready hand of the player at the seat 3 (席3 の聴牌を伴う荒牌平局) |
27 | 27 |
No left tile with Liuju Manguan (流し満貫) by the player at the seat 3
|
28 | 28 |
Interruption of the game |
29 | 29 |
<PADDING> (does not appear in annotation) |
The 7th column represents the result of the round where the decision-making point appears and the result of the game. This column consists of exactly 12 elements.
Element Index | Explanation |
---|---|
0 | Round delta of the score of the player at the seat 0
|
1 | Round delta of the score of the player at the seat 1
|
2 | Round delta of the score of the player at the seat 2
|
3 | Round delta of the score of the player at the seat 3
|
4 | End-of-round score of the player at the seat 0
|
5 | End-of-round score of the player at the seat 1
|
6 | End-of-round score of the player at the seat 2
|
7 | End-of-round score of the player at the seat 3
|
8 | End-of-game score of the player at the seat 0
|
9 | End-of-game score of the player at the seat 1
|
10 | End-of-game score of the player at the seat 2
|
11 | End-of-game score of the player at the seat 3
|
Roughly speaking, the training data format for offline reinforcement learning consists of a set of triplets (s, a, s') or (s, a, o), which represent state transitions from a decision-making point to either the next consecutive decision-making point or the "terminal state" of the game.
In the former, (s, a, s'), s and s' represent the situation at two consecutive decision-making points as seen from one player's perspective. From this, s is not the last decison-making point of each game for any given player. a represents the action taken by the player at s. In other words, (s, a, s') represents a state transition from s to s', from the perspective of one player.
In the latter, (s, a, o), s represents the situation at the last decision-making point from the perspective of a player in each game. Note that (s, a, o) represents the last decision-making point "from the perspective of a player", so there exist four (s, a, o) in each game of a 4-player mahjong. a represents the action taken by the player at s. In other words, (s, a) represents a state transition from s to the "terminal state" of each game, where a is the last action taken by the player in that game. o is the result of the game.
Let me describe this format in more detail. The annotation of a state transition from a decision-making point to the next consecutive decision-making point or the terminal state of the game is represented by one text line. Each line is tab-separated into either 9 or 7 fields, and each field is in turn comma-separated into elements. Lines with 9 tab-separated fields are annotations of state transitions from a decision-making point to the next consecutive decision-making point. Lines with 7 tab-separated fields are annotations of state transitions from a decision-making point to the terminal state of the game.
Each line with 9 tab-separated fields is as follows:
(FIRST SPARSE FEATURES)\t(FIRST NUMERIC FEATURES)\t(FIRST PROGRESSION FEATURES)\t(FIRST OPTION FEATURES)\t(ACTION INDEX)\t(SECOND SPARSE FEATURES)\t(SECOND NUMERIC FEATURES)\t(SECOND PROGRESSION FEATURES)\t(SECOND OPTION FEATURES)
In each line with 9 tab-separated fields, the first 4 fields (FIRST SPARSE FEATURES, FIRST NUMERIC FEATURES, FIRST PROGRESSION FEATURES, and FIRST OPTION FEATURES) represent the situation of the decision-making point before the transition, the next field (ACTION INDEX) represents the action taken by the player causing the transition, the next 4 fields (SECOND SPARSE FEATURES, SECOND NUMERIC FEATURES, SECOND PROGRESSION FEATURES, and SECOND OPTION FEATURES) represent the situation of the decision making point after the transition.
In each line with 7 tab-separated field is as follows:
(SPARSE FEATURES)\t(NUMERIC FEATURES)\t(PROGRESSION FEATURES)\t(OPTION FEATURES)\t(ACTION INDEX)\t(GAME RANK)\t(GAME SCORE)
In each line with 7 tab-separated fields, the first 4 fields (SPARSE FEATURES, NUMERIC FEATURES, PROGRESSION FEATURES, and OPTION FEATURES) represent the situation of the decision-making point before the transition, the next field (ACTION INDEX) represents the action taken by the player causing the transition to the terminal state, and the final 2 fields represent the result of the game, i.e., the final rank and score at the game end.
The 1st column consists of sparse features. All the elements in this column are an non-negative integer. These integers are used as indices for embeddings, which are finally used as a part of inputs to models. The meaning of each integer is as follows.
Element Index | Title | Value | Note |
---|---|---|---|
0 | Room |
0 : Bronze Room (銅の間)1 : Silver Room (銀の間)2 : Gold Room (金の間)3 : Jade Room (玉の間)4 : Throne Room (王座の間) |
|
1 | Game Style |
5 : quarter-length game (dong feng zhan, 東風戦)6 : half-length game (ban zhuang zhan, 半荘戦) |
|
2 | Grade of the player at the seat 0
|
7 ~ 22
|
7 + grade |
3 | Grade of the player at the seat 1
|
23 ~ 38
|
23 + grade |
4 | Grade of the player at the seat 2
|
39 ~ 54
|
39 + grade |
5 | Grade of the player at the seat 3
|
55 ~ 70
|
55 + grade |
6 | Game Wind (Chang, 場) |
71 : East (東場)72 : South (南場)73 : West (西場) |
|
7 | Round (Ju, 局) |
74 ~ 77
|
74 + round |
The 2nd column consists of numeric features. This field consists of exactly 6 elements. The numbers in this column are all at the very beginning of the round. These features are numerically meaningful and directly used as a part of inputs to models. The meaning of each element is as follows.
Element Index | Explanation |
---|---|
0 | The beginning-of-round number of counter sticks (ben chang, 本場) |
1 | The number of riichi deposits (供託本数) |
2 | The beginning-of-round score of the player at the seat 0
|
3 | The beginning-of-round score of the player at the seat 1
|
4 | The beginning-of-round score of the player at the seat 2
|
5 | The beginning-of-round score of the player at the seat 3
|
Element Index | Explanation |
---|---|
0 | The round score delta of the player at the seat 0
|
1 | The round score delta of the player at the seat 1
|
2 | The round score delta of the player at the seat 2
|
3 | The round score delta of the player at the seat 3
|
4 | The end-of-game score of the player at the seat 0
|
5 | The end-of-game score of the player at the seat 1
|
6 | The end-of-game score of the player at the seat 2
|
7 | The end-of-game score of the player at the seat 3
|
All the learning programs in this project assume that training data may be very huge. This includes the possibility that the training data will not fit in main memory (not GPU memory) or even on disk. Therefore, the learning programs do not put whole the training data into memory at the start time, but access the training data sequentially from the beginning as needed. This way, the learning programs consume very little main memory, no matter how large training data is. The learning programs also support the case where training data is compressed using gzip or bzip2. If the file name of training data ends with ".gz" or ".bz2", the learning programs automatically decompress the training data as they read it.
On the other hand, there is a downside to always accessing training data sequentially from the beginning, i.e., users need to shuffle training data before inputting them to a learning program. In particular, it is strongly discouraged to input annotated data created by annotate into learning programs without shuffling. This is because, in annotated data created using annotate, the annotations for each round are clustered together in a certain part of training data, and it is quite likely for very similar training samples to appear in a certain mini-batch of training. In general, training samples in machine learning are assumed to be independent and identically distributed (i.i.d.), and it is best to avoid such a bias in training samples.