diff --git a/database/rekordbox_pdb.ksy b/database/rekordbox_pdb.ksy new file mode 100644 index 000000000..ba9eca0a2 --- /dev/null +++ b/database/rekordbox_pdb.ksy @@ -0,0 +1,975 @@ +meta: + id: rekordbox_pdb + title: DeviceSQL database export (probably generated by rekordbox) + application: rekordbox + file-extension: + - pdb + license: EPL-1.0 + endian: le + +doc: | + This is a relational database format designed to be efficiently used + by very low power devices (there were deployments on 16 bit devices + with 32K of RAM). Today you are most likely to encounter it within + the Pioneer Professional DJ ecosystem, because it is the format that + their rekordbox software uses to write USB and SD media which can be + mounted in DJ controllers and used to play and mix music. + + It has been reverse-engineered to facilitate sophisticated + integrations with light and laser shows, videos, and other musical + instruments, by supporting deep knowledge of what is playing and + what is coming next through monitoring the network communications of + the players. + + The file is divided into fixed-size blocks. The first block has a + header that establishes the block size, and lists the tables + available in the database, identifying their types and the index of + the first of the series of linked pages that make up that table. + + Each table is made up of a series of rows which may be spread across + any number of pages. The pages start with a header describing the + page and linking to the next page. The rest of the page is used as a + heap: rows are scattered around it, and located using an index + structure that builds backwards from the end of the page. Each row + of a given type has a fixed size structure which links to any + variable-sized strings by their offsets within the page. + + As changes are made to the table, some records may become unused, + and there may be gaps within the heap that are too small to be used + by other data. There is a bit map in the row index that identifies + which rows are actually present. Rows that are not present must be + ignored: they do not contain valid (or even necessarily well-formed) + data. + + The majority of the work in reverse-engineering this format was + performed by @henrybetts and @flesniak, for which I am hugely + grateful. @GreyCat helped me learn the intricacies (and best + practices) of Kaitai far faster than I would have managed on my own. + +doc-ref: https://github.com/Deep-Symmetry/crate-digger/blob/master/doc/Analysis.pdf + +seq: + - type: u4 + doc: | + Unknown purpose, perhaps an unoriginal signature, seems to + always have the value 0. + - id: len_page + type: u4 + doc: | + The database page size, in bytes. Pages are referred to by + index, so this size is needed to calculate their offset, and + table pages have a row index structure which is built from the + end of the page backwards, so finding that also requires this + value. + - id: num_tables + type: u4 + doc: | + Determines the number of table entries that are present. Each + table is a linked list of pages containing rows of a particular + type. + - id: next_unused_page + type: u4 + doc: | + @flesinak said: "Not used as any `empty_candidate`, points + past the end of the file." + - type: u4 + - id: sequence + type: u4 + doc: | + @flesniak said: "Always incremented by at least one, + sometimes by two or three." + - contents: [0, 0, 0, 0] + - id: tables + type: table + repeat: expr + repeat-expr: num_tables + doc: | + Describes and links to the tables present in the database. + +types: + table: + doc: | + Each table is a linked list of pages containing rows of a single + type. This header describes the nature of the table and links to + its pages by index. + seq: + - id: type + type: u4 + enum: page_type + doc: | + Identifies the kind of rows that are found in this table. + - id: empty_candidate + type: u4 + - id: first_page + type: page_ref + doc: | + Links to the chain of pages making up that table. The first + page seems to always contain similar garbage patterns and + zero rows, but the next page it links to contains the start + of the meaningful data rows. + - id: last_page + type: page_ref + doc: | + Holds the index of the last page that makes up this table. + When following the linked list of pages of the table, you + either need to stop when you reach this page, or when you + notice that the `next_page` link you followed took you to a + page of a different `type`. + -webide-representation: '{type}' + + page_ref: + doc: | + An index which points to a table page (its offset can be found + by multiplying the index by the `page_len` value in the file + header). This type allows the linked page to be lazy loaded. + seq: + - id: index + type: u4 + doc: | + Identifies the desired page number. + instances: + body: + doc: | + When referenced, loads the specified page and parses its + contents appropriately for the type of data it contains. + io: _root._io + pos: _root.len_page * index + size: _root.len_page + type: page + + page: + doc: | + A table page, consisting of a short header describing the + content of the page and linking to the next page, followed by a + heap in which row data is found. At the end of the page there is + an index which locates all rows present in the heap via their + offsets past the end of the page header. + seq: + - contents: [0, 0, 0, 0] + - id: page_index + doc: Matches the index we used to look up the page, sanity check? + type: u4 + - id: type + type: u4 + enum: page_type + doc: | + Identifies the type of information stored in the rows of this page. + - id: next_page + doc: | + Index of the next page containing this type of rows. Points past + the end of the file if there are no more. + type: page_ref + - type: u4 + doc: | + @flesniak said: "sequence number (0->1: 8->13, 1->2: 22, 2->3: 27)" + - size: 4 + - id: num_rows_small + type: u1 + doc: | + Holds the value used for `num_rows` (see below) unless + `num_rows_large` is larger (but not equal to `0x1fff`). This + seems like some strange mechanism to deal with the fact that + lots of tiny entries, such as are found in the + `playlist_entries` table, are too big to count with a single + byte. But why not just always use `num_rows_large`, then? + - type: u1 + doc: | + @flesniak said: "a bitmask (1st track: 32)" + - type: u1 + doc: | + @flesniak said: "often 0, sometimes larger, esp. for pages + with high real_entry_count (e.g. 12 for 101 entries)" + - id: page_flags + type: u1 + doc: | + @flesniak said: "strange pages: 0x44, 0x64; otherwise seen: 0x24, 0x34" + - id: free_size + type: u2 + doc: | + Unused space (in bytes) in the page heap, excluding the row + index at end of page. + - id: used_size + type: u2 + doc: | + The number of bytes that are in use in the page heap. + - type: u2 + doc: | + @flesniak said: "(0->1: 2)" + - id: num_rows_large + type: u2 + doc: | + Holds the value used for `num_rows` (as described above) + when that is too large to fit into `num_rows_small`, and + that situation seems to be indicated when this value is + larger than `num_rows_small`, but not equal to `0x1fff`. + This seems like some strange mechanism to deal with the fact + that lots of tiny entries, such as are found in the + `playlist_entries` table, are too big to count with a single + byte. But why not just always use this value, then? + - type: u2 + doc: | + @flesniak said: "1004 for strange blocks, 0 otherwise" + - type: u2 + doc: | + @flesniak said: "always 0 except 1 for history pages, num + entries for strange pages?" + - id: heap + size-eos: true + if: false # never true, but stores pos + instances: + is_data_page: + value: page_flags & 0x40 == 0 + -webide-parse-mode: eager + heap_pos: + value: _io.pos + num_rows: + value: | + (num_rows_large > num_rows_small) and (num_rows_large != 0x1fff) ? num_rows_large : num_rows_small + doc: | + The number of rows on this page (controls the number of row + index entries there are, but some of those may not be marked + as present in the table due to deletion). + -webide-parse-mode: eager + num_groups: + value: '(num_rows - 1) / 16 + 1' + doc: | + The number of row groups that are present in the index. Each + group can hold up to sixteen rows. All but the final one + will hold sixteen rows. + row_groups: + type: 'row_group(_index)' + repeat: expr + repeat-expr: num_groups + doc: | + The actual row groups making up the row index. Each group + can hold up to sixteen rows. Non-data pages do not have + actual rows, and attempting to parse them can crash. + if: is_data_page + + row_group: + doc: | + A group of row indices, which are built backwards from the end + of the page. Holds up to sixteen row offsets, along with a bit + mask that indicates whether each row is actually present in the + table. + params: + - id: group_index + type: u2 + doc: | + Identifies which group is being generated. They build backwards + from the end of the page. + instances: + base: + value: '_root.len_page - (group_index * 0x24)' + doc: | + The starting point of this group of row indices. + row_present_flags: + pos: base - 4 + type: u2 + doc: | + Each bit specifies whether a particular row is present. The + low order bit corresponds to the first row in this index, + whose offset immediately precedes these flag bits. The + second bit corresponds to the row whose offset precedes + that, and so on. + -webide-parse-mode: eager + rows: + type: row_ref(_index) + repeat: expr + repeat-expr: '(group_index < (_parent.num_groups - 1)) ? 16 : ((_parent.num_rows - 1) % 16 + 1)' + doc: | + The row offsets in this group. + + row_ref: + doc: | + An offset which points to a row in the table, whose actual + presence is controlled by one of the bits in + `row_present_flags`. This instance allows the row itself to be + lazily loaded, unless it is not present, in which case there is + no content to be loaded. + params: + - id: row_index + type: u2 + doc: | + Identifies which row within the row index this reference + came from, so the correct flag can be checked for the row + presence and the correct row offset can be found. + instances: + ofs_row: + pos: '_parent.base - (6 + (2 * row_index))' + type: u2 + doc: | + The offset of the start of the row (in bytes past the end of + the page header). + row_base: + value: ofs_row + _parent._parent.heap_pos + doc: | + The location of this row relative to the start of the page. + A variety of pointers (such as all device_sql_string values) + are calculated with respect to this position. + present: + value: '(((_parent.row_present_flags >> row_index) & 1) != 0 ? true : false)' + doc: | + Indicates whether the row index considers this row to be + present in the table. Will be `false` if the row has been + deleted. + -webide-parse-mode: eager + body: + pos: row_base + type: + switch-on: _parent._parent.type + cases: + 'page_type::albums': album_row + 'page_type::artists': artist_row + 'page_type::artwork': artwork_row + 'page_type::colors': color_row + 'page_type::genres': genre_row + 'page_type::keys': key_row + 'page_type::labels': label_row + 'page_type::playlist_tree': playlist_tree_row + 'page_type::playlist_entries': playlist_entry_row + 'page_type::tracks': track_row + if: present + doc: | + The actual content of the row, as long as it is present. + -webide-parse-mode: eager + -webide-representation: '{body.name.body.text}{body.title.body.text} ({body.id})' + + album_row: + doc: | + A row that holds an album name and ID. + seq: + - type: u2 + doc: | + Some kind of magic word? Usually 0x80, 0x00. + - id: index_shift + type: u2 + doc: TODO name from @flesniak, but what does it mean? + - type: u4 + - id: artist_id + type: u4 + doc: | + Identifies the artist associated with the album. + - id: id + type: u4 + doc: | + The unique identifier by which this album can be requested + and linked from other rows (such as tracks). + - type: u4 + - type: u1 + doc: | + @flesniak says: "alwayx 0x03, maybe an unindexed empty string" + - id: ofs_name + type: u1 + doc: | + The location of the variable-length name string, relative to + the start of this row. + instances: + name: + type: device_sql_string + pos: _parent.row_base + ofs_name + doc: | + The name of this album. + -webide-parse-mode: eager + + artist_row: + doc: | + A row that holds an artist name and ID. + seq: + - id: subtype + type: u2 + doc: | + Usually 0x60, but 0x64 means we have a long name offset + embedded in the row. + - id: index_shift + type: u2 + doc: TODO name from @flesniak, but what does it mean? + - id: id + type: u4 + doc: | + The unique identifier by which this artist can be requested + and linked from other rows (such as tracks). + - type: u1 + doc: | + @flesniak says: "always 0x03, maybe an unindexed empty string" + - id: ofs_name_near + type: u1 + doc: | + The location of the variable-length name string, relative to + the start of this row, unless subtype is 0x64. + instances: + ofs_name_far: + pos: _parent.row_base + 0x0a + type: u2 + doc: | + For names that might be further than 0xff bytes from the + start of this row, this holds a two-byte offset, and is + signalled by the subtype value. + if: subtype == 0x64 + name: + pos: '_parent.row_base + (subtype == 0x64? ofs_name_far : ofs_name_near)' + type: device_sql_string + doc: | + The name of this artist. + -webide-parse-mode: eager + + artwork_row: + doc: | + A row that holds the path to an album art image file and the + associated artwork ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this art can be requested + and linked from other rows (such as tracks). + - id: path + type: device_sql_string + doc: | + The variable-length file path string at which the art file + can be found. + -webide-representation: '{path.body.text}' + + color_row: + doc: | + A row that holds a color name and the associated ID. + seq: + - size: 5 + - id: id + type: u2 + doc: | + The unique identifier by which this color can be requested + and linked from other rows (such as tracks). + - type: u1 + - id: name + type: device_sql_string + doc: | + The variable-length string naming the color. + + genre_row: + doc: | + A row that holds a genre name and the associated ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this genre can be requested + and linked from other rows (such as tracks). + - id: name + type: device_sql_string + doc: | + The variable-length string naming the genre. + + key_row: + doc: | + A row that holds a musical key and the associated ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this key can be requested + and linked from other rows (such as tracks). + - id: id2 + type: u4 + doc: | + Seems to be a second copy of the ID? + - id: name + type: device_sql_string + doc: | + The variable-length string naming the key. + + label_row: + doc: | + A row that holds a label name and the associated ID. + seq: + - id: id + type: u4 + doc: | + The unique identifier by which this label can be requested + and linked from other rows (such as tracks). + - id: name + type: device_sql_string + doc: | + The variable-length string naming the label. + + playlist_tree_row: + doc: | + A row that holds a playlist name, ID, indication of whether it + is an ordinary playlist or a folder of other playlists, a link + to its parent folder, and its sort order. + seq: + - id: parent_id + type: u4 + doc: | + The ID of the `playlist_tree_row` in which this one can be + found, or `0` if this playlist exists at the root level. + - size: 4 + - id: sort_order + type: u4 + doc: | + The order in which the entries of this playlist are sorted. + - id: id + type: u4 + doc: | + The unique identifier by which this playlist or folder can + be requested and linked from other rows. + - id: raw_is_folder + type: u4 + doc: | + Has a non-zero value if this is actually a folder rather + than a playlist. + - id: name + type: device_sql_string + doc: | + The variable-length string naming the playlist. + instances: + is_folder: + value: raw_is_folder != 0 + -webide-parse-mode: eager + + playlist_entry_row: + doc: | + A row that associates a track with a position in a playlist. + seq: + - id: entry_index + type: u4 + doc: | + The position within the playlist represented by this entry. + - id: track_id + type: u4 + doc: | + The track found at this position in the playlist. + - id: playlist_id + type: u4 + doc: | + The playlist to which this entry belongs. + + track_row: + doc: | + A row that describes a track that can be played, with many + details about the music, and links to other tables like artists, + albums, keys, etc. + seq: + - type: u2 + doc: | + Some kind of magic word? Usually 0x24, 0x00. + - id: index_shift + type: u2 + doc: TODO name from @flesniak, but what does it mean? + - id: bitmask + type: u4 + doc: TODO what do the bits mean? + - id: sample_rate + type: u4 + doc: | + Playback sample rate of the audio file. + - id: composer_id + type: u4 + doc: | + References a row in the artist table if the composer is + known. + - id: file_size + type: u4 + doc: | + The length of the audio file, in bytes. + - type: u4 + doc: | + Some ID? Purpose as yet unknown. + - type: u2 + doc: | + From @flesniak: "always 19048?" + - type: u2 + doc: | + From @flesniak: "always 30967?" + - id: artwork_id + type: u4 + doc: | + References a row in the artwork table if there is album art. + - id: key_id + type: u4 + doc: | + References a row in the keys table if the track has a known + main musical key. + - id: original_artist_id + type: u4 + doc: | + References a row in the artwork table if this is a cover + performance and the original artist is known. + - id: label_id + type: u4 + doc: | + References a row in the labels table if the track has a + known record label. + - id: remixer_id + type: u4 + doc: | + References a row in the artists table if the track has a + known remixer. + - id: bitrate + type: u4 + doc: | + Playback bit rate of the audio file. + - id: track_number + type: u4 + doc: | + The position of the track within an album. + - id: tempo + type: u4 + doc: | + The tempo at the start of the track in beats per minute, + multiplied by 100. + - id: genre_id + type: u4 + doc: | + References a row in the genres table if the track has a + known musical genre. + - id: album_id + type: u4 + doc: | + References a row in the albums table if the track has a + known album. + - id: artist_id + type: u4 + doc: | + References a row in the artists table if the track has a + known performer. + - id: id + type: u4 + doc: | + The id by which this track can be looked up; players will + report this value in their status packets when they are + playing the track. + - id: disc_number + type: u2 + doc: | + The number of the disc on which this track is found, if it + is known to be part of a multi-disc album. + - id: play_count + type: u2 + doc: | + The number of times this track has been played. + - id: year + type: u2 + doc: | + The year in which this track was released. + - id: sample_depth + type: u2 + doc: | + The number of bits per sample of the audio file. + - id: duration + type: u2 + doc: | + The length, in seconds, of the track when played at normal + speed. + - type: u2 + doc: | + From @flesniak: "always 41?" + - id: color_id + type: u1 + doc: | + References a row in the colors table if the track has been + assigned a color. + - id: rating + type: u1 + doc: | + The number of stars to display for the track, 0 to 5. + - type: u2 + doc: | + From @flesniak: "always 1?" + - type: u2 + doc: | + From @flesniak: "alternating 2 or 3" + - id: ofs_strings + type: u2 + repeat: expr + repeat-expr: 21 + doc: | + The location, relative to the start of this row, of a + variety of variable-length strings. + instances: + unknown_string_1: + type: device_sql_string + pos: _parent.row_base + ofs_strings[0] + doc: | + A string of unknown purpose, which has so far only been + empty. + -webide-parse-mode: eager + texter: + type: device_sql_string + pos: _parent.row_base + ofs_strings[1] + doc: | + A string of unknown purpose, which @flesnik named. + -webide-parse-mode: eager + unknown_string_2: + type: device_sql_string + pos: _parent.row_base + ofs_strings[2] + doc: | + A string of unknown purpose; @flesniak said "thought + tracknumber -> wrong!" + unknown_string_3: + type: device_sql_string + pos: _parent.row_base + ofs_strings[3] + doc: | + A string of unknown purpose; @flesniak said "strange + strings, often zero length, sometimes low binary values + 0x01/0x02 as content" + unknown_string_4: + type: device_sql_string + pos: _parent.row_base + ofs_strings[4] + doc: | + A string of unknown purpose; @flesniak said "strange + strings, often zero length, sometimes low binary values + 0x01/0x02 as content" + -webide-parse-mode: eager + message: + type: device_sql_string + pos: _parent.row_base + ofs_strings[5] + doc: | + A string of unknown purpose, which @flesnik named. + -webide-parse-mode: eager + kuvo_public: + type: device_sql_string + pos: _parent.row_base + ofs_strings[6] + doc: | + A string whose value is always either empty or "ON", and + which apparently for some insane reason is used, rather than + a single bit somewhere, to control whether the track + information is visible on Kuvo. + -webide-parse-mode: eager + autoload_hotcues: + type: device_sql_string + pos: _parent.row_base + ofs_strings[7] + doc: | + A string whose value is always either empty or "ON", and + which apparently for some insane reason is used, rather than + a single bit somewhere, to control whether hot-cues are + auto-loaded for the track. + -webide-parse-mode: eager + unknown_string_5: + type: device_sql_string + pos: _parent.row_base + ofs_strings[8] + doc: | + A string of unknown purpose. + -webide-parse-mode: eager + unknown_string_6: + type: device_sql_string + pos: _parent.row_base + ofs_strings[9] + doc: | + A string of unknown purpose, usually empty. + -webide-parse-mode: eager + date_added: + type: device_sql_string + pos: _parent.row_base + ofs_strings[10] + doc: | + A string containing the date this track was added to the collection. + -webide-parse-mode: eager + release_date: + type: device_sql_string + pos: _parent.row_base + ofs_strings[11] + doc: | + A string containing the date this track was released, if known. + -webide-parse-mode: eager + mix_name: + type: device_sql_string + pos: _parent.row_base + ofs_strings[12] + doc: | + A string naming the remix of the track, if known. + -webide-parse-mode: eager + unknown_string_7: + type: device_sql_string + pos: _parent.row_base + ofs_strings[13] + doc: | + A string of unknown purpose, usually empty. + -webide-parse-mode: eager + analyze_path: + type: device_sql_string + pos: _parent.row_base + ofs_strings[14] + doc: | + The file path of the track analysis, which allows rapid + seeking to particular times in variable bit-rate files, + jumping to particular beats, visual waveform previews, and + stores cue points and loops. + -webide-parse-mode: eager + analyze_date: + type: device_sql_string + pos: _parent.row_base + ofs_strings[15] + doc: | + A string containing the date this track was analyzed by rekordbox. + -webide-parse-mode: eager + comment: + type: device_sql_string + pos: _parent.row_base + ofs_strings[16] + doc: | + The comment assigned to the track by the DJ, if any. + -webide-parse-mode: eager + title: + type: device_sql_string + pos: _parent.row_base + ofs_strings[17] + doc: | + The title of the track. + -webide-parse-mode: eager + unknown_string_8: + type: device_sql_string + pos: _parent.row_base + ofs_strings[18] + doc: | + A string of unknown purpose, usually empty. + -webide-parse-mode: eager + filename: + type: device_sql_string + pos: _parent.row_base + ofs_strings[19] + doc: | + The file name of the track audio file. + -webide-parse-mode: eager + file_path: + type: device_sql_string + pos: _parent.row_base + ofs_strings[20] + doc: | + The file path of the track audio file. + -webide-parse-mode: eager + + device_sql_string: + doc: | + A variable length string which can be stored in a variety of + different encodings. + seq: + - id: length_and_kind + type: u1 + doc: | + Mangled length of an ordinary ASCII string if odd, or a flag + indicating another encoding with a longer length value to + follow. + - id: body + type: + switch-on: length_and_kind + cases: + 0x40: device_sql_long_ascii + 0x90: device_sql_long_utf16be + _: device_sql_short_ascii(length_and_kind) + -webide-parse-mode: eager + -webide-representation: '{body.text}' + + device_sql_short_ascii: + doc: | + An ASCII-encoded string up to 127 bytes long. + params: + - id: mangled_length + type: u1 + doc: | + Contains the actual length, incremented, doubled, and + incremented again. Go figure. + seq: + - id: text + type: str + size: length + encoding: ascii + if: '(mangled_length % 2 > 0) and (length >= 0)' # Skip invalid strings + doc: | + The content of the string. + instances: + length: + value: '((mangled_length - 1) / 2) - 1' + doc: | + The un-mangled length of the string, in bytes. + -webide-parse-mode: eager + + device_sql_long_ascii: + doc: | + An ASCII-encoded string preceded by a two-byte length field. + seq: + - id: length + type: u2 + doc: | + Contains the length of the string in bytes. + - id: text + type: str + size: length + encoding: ascii + doc: | + The content of the string. + + device_sql_long_utf16be: + doc: | + A UTF-16BE-encoded string preceded by a two-byte length field. + seq: + - id: length + type: u2 + doc: | + Contains the length of the string in bytes, including two trailing nulls. + - id: text + type: str + size: length - 4 + encoding: utf-16be + doc: | + The content of the string. + +enums: + page_type: + 0: + id: tracks + doc: | + Holds rows describing tracks, such as their title, artist, + genre, artwork ID, playing time, etc. + 1: + id: genres + doc: | + Holds rows naming musical genres, for reference by tracks and searching. + 2: + id: artists + doc: | + Holds rows naming artists, for reference by tracks and searching. + 3: + id: albums + doc: | + Holds rows naming albums, for reference by tracks and searching. + 4: + id: labels + doc: | + Holds rows naming music labels, for reference by tracks and searching. + 5: + id: keys + doc: | + Holds rows naming musical keys, for reference by tracks and searching. + 6: + id: colors + doc: | + Holds rows naming color labels, for reference by tracks and searching. + 7: + id: playlist_tree + doc: | + Holds rows that describe the hierarchical tree structure of + available playlists and folders grouping them. + 8: + id: playlist_entries + doc: | + Holds rows that enumerate the tracks found in playlists and + the playlists they belong to. + 9: + id: unknown_9 + 10: + id: unknown_10 + 11: + id: unknown_11 + doc: | + The rows all seem to have history file names in them, such as "HISTORY 001". + 12: + id: unknown_12 + 13: + id: artwork + doc: | + Holds rows pointing to album artwork images. + 14: + id: unknown_14 + 15: + id: unknown_15 + 16: + id: columns + doc: | + TODO figure out and explain + 17: + id: unknown_17 + 18: + id: unknown_18 + 19: + id: history + doc: | + Holds rows listing tracks played in performance sessions. diff --git a/media/rekordbox_anlz.ksy b/media/rekordbox_anlz.ksy new file mode 100644 index 000000000..6a53fef27 --- /dev/null +++ b/media/rekordbox_anlz.ksy @@ -0,0 +1,514 @@ +meta: + id: rekordbox_anlz + title: rekordbox track analysis file + application: rekordbox + file-extension: + - dat + - ext + license: EPL-1.0 + endian: be + +doc: | + These files are created by rekordbox when analyzing audio tracks + to facilitate DJ performance. They include waveforms, beat grids + (information about the precise time at which each beat occurs), + time indices to allow efficient seeking to specific positions + inside variable bit-rate audio streams, and lists of memory cues + and loop points. They are used by Pioneer professional DJ + equipment. + + The format has been reverse-engineered to facilitate sophisticated + integrations with light and laser shows, videos, and other musical + instruments, by supporting deep knowledge of what is playing and + what is coming next through monitoring the network communications + of the players. + +doc-ref: https://reverseengineering.stackexchange.com/questions/4311/help-reversing-a-edb-database-file-for-pioneers-rekordbox-software + +seq: + - contents: "PMAI" + - id: len_header + type: u4 + doc: | + The number of bytes of this header section. + - id: len_file + type: u4 + doc: | + The number of bytes in the entire file. + - size: len_header - _io.pos + - id: sections + type: tagged_section + repeat: eos + doc: | + The remainder of the file is a sequence of type-tagged sections, + identified by a four-byte magic sequence. + +types: + tagged_section: + doc: | + A type-tagged file section, identified by a four-byte magic + sequence, with a header specifying its length, and whose payload + is determined by the type tag. + seq: + - id: fourcc + type: s4 + # enum: section_tags # Can't use this line until KSC supports switching on possibly-null enums in Java. + doc: | + A tag value indicating what kind of section this is. + - id: len_header + type: u4 + doc: | + The size, in bytes, of the header portion of the tag. + - id: len_tag + type: u4 + doc: | + The size, in bytes, of this entire tag, counting the header. + - id: body + size: len_tag - 12 + type: + switch-on: fourcc + cases: + 0x50434f32: cue_extended_tag #'section_tags::cues_2' (PCO2) + 0x50434f42: cue_tag #'section_tags::cues' (PCOB) + 0x50505448: path_tag #'section_tags::path' (PPTH) + 0x5051545a: beat_grid_tag #'section_tags::beat_grid' (PQTZ) + 0x50564252: vbr_tag #'section_tags::vbr' (PVBR) + 0x50574156: wave_preview_tag #'section_tags::wave_preview' (PWAV) + 0x50575632: wave_preview_tag #'section_tags::wave_tiny' (PWV2) + 0x50575633: wave_scroll_tag #'section_tags::wave_scroll' (PWV3, seen in .EXT) + 0x50575634: wave_color_preview_tag #'section_tags::wave_color_preview' (PWV4, in .EXT) + 0x50575635: wave_color_scroll_tag #'section_tags::wave_color_scroll' (PWV5, in .EXT) + 0x50535349: song_structure_tag #'section_tags::song_structure' (PSSI, in .EXT) + _: unknown_tag + -webide-representation: '{fourcc}' + + + beat_grid_tag: + doc: | + Holds a list of all the beats found within the track, recording + their bar position, the time at which they occur, and the tempo + at that point. + seq: + - type: u4 + - type: u4 # @flesniak says this is always 0x80000 + - id: len_beats + type: u4 + doc: | + The number of beat entries which follow. + - id: beats + type: beat_grid_beat + repeat: expr + repeat-expr: len_beats + doc: The entries of the beat grid. + + beat_grid_beat: + doc: | + Describes an individual beat in a beat grid. + seq: + - id: beat_number + type: u2 + doc: | + The position of the beat within its musical bar, where beat 1 + is the down beat. + - id: tempo + type: u2 + doc: | + The tempo at the time of this beat, in beats per minute, + multiplied by 100. + - id: time + type: u4 + doc: | + The time, in milliseconds, at which this beat occurs when + the track is played at normal (100%) pitch. + + cue_tag: + doc: | + Stores either a list of ordinary memory cues and loop points, or + a list of hot cues and loop points. + seq: + - id: type + type: u4 + enum: cue_list_type + doc: | + Identifies whether this tag stores ordinary or hot cues. + - size: 2 + - id: len_cues + type: u2 + doc: | + The length of the cue list. + - id: memory_count + type: u4 + doc: | + Unsure what this means. + - id: cues + type: cue_entry + repeat: expr + repeat-expr: len_cues + + cue_entry: + doc: | + A cue list entry. Can either represent a memory cue or a loop. + seq: + - contents: "PCPT" + - id: len_header + type: u4 + - id: len_entry + type: u4 + - id: hot_cue + type: u4 + doc: | + If zero, this is an ordinary memory cue, otherwise this a + hot cue with the specified number. + - id: status + type: u4 + enum: cue_entry_status + doc: | + If zero, this entry should be ignored. + - type: u4 # Seems to always be 0x10000 + - id: order_first + type: u2 + doc: | + @flesniak says: "0xffff for first cue, 0,1,3 for next" + - id: order_last + type: u2 + doc: | + @flesniak says: "1,2,3 for first, second, third cue, 0xffff for last" + - id: type + type: u1 + enum: cue_entry_type + doc: | + Indicates whether this is a memory cue or a loop. + - size: 3 # seems to always be 1000 + - id: time + type: u4 + doc: | + The position, in milliseconds, at which the cue point lies + in the track. + - id: loop_time + type: u4 + doc: | + The position, in milliseconds, at which the player loops + back to the cue time if this is a loop. + - size: 16 + + cue_extended_tag: + doc: | + A variation of cue_tag which was introduced with the nxs2 line, + and adds descriptive names. (Still comes in two forms, either + holding memory cues and loop points, or holding hot cues and + loop points.) Also includes hot cues D through H and color assignment. + seq: + - id: type + type: u4 + enum: cue_list_type + doc: | + Identifies whether this tag stores ordinary or hot cues. + - id: len_cues + type: u2 + doc: | + The length of the cue comment list. + - size: 2 + - id: cues + type: cue_extended_entry + repeat: expr + repeat-expr: len_cues + + cue_extended_entry: + doc: | + A cue extended list entry. Can either describe a memory cue or a + loop. + seq: + - contents: "PCP2" + - id: len_header + type: u4 + - id: len_entry + type: u4 + - id: hot_cue + type: u4 + doc: | + If zero, this is an ordinary memory cue, otherwise this a + hot cue with the specified number. + - id: type + type: u1 + enum: cue_entry_type + doc: | + Indicates whether this is a memory cue or a loop. + - size: 3 # seems to always be 1000 + - id: time + type: u4 + doc: | + The position, in milliseconds, at which the cue point lies + in the track. + - id: loop_time + type: u4 + doc: | + The position, in milliseconds, at which the player loops + back to the cue time if this is a loop. + - id: color_id + type: u1 + doc: | + References a row in the colors table if this is a memory cue or loop + and has been assigned a color. + - size: 11 # Loops seem to have some non-zero values in the last four bytes of this. + - id: len_comment + type: u4 + if: len_entry > 43 + - id: comment + type: str + size: len_comment + encoding: utf-16be + doc: | + The comment assigned to this cue by the DJ, if any, with a trailing NUL. + if: len_entry > 43 + - id: color_code + type: u1 + doc: | + A lookup value for a color table? We use this to index to the hot cue colors shown in rekordbox. + if: (len_entry - len_comment) > 44 + - id: color_red + type: u1 + doc: | + The red component of the hot cue color to be displayed. + if: (len_entry - len_comment) > 45 + - id: color_green + type: u1 + doc: | + The green component of the hot cue color to be displayed. + if: (len_entry - len_comment) > 46 + - id: color_blue + type: u1 + doc: | + The blue component of the hot cue color to be displayed. + if: (len_entry - len_comment) > 47 + - size: len_entry - 48 - len_comment # The remainder after the color + if: (len_entry - len_comment) > 48 + + path_tag: + doc: | + Stores the file path of the audio file to which this analysis + applies. + seq: + - id: len_path + type: u4 + - id: path + type: str + size: len_path - 2 + encoding: utf-16be + if: len_path > 1 + + vbr_tag: + doc: | + Stores an index allowing rapid seeking to particular times + within a variable-bitrate audio file. + seq: + - type: u4 + - id: index + type: u4 + repeat: expr + repeat-expr: 400 + + wave_preview_tag: + doc: | + Stores a waveform preview image suitable for display above + the touch strip for jumping to a track position. + seq: + - id: len_preview + type: u4 + doc: | + The length, in bytes, of the preview data itself. This is + slightly redundant because it can be computed from the + length of the tag. + - type: u4 # This seems to always have the value 0x10000 + - id: data + size: len_preview + doc: | + The actual bytes of the waveform preview. + if: _parent.len_tag > _parent.len_header + + wave_scroll_tag: + doc: | + A larger waveform image suitable for scrolling along as a track + plays. + seq: + - id: len_entry_bytes + type: u4 + doc: | + The size of each entry, in bytes. Seems to always be 1. + - id: len_entries + type: u4 + doc: | + The number of waveform data points, each of which takes one + byte. + - type: u4 # Always 0x960000? + - id: entries + size: len_entries * len_entry_bytes + + wave_color_preview_tag: + doc: | + A larger, colorful waveform preview image suitable for display + above the touch strip for jumping to a track position on newer + high-resolution players. + seq: + - id: len_entry_bytes + type: u4 + doc: | + The size of each entry, in bytes. Seems to always be 6. + - id: len_entries + type: u4 + doc: | + The number of waveform data points, each of which takes one + byte for each of six channels of information. + - type: u4 + - id: entries + size: len_entries * len_entry_bytes + + wave_color_scroll_tag: + doc: | + A larger, colorful waveform image suitable for scrolling along + as a track plays on newer high-resolution hardware. Also + contains a higher-resolution blue/white waveform. + seq: + - id: len_entry_bytes + type: u4 + doc: | + The size of each entry, in bytes. Seems to always be 2. + - id: len_entries + type: u4 + doc: | + The number of columns of waveform data (this matches the + non-color waveform length. + - type: u4 + - id: entries + size: len_entries * len_entry_bytes + + song_structure_tag: + doc: | + Stores the song structure, also known as phrases (intro, verse, + bridge, chorus, up, down, outro). + seq: + - id: len_entry_bytes + type: u4 + doc: | + The size of each entry, in bytes. Seems to always be 24. + - id: len_entries + type: u2 + doc: | + The number of phrases. + - id: style + type: u2 + # enum: phrase_style Can't use this line until KSC supports switching on possibly-null enums in Java. + doc: | + The phrase style. 1 is the up-down style + (white label text in rekordbox) where the main phrases consist + of up, down, and chorus. 2 is the bridge-verse style + (black label text in rekordbox) where the main phrases consist + of verse, chorus, and bridge. Style 3 is mostly identical to + bridge-verse style except verses 1-3 are labeled VERSE1 and verses + 4-6 are labeled VERSE2 in rekordbox. + - size: 6 + - id: end_beat + type: u2 + doc: | + The beat number at which the last phrase ends. The track may + continue after the last phrase ends. If this is the case, it will + mostly be silence. + - size: 4 + - id: entries + type: song_structure_entry + repeat: expr + repeat-expr: len_entries + + song_structure_entry: + doc: | + A song structure entry, represents a single phrase. + seq: + - id: phrase_number + type: u2 + doc: | + The absolute number of the phrase, starting at one. + - id: beat_number + type: u2 + doc: | + The beat number at which the phrase starts. + - id: phrase_id + type: + switch-on: _parent.style + cases: + 1: phrase_up_down # 'phrase_style::up_down' + 2: phrase_verse_bridge # 'phrase_style::verse_bridge' + _: phrase_verse_bridge + doc: | + Identifier of the phrase label. + - size: _parent.len_entry_bytes - 9 + - id: fill_in + type: u1 + doc: | + If nonzero, fill-in is present. + - id: fill_in_beat_number + type: u2 + doc: | + The beat number at which fill-in starts. + + phrase_up_down: + seq: + - id: id + type: u2 + enum: phrase_up_down_id + + phrase_verse_bridge: + seq: + - id: id + type: u2 + enum: phrase_verse_bridge_id + + unknown_tag: {} + +enums: + section_tags: # We can't use this enum until KSC supports default/unmatched values + 0x50434f42: cues # PCOB + 0x50434f32: cues_2 # PCO2 (seen in .EXT) + 0x50505448: path # PPTH + 0x50564252: vbr # PVBR + 0x5051545a: beat_grid # PQTZ + 0x50574156: wave_preview # PWAV + 0x50575632: wave_tiny # PWV2 + 0x50575633: wave_scroll # PWV3 (seen in .EXT) + 0x50575634: wave_color_preview # PWV4 (seen in .EXT) + 0x50575635: wave_color_scroll # PWV5 (seen in .EXT) + 0x50535349: song_structure # PSSI (seen in .EXT) + + cue_list_type: + 0: memory_cues + 1: hot_cues + + cue_entry_type: + 1: memory_cue + 2: loop + + cue_entry_status: + 0: disabled + 1: enabled + + phrase_style: + 1: up_down + 2: verse_bridge + 3: verse_bridge_2 + + phrase_verse_bridge_id: + 1: intro + 2: verse1 + 3: verse2 + 4: verse3 + 5: verse4 + 6: verse5 + 7: verse6 + 8: bridge + 9: chorus + 10: outro + + phrase_up_down_id: + 1: intro + 2: up + 3: down + 5: chorus + 6: outro