Skip to content

Commit

Permalink
Better catalognum search, more accurate albumtypes and other little f…
Browse files Browse the repository at this point in the history
…ixes
  • Loading branch information
snejus committed Feb 10, 2022
1 parent fae5d6a commit 142218e
Show file tree
Hide file tree
Showing 23 changed files with 1,485 additions and 2,114 deletions.
8 changes: 7 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,27 @@ on:
jobs:
sonarcloud:
runs-on: ubuntu-latest
strategy:
matrix:
python: [3.6, 3.7, 3.8, 3.9, "3.10"]
beets: [1.4.9, 1.5.0, 1.6.0]
steps:
- uses: actions/checkout@v2
with:
# Disabling shallow clone is recommended for improving relevancy of reporting
fetch-depth: 0

- name: Set up Python ${{ matrix.python-version }}
- name: Set up Python ${{ matrix.python }}
uses: actions/setup-python@v2

- name: Install dependencies
run: |
python -m pip install --upgrade pip setuptools wheel
pip install poetry==1.2.0a2
poetry install --sync
pip install beets==${{ matrix.beets }}
- name: Pytest
continue-on-error: true
run: |
poetry run pytest
- name: Coveralls
Expand Down
104 changes: 84 additions & 20 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,67 @@
## [0.12.0] 2022-02-10

### Added

- `album`: following MusicBrainz [title format specification], strings **EP** and **LP** are from now on kept in place in album names.
- `catalognum`: To find the catalog number, we have previously been looking at the release title and pointers such as **Catalogue Number:** within the release description.

In addition to the above, we now apply a generic search pattern across the rest of the text, including media title, media description and the rest of the release description.

For those interested, at a high level the pattern used in the search looks like below

```perl
(
[A-Z .]+\d{3} # HANDS D300
| [A-z ][ ]0\d{2,3} # Persephonic Sirens 012
| [A-Z-]{2,}\d+ # RIV4
| [A-Z]+[A-Z.$-]+\d{2,} # USE202, HEY-101, LI$025
| [A-Z.]{2,}[ ]\d{1,3} # OBS.CUR 9
| \w+[A-z]0\d+ # 1ØPILLS018, fa036
| [a-z]+(cd|lp)\d+ # ostgutlp45
| [A-z]+\d+-\d+ # P90-003
)
( # optionally followed by
[ ]?[A-Z] # IBM001V
| [.][0-9]+ # ISMVA002.1
| -?[A-Z]+ # PLUS8024CD
)?
```

- `albumtype`: similar to the `catalognum`, the descriptions are searched for **EP** and **LP** strings presence to find out the `albumtype`.

- `track`: Support for tracks that do not use dash (`-`) but some other character to separate pieces of information in track names. For example, consider the following [tracklist]:

```
A1 | WHITESHADOWHURTS x TOXICSPIKEBACK | Arcadia
A2 | WHITESHADOWHURTS | Corrupted Entity
A3 | WHITESHADOWHURTS | Colosseo
B1 | TOXICSPIKEBACK | Eclipse
B2 | TOXICSPIKEBACK | Eclipse [DJ LINT's Tribe Mix]
B3 | WHITESHADOWHURTS | Corrupted Entity [OAT.M's Oldschool Mix]
```

`beetcamp` now finds that `|` is being used as the delimiter and parses values for `track_alt`, `artist` and `title` accordingly.

### Updated

- singleton: `album` and `albumartist` fields are not anymore populated.
- `catalognum`: artists like **PROCESS 404** are not assumed to be catalogue numbers anymore.
- `track_alt`: allow non-capital letters, like **a1** to be parsed and convert them to capitals.
- `albumartist`: use **Various Artists** (or equivalent) when a release includes more than four different artists. Until now we've only done so for compilations.
- `genre`: genres are now sorted alphabetically

### Fixed

- Support for `beets<1.5` has been broken since `0.11.0`, - it should now work fine. However, fields such as `comments` and `lyrics` are not available, and album-like metadata like `catalognum` is not available for singletons. Thanks **@zane-schaffer** for reporting this issue (Closes #22).
- `singleton`: `catalognum`, if found, is now reliably removed from the title.
- `track.title`: `-` delimiter is handled more appropriately when it is found in the song title.
- `albumartist`: for the Various Artists releases, the plugin will now use the globally configured `va_name` field instead of hard-coding _Various Artists_.
- `artist`: Recent bandcamp updates of the JSON data removed artists names from most of compilations, therefore we are again having a peek at the raw HTML data to fetch the data from there.

[tracklist]: https://scumcllctv.bandcamp.com/album/scum002-arcadia
[title format specification]: https://beta.musicbrainz.org/doc/Style/Titles
[0.12.0]: https://github.com/snejus/beetcamp/releases/tag/0.12.0

## [0.11.0] 2021-11-12

### Added
Expand Down Expand Up @@ -62,6 +126,7 @@
data - this is now handled gracefully.

[musicbrainz genres]: https://beta.musicbrainz.org/genres
[0.11.0]: https://github.com/snejus/beetcamp/releases/tag/0.11.0

## [0.10.1] 2021-09-13

Expand All @@ -70,7 +135,7 @@
- Fixed #18 by handling cases when a track duration is not given.
- Fixed #19 where artist names like **SUNN O)))** would get incorrectly mistreated by
the album name cleanup logic due to multiple consecutive parentheses. The fix involved
adding some rules around it: they are now deduped _only if_
adding some rules around it: they are now deduplicated _only if_

- they are preceded with a space
- or they enclose remix / edit info and are the last characters in the album name
Expand Down Expand Up @@ -154,7 +219,7 @@ Thanks @arogl for reporting each of the above!
- `label`: some releases embed the `recordLabel` field into the json data - it now gets
prioritized over the publisher name when it is available.
- `track.title`: clean up `*digital only*` properly. Previously we did not account for
asterixes
asterisks

### Fixed

Expand All @@ -180,7 +245,7 @@ Thanks @arogl for reporting each of the above!

### Added

- Added a github action to run ci for `master` and `dev` branches. For now it's just a minimal
- Added a github action to run CI for `master` and `dev` branches. For now it's just a minimal
configuration and will probably get updated soon.

## [0.9.1] 2021-06-04
Expand All @@ -194,19 +259,19 @@ Thanks @arogl for reporting each of the above!
been fixed regarding the MusicBrainz description: release composed of the same title
and multiple remixes is a single.
- Use `ep` only if _EP_ is mentioned either in the album name or the disc title.
- `album.catalognum`: Make the _DISCTITLE_ uppercase before looking for the catalogue
- `album.catalognum`: Make the _DISCTITLE_ uppercase before looking for the catalog
number.
- `album.media`: Exclude anything that contains _bundle_ in their names. These usually
contain additional releases that we do not need.
- `track.title`: Clean `- DIGITAL ONLY` (and similar) when it's preceded by a dash and not
enclosed by parens or square brackets.
enclosed by parentheses or square brackets.
- `track.track_alt`: Having witnessed a very creative track title **E7-E5**, limit the
`track_alt` field number to the range **0-6**.
- Committed a JSON testcase which was supposed to be part of `0.9.0`.
- Committed a JSON test case which was supposed to be part of `0.9.0`.

### Added

- Extend `url2json` with `--tracklist-for-tests` to ease adding new testcases.
- Extend `url2json` with `--tracklist-for-tests` to ease adding new test cases.

## [0.9.0] 2021-06-01

Expand All @@ -219,12 +284,12 @@ Thanks @arogl for reporting each of the above!
### Added

- The `comments` field now includes the media description and credits.
- The description is searched for artist and album names in addition to the catalogue
- The description is searched for artist and album names in addition to the catalog
number.

### Updated

- All testcases are now pretty JSON files - this should bring more transparency around
- All test cases are now pretty JSON files - this should bring more transparency around
the adjustments that Bandcamp make in the future (once they get updated). The `url2json`
tool has `-u` flag that updates them automatically.

Expand All @@ -248,7 +313,7 @@ Thanks @arogl for reporting each of the above!

- Parsing / logic:

- Token `feat.` is now recognised as a valid member of the `artist` field.
- Token `feat.` is now recognized as a valid member of the `artist` field.
- `free download`, `[EP|LP]`, `(EP|LP)`, `E.P.`, `LP` are now cleaned from the album name.
- Updated `albumtype` logic: in some `compilation` cases track artists would go missing
and get set to _Various Artists_ - instead it now defaults to the original
Expand All @@ -271,7 +336,7 @@ Thanks @arogl for reporting each of the above!

### Added

- Release description is now checked for the catalogue number.
- Release description is now checked for the catalog number.
- Added a test based on parsing _the JSON output_ directly without having to parse the
entire HTML. Bandcamp have been moving away from HTML luckily, so let's hope the trend
continues.
Expand Down Expand Up @@ -305,7 +370,7 @@ Thanks @arogl for reporting each of the above!

- Artist name (unless it's a singleton track)
- Label name
- Catalogue number
- Catalog number
- Strings
- **Various Artists**
- **limited edition**
Expand All @@ -328,7 +393,7 @@ Thanks @arogl for reporting each of the above!
- Added _recommended_ installation method in the readme.
- Added tox tests for `beets < 1.5` and `beets > 1.5` for python versions from 3.6 up to
3.9.
- Sped up reimporting bandcamp items by checking whether the URL is already available
- Sped up re-importing bandcamp items by checking whether the URL is already available
before searching.
- Parsing: If track's name includes _bandcamp digital (bonus|only) etc._, **bandcamp** part gets
removed as well.
Expand Down Expand Up @@ -391,7 +456,7 @@ Thanks @arogl for reporting each of the above!

### Updated

- Catalogue number parser now requires at least two digits to find a good match.
- Catalog number parser now requires at least two digits to find a good match.

## [0.5.5] 2021-01-30

Expand Down Expand Up @@ -419,8 +484,8 @@ Thanks @arogl for reporting each of the above!
- Handle a sold-out release where the track listing isn't available, which would otherwise
cause a KeyError.

- Catalogue number parser should now forget that cassette types like **C30** or **C90**
could be valid catalogue numbers.
- Catalog number parser should now forget that cassette types like **C30** or **C90**
could be valid catalog numbers.

### Updated

Expand All @@ -430,8 +495,7 @@ Thanks @arogl for reporting each of the above!

### Fixed

- For data that is parsed directly from the html, ampersands are now correctly
unescaped.
- For data that is parsed directly from the html, ampersands are now correctly unescaped.

## [0.5.2] 2021-01-18

Expand Down Expand Up @@ -487,7 +551,7 @@ Thanks @arogl for reporting each of the above!

### Fixed

- `catalognum` parser used to parse `Vol.30` or `Christmas 2020` as catalogue
- `catalognum` parser used to parse `Vol.30` or `Christmas 2020` as catalog
number - these are now excluded. It's likely that additional patterns will
come up later.

Expand All @@ -507,5 +571,5 @@ Thanks @arogl for reporting each of the above!

- The pipeline now uses generators, therefore the plug-in searches until it
finds a good fit and won't continue further (same as the musicbrainz autotagger)
- Extended the parsing functionality with data like catalogue number, label,
- Extended the parsing functionality with data like catalog number, label,
country etc. The full list is given in the readme.
4 changes: 2 additions & 2 deletions beetsplug/bandcamp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ def guru(self, url: str, html: Optional[str] = None) -> Optional[Metaguru]:
if not html:
html = self._get(url)
if html:
self._gurucache[url] = Metaguru(html, self.config.flatten())
self._gurucache[url] = Metaguru.from_html(html, self.config.flatten())
return self._gurucache.get(url)

def loaded(self) -> None:
Expand Down Expand Up @@ -212,7 +212,7 @@ def track_for_id(self, track_id: str) -> Optional[TrackInfo]:
def handle(self, guru: Metaguru, attr: str, _id: str) -> Any:
try:
return getattr(guru, attr)
except (KeyError, ValueError, AttributeError):
except (KeyError, ValueError, AttributeError, IndexError):
self._info("Failed obtaining {}", _id)
return None
except Exception: # pylint: disable=broad-except
Expand Down
Loading

0 comments on commit 142218e

Please sign in to comment.