Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support parsing comma separated genres #3127

Open
NitzanNougat opened this issue Jul 6, 2024 · 21 comments · May be fixed by #3136
Open

[Enhancement] Support parsing comma separated genres #3127

NitzanNougat opened this issue Jul 6, 2024 · 21 comments · May be fixed by #3136
Labels
enhancement New feature or request possible plugin Features that could potentially be plugins

Comments

@NitzanNougat
Copy link

What happened?

Most of my audiobooks are sourced from Audible by Libation.
When initially downloading these books, I did so without metadata.
However, for those downloaded with metadata, it is in CSV format rather than JSON(I have no idea if that would make a difference).

The issue arrises in both cases with and without metadata:

For example, for the book Cosmos by Carl Sagan, the genres are currently listed as:
Genres: "Astronomy, Cosmology, Biological Sciences, Atmospheric Sciences, Physics"

The entire genre string is treated as a single genre, making it impossible to search for the book by individual genres.

What did you expect to happen?

I would expect the genres to be parsed as individual entries:
Genres: "Astronomy", "Cosmology", "Biological Sciences", "Atmospheric Sciences", "Physics"

Each genre should be recognized as a separate entry, in order to filter or search by genre and get accurate results

Steps to reproduce the issue

Import an Audible audiobook by Libation without metadata/with .csv metadata
Install ABS v2.10.1 via docker compose.
Scan the new libraries.

Audiobookshelf version

v2.10.1

How are you running audiobookshelf?

Docker

What OS is your Audiobookshelf server hosted from?

Linux

If the issue is being seen in the UI, what browsers are you seeing the problem on?

None

Logs

csv meta data example:

{"timestamp":"2024-06-22T18:15:35.804Z","message":"\"The Last Command\" Getting metadata with precedence [folderStructure, audioMetatags, nfoFile, txtFiles, opfFile, absMetadata]","levelName":"DEBUG","level":1}
{"timestamp":"2024-06-22T18:15:35.804Z","message":"setChapters: Using embedded chapters in first audio file /audiobooks/Timothy Zahn/The Thrawn Trilogy/Book 3 - The Last Command/The Last Command Track 1.m4b","levelName":"DEBUG","level":1}
{"timestamp":"2024-06-22T18:15:36.958Z","message":"Success saving abmetadata to \"/metadata/items/fae01eb8-881d-41fd-928e-35b3c58213c9/metadata.json\"","levelName":"DEBUG","level":1}
{"timestamp":"2024-06-22T18:15:36.958Z","message":"Created new library item \"Timothy Zahn/The Thrawn Trilogy/Book 3 - The Last Command\"","levelName":"INFO","level":2}

no metadata example:

{"timestamp":"2024-06-22T18:14:50.395Z","message":"\"Cosmos꞉ A Personal Voyage\" Getting metadata with precedence [folderStructure, audioMetatags, nfoFile, txtFiles, opfFile, absMetadata]","levelName":"DEBUG","level":1}
{"timestamp":"2024-06-22T18:14:50.396Z","message":"setChapters: Using embedded chapters in first audio file /audiobooks/Carl Sagan/Cosmos꞉ A Personal Voyage/Cosmos꞉ A Personal Voyage Track 1.m4b","levelName":"DEBUG","level":1}
{"timestamp":"2024-06-22T18:14:51.177Z","message":"Success saving abmetadata to \"/metadata/items/5fc031e3-52ba-4806-a693-16708693a3ba/metadata.json\"","levelName":"DEBUG","level":1}
{"timestamp":"2024-06-22T18:14:51.177Z","message":"Created new library item \"Carl Sagan/Cosmos꞉ A Personal Voyage\"","levelName":"INFO","level":2}

Additional Notes

No response

@NitzanNougat NitzanNougat added the bug Something isn't working label Jul 6, 2024
@mikiher
Copy link
Contributor

mikiher commented Jul 8, 2024

Just to be clear - this has nothing to do with csv.
Audiobookshelf doesn't import metadata from csv files.
The metadata is usually read from the audio file itself (or from some other sources supported by ABS, which don't include csv). Libation by default embeds the metadata into the audio file (this is controlled in Libation by Settings -> Audio File Settings -> Allow Libation to fix up audiobook metadata).

Anyway, I did reproduce the behavior you describe, and I'll try to fix it.

@advplyr advplyr changed the title [Bug]: Incorrect Genre Parsing [Enhancement] Support parsing comma separated genres Jul 8, 2024
@advplyr advplyr added enhancement New feature or request and removed bug Something isn't working labels Jul 8, 2024
@advplyr
Copy link
Owner

advplyr commented Jul 8, 2024

Comma was intentionally left out when I set this up a few years ago. I believe that some genres from Audible have commas in them so if we split on comma then it would break those genres. We should confirm this before adding comma, it may not actually be an issue but I remember intentionally leaving comma out.

@advplyr
Copy link
Owner

advplyr commented Jul 8, 2024

Found an example: https://api.audnex.us/books/B01CUKULGA

"genres": [
{
"asin": "18574597011",
"name": "Mystery, Thriller & Suspense",
"type": "genre"
},
{
"asin": "18580606011",
"name": "Science Fiction & Fantasy",
"type": "genre"
},
{
"asin": "18574621011",
"name": "Thriller & Suspense",
"type": "tag"
},
]

@mikiher
Copy link
Contributor

mikiher commented Jul 8, 2024 via email

@advplyr
Copy link
Owner

advplyr commented Jul 8, 2024

This issue has been brought up before with Libation #2539
I've never used it before, maybe they have an option to not use comma?

Even though there is no official spec for delimiters on multiple genres it is pretty widely adopted the ones we use and I'm not sure of any meta tagging software that supports comma.

As far as data sources go I would guess Audible is the vast majority. I'm not opposed to supporting comma delimiters if it can be non-disruptive but certainly not a bug.

Related
#1864
#1998

@mikiher
Copy link
Contributor

mikiher commented Jul 8, 2024

So, just to have some data points about this: Audible has a page that shows it's Level 1 and 2 categories (which are used as genres in metadata). These aren't all the genres since there are also some lower level categories that don't appear in this page, but I think it gives some notion of how Audible genres look like. I scraped the data into a Google sheet and ran a couple of stats.

Out of 212 unique genres, 13 contain a comma (~6%). All of the ones containing a comma are of the form "A, B & C".

I'm not sure exactly what to do with this info yet, just wanted to share.

@NitzanNougat
Copy link
Author

NitzanNougat commented Jul 9, 2024

Hi, thanks for the quick reply :)

tbh, I don't mind splitting these unique examples down the middle. For example, for the genre "Fitness, Diet & Nutrition," I'm okay if "Fitness" ends up as a separate genre. It might even help if I'm searching for just "Fitness," as it would show up in that category instead of only under "Fitness, Diet & Nutrition," which might be specific to Audible.

I'm thinking a possible(ugly) idea might just be to check for the unique cases you mentioned, specifically from Audible:

For genres that don't contain one of the unique genres, just use ',' as a regular separator. Regarding the unique genres, maybe remove the substring from the genreTag and then separate by ',' and insert the unique genre later(or something like it but cleaner)?

Thanks!

@mikiher
Copy link
Contributor

mikiher commented Jul 9, 2024

In the meantime, until this is resolved, running a match in Audiobookshelf with Audible.com as provider will get this fixed for you effortlessly.

@NitzanNougat
Copy link
Author

I updated to the newest version of ABS and ran a match.
Afterward, I noticed that the genres are still the same. Do I need to delete all the genres and run a match again?

Anyhow, I noticed that book tags are separated by commas, though I didn't check this before the update.
And tbh, searching by tags instead of genres works well enough for me.

@mikiher
Copy link
Contributor

mikiher commented Jul 9, 2024

In Audiobookshelf Settings, there's an option called "Prefer matched metadata". Turn that option on, and then matching will override existing metadata.

@NitzanNougat
Copy link
Author

NitzanNougat commented Jul 9, 2024

Great it has overriden the previous genres,it didn't split up genres like Mystery, Thriller & Suspense.

fyi i have found only 1 genre that it didn't split up: [Wars & Conflicts, Greece, Civilization]
which should be 3 separated genres but that is minor edge case.

Really appreciate the quick help!

@mikiher
Copy link
Contributor

mikiher commented Jul 9, 2024

@advplyr going back to the original discussion - from my perspective, we're trying to get as much data as possible from the input audio file, with the highest accuracy possible.

With that view in mind, what I'm trying to do is to get genres from Libation-encoded audio file with ~94% accuracy (given the stats we have from the Audible category page), instead of getting them wrong almost every time there's more than one genre. To check this, I looked at the Libation export data from my own Audible library. The library contains 451 books, of which 374 have more than 1 genre. This means that accuracy using the current scanning algorithm would be ~((451-374)/451)=~17%.

So I'm trying to trade 17% accuracy with 94% accuracy. Plus, I'm willing to scrape all genres containing a comma from Audible (I don't think their list of genres is very dynamic), and match against these, so we're a 100% accurate on Libation-encoded books.

Does this make sense?

@mikiher
Copy link
Contributor

mikiher commented Jul 9, 2024

Great it has overriden the previous genres,it didn't split up genres like Mystery, Thriller & Suspense.

Yes, that's expected. The provider we use returns genres one by one, not as a comma-separated list, so we can tell the genres for sure.

fyi i have found only 1 genre that it didn't split up: [Wars & Conflicts, Greece, Civilization] which should be 3 separated genres but that is minor edge case.

Can you tell me the book name and author for which this happned?

Really appreciate the quick help!

@NitzanNougat
Copy link
Author

A War Like No Other
How the Athenians and Spartans Fought the Peloponnesian War
By: Victor Davis Hanson

@advplyr
Copy link
Owner

advplyr commented Jul 9, 2024

I think it will be confusing if we split on comma-separated lists but leave the Audible genres with commas. Has anyone opened an issue with this software that is the only one embedding genres with commas?
The algorithm should be straightforward with what delimiters we support. I don't mind splitting those Audible genres up personally but we may have other users using commas in their genres. I can ask in the Discord

@DaeDroug
Copy link

DaeDroug commented Jan 9, 2025

Per the author of Libation on that recently mentioned issue:
“This actually comes directly from audible; I don't touch 'genre'.

I looked through my code and the code of the audio library I use and nowhere is genre manipulated. Evidently it wasn't me after all. I was thinking of "Categories."”

@advplyr
Copy link
Owner

advplyr commented Jan 9, 2025

What recent issue are you referring to?

Audible uses commas in their genres which is one of the reasons why we don't embed a comma separated list in the audio file. We use a semicolon delimiter instead so that the genres that have commas remain intact.

@DaeDroug
Copy link

The recent issue that this issue was mentioned on by CLHatch just above my comment. Apparently audible itself uses commas to separate genres in their meta data despite the fact that they have some genres that contain commas. This would mean even if someone else created a similar program to convert audible content unless they built something to deal with the commas this would still be an issue.

@advplyr
Copy link
Owner

advplyr commented Jan 13, 2025

That's not accurate. The audible API gives an array:
https://api.audible.com/1.0/catalog/products/B002V5A12Y?response_groups=category_ladders

"category_ladders": [
      {
        "ladder": [
          {
            "id": "18580715011",
            "name": "Teen & Young Adult"
          },
          {
            "id": "18580894011",
            "name": "Literature & Fiction"
          },
          {
            "id": "18580902011",
            "name": "Classics"
          }
        ],
        "root": "Genres"
      },
      {
        "ladder": [
          {
            "id": "18580715011",
            "name": "Teen & Young Adult"
          },
          {
            "id": "18581048011",
            "name": "Science Fiction & Fantasy"
          },
          {
            "id": "18581062011",
            "name": "Science Fiction"
          },
          {
            "id": "18581071011",
            "name": "Space Opera"
          }
        ],
        "root": "Genres"
      }
    ]

@advplyr
Copy link
Owner

advplyr commented Jan 13, 2025

I'm not sure actually how Audible is using meta tags in their audio files. So it could be that they are embedding genres in there with commas. I would find that surprising though since those audio files have a DRM wrapper so why embed meta tags.

What I meant was it is inaccurate that Audible uses commas from their API which doesn't appear to be what was being said in the original comment.

@DaeDroug
Copy link

I'm not sure actually how Audible is using meta tags in their audio files. So it could be that they are embedding genres in there with commas. I would find that surprising though since those audio files have a DRM wrapper so why embed meta tags.

What I meant was it is inaccurate that Audible uses commas from their API which doesn't appear to be what was being said in the original comment.

To answer your question here's another excerpt from that issue mentioned from the author of libation:

"If it's embedded in the file (ie: id3 metadata tags) then it just means that audible populates these in the audio file, encrypts the file, and they're still there when I decrypt."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request possible plugin Features that could potentially be plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants