-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Support parsing comma separated genres #3127
Comments
Just to be clear - this has nothing to do with csv. Anyway, I did reproduce the behavior you describe, and I'll try to fix it. |
Comma was intentionally left out when I set this up a few years ago. I believe that some genres from Audible have commas in them so if we split on comma then it would break those genres. We should confirm this before adding comma, it may not actually be an issue but I remember intentionally leaving comma out. |
Found an example: https://api.audnex.us/books/B01CUKULGA "genres": [
{
"asin": "18574597011",
"name": "Mystery, Thriller & Suspense",
"type": "genre"
},
{
"asin": "18580606011",
"name": "Science Fiction & Fantasy",
"type": "genre"
},
{
"asin": "18574621011",
"name": "Thriller & Suspense",
"type": "tag"
},
] |
We cannot ignore, though, a quite significant data source (Libation), that
seems to always put commas between genres.
Between getting all Libation multi-genre tags wrong (which also pollutes
the genres data in ABS), and sometimes splitting a genre mistakenly, the
latter seems preferable.
But let me first try to think if there's some heuristic that will let us
eat the cake and leave it whole.
…On Mon, Jul 8, 2024, 17:13 advplyr ***@***.***> wrote:
Found an example: https://api.audnex.us/books/B01CUKULGA
"genres": [
{"asin": "18574597011","name": "Mystery, Thriller & Suspense","type": "genre"
},
{"asin": "18580606011","name": "Science Fiction & Fantasy","type": "genre"
},
{"asin": "18574621011","name": "Thriller & Suspense","type": "tag"
},
]
—
Reply to this email directly, view it on GitHub
<#3127 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFMDFVST3U4RF3Q25I6EMK3ZLKNAXAVCNFSM6AAAAABKOUF3K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUGE4TKMBXHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
This issue has been brought up before with Libation #2539 Even though there is no official spec for delimiters on multiple genres it is pretty widely adopted the ones we use and I'm not sure of any meta tagging software that supports comma. As far as data sources go I would guess Audible is the vast majority. I'm not opposed to supporting comma delimiters if it can be non-disruptive but certainly not a bug. |
So, just to have some data points about this: Audible has a page that shows it's Level 1 and 2 categories (which are used as genres in metadata). These aren't all the genres since there are also some lower level categories that don't appear in this page, but I think it gives some notion of how Audible genres look like. I scraped the data into a Google sheet and ran a couple of stats. Out of 212 unique genres, 13 contain a comma (~6%). All of the ones containing a comma are of the form "A, B & C". I'm not sure exactly what to do with this info yet, just wanted to share. |
Hi, thanks for the quick reply :) tbh, I don't mind splitting these unique examples down the middle. For example, for the genre "Fitness, Diet & Nutrition," I'm okay if "Fitness" ends up as a separate genre. It might even help if I'm searching for just "Fitness," as it would show up in that category instead of only under "Fitness, Diet & Nutrition," which might be specific to Audible. I'm thinking a possible(ugly) idea might just be to check for the unique cases you mentioned, specifically from Audible: For genres that don't contain one of the unique genres, just use ',' as a regular separator. Regarding the unique genres, maybe remove the substring from the genreTag and then separate by ',' and insert the unique genre later(or something like it but cleaner)? Thanks! |
In the meantime, until this is resolved, running a match in Audiobookshelf with Audible.com as provider will get this fixed for you effortlessly. |
I updated to the newest version of ABS and ran a match. Anyhow, I noticed that book tags are separated by commas, though I didn't check this before the update. |
In Audiobookshelf Settings, there's an option called "Prefer matched metadata". Turn that option on, and then matching will override existing metadata. |
Great it has overriden the previous genres,it didn't split up genres like Mystery, Thriller & Suspense. fyi i have found only 1 genre that it didn't split up: [Wars & Conflicts, Greece, Civilization] Really appreciate the quick help! |
@advplyr going back to the original discussion - from my perspective, we're trying to get as much data as possible from the input audio file, with the highest accuracy possible. With that view in mind, what I'm trying to do is to get genres from Libation-encoded audio file with ~94% accuracy (given the stats we have from the Audible category page), instead of getting them wrong almost every time there's more than one genre. To check this, I looked at the Libation export data from my own Audible library. The library contains 451 books, of which 374 have more than 1 genre. This means that accuracy using the current scanning algorithm would be ~((451-374)/451)=~17%. So I'm trying to trade 17% accuracy with 94% accuracy. Plus, I'm willing to scrape all genres containing a comma from Audible (I don't think their list of genres is very dynamic), and match against these, so we're a 100% accurate on Libation-encoded books. Does this make sense? |
Yes, that's expected. The provider we use returns genres one by one, not as a comma-separated list, so we can tell the genres for sure.
Can you tell me the book name and author for which this happned?
|
A War Like No Other |
I think it will be confusing if we split on comma-separated lists but leave the Audible genres with commas. Has anyone opened an issue with this software that is the only one embedding genres with commas? |
Per the author of Libation on that recently mentioned issue: I looked through my code and the code of the audio library I use and nowhere is genre manipulated. Evidently it wasn't me after all. I was thinking of "Categories."” |
What recent issue are you referring to? Audible uses commas in their genres which is one of the reasons why we don't embed a comma separated list in the audio file. We use a semicolon delimiter instead so that the genres that have commas remain intact. |
The recent issue that this issue was mentioned on by CLHatch just above my comment. Apparently audible itself uses commas to separate genres in their meta data despite the fact that they have some genres that contain commas. This would mean even if someone else created a similar program to convert audible content unless they built something to deal with the commas this would still be an issue. |
That's not accurate. The audible API gives an array: "category_ladders": [
{
"ladder": [
{
"id": "18580715011",
"name": "Teen & Young Adult"
},
{
"id": "18580894011",
"name": "Literature & Fiction"
},
{
"id": "18580902011",
"name": "Classics"
}
],
"root": "Genres"
},
{
"ladder": [
{
"id": "18580715011",
"name": "Teen & Young Adult"
},
{
"id": "18581048011",
"name": "Science Fiction & Fantasy"
},
{
"id": "18581062011",
"name": "Science Fiction"
},
{
"id": "18581071011",
"name": "Space Opera"
}
],
"root": "Genres"
}
] |
I'm not sure actually how Audible is using meta tags in their audio files. So it could be that they are embedding genres in there with commas. I would find that surprising though since those audio files have a DRM wrapper so why embed meta tags. What I meant was it is inaccurate that Audible uses commas from their API which doesn't appear to be what was being said in the original comment. |
To answer your question here's another excerpt from that issue mentioned from the author of libation: "If it's embedded in the file (ie: id3 metadata tags) then it just means that audible populates these in the audio file, encrypts the file, and they're still there when I decrypt." |
What happened?
Most of my audiobooks are sourced from Audible by Libation.
When initially downloading these books, I did so without metadata.
However, for those downloaded with metadata, it is in CSV format rather than JSON(I have no idea if that would make a difference).
The issue arrises in both cases with and without metadata:
For example, for the book Cosmos by Carl Sagan, the genres are currently listed as:
Genres: "Astronomy, Cosmology, Biological Sciences, Atmospheric Sciences, Physics"
The entire genre string is treated as a single genre, making it impossible to search for the book by individual genres.
What did you expect to happen?
I would expect the genres to be parsed as individual entries:
Genres: "Astronomy", "Cosmology", "Biological Sciences", "Atmospheric Sciences", "Physics"
Each genre should be recognized as a separate entry, in order to filter or search by genre and get accurate results
Steps to reproduce the issue
Import an Audible audiobook by Libation without metadata/with .csv metadata
Install ABS v2.10.1 via docker compose.
Scan the new libraries.
Audiobookshelf version
v2.10.1
How are you running audiobookshelf?
Docker
What OS is your Audiobookshelf server hosted from?
Linux
If the issue is being seen in the UI, what browsers are you seeing the problem on?
None
Logs
Additional Notes
No response
The text was updated successfully, but these errors were encountered: