Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate check is broken in certain cases of api autofill #72

Open
CptPie opened this issue Aug 16, 2020 · 4 comments
Open

Duplicate check is broken in certain cases of api autofill #72

CptPie opened this issue Aug 16, 2020 · 4 comments
Labels
bug Something isn't working stream-issue An issue that would be nice to work on on stream.

Comments

@CptPie
Copy link
Collaborator

CptPie commented Aug 16, 2020

https://moviepolls.zorchenhimer.com/movie/82 is available for voting, but https://moviepolls.zorchenhimer.com/movie/41 was watched thirteen months and two days ago? - abridgewater on Discord

In the reported case the Movie "Princess Mononoke" got added twice and got past the duplicate check.
That is caused by once using MAL autofill and the other time using IMDB autofill.
Since the autofilled title has different formats depending on the API used the titles didnt match and therefore the duplicate check did not hit.

Possible solutions:

  • Modify the duplicate check to ignore the added parantheses (might be troublesome in this exact case [eng vs jap title])
  • Bring the formatting for the APIs in line (might be impossible depending on the API data [i.e. i am not sure if imdb has both the jap and english title])
@CptPie CptPie added the bug Something isn't working label Aug 16, 2020
@zorchenhimer
Copy link
Owner

zorchenhimer commented Aug 16, 2020

Are multiple titles returned for either IMDB or MAL? I know AniDB returns multiple titles. Maybe we can store those and use them for the duplicate check?

Other than that, there would need to be some normalization of characters or a similarity check for titles. Maybe some sort of string metric for checking similar strings? (see https://en.wikipedia.org/wiki/String_metric). If a title is close enough we could prompt for confirmation or require mod/admin approval if it's to similar to another. Although this would probably break with sequels, eg "Deadpool" and "Deadpool 2" being only one character different (two including the space).

@CptPie
Copy link
Collaborator Author

CptPie commented Aug 16, 2020

Regarding the API results:
Jikan (it is not ensured that both title and title_english are filled - when the original title is already english the "title_english" field is null):
image
TMDb:
image

@zorchenhimer
Copy link
Owner

So storing a single title for display then a bunch of alt titles is plausible then.

@CptPie
Copy link
Collaborator Author

CptPie commented Aug 16, 2020

In theory - yes. But i am afraid of the data quality of TMDb seeing the original Title being in kanji (?) while the MAL title is in latin script -> wont help us much.

Regardless i think it would be nice to have an "improved" movie struct with

Title string
Org_Title string
Year string (or int)

and then use a common title format for both APIs (i.e. "Movie.Title (Movie.Org_Title) (Movie.Year)" ).

With this struct we could use an approach with the assumption that Movie.Title is always the english title (whenever possible) and use that field for the duplicate check.

@CptPie CptPie added the stream-issue An issue that would be nice to work on on stream. label Sep 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stream-issue An issue that would be nice to work on on stream.
Projects
None yet
Development

No branches or pull requests

2 participants