Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf-idf example prints the right score but the wrong document #2

Open
KatMaff opened this issue Apr 14, 2021 · 1 comment
Open

tf-idf example prints the right score but the wrong document #2

KatMaff opened this issue Apr 14, 2021 · 1 comment

Comments

@KatMaff
Copy link

KatMaff commented Apr 14, 2021

When I added more examples to the documents array in the tf-idf example, the wrong document was shown as the most similar. For me, with scikit-learn version 0.24.1, the cosine similarities don't include the input document, so the index 'i' is actually one less than the corresponding document in the documents array. Therefore the most similar document turns out to be documents[highest_score_index + 1].

@eklavyadahiya
Copy link

When I added more examples to the documents array in the tf-idf example, the wrong document was shown as the most similar. For me, with scikit-learn version 0.24.1, the cosine similarities don't include the input document, so the index 'i' is actually one less than the corresponding document in the documents array. Therefore the most similar document turns out to be documents[highest_score_index + 1].

Thank you! I had the same issue and spent quite some time messing around until I saw your comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants