tf-idf example prints the right score but the wrong document #2

KatMaff · 2021-04-14T07:39:27Z

When I added more examples to the documents array in the tf-idf example, the wrong document was shown as the most similar. For me, with scikit-learn version 0.24.1, the cosine similarities don't include the input document, so the index 'i' is actually one less than the corresponding document in the documents array. Therefore the most similar document turns out to be documents[highest_score_index + 1].

eklavyadahiya · 2021-12-03T01:47:29Z

When I added more examples to the documents array in the tf-idf example, the wrong document was shown as the most similar. For me, with scikit-learn version 0.24.1, the cosine similarities don't include the input document, so the index 'i' is actually one less than the corresponding document in the documents array. Therefore the most similar document turns out to be documents[highest_score_index + 1].

Thank you! I had the same issue and spent quite some time messing around until I saw your comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf-idf example prints the right score but the wrong document #2

tf-idf example prints the right score but the wrong document #2

KatMaff commented Apr 14, 2021

eklavyadahiya commented Dec 3, 2021

tf-idf example prints the right score but the wrong document #2

tf-idf example prints the right score but the wrong document #2

Comments

KatMaff commented Apr 14, 2021

eklavyadahiya commented Dec 3, 2021