Release Topic Clustering Using Doc2Vec · minasmz/Persian-Summarization

By running this code on an arbitrary input text file (input.txt) you can cluster document paragraphs by their topics and then return a summary of each cluster.
For doing this you should train a doc2vec model on paragraphs of training set and put it in the project by the name (my_model_parags_from_wikiAggregate.doc2vec) then you can obtain vector of each input paragraph and calculate cosine similarity between each two paragraph in a row and if their similarity was more than a calculated threshold they assigns to a same cluster. After this we can obtain summary of each cluster separately and summarization does not miss important topics of each input text by this way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic Clustering Using Doc2Vec