topic-analyzer

Group Project by Parvathy Mohan and Saranya Balaji for Python course Dependency: python -m nltk.downloader punkt

Step 1: Html Parsing: a) BeautifulSoap4 Library: BeautifulSoap4 Reference : http://www.pythonforbeginners.com/python-on-the-web/web-scraping-with-beautifulsoup/ commands: python -m pip install beautifulsoup4

b) Content Extraction: Library name - NewsPaper Reference: https://github.com/codelucas/newspaper Command: python -m pip install newspaper3k

Noun Chunker: Library: Spacy, Wikipedia For spacy, in windows, make sure that there Visual C++ 14 build tools installed. If not, install from here http://landinghub.visualstudio.com/visual-cpp-build-tools

Commands: python -m pip install -U spacy python -m spacy download en_core_web_sm python -m pip install wikipedia

Kmeans Clustering: Library: Numpy, Gensim python -m pip install numpy python -m pip install --upgrade gensim

Ranking: Library: requests, beautifulsoup4 commands: pip install beautifulsoup4 pip install requests reference : http://www.pythonforbeginners.com/beautifulsoup/beautifulsoup-4-python

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
src		src
README.md		README.md
README.txt		README.txt

Provide feedback