url: https://dumps.wikimedia.org/enwiki/latest/
Note: I have trained the Model on enwiki-latest-pages-articles.xml.bz2, but there are a lot of other datasets available. After extraction this data file size will be around 11GB.
Command: python wikidata_normalize.py enwiki-latest-pages-articles.xml.bz2 wiki.text
Note: It took 7 to 8 hours for me, I was running it on aws 2GB ec2 compute node.
Command: python word2vec_model.py wiki.text wiki.model
It goes without saying, if you have more targetted data according to your requirement, it will give much better results.