Skip to content

Latest commit

 

History

History
82 lines (51 loc) · 2.78 KB

README.md

File metadata and controls

82 lines (51 loc) · 2.78 KB

song2vec is a Telegram bot that recommends YouTube songs through gensim's word2vec model.

=================

About

Feature requests and bug reports are welcome, please open an issue.

Justin Bieber + Backstreet Boys + Ice Cube + Lil Jon = Justin Bieber with rappers.

Usage

COMMAND SYNTAX:
	Simply type /rec followed by a comma-separated list of artists.

EXAMPLE:
	/rec Metallica, Nirvana, Pink Floyd, Iron Maiden, Ice Cube, Bob Marley, Rolling Stones, U2

Installation

You can run song2vec from your own computer.

virtualenv song2vec_env -p `which python3.5`
cd ./song2vec_env/bin
source activate
./pip3.5 install datetime gensim numpy python-telegram-bot sympy yapi
cd ..
git clone https://github.com/ruanchaves/song2vec.git
cd ./song2vec/song2vec
bash install.sh

After that you just have to edit settings.py with your Youtube and Telegram API keys. If you don't have them yet:

Then you can turn on the bot with:

python3.5 s2v_bot.py

Details

Currently the bot takes recommendations from a gensim word2vec model and that's all there's to it.

It's been trained on The Echo Nest Taste Profile Subset taken from the Million Song Database. The Song IDs were matched to author and title according to this file.

Some tricks I learned along the way:

  • This is not NLP, so we shouldn't use gensim's default parameters. Otherwise recommendations will be twice as bad.

  • Calling model.wv[word] for every word is painfully slow. It's much faster to do...

      model_words = list(model.wv.index2word)
      model_vectors = list(model.wv.syn0)
      model_dct = dict(zip(model_words,model_vectors))
    

...and call model_dct[word]. It's there on the source code.

TO-DO

  • train.py has to be rewritten as parallel code.
  • The model has to be further tested and fine-tuned to the dataset.