A Tri-Modal Video Genre Classification Dataset 0. Regroup for data loading- Friday
- Audio - LSTM(Extract features manually) and 2d CNN(CNN Extraction for features)
- Video - 3dCNN(Exists) , Tune hyperparameters etc.
- Maybe text (optional)- Train CNN,Transformer,LSTM.
- Speech to text