Skip to content

Latest commit

 

History

History
203 lines (111 loc) · 10.8 KB

README.md

File metadata and controls

203 lines (111 loc) · 10.8 KB

Multi-modal-Deep-Learning

Recent Multi-modal Deep Learning Advances (list of papers and highlights).


Introduction

Prelude

There are many advances of using unified models (e.g. Transformer) to create representations for multiple modalities. Some of them even enable fusion of multiple modalities to make different modalities help each other. Here, multiple modalities not only include natural language, vision and speech, but also include formal language (e.g. code), (semi-)structured knowledge (e.g. table, KG etc.) and biological/chemical compounds (e.g. protein, molecular, etc.). This is a list of recent important papers in this field. Welcome to contribute.

Resources

Natural Language

Vision

Supervised Vision Tasks

Unsupervised Vision Representation Learning

Speech

Unsupervised Speech Representation Learning

Unsupervised Automatic Speech Recognition

Formal Language

Structured Knowledge

Table

Knowledge Graph

Retrieval Paragraphs as Knowledge

Biology and Chemistry

Protein

Molecular

Modality Fusion

Vision and Natural Language