Paper Reading Record in Computer Vision

Knowledge Distillation
Domain Adaptation
Semi-Supervised Learning
Semantic Segmentation
Object Detection
Visual Object Tracking
Data Augmentation
Image-to-Image Translation
Multi-Domain Learning
Corruption Robustness
Adversarial Robustness
Self-Supervised Learning
Attention
Precious Papers
Multimodal Large Language Models

Knowledge Distillation

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐⭐

Knowledge distillation (KD) has been proposed to transfer information learned from one model to another

In this work, we focus on analyzing and categorizing existing KD methods accompanied by various types of S-T structures for model compression and knowledge transfer

Distillation from one teacher

Data-free distillation

Online distillation

Cross-modal distillation

Semi/self-supervised learning

Structured Knowledge Distillation for Dense Prediction (ArXiv 2020) [Paper][Code] ⭐⭐⭐
Knowledge Adaptation for Efficient Semantic Segmentation (CVPR 2019) [Paper][Code] ⭐⭐
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification (ECCV 2020) [Paper][Code] ⭐

Domain Adaptation

A Review of Single-Source Deep Unsupervised Visual Domain Adaptation (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐
A Comprehensive Survey on Transfer Learning (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐
Transfer Adaptation Learning: A Decade Survey (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐
Deep Visual Domain Adaptation: A Survey (Neurocomputing 2018) [Paper][Code] ⭐⭐
Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey (ArXiv 2018) [Paper][Code] ⭐⭐
DADA: Depth-Aware Domain Adaptation in Semantic Segmentation (ArXiv 2020) [Paper] [Code]
Multi-source Domain Adaptation for Semantic Segmentation (ArXiv 2020) [Paper] [Code]⭐
A Robust Learning Approach to Domain Adaptive Object Detection (ICCV 2019) [Paper][Code]

Semi-Supervised Learning

Semi-Supervised Semantic Segmentation with Cross-Consistency Training (CVPR 2020) [Paper][Code] ⭐
Semi-Supervised Semantic Segmentation via Dynamic Self-Training and Class-Balanced Curriculum (ECCV 2020) [Paper][Code] ⭐
Guided Collaborative Training for Pixel-wise Semi-Supervised Learning (ECCV 2020) [Paper][Code] ⭐
FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning (ECCV 2020) [Paper][Code] ⭐⭐

Data Imbalance

Prime Sample Attention in Object Detection (CVPR 2020) [Paper][Code] ⭐

Multi-Task Learning

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics (CVPR 2018) [Paper][Code]

Semantic Segmentation

Benchmarking the Robustness of Semantic Segmentation Models (CVPR 2020) [Paper] [Code]:star::star::star:

Object Detection

Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer (ECCV 2020) [Paper][Code]
HoughNet: Integrating near and long-range evidence for bottom-up object detection (ECCV 2020) [Paper][Code]

Visual Object Tracking

CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers (ECCV 2020) [Paper][Code] ⭐✔️
Scale Equivariance Improves Siamese Tracking (ECCV 2020) [Paper][Code]
Fully Convolutional Online Tracking (ArXiv 2020) [Paper][Code]
Transformer Meets Tracker:Exploiting Temporal Context for Robust Visual Tracking (CVPR 2021)[Paper][Code]
Transformer Tracking (CVPR 2021) [Paper][Code]
Learning Spatio-Temporal Transformer for Visual Tracking (ICCV 2021) [Paper][Code]
TrTr: Visual Tracking with Transformer (ArXiv 2021) [Paper][Code]

Data Augmentation

Adversarial Semantic Data Augmentation for Human Pose Estimation (ECCV 2020) [Paper][Code]
Unsupervised Data Augmentation for Consistency Training (NeurIPS 2020) [Paper][Code]

Image-to-Image-Translation

Multi-Domain Learning

Budget-Aware Adapters for Multi-Domain Learning (ArXiv 2020) [Paper][Code]
Self-Supervised Representation Learning From Multi-Domain Data (ArXiv 2020) [Paper][Code]
Efficient parametrization of multi-domain deep neural networks (ArXiv 2020) [Paper][Code]

Corruption Robustness

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness (ICLR 2019) [Paper][Code] ⭐⭐⭐⭐
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations (ICLR 2019) [Paper][Code] ⭐⭐⭐⭐
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming (NeurIPS 2019) [Paper][Code] ⭐⭐⭐⭐
ADVERSARIAL AND NATURAL PERTURBATIONS FOR GENERAL ROBUSTNESS (ArXiv 2020, Rejected in ICLR 2021) [Paper][Code]

Adversarial Robustness

Adversarial Examples in Modern Machine Learning: A Review (ArXiv 2019) [Paper][Code]:star::star::star::star:
Opportunities and Challenges in Deep Learning Adversarial Robustness: A Survey (ArXiv 2020) [Paper][Code]:star::star::star::star:
Defending against adversarial examples using defense kernel network (BMVC 2019) [Paper][Code]
Towards Evaluating the Robustness of Neural Networks (ArXiv 2017) [Paper][Code]
A simple way to make neural networks robust against diverse image corruptions (EECV 2020) [Paper][Code]:star::star:

Self-Supervised Learning

Supervised Contrastive Learning (ArXiv 2020) [Paper][Code] ⭐⭐⭐
Contrastive Representation Learning: A Framework and Review (IEEE ACCESS 2020) [Paper][Code] ⭐⭐

Attention

AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE (ICLR 2021) [Paper][Code] ⭐⭐⭐⭐
VOLO: Vision Outlooker for Visual Recognition (ArXiv 2021) [Paper][Code]
SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (ICASSP 2021) [Paper][Code]
Involution: Inverting the Inherence of Convolution for Visual Recognition (CVPR 2021) [Paper][Code] ⭐⭐
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (ICCV 2021) [Paper][Code]⭐⭐⭐

Precious Papers

Shortcut Learning in Deep Neural Networks (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐⭐
Scale-Equivariant Steerable Networks (ArXiv 2020) [Paper][Code]

Multimodal Large Language Models

Multimodal Instruction Tuning

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions (arXiv 2024) [Paper]
StreamChat: Chatting with Streaming Video (arXiv 2024) [Paper]
CompCap: Improving Multimodal Large Language Models with Composite Captions (arXiv 2024) [Paper]

Multimodal Hallucination

LinVT: Empower Your Image-level Large Language Model to Understand Videos (arXiv 2024) [Paper]
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling (Tech Report 2024) [Paper]

Multimodal Chain-of-Thought

NVILA: Efficient Frontier Visual Language Models (arXiv 2024) [Paper]
T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs (arXiv 2024) [Paper]
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability (arXiv 2024) [Paper]

Contact & Feedback

Feel free to contact me or pull request.

E-mail

License

This list is released into the public domain.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper Reading Record in Computer Vision

Table of Contents

Knowledge Distillation

Domain Adaptation

Semi-Supervised Learning

Data Imbalance

Multi-Task Learning

Semantic Segmentation

Object Detection

Visual Object Tracking

Data Augmentation

Image-to-Image-Translation

Multi-Domain Learning

Corruption Robustness

Adversarial Robustness

Self-Supervised Learning

Attention

Precious Papers

Multimodal Large Language Models

Multimodal Instruction Tuning

Multimodal Hallucination

Multimodal Chain-of-Thought

Contact & Feedback

License

About

Releases

Packages

kashiani/Paper-Reading-Record-in-Computer-Vision

Folders and files

Latest commit

History

Repository files navigation

Paper Reading Record in Computer Vision

Table of Contents

Knowledge Distillation

Domain Adaptation

Semi-Supervised Learning

Data Imbalance

Multi-Task Learning

Semantic Segmentation

Object Detection

Visual Object Tracking

Data Augmentation

Image-to-Image-Translation

Multi-Domain Learning

Corruption Robustness

Adversarial Robustness

Self-Supervised Learning

Attention

Precious Papers

Multimodal Large Language Models

Multimodal Instruction Tuning

Multimodal Hallucination

Multimodal Chain-of-Thought

Contact & Feedback

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages