Skip to content

I make this repo to record the papers I read every day and organize them better.

Notifications You must be signed in to change notification settings

kashiani/Paper-Reading-Record-in-Computer-Vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 

Repository files navigation

Paper Reading Record in Computer Vision

Table of Contents

Knowledge Distillation

  • Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐⭐
  • Knowledge distillation (KD) has been proposed to transfer information learned from one model to another
  • In this work, we focus on analyzing and categorizing existing KD methods accompanied by various types of S-T structures for model compression and knowledge transfer
  • Distillation from one teacher
  • Data-free distillation
  • Online distillation
  • Cross-modal distillation
  • Semi/self-supervised learning
  • Structured Knowledge Distillation for Dense Prediction (ArXiv 2020) [Paper][Code] ⭐⭐⭐

  • Knowledge Adaptation for Efficient Semantic Segmentation (CVPR 2019) [Paper][Code] ⭐⭐

  • Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification (ECCV 2020) [Paper][Code] ⭐

Domain Adaptation

  • A Review of Single-Source Deep Unsupervised Visual Domain Adaptation (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐

  • A Comprehensive Survey on Transfer Learning (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐

  • Transfer Adaptation Learning: A Decade Survey (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐

  • Deep Visual Domain Adaptation: A Survey (Neurocomputing 2018) [Paper][Code] ⭐⭐

  • Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey (ArXiv 2018) [Paper][Code] ⭐⭐

  • DADA: Depth-Aware Domain Adaptation in Semantic Segmentation (ArXiv 2020) [Paper] [Code]

  • Multi-source Domain Adaptation for Semantic Segmentation (ArXiv 2020) [Paper] [Code]

  • A Robust Learning Approach to Domain Adaptive Object Detection (ICCV 2019) [Paper][Code]

Semi-Supervised Learning

  • Semi-Supervised Semantic Segmentation with Cross-Consistency Training (CVPR 2020) [Paper][Code]

  • Semi-Supervised Semantic Segmentation via Dynamic Self-Training and Class-Balanced Curriculum (ECCV 2020) [Paper][Code]

  • Guided Collaborative Training for Pixel-wise Semi-Supervised Learning (ECCV 2020) [Paper][Code]

  • FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning (ECCV 2020) [Paper][Code] ⭐⭐

Data Imbalance

  • Prime Sample Attention in Object Detection (CVPR 2020) [Paper][Code]

Multi-Task Learning

  • Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics (CVPR 2018) [Paper][Code]

Semantic Segmentation

  • Benchmarking the Robustness of Semantic Segmentation Models (CVPR 2020) [Paper] [Code]:star::star::star:

Object Detection

  • Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer (ECCV 2020) [Paper][Code]

  • HoughNet: Integrating near and long-range evidence for bottom-up object detection (ECCV 2020) [Paper][Code]

Visual Object Tracking

  • CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers (ECCV 2020) [Paper][Code] ⭐✔️

  • Scale Equivariance Improves Siamese Tracking (ECCV 2020) [Paper][Code]

  • Fully Convolutional Online Tracking (ArXiv 2020) [Paper][Code]

  • Transformer Meets Tracker:Exploiting Temporal Context for Robust Visual Tracking (CVPR 2021)[Paper][Code]

  • Transformer Tracking (CVPR 2021) [Paper][Code]

  • Learning Spatio-Temporal Transformer for Visual Tracking (ICCV 2021) [Paper][Code]

  • TrTr: Visual Tracking with Transformer (ArXiv 2021) [Paper][Code]

Data Augmentation

  • Adversarial Semantic Data Augmentation for Human Pose Estimation (ECCV 2020) [Paper][Code]

  • Unsupervised Data Augmentation for Consistency Training (NeurIPS 2020) [Paper][Code]

Image-to-Image-Translation

  • Few-Shot Unsupervised Image-to-Image Translation (ICCV 2019) [Paper][Code]

  • Unsupervised Domain Adaptation with Multiple Domain Discriminators and Adaptive Self-Training (ArXiv 2020) [Paper][Code]

  • DRIT++: Diverse Image-to-Image Translation via Disentangled Representations (ArXiv 2020) [Paper][Code]:star::star:

  • EDIT: Exemplar-Domain Aware Image-to-Image Translation (ArXiv 2020) [Paper][Code]:star:

  • GANHOPPER: Multi-Hop GAN for Unsupervised Image-to-Image Translation (ArXiv 2020) [Paper][Code]:star:

  • Exemplar Guided Unsupervised Image-To-Image Translation With Semantic Consistency (ArXiv 2020) [Paper][Code]:star:

  • Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping (ArXiv 2020) [Paper][Code]:star:

  • Domain Bridge for Unpaired Image-to-Image Translation and Unsupervised Domain Adaptation (ArXiv 2020) [Paper][Code]:star:

  • On the Role of Receptive Field in Unsupervised Sim-to-Real Image Translation (ArXiv 2020) [Paper][Code]:star:

  • Multi-mapping Image-to-Image Translation via Learning Disentanglement (ArXiv 2020) [Paper][Code]:star:

  • INSTAGAN: Instrance-Aware Image-To-ImageTranslation (ArXiv 2020) [Paper][Code]:star:

  • Towards Instance-level Image-to-Image Translation(ArXiv 2020) [Paper][Code]:star:

  • TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images (ArXiv 2020) [Paper][Code]:star:

Multi-Domain Learning

  • Budget-Aware Adapters for Multi-Domain Learning (ArXiv 2020) [Paper][Code]

  • Self-Supervised Representation Learning From Multi-Domain Data (ArXiv 2020) [Paper][Code]

  • Efficient parametrization of multi-domain deep neural networks (ArXiv 2020) [Paper][Code]

Corruption Robustness

  • ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness (ICLR 2019) [Paper][Code] ⭐⭐⭐⭐

  • Benchmarking Neural Network Robustness to Common Corruptions and Perturbations (ICLR 2019) [Paper][Code] ⭐⭐⭐⭐

  • Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming (NeurIPS 2019) [Paper][Code] ⭐⭐⭐⭐

  • ADVERSARIAL AND NATURAL PERTURBATIONS FOR GENERAL ROBUSTNESS (ArXiv 2020, Rejected in ICLR 2021) [Paper][Code]

Adversarial Robustness

  • Adversarial Examples in Modern Machine Learning: A Review (ArXiv 2019) [Paper][Code]:star::star::star::star:

  • Opportunities and Challenges in Deep Learning Adversarial Robustness: A Survey (ArXiv 2020) [Paper][Code]:star::star::star::star:

  • Defending against adversarial examples using defense kernel network (BMVC 2019) [Paper][Code]

  • Towards Evaluating the Robustness of Neural Networks (ArXiv 2017) [Paper][Code]

  • A simple way to make neural networks robust against diverse image corruptions (EECV 2020) [Paper][Code]:star::star:

Self-Supervised Learning

  • Supervised Contrastive Learning (ArXiv 2020) [Paper][Code] ⭐⭐⭐

  • Contrastive Representation Learning: A Framework and Review (IEEE ACCESS 2020) [Paper][Code] ⭐⭐

Attention

  • AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE (ICLR 2021) [Paper][Code] ⭐⭐⭐⭐

  • VOLO: Vision Outlooker for Visual Recognition (ArXiv 2021) [Paper][Code]

  • SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (ICASSP 2021) [Paper][Code]

  • Involution: Inverting the Inherence of Convolution for Visual Recognition (CVPR 2021) [Paper][Code] ⭐⭐

  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (ICCV 2021) [Paper][Code]⭐⭐⭐

Precious Papers

  • Shortcut Learning in Deep Neural Networks (ArXiv 2020) [Paper][Code] ⭐⭐⭐⭐⭐

  • Scale-Equivariant Steerable Networks (ArXiv 2020) [Paper][Code]

Multimodal Large Language Models

Multimodal Instruction Tuning

  • InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions (arXiv 2024) [Paper]
  • StreamChat: Chatting with Streaming Video (arXiv 2024) [Paper]
  • CompCap: Improving Multimodal Large Language Models with Composite Captions (arXiv 2024) [Paper]

Multimodal Hallucination

  • LinVT: Empower Your Image-level Large Language Model to Understand Videos (arXiv 2024) [Paper]
  • Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling (Tech Report 2024) [Paper]

Multimodal Chain-of-Thought

  • NVILA: Efficient Frontier Visual Language Models (arXiv 2024) [Paper]
  • T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs (arXiv 2024) [Paper]
  • TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability (arXiv 2024) [Paper]

Contact & Feedback

Feel free to contact me or pull request.

License

CC0

This list is released into the public domain.

About

I make this repo to record the papers I read every day and organize them better.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published