Introduction

Distilling the Knowledge in a Neural Network¹ is a different type of training used to transfer the knowledge from the cumbersome models(teachers) to a small model(student) that is more suitable for deployment.

EDL Distillation is a large scale and universal solution for knowledge distillation.

Decouple the teacher and student models
- They can run in the same or different nodes and transfer knowledge via network even on heterogeneous machines.
  Use Distillation on resnet50 as an example: The teachers(Resnet101 for example) can be deployed on P4 GPU cards since they compute forward network generally and the student can be deployed on v100 GPU cards since they need more GPU memory.
It's flexible and efficient.
- Teachers and students can be adjusted elastically in training by the resource utilization
Easier to use and deploy.
- Few lines need to change.
- End to end use. We release the Kubernetes' deployment solution for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

edl_distill_design_doc.md

edl_distill_design_doc.md

Introduction

Design

Architecture

Student

Teacher

Reader

Balancer

Reference

Files

edl_distill_design_doc.md

Latest commit

History

edl_distill_design_doc.md

File metadata and controls

Introduction

Design

Architecture

Student

Teacher

Reader

Balancer

Reference