We have optimized and designed MAF-YOLOv2(MHAF-YOLO) based on the latest YOLO framework. This model achieves exceptionally high parameter efficiency and has reached state-of-the-art performance among all YOLO variants.
最新论文为:MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for accurate object detection
- 2025/2/7: Add Paper
- 2025/1/22: Add MAF-YOLOv2-cls, MAF-YOLOv2-seg
- 2024/11/11: Add MAF-YOLOv2
MS COCO
Model | Test Size | #Params | FLOPs | APval | AP50val | Latency |
---|---|---|---|---|---|---|
MAF-YOLOv2-Lite-N | 640 | 1.4M | 4.7G | 38.5% | 53.7% | 1.11ms |
MAF-YOLOv2-N | 640 | 2.2M | 7.2G | 42.3% | 58.5% | 1.28ms |
MAF-YOLOv2-S | 640 | 7.1M | 25.3G | 48.9% | 65.9% | 1.67ms |
MAF-YOLOv2-M | 640 | 15.3M | 65.2G | 52.7% | 69.5% | 2.79ms |
MS COCO with ImageNet Pretrain
Model | Test Size | #Params | FLOPs | APval | AP50val |
---|---|---|---|---|---|
MAF-YOLOv2-N-pretrained | 640 | 2.2M | 7.2G | 43.1% | 59.3% |
MAF-YOLOv2-S-pretrained | 640 | 7.1M | 25.3G | 49.4% | 66.5% |
COCO-seg
Model | Test Size | #Params | FLOPs | APbbox | APmask |
---|---|---|---|---|---|
MAF-YOLOv2-N-seg | 640 | 2.4M | 14.8G | 42.5% | 35.0% |
MAF-YOLOv2-S-seg | 640 | 7.8M | 40.4G | 48.8% | 39.7% |
ImageNet
Model | size (pixels) |
acc top1 |
acc top5 |
params (M) |
FLOPs (G) |
---|---|---|---|---|---|
MAF-YOLOv2-N-cls | 224 | 71.2 | 90.3 | 2.8 | 0.4 |
MAF-YOLOv2-S-cls | 224 | 75.5 | 92.6 | 5.6 | 1.4 |
MAF-YOLOv2-N-cls-finetune-384 | 384 | 73.8 | 91.8 | 2.8 | 1.3 |
MAF-YOLOv2-S-cls-finetune-384 | 384 | 77.5 | 93.8 | 5.6 | 4.2 |
DOTA-v1.0 TBD
conda create -n mafyolov2 python==3.9
conda activate mafyolov2
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install -e .
Single GPU training
# train.py
from ultralytics import YOLOv10
if __name__ == '__main__':
model = YOLOv10('MAF-YOLOv2-n.yaml')
model.train(data='coco.yaml', batch=16, device=0)
# val.py
from ultralytics import YOLOv10
if __name__ == '__main__':
model = YOLOv10('MAF-YOLOv2-N.pt')
model.val(data='coco.yaml', device=0,split='val', save_json=True, batch=8)
End-to-End ONNX
yolo export model=MAF-YOLOv2-N.pt format=onnx opset=13 simplify
End-to-End TensorRT
yolo export model=MAF-YOLOv2-N.pt format=engine half=True simplify opset=13 workspace=16
or
trtexec --onnx=MAF-YOLOv2-N.onnx --saveEngine=MAF-YOLOv2-N.engine --fp16
Evaluation speed
trtexec --loadEngine=MAF-YOLOv2-N.engine --fp16
Problems
1. Multi-GPU distributed training
One of the issues with the YOLOv10 framework is that during multi-GPU training, there is a certain probability that the program cannot be completely stopped, requiring manual intervention to kill the process.2. Failed to load some pretrained weights.
Improvements
1. Try to replace nms free
MHAF-YOLO, like YOLOv10, uses a one-to-one head by default to achieve an NMS-free effect. However, in some smaller models or smaller datasets, using NMS combined with a one-to-many head can lead to significant improvements. For example, on the COCO dataset, the nano model shows a 1% improvement, and on private smaller-scale datasets, it can even reach over 2%. If your model isn’t concerned about the speed overhead of NMS, you can make the following modification to see the accuracy improvement:Edit the file ultralytics/models/yolov10/val.py and uncomment lines 11 to 19.
If our code or model is helpful to your work, please cite our paper and consider giving us a star. We would be very grateful!
@article{yang2025mhaf,
title={MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for accurate object detection},
author={Yang, Zhiqiang and Guan, Qiu and Yu, Zhongwen and Xu, Xinli and Long, Haixia and Lian, Sheng and Hu, Haigen and Tang, Ying},
journal={arXiv preprint arXiv:2502.04656},
year={2025}
}