Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

二块卡 迭代是100400还是200800? 训练跑起来就不用管了?训练多久呢? #37

Open
henbucuoshanghai opened this issue Oct 29, 2020 · 16 comments

Comments

@henbucuoshanghai
Copy link

2x

#learning_rate=0.0005
#burn_in=2000
#max_batches = 100400
#max_batches = 200800

@Zzh-tju
Copy link
Owner

Zzh-tju commented Oct 29, 2020

darknet的训练轮数与GPU数量无关,VOC一律5w轮为好,多了就退化,COCO 50w轮

@Zzh-tju
Copy link
Owner

Zzh-tju commented Oct 29, 2020

2 1080Ti训练50000轮约25小时,即VOC约一天出一个结果,而COCO则要10天以上一个结果,这也是为什么我做不动coco的原因。一是设备有限,我不能长期占用,二是GIoU复现不出其论文结果,我猜测是回归损失权重得调。(PS: YOLOv4搜索出的回归权重是0.07,与0.5差距挺远的)

@henbucuoshanghai
Copy link
Author

你好 多谢热心的回复。
那训练voc,直接learning_rate=0.0005
burn_in=2000
max_batches = 100400
max_batches = 200800
cfg文件直接这样写就可以?

@henbucuoshanghai
Copy link
Author

二卡训练voc我怎么需要二倍的时间 50h。。 才跑完。、

@Zzh-tju
Copy link
Owner

Zzh-tju commented Oct 29, 2020

我已经修改了cfg,改成双GPU时

learning_rate=0.0005
burn_in=2000
max_batches = 50000

另,得看你是什么卡

@henbucuoshanghai
Copy link
Author

卡是2080ti 能问下 当初整个论文周期花了多久呢?

@Zzh-tju
Copy link
Owner

Zzh-tju commented Oct 29, 2020

理论上2080Ti要更快,但是我这里的双2080Ti反而更慢,5w轮28h。论文周期5个月10天。

@henbucuoshanghai
Copy link
Author

./darknet detector train cfg/voc-ciou.data cfg/voc-ciou.cfg darknet53.conv.74 -gpus 0,1
你好 按照命令跑下来 今天跑完
./darknet detector valid voc-ciou.data voc-ciou.cfg backup/voc-ciou-final.weights
结果ap只有42.81 ap75只有44 请问怎么回事 哪里有问题
读写多谢

@Zzh-tju
Copy link
Owner

Zzh-tju commented Oct 30, 2020

你的训练集是?多少张?以及你的cfg

@henbucuoshanghai
Copy link
Author

16551 images, validation set contains 4952张 对的上的。。按教程安装的数据集,查看过个数 一样
cfg就是git的修改后的 50k迭代 别的没修改。。

@Zzh-tju
Copy link
Owner

Zzh-tju commented Oct 30, 2020

贴一下你的AP50下的20类mAP

@henbucuoshanghai
Copy link
Author

aeroplane 0.7929968043056034
bicycle 0.8678608160298611
bird 0.7312500816479228
boat 0.655574079242459
bottle 0.6462959543045752
bus 0.8004222987227507
car 0.8732552180085336
cat 0.8624825817750525
chair 0.5207228521416307
cow 0.8094920886225478
diningtable 0.7041849580514079
dog 0.8215610117957911
horse 0.8611032074398459
motorbike 0.7970118607616075
person 0.8357146245011494
pottedplant 0.46349002358841396
sheep 0.6984532767303007
sofa 0.7412281410502517
train 0.8477495046324234
tvmonitor 0.7440153420517943

0.753743

@henbucuoshanghai
Copy link
Author

多谢 ap50的达不到结果。。 怎么回事

@Zzh-tju
Copy link
Owner

Zzh-tju commented Oct 30, 2020

  1. 你的训练是否中断过?
  2. 该测试权重是否是50k轮下的?
  3. 建议尝试再测一测49k轮的权重。

@henbucuoshanghai
Copy link
Author

训完ciou,ap是49.04,ap75是53.05比官方49.21,54.28低
加上丢nms后ap是49.23,ap75是53.52比官方49.32,54.74低

属于正常范围内?怎样达到官方的结果呢?

@chenzhongwang9811
Copy link

想请问一下,为什么训练不能够中断?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants