Low GPU Usage - Cache/DataLoader Issue? #199

sbuser · 2020-06-11T10:59:55Z

I'm running CUDA 10.1 with the latest versions of TF and pyTorch a TeslaK80 and a 1080ti.

I'm running the stable version (0.1.1 -- I was unable to get the ESPnet version running) with a patched train.py implementing data_parallel_workaround() from master.

The model seems to be training -- but very inefficiently. If I watch GPU usage with nvidia-smi I see only intermittent GPU-Util spikes with CPU utilization at about 25% (8 cores @ 4.8 ghz).

hparams that may be relevant:

    # Data loader
    pin_memory=True,
    num_workers=12,

    # Training:
    batch_size=12,

Do I just need to dramatically increase the num_workers to feed the GPUs more data? GPU temps look fine, data is on a super fast SSD, so I'm not sure what I'm doing wrong.

FWIW, here's what I show in the python.exe stack:

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low GPU Usage - Cache/DataLoader Issue? #199

Low GPU Usage - Cache/DataLoader Issue? #199

sbuser commented Jun 11, 2020 •

edited

Loading

Low GPU Usage - Cache/DataLoader Issue? #199

Low GPU Usage - Cache/DataLoader Issue? #199

Comments

sbuser commented Jun 11, 2020 • edited Loading

sbuser commented Jun 11, 2020 •

edited

Loading