Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: cublas runtime error #2

Open
Bill-Cai opened this issue Jan 8, 2025 · 1 comment
Open

RuntimeError: cublas runtime error #2

Bill-Cai opened this issue Jan 8, 2025 · 1 comment

Comments

@Bill-Cai
Copy link

Bill-Cai commented Jan 8, 2025

I followed the PDF instructions, completed the environment configuration and data preparation, but encountered an error while running the training script traindPLHBV.py.

Below is error:

$ python traindPLHBV.py 
loading package hydroDL
daymet tmean was used!
read usgs streamflow 7.540483236312866
read usgs streamflow 7.585710525512695
read usgs streamflow 11.400542974472046
daymet tmean was used!
read usgs streamflow 7.546891689300537
read usgs streamflow 7.621800184249878
write master file /workspace/output/rnnStreamflow/CAMELSDemo/dPLHBV/ALL/Testforc/daymet/BuffOpt0/RMSE_para0.25/111111/Fold1/T_19801001_19951001_BS_100_HS_256_RHO_365_NF_12_Buff_365_Mul_16/master.json
Traceback (most recent call last):
  File "traindPLHBV.py", line 306, in <module>
    bufftime=BUFFTIME)
  File "../../hydroDL/model/train.py", line 89, in trainModel
    yP = model(xTrain, zTrain)
  File "/opt/conda/envs/mhpihydrodl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "../../hydroDL/model/rnn.py", line 1315, in forward
    Gen = self.lstminv(z)
  File "/opt/conda/envs/mhpihydrodl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "../../hydroDL/model/rnn.py", line 368, in forward
    x0 = F.relu(self.linearIn(x))
  File "/opt/conda/envs/mhpihydrodl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/mhpihydrodl/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 67, in forward
    return F.linear(input, self.weight, self.bias)
  File "/opt/conda/envs/mhpihydrodl/lib/python3.6/site-packages/torch/nn/functional.py", line 1354, in linear
    output = input.matmul(weight.t())
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1550802451070/work/aten/src/THC/THCBlas.cu:258

I run the experiment in Docker container, and the CUDA version is 10.0, used A800 GPU. Python env is built under conda as PDF says. My code and data are all mounted in /workspace folder.

I want to know what problem caused this error and how it should be fixed.

Thanks!

@chaopengshen
Copy link
Contributor

Unknown issue. I will let other team members look at this issue.

btw, have you looked at our new release?
𝛿MG is our focus now and everything is much more streamlined and systematic. Maybe you won't run into issues like this with 𝛿MG.
https://mhpi.github.io/codes/frameworks/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants