Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoTabPFNClassifier using CUDA silently fails due to SIGSEGV in cuda malloc #20

Open
shaywinter opened this issue Jan 20, 2025 · 1 comment

Comments

@shaywinter
Copy link

Using AutoTabPFNClassifier with a Cuda device (RTX 3090, 24GB) works ok as long as the tuning time remains short (~<=300).
increasing it to 500-1000 Sec fails with a core dump, loading it in gdb:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fbc1c6e58d6 in c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(signed char, unsigned long, CUstream_st*) ()

Disabling memory optimizations of TabPFN slightly increases max possible tuning time (1000), but it issue still happens with longer tuning times (2000).

@LeoGrin
Copy link
Collaborator

LeoGrin commented Jan 22, 2025

Hey @shaywinter ! Thanks for the report :)
I cannot reproduce this error on my machine. Could you please try running this on CPU with the same input and see if you get a more informative error?
If you can share more about your dataset (ideally the dataset itself if public, otherwise some characteristics) I'll be happy to investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants