Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use ./inference/incr_decoding/incr_decoding on inference branch #1

Open
hugolatendresse opened this issue Nov 26, 2024 · 0 comments

Comments

@hugolatendresse
Copy link

hugolatendresse commented Nov 26, 2024

I am unable to run the incremental decoding with the c++ interface without errors. Tried with meta-llama/Llama-2-7b-hf, other llama models, and an OPT model.

I get the error below (with backtrace in gdb)

[10]29889
No small speculative model registered, using incremental decoding.
[0 - 7ffff4921000]    1.042109 {3}{RequestManager}: [1000358]New request tokens: 1 14350 263 26228 21256 1048 7535 17770 363 596 10462 29889
optimal_views.size = 294
views.size() = 294
###PEFT DEBUGGING### Operators reconstructed from optimized graph.
###PEFT DEBUGGING### Starting inplace optimizations.
###PEFT DEBUGGING### Mapping output tensors.
ndim(1) dims[1 0 0 0]
###PEFT DEBUGGING### Setting up NCCL communications.
###PEFT DEBUGGING### compile_inference completed successfully.
Loading weight file embed_tokens.weight
Loading weight file layers.0.input_layernorm.weight
Loading weight file layers.0.self_attn.q_proj.weight
incr_decoding: /home/ubuntu/FlexFlow/deps/legion/runtime/legion/runtime.cc:4991: void Legion::Internal::PhysicalRegionImpl::wait_until_valid(bool, const char*, bool, const char*): Assertion `implicit_context == context' failed.

Thread 10 "incr_decoding" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffefcf4000 (LWP 18011)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737216724992) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737216724992) at ./nptl/pthread_kill.c:44
flexflow/flexflow-train#1  __pthread_kill_internal (signo=6, threadid=140737216724992) at ./nptl/pthread_kill.c:78
flexflow/flexflow-train#2  __GI___pthread_kill (threadid=140737216724992, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
flexflow/flexflow-train#3  0x00007fffece42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
flexflow/flexflow-train#4  0x00007fffece287f3 in __GI_abort () at ./stdlib/abort.c:79
flexflow/flexflow-train#5  0x00007fffece2871b in __assert_fail_base (fmt=0x7fffecfdd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff3c00b7b "implicit_context == context",
    file=0x7ffff3bfd360 "/home/ubuntu/FlexFlow/deps/legion/runtime/legion/runtime.cc", line=4991, function=<optimized out>) at ./assert/assert.c:92
flexflow/flexflow-train#6  0x00007fffece39e96 in __GI___assert_fail (assertion=0x7ffff3c00b7b "implicit_context == context", file=0x7ffff3bfd360 "/home/ubuntu/FlexFlow/deps/legion/runtime/legion/runtime.cc", line=4991,
    function=0x7ffff3c016f8 "void Legion::Internal::PhysicalRegionImpl::wait_until_valid(bool, const char*, bool, const char*)") at ./assert/assert.c:101
flexflow/flexflow-train#7  0x00007ffff2ad68b8 in Legion::Internal::PhysicalRegionImpl::wait_until_valid (this=0x7ff754203e70, silence_warnings=false, warning_string=0x0, warn=false, source=0x0)
    at /home/ubuntu/FlexFlow/deps/legion/runtime/legion/runtime.cc:4991
flexflow/flexflow-train#8  0x00007ffff2624402 in Legion::PhysicalRegion::wait_until_valid (this=0x7ff7542005d8, silence_warnings=false, warning_string=0x0) at /home/ubuntu/FlexFlow/deps/legion/runtime/legion/legion.cc:2772
flexflow/flexflow-train#9  0x00007ffff65530eb in FlexFlow::ParallelTensorBase::set_tensor<__half> (this=0x7ff762541c30, ff=0x7ff7641e7db0, dim_sizes=std::vector of length 1, capacity 1 = {...}, data=0x7ff754201320)
    at /home/ubuntu/FlexFlow/src/runtime/parallel_tensor.cc:680
flexflow/flexflow-train#10 0x00007ffff62dcb4a in FileDataLoader::load_single_weight_tensor<__half> (this=0x7ff765508840, ff=0x7ff7641e7db0, l=0x7ff76498f5e0, weight_idx=0) at /home/ubuntu/FlexFlow/src/runtime/file_loader.cc:849
flexflow/flexflow-train#11 0x00007ffff62dad8c in FileDataLoader::load_weight_task (task=0x7ff724a142e0, regions=std::vector of length 0, capacity 0, ctx=0x7ff76c2effe0, runtime=0x555556b68000)
    at /home/ubuntu/FlexFlow/src/runtime/file_loader.cc:864
flexflow/flexflow-train#12 0x00007ffff64cc78a in Legion::LegionTaskWrapper::legion_task_wrapper<&FileDataLoader::load_weight_task> (args=0x7ff724a24990, arglen=8, userdata=0x0, userlen=0, p=...)
    at /home/ubuntu/FlexFlow/deps/legion/runtime/legion/legion.inl:21215
flexflow/flexflow-train#13 0x00007fffedee02cc in Realm::LocalTaskProcessor::execute_task (this=0x555556ad9bf0, func_id=19, task_args=...) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/proc_impl.cc:1176
flexflow/flexflow-train#14 0x00007fffedf5fc4b in Realm::Task::execute_on_processor (this=0x7ff724a24810, p=...) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:326
flexflow/flexflow-train#15 0x00007fffedf650fc in Realm::UserThreadTaskScheduler::execute_task (this=0x555556ad9f90, task=0x7ff724a24810) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:1687
flexflow/flexflow-train#16 0x00007fffedf62deb in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x555556ad9f90) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:1160
flexflow/flexflow-train#17 0x00007fffedf6b5be in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x555556ad9f90)
    at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/threads.inl:97
flexflow/flexflow-train#18 0x00007fffedf7ab2d in Realm::UserThread::uthread_entry () at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/threads.cc:1428
flexflow/flexflow-train#19 0x00007fffece5a130 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:90 from /lib/x86_64-linux-gnu/libc.so.6
flexflow/flexflow-train#20 0x0000

Steps to reproduce

Create and ssh into a g4dn.8xlarge instance with AMI Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Ubuntu 22.04) 20241119

git clone --recursive https://github.com/flexflow/FlexFlow.git
export FF_GPU_BACKEND=cuda
export cuda_version=12.2 # aws instance has CUDA 12.4, but only 12.2 is supported by FF
cd FlexFlow
curl https://sh.rustup.rs -sSf | sh -s -- -y
source ~/.bashrc
vim config/config.linux # change build type to Debug
mkdir build
cd build
../config/config.linux
make
cd ..
pip install .
huggingface-cli login
python3 ./inference/utils/download_hf_model.py  meta-llama/Llama-2-7b-hf
cd build
wget -O chatgpt.json https://specinfer.s3.us-east-2.amazonaws.com/prompts/chatgpt.json
gdb --args ./inference/incr_decoding/incr_decoding -ll:gpu 1 -ll:cpu 4 -ll:fsize 7000 -ll:zsize 32000 -llm-model meta-llama/Llama-2-7b-hf  -prompt chatgpt.json -tensor-parallelism-degree 1
r
@lockshaw lockshaw transferred this issue from flexflow/flexflow-train Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant