You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am unable to run the incremental decoding with the c++ interface without errors. Tried with meta-llama/Llama-2-7b-hf, other llama models, and an OPT model.
I get the error below (with backtrace in gdb)
[10]29889
No small speculative model registered, using incremental decoding.
[0 - 7ffff4921000] 1.042109 {3}{RequestManager}: [1000358]New request tokens: 1 14350 263 26228 21256 1048 7535 17770 363 596 10462 29889
optimal_views.size = 294
views.size() = 294
###PEFT DEBUGGING### Operators reconstructed from optimized graph.
###PEFT DEBUGGING### Starting inplace optimizations.
###PEFT DEBUGGING### Mapping output tensors.
ndim(1) dims[1 0 0 0]
###PEFT DEBUGGING### Setting up NCCL communications.
###PEFT DEBUGGING### compile_inference completed successfully.
Loading weight file embed_tokens.weight
Loading weight file layers.0.input_layernorm.weight
Loading weight file layers.0.self_attn.q_proj.weight
incr_decoding: /home/ubuntu/FlexFlow/deps/legion/runtime/legion/runtime.cc:4991: void Legion::Internal::PhysicalRegionImpl::wait_until_valid(bool, const char*, bool, const char*): Assertion `implicit_context == context' failed.
Thread 10 "incr_decoding" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffefcf4000 (LWP 18011)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737216724992) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737216724992) at ./nptl/pthread_kill.c:44
flexflow/flexflow-train#1 __pthread_kill_internal (signo=6, threadid=140737216724992) at ./nptl/pthread_kill.c:78
flexflow/flexflow-train#2 __GI___pthread_kill (threadid=140737216724992, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
flexflow/flexflow-train#3 0x00007fffece42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
flexflow/flexflow-train#4 0x00007fffece287f3 in __GI_abort () at ./stdlib/abort.c:79
flexflow/flexflow-train#5 0x00007fffece2871b in __assert_fail_base (fmt=0x7fffecfdd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff3c00b7b "implicit_context == context",
file=0x7ffff3bfd360 "/home/ubuntu/FlexFlow/deps/legion/runtime/legion/runtime.cc", line=4991, function=<optimized out>) at ./assert/assert.c:92
flexflow/flexflow-train#6 0x00007fffece39e96 in __GI___assert_fail (assertion=0x7ffff3c00b7b "implicit_context == context", file=0x7ffff3bfd360 "/home/ubuntu/FlexFlow/deps/legion/runtime/legion/runtime.cc", line=4991,
function=0x7ffff3c016f8 "void Legion::Internal::PhysicalRegionImpl::wait_until_valid(bool, const char*, bool, const char*)") at ./assert/assert.c:101
flexflow/flexflow-train#7 0x00007ffff2ad68b8 in Legion::Internal::PhysicalRegionImpl::wait_until_valid (this=0x7ff754203e70, silence_warnings=false, warning_string=0x0, warn=false, source=0x0)
at /home/ubuntu/FlexFlow/deps/legion/runtime/legion/runtime.cc:4991
flexflow/flexflow-train#8 0x00007ffff2624402 in Legion::PhysicalRegion::wait_until_valid (this=0x7ff7542005d8, silence_warnings=false, warning_string=0x0) at /home/ubuntu/FlexFlow/deps/legion/runtime/legion/legion.cc:2772
flexflow/flexflow-train#9 0x00007ffff65530eb in FlexFlow::ParallelTensorBase::set_tensor<__half> (this=0x7ff762541c30, ff=0x7ff7641e7db0, dim_sizes=std::vector of length 1, capacity 1 = {...}, data=0x7ff754201320)
at /home/ubuntu/FlexFlow/src/runtime/parallel_tensor.cc:680
flexflow/flexflow-train#10 0x00007ffff62dcb4a in FileDataLoader::load_single_weight_tensor<__half> (this=0x7ff765508840, ff=0x7ff7641e7db0, l=0x7ff76498f5e0, weight_idx=0) at /home/ubuntu/FlexFlow/src/runtime/file_loader.cc:849
flexflow/flexflow-train#11 0x00007ffff62dad8c in FileDataLoader::load_weight_task (task=0x7ff724a142e0, regions=std::vector of length 0, capacity 0, ctx=0x7ff76c2effe0, runtime=0x555556b68000)
at /home/ubuntu/FlexFlow/src/runtime/file_loader.cc:864
flexflow/flexflow-train#12 0x00007ffff64cc78a in Legion::LegionTaskWrapper::legion_task_wrapper<&FileDataLoader::load_weight_task> (args=0x7ff724a24990, arglen=8, userdata=0x0, userlen=0, p=...)
at /home/ubuntu/FlexFlow/deps/legion/runtime/legion/legion.inl:21215
flexflow/flexflow-train#13 0x00007fffedee02cc in Realm::LocalTaskProcessor::execute_task (this=0x555556ad9bf0, func_id=19, task_args=...) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/proc_impl.cc:1176
flexflow/flexflow-train#14 0x00007fffedf5fc4b in Realm::Task::execute_on_processor (this=0x7ff724a24810, p=...) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:326
flexflow/flexflow-train#15 0x00007fffedf650fc in Realm::UserThreadTaskScheduler::execute_task (this=0x555556ad9f90, task=0x7ff724a24810) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:1687
flexflow/flexflow-train#16 0x00007fffedf62deb in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x555556ad9f90) at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/tasks.cc:1160
flexflow/flexflow-train#17 0x00007fffedf6b5be in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x555556ad9f90)
at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/threads.inl:97
flexflow/flexflow-train#18 0x00007fffedf7ab2d in Realm::UserThread::uthread_entry () at /home/ubuntu/FlexFlow/deps/legion/runtime/realm/threads.cc:1428
flexflow/flexflow-train#19 0x00007fffece5a130 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:90 from /lib/x86_64-linux-gnu/libc.so.6
flexflow/flexflow-train#20 0x0000
Steps to reproduce
Create and ssh into a g4dn.8xlarge instance with AMI Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Ubuntu 22.04) 20241119
git clone --recursive https://github.com/flexflow/FlexFlow.git
export FF_GPU_BACKEND=cuda
export cuda_version=12.2 # aws instance has CUDA 12.4, but only 12.2 is supported by FF
cd FlexFlow
curl https://sh.rustup.rs -sSf | sh -s -- -y
source ~/.bashrc
vim config/config.linux # change build type to Debug
mkdir build
cd build
../config/config.linux
make
cd ..
pip install .
huggingface-cli login
python3 ./inference/utils/download_hf_model.py meta-llama/Llama-2-7b-hf
cd build
wget -O chatgpt.json https://specinfer.s3.us-east-2.amazonaws.com/prompts/chatgpt.json
gdb --args ./inference/incr_decoding/incr_decoding -ll:gpu 1 -ll:cpu 4 -ll:fsize 7000 -ll:zsize 32000 -llm-model meta-llama/Llama-2-7b-hf -prompt chatgpt.json -tensor-parallelism-degree 1
r
The text was updated successfully, but these errors were encountered:
I am unable to run the incremental decoding with the c++ interface without errors. Tried with meta-llama/Llama-2-7b-hf, other llama models, and an OPT model.
I get the error below (with backtrace in gdb)
Steps to reproduce
Create and ssh into a g4dn.8xlarge instance with AMI Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Ubuntu 22.04) 20241119
The text was updated successfully, but these errors were encountered: