feat: QNN Multi Chunk Execution in New Frontend #191

oreomaker · 2024-11-13T09:42:03Z

Multi Chunk Execution

Tokenizer: Create a new tokenizePaddingByChunk in SmolLMTokenizer, which will takes input and padding to nearest multiplication of chunk_size

auto [real_seq_length, input_tensor] = tokenizer.tokenizePaddingByChunk(input_str, chunk_size, config.vocab_size);

Module Static States: Add Module::isMultiChunkPrefilling and Module::isFirstChunk to record the multi chunk execution

Module Execution: Add a new tensor_status of TENSOR_UNDEFINED, which is used in QNN chunk execution. If Module::isMultiChunkPrefilling is true, the QNN modules will not reshape & setUp in following chunks, while CPU modules still reshape & setUp

if (Tensor::tensor_status == TENSOR_STATIC_INIT && device_ != MLLM_CPU) { // backend specific module reshape & setup
                if (Module::isMultiChunkPrefilling && !Module::isFirstChunk) {        // set to TENSOR_UNDEFINED and SKIP executing qnn layers
                    Tensor::tensor_status = TENSOR_UNDEFINED;
                    auto outputs =  Forward(inputs, anyArgs);
                    Tensor::tensor_status = TENSOR_STATIC_INIT;
                    return outputs;
                }
...

TODO

Multi round input still output weird results, which may be caused by stateful OPs like KVCache, RoPE and CasaulMask

refactor: qnn module setup skip in following chunks todo: multi input

oreomaker and others added 6 commits November 8, 2024 16:03

dev: qnn multi input inference developing

e428bb5

Merge branch 'main' into develop-zh

fca8584

feat: qnn multi chunk prefilling in new frontend

f94937d

refactor: qnn module setup skip in following chunks todo: multi input

Merge branch 'main' into develop-zh

83fefd3

fix: clearCache in RoPE

5ff02cd

fix: genarate with Padding

68ddb99

yirongjie changed the title ~~QNN Multi Chunk Execution in New Frontend~~ feat: QNN Multi Chunk Execution in New Frontend Nov 14, 2024

yirongjie approved these changes Nov 14, 2024

View reviewed changes

yirongjie merged commit 11a2fb2 into UbiquitousLearning:main Nov 14, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: QNN Multi Chunk Execution in New Frontend #191

feat: QNN Multi Chunk Execution in New Frontend #191

oreomaker commented Nov 13, 2024

feat: QNN Multi Chunk Execution in New Frontend #191

feat: QNN Multi Chunk Execution in New Frontend #191

Conversation

oreomaker commented Nov 13, 2024

Multi Chunk Execution

TODO