forked from mlc-ai/mlc-llm
-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Integrate Flash-Decoding into engine (#181)
* test stub * wip * wip * wip * compiled * wip * fix * fix * wip, decode with flash decoding works * all work * add paged_kv_cache_type option * read kv_type from artifact * black * refactor attention backend * minor clean up * Integrate flash-decoding into mlc-serve * remove --use-vllm-attention * wip decode_multi_query integration * temp handling for multi-query logits * remove tmp support for multi-query decode * typo * use block size 128 or 64 when possible * remove unused var * merge fix
- Loading branch information
Showing
5 changed files
with
107 additions
and
61 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters