feat: Inference speed(tokens/s) profiling #91

yirongjie · 2024-07-15T10:38:33Z

What's new

Introduced the module.profiling() function.
An example can be found on line 58 of example/demo_tinyllama.cpp, and the resulting output is shown below:

===========================================
  Load time: 0.461832 s
  Prefilling speed: 5.78816 tokens/s
  Decoding speed: 5.23941 tokens/s
===========================================

yirongjie added 4 commits July 15, 2024 09:52

fix: time profile

b2d0fb3

fix: sparse inference is not support for ARM NEON

9ef187c

fix: rename profiling

866d419

fix: Typo

be37bdc

yirongjie requested a review from UbiquitousLearning July 15, 2024 10:38

UbiquitousLearning approved these changes Jul 15, 2024

View reviewed changes

yirongjie merged commit ec3360d into UbiquitousLearning:main Jul 15, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Inference speed(tokens/s) profiling #91

feat: Inference speed(tokens/s) profiling #91

yirongjie commented Jul 15, 2024

feat: Inference speed(tokens/s) profiling #91

feat: Inference speed(tokens/s) profiling #91

Conversation

yirongjie commented Jul 15, 2024

What's new