-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
input is not valid Modified UTF-8: illegal continuation byte 0 #1894
Comments
@csukuangfj I tested some fixes locally, like this:
and added my own implementation of the new methods ( |
We have already done sherpa-onnx/sherpa-onnx/csrc/offline-recognizer-impl.cc Lines 498 to 500 in 4801094
Can you output the byte sequence of the invalid utf8 string? |
This is what I get in the log: |
So the string has only two bytes We need to update our |
No problem, I can do that. How do I run the unit-tests for this project? |
Thanks!
Please use git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build
cd build
cmake -DSHERPA_ONNX_ENABLE_TESTS=ON ..
make test-utils-test
./bin/test-utils-test Please also update |
After I successfully ran Then I try to run this I get this error: |
I got it:
|
@csukuangfj here is the PR with a fix: #1904 |
Closing by #1904 |
Latest master, I am seeing this new exception:
Steps to reproduce:
ASR with whisper small model, pass the language as cs (Czech), utterance - "check check". Reproduces almost 100%. This is a regression introduced in the last 2-4 weeks.
The text was updated successfully, but these errors were encountered: