forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optionally ignore utf-8 decoding error when converting std::string to…
… python str. (pytorch#97282) Summary: When language models use c++ tokenizer, outputs are a c++ strings that are not necessarily valid utf-8 encodings. Default pybind11 casting uses strict utf-8 decoding. We relax the decoding using 'ignore' argument. Test Plan: https://www.internalfb.com/intern/testinfra/testrun/6473924609918070 Reviewed By: Nayef211 Differential Revision: D43970697 Pull Request resolved: pytorch#97282 Approved by: https://github.com/davidberard98
- Loading branch information
1 parent
a524123
commit b45880c
Showing
5 changed files
with
59 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#include <torch/csrc/jit/python/utf8_decoding_ignore.h> | ||
|
||
namespace torch::jit { | ||
|
||
namespace { | ||
thread_local bool kIgnore = false; | ||
} | ||
|
||
void setUTF8DecodingIgnore(bool o) { | ||
kIgnore = o; | ||
} | ||
bool getUTF8DecodingIgnore() { | ||
return kIgnore; | ||
} | ||
|
||
} // namespace torch::jit |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#pragma once | ||
#include <torch/csrc/Export.h> | ||
namespace torch { | ||
namespace jit { | ||
TORCH_API void setUTF8DecodingIgnore(bool o); | ||
TORCH_API bool getUTF8DecodingIgnore(); | ||
} // namespace jit | ||
} // namespace torch |