Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug? Request?] Force StringParser to not split special tokens #126

Open
aw632 opened this issue Jul 31, 2024 · 1 comment
Open

[Bug? Request?] Force StringParser to not split special tokens #126

aw632 opened this issue Jul 31, 2024 · 1 comment

Comments

@aw632
Copy link
Contributor

aw632 commented Jul 31, 2024

Right now, StringParser's implementation is at the character level, so if you give it a special token as the target string, it can possibly generate the same string but with non-special tokens. If a flag could be added that prevents the target string from being split, it would be very helpful. I can help write the PR, but I am not sure where exactly to get started..I see the comment:

It is a debugging / learning tool to show how CharacterLevelParser works together with TokenizerPrefixTree to filter the allowed tokens (some of whom may contain multiple characters)"""

so I think it should be possible?

@noamgat
Copy link
Owner

noamgat commented Sep 3, 2024

The idea of LMFE is to support any sequence of tokens, whose string decoding is legal output. What you are requesting is essentially a violation of this. I'm not sure there's an elegant way to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants