[Bug? Request?] Force `StringParser` to not split special tokens #126

aw632 · 2024-07-31T02:26:43Z

Right now, StringParser's implementation is at the character level, so if you give it a special token as the target string, it can possibly generate the same string but with non-special tokens. If a flag could be added that prevents the target string from being split, it would be very helpful. I can help write the PR, but I am not sure where exactly to get started..I see the comment:

It is a debugging / learning tool to show how CharacterLevelParser works together with TokenizerPrefixTree to filter the allowed tokens (some of whom may contain multiple characters)"""

so I think it should be possible?

The text was updated successfully, but these errors were encountered:

noamgat · 2024-09-03T19:42:00Z

The idea of LMFE is to support any sequence of tokens, whose string decoding is legal output. What you are requesting is essentially a violation of this. I'm not sure there's an elegant way to do this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug? Request?] Force `StringParser` to not split special tokens #126

[Bug? Request?] Force `StringParser` to not split special tokens #126

aw632 commented Jul 31, 2024

noamgat commented Sep 3, 2024

[Bug? Request?] Force StringParser to not split special tokens #126

[Bug? Request?] Force StringParser to not split special tokens #126

Comments

aw632 commented Jul 31, 2024

noamgat commented Sep 3, 2024

[Bug? Request?] Force `StringParser` to not split special tokens #126

[Bug? Request?] Force `StringParser` to not split special tokens #126