-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When llama_chat_apply_template doesn't work #11687
base: master
Are you sure you want to change the base?
Conversation
27f1b9b
to
7c0adb5
Compare
Try minja. With granite-code if we fall back to jinja on failure, it's fine. Co-authored-by: Michael Engel <[email protected]> Signed-off-by: Eric Curtin <[email protected]>
7c0adb5
to
d23abdc
Compare
@ngxson I also added this change to llama-server if you want to take it, some models fail with the default template engine, but if you fallback to minja then, they just work. |
return false; | ||
} | ||
} | ||
|
||
bool validate_builtin_chat_template(bool use_jinja) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we choose to fallback to jinja, logically we should remove the bool use_jinja
from this function signature, but I'm not sure.
Also have a look on where this function is used:
if (params_base.chat_template.empty() && !validate_builtin_chat_template(params.use_jinja)) {
LOG_WRN("%s: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses\n", __func__);
chat_templates = common_chat_templates_from_model(model, "chatml");
} else {
chat_templates = common_chat_templates_from_model(model, params_base.chat_template);
}
GGML_ASSERT(chat_templates.template_default.get() != nullptr);
So now the LOG_WRN message is no longer valid, probably need to be changed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the bool still kinda make sense, somebody might want to force use jinja... If the llama_chat_apply_template works, it won't use jinja at all.
We could delete the validation and just fallback on failure. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I think we can change the variable name to bool prefer_jinja
to make it more intuitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, prefer_
or enforce_
prefix would be good.
Co-authored-by: Xuan-Son Nguyen <[email protected]>
Also btw I think we can now remove the whole fallback-to-chatml logic. I briefly discussed with @ochafik in another PR. My point is that the fallback-to-chatml was useful when chat template was a new thing, not many models had it. But now it's became a standard so model not having it can be considered as "broken" When both the "normal" and "jinja" template system failed, we should return an error on Edit: this is a bit off-topic here but would be nice if we can do in a next PR |
Co-authored-by: Xuan-Son Nguyen <[email protected]> Signed-off-by: Eric Curtin <[email protected]>
8ea082c
to
e00c9d1
Compare
sgtm, we can even suggest |
Try minja. With granite-code if we fall back to jinja on failure, it's fine.