Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When llama_chat_apply_template doesn't work #11687

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ericcurtin
Copy link
Collaborator

Try minja. With granite-code if we fall back to jinja on failure, it's fine.

@ericcurtin
Copy link
Collaborator Author

@engelmi @ochafik PTAL

Try minja. With granite-code if we fall back to jinja on failure,
it's fine.

Co-authored-by: Michael Engel <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
@ericcurtin
Copy link
Collaborator Author

@ngxson I also added this change to llama-server if you want to take it, some models fail with the default template engine, but if you fallback to minja then, they just work.

examples/server/server.cpp Outdated Show resolved Hide resolved
examples/server/server.cpp Outdated Show resolved Hide resolved
return false;
}
}

bool validate_builtin_chat_template(bool use_jinja) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we choose to fallback to jinja, logically we should remove the bool use_jinja from this function signature, but I'm not sure.

Also have a look on where this function is used:

        if (params_base.chat_template.empty() && !validate_builtin_chat_template(params.use_jinja)) {
            LOG_WRN("%s: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses\n", __func__);
            chat_templates = common_chat_templates_from_model(model, "chatml");
        } else {
            chat_templates = common_chat_templates_from_model(model, params_base.chat_template);
        }
        GGML_ASSERT(chat_templates.template_default.get() != nullptr);

So now the LOG_WRN message is no longer valid, probably need to be changed too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the bool still kinda make sense, somebody might want to force use jinja... If the llama_chat_apply_template works, it won't use jinja at all.

We could delete the validation and just fallback on failure. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think we can change the variable name to bool prefer_jinja to make it more intuitive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, prefer_ or enforce_ prefix would be good.

Co-authored-by: Xuan-Son Nguyen <[email protected]>
@ngxson
Copy link
Collaborator

ngxson commented Feb 5, 2025

Also btw I think we can now remove the whole fallback-to-chatml logic. I briefly discussed with @ochafik in another PR.

My point is that the fallback-to-chatml was useful when chat template was a new thing, not many models had it. But now it's became a standard so model not having it can be considered as "broken"

When both the "normal" and "jinja" template system failed, we should return an error on /v1/chat/completions. For llama-run and llama-cli, it should show an error and exit. The error message can be something like: Chat template is not valid. Please use another chat template by using --chat-template argument


Edit: this is a bit off-topic here but would be nice if we can do in a next PR

Co-authored-by: Xuan-Son Nguyen <[email protected]>
Signed-off-by: Eric Curtin <[email protected]>
@ochafik
Copy link
Collaborator

ochafik commented Feb 8, 2025

Also btw I think we can now remove the whole fallback-to-chatml logic. I briefly discussed with @ochafik in another PR.

My point is that the fallback-to-chatml was useful when chat template was a new thing, not many models had it. But now it's became a standard so model not having it can be considered as "broken"

When both the "normal" and "jinja" template system failed, we should return an error on /v1/chat/completions. For llama-run and llama-cli, it should show an error and exit. The error message can be something like: Chat template is not valid. Please use another chat template by using --chat-template argument

sgtm, we can even suggest --chat-template chatml to provide the default explicitly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants