When llama_chat_apply_template doesn't work #11687

ericcurtin · 2025-02-05T21:11:27Z

Try minja. With granite-code if we fall back to jinja on failure, it's fine.

ericcurtin · 2025-02-05T21:11:45Z

Try minja. With granite-code if we fall back to jinja on failure, it's fine. Co-authored-by: Michael Engel <[email protected]> Signed-off-by: Eric Curtin <[email protected]>

ericcurtin · 2025-02-05T21:27:56Z

@ngxson I also added this change to llama-server if you want to take it, some models fail with the default template engine, but if you fallback to minja then, they just work.

examples/server/server.cpp

ngxson · 2025-02-05T22:54:29Z

examples/server/server.cpp

+            return false;
+        }
+    }
+
    bool validate_builtin_chat_template(bool use_jinja) const {


If we choose to fallback to jinja, logically we should remove the bool use_jinja from this function signature, but I'm not sure.

Also have a look on where this function is used:

if (params_base.chat_template.empty() && !validate_builtin_chat_template(params.use_jinja)) { LOG_WRN("%s: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses\n", __func__); chat_templates = common_chat_templates_from_model(model, "chatml"); } else { chat_templates = common_chat_templates_from_model(model, params_base.chat_template); } GGML_ASSERT(chat_templates.template_default.get() != nullptr);

So now the LOG_WRN message is no longer valid, probably need to be changed too.

Well the bool still kinda make sense, somebody might want to force use jinja... If the llama_chat_apply_template works, it won't use jinja at all.

We could delete the validation and just fallback on failure. WDYT?

In this case, I think we can change the variable name to bool prefer_jinja to make it more intuitive

Yes, prefer_ or enforce_ prefix would be good.

Co-authored-by: Xuan-Son Nguyen <[email protected]>

ngxson · 2025-02-05T23:05:04Z

Also btw I think we can now remove the whole fallback-to-chatml logic. I briefly discussed with @ochafik in another PR.

My point is that the fallback-to-chatml was useful when chat template was a new thing, not many models had it. But now it's became a standard so model not having it can be considered as "broken"

When both the "normal" and "jinja" template system failed, we should return an error on /v1/chat/completions. For llama-run and llama-cli, it should show an error and exit. The error message can be something like: Chat template is not valid. Please use another chat template by using --chat-template argument

Edit: this is a bit off-topic here but would be nice if we can do in a next PR

Co-authored-by: Xuan-Son Nguyen <[email protected]> Signed-off-by: Eric Curtin <[email protected]>

ochafik · 2025-02-08T18:49:01Z

Also btw I think we can now remove the whole fallback-to-chatml logic. I briefly discussed with @ochafik in another PR.

My point is that the fallback-to-chatml was useful when chat template was a new thing, not many models had it. But now it's became a standard so model not having it can be considered as "broken"

When both the "normal" and "jinja" template system failed, we should return an error on /v1/chat/completions. For llama-run and llama-cli, it should show an error and exit. The error message can be something like: Chat template is not valid. Please use another chat template by using --chat-template argument

sgtm, we can even suggest --chat-template chatml to provide the default explicitly

github-actions bot added the examples label Feb 5, 2025

ericcurtin force-pushed the fall-back-to-jinja branch from 27f1b9b to 7c0adb5 Compare February 5, 2025 21:16

When llama_chat_apply_template doesn't work

d23abdc

Try minja. With granite-code if we fall back to jinja on failure, it's fine. Co-authored-by: Michael Engel <[email protected]> Signed-off-by: Eric Curtin <[email protected]>

ericcurtin force-pushed the fall-back-to-jinja branch from 7c0adb5 to d23abdc Compare February 5, 2025 21:25

ericcurtin requested a review from ngxson as a code owner February 5, 2025 21:25

github-actions bot added the server label Feb 5, 2025

ngxson reviewed Feb 5, 2025

View reviewed changes

Update examples/server/server.cpp

a4e9e4d

Co-authored-by: Xuan-Son Nguyen <[email protected]>

Update examples/server/server.cpp

e00c9d1

Co-authored-by: Xuan-Son Nguyen <[email protected]> Signed-off-by: Eric Curtin <[email protected]>

ericcurtin force-pushed the fall-back-to-jinja branch from 8ea082c to e00c9d1 Compare February 6, 2025 07:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When llama_chat_apply_template doesn't work #11687

When llama_chat_apply_template doesn't work #11687

ericcurtin commented Feb 5, 2025

ericcurtin commented Feb 5, 2025

ericcurtin commented Feb 5, 2025

ngxson Feb 5, 2025

ericcurtin Feb 5, 2025

ngxson Feb 6, 2025

engelmi Feb 6, 2025

ngxson commented Feb 5, 2025 •

edited

Loading

ochafik commented Feb 8, 2025

When llama_chat_apply_template doesn't work #11687

Are you sure you want to change the base?

When llama_chat_apply_template doesn't work #11687

Conversation

ericcurtin commented Feb 5, 2025

ericcurtin commented Feb 5, 2025

ericcurtin commented Feb 5, 2025

ngxson Feb 5, 2025

Choose a reason for hiding this comment

ericcurtin Feb 5, 2025

Choose a reason for hiding this comment

ngxson Feb 6, 2025

Choose a reason for hiding this comment

engelmi Feb 6, 2025

Choose a reason for hiding this comment

ngxson commented Feb 5, 2025 • edited Loading

ochafik commented Feb 8, 2025

ngxson commented Feb 5, 2025 •

edited

Loading