-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo Bug fix for annotation/inference endpoint fails with long sequences as input #762
base: main
Are you sure you want to change the base?
Redo Bug fix for annotation/inference endpoint fails with long sequences as input #762
Conversation
…ssue-667-annotation-with-large-texts-fails-redo
…ssue-667-annotation-with-large-texts-fails-redo
…t by the modelclient based om HF config plausible params
@@ -99,7 +99,7 @@ | |||
"use_fast": "{use_fast}", | |||
"trust_remote_code": "{trust_remote_code}", | |||
"torch_dtype": "{torch_dtype}", | |||
"max_length": 500 | |||
"max_new_tokens": 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have this option in HfPipelineConfig
, I would change it to:
"max_new_tokens": 500 | |
"max_new_tokens": "{max_new_tokens}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried with that but it throws an error since no default value is provided for max_new_tokens
in inference_config (unline use_fast, trust_remote_code, etc. which have default values)
"use_fast": "{use_fast}", | ||
"trust_remote_code": "{trust_remote_code}", | ||
"torch_dtype": "{torch_dtype}", | ||
"max_new_tokens": 142 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
"max_new_tokens": 142 | |
"max_new_tokens": "{max_new_tokens}" |
Checks various possible max_length parameters which varies on model architecture. | ||
""" | ||
config = self._pipeline.model.config | ||
logger.info(f"Initial model_max_length {self._pipeline.tokenizer.model_max_length}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, don't say "initial ...". It kind of implies that you will definitely change it. You can also omit this log in favor of the other one, beneath.
logger.info(f"Initial model_max_length {self._pipeline.tokenizer.model_max_length}") | |
logger.info(f"The maximum number of input tokens is set to {self._pipeline.tokenizer.model_max_length}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this message, re-worded it.
Removing the one below (both messages will be providing the same info and the same value for the case self._pipeline.tokenizer.model_max_length != VERY_LARGE_INTEGER
logger.info(f"Initial model_max_length {self._pipeline.tokenizer.model_max_length}") | ||
# If suitable model_max_length is already available, don't override it | ||
if self._pipeline.tokenizer.model_max_length != VERY_LARGE_INTEGER: | ||
logger.info(f"Using model_max_length = {self._pipeline.tokenizer.model_max_length} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it a bit more human readable:
logger.info(f"Using model_max_length = {self._pipeline.tokenizer.model_max_length} \ | |
logger.info(f"Setting the maximum length of input tokens to {self._pipeline.tokenizer.model_max_length} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this logging statement in favor of the above.
isinstance(value, int) and value < VERY_LARGE_INTEGER | ||
): # Sanity check for reasonable values | ||
self._pipeline.tokenizer.model_max_length = value | ||
logger.info(f"Setting model_max_length to {value} based on config.{param}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info(f"Setting model_max_length to {value} based on config.{param}") | |
logger.info(f"Setting the maximum length of input tokens to {value} based on the config.{param} attribute.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as suggested
return | ||
|
||
# If no suitable parameter is found, warn the user and continue with the HF default | ||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inform the user which is the default value, if you have it available. I agree with you that this should be a warning
, however, currently warnings are not included in job logs. I would leave it as a warning
, though, and fix this later. Could you create an issue for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, will add a Git issue for that. Added the default value being used in the log message.
Edit: #785
Hey @dpoulopoulos, I've made the necessary changes and added my responses. Please check and let me know if we are good to go. |
What's changing
When testing with Thunderbird dataset, @agpituk found that BART model fails for long sequences with the error:
As pointed out by @dpoulopoulos , the error comes from a limitation on the maximum number of positional embeddings that the model can have (token embeddings will always be within range).
This could be fixed by setting the allowed
model_max_length
(the number of tokens that the model sees) for the respective model, e.g. for BART this is 1024Edit: Instead of hard coding this in config_templates, we search for plausible params that are most like to correspond to
model_max_length
if it is set to the default value of VERY_LARGE_INTEGER (int(1e30)).Additionally, one would also need to set the
truncation = True
in the HF pipeline to truncate the sequence at this max_length.Closes #667
Additional notes for reviewers
See Colab Notebook for a demo with these two changes that fixes the issue for long text input.
Also removing
max_length
from the configs (this corresponds to the #tokens in the generated output) - this was being hard-coded in default_infer_template and bart_infer_template. After removal, we would use the HF defaults provided in the respective model configs.How to test it
Steps to test the changes:
mock_long_sequences_no_gt.csv
(synthetically generated data with no GT) as the datasetI already...
/docs
)