-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minimal-trainer-zoo #62
Comments
I am trying to get started with summarization, question answering and multilingual sequence classification tasks. Next steps are to play with the minimal trainer zoo examples and HF book code examples |
Example Sequence Classification Notebook using imdb-movie-review - Colab |
Follow up next steps on Sequence Classification (L2)
|
UserExplain the following piece of python code from transformers library. Specifically explain the axis parameter with an example in the line
BingThe provided Python code is a function named compute_metrics that takes in a tuple eval_pred containing two elements: logits and labels. This function is typically used in the context of machine learning, where logits are the model's predictions and labels are the actual values or targets. a = np.array([[10, 11, 12], [13, 14, 15]]) If we call np.argmax(a, axis=0), it will return the indices of the maximum values along the columns (axis 0), which is [1, 1, 1]. This means the maximum value in each column is found in the second row (index 1). |
Starting with summarization task.
Questions
Very information log messages from transformers library when I did not provide max_length during similarly I got the below one when I only provided max_length without truncation
|
UserExplain the warning Note from the transformers library while calling Following Code Snippet contains how the tokenizer is currently used in my code
Warning Note: BingThe warning you're seeing is related to the efficiency of using the In the Hugging Face Transformers library, the In your code, you're currently using the tokenizer to encode the inputs and then separately handling the padding. The warning is suggesting that you can do both in one step by using the Here's how you might adjust your code: from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_inputs = tokenizer(inputs, padding='longest', max_length=1024, truncation=True) In this adjusted code, the Please note that the padding strategy ( I asked Bing to response in the below format UserQuestion that I posed BingYour response |
UserExplain the error message from transformers library during Code
Error Message BingThe error message you're seeing is related to the device allocation of your model and data. In PyTorch, both your model and the data you're using for training or inference need to be on the same device, either CPU or GPU (CUDA). In your code, it seems like your model and data are on different devices, which is causing the To resolve this issue, you need to ensure that both your model and data are on the same device. You can do this by using the Here's how you might adjust your code: from transformers import AutoModelForSeq2SeqLM
# Specify the device (use 'cuda' if GPU is available, else 'cpu')
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')
model = model.to(device) # Move model to the device
encoded_input = tokenizer(text, return_tensors="pt").input_ids
encoded_input = encoded_input.to(device) # Move encoded input to the device
outputs = model.generate(encoded_input, max_new_tokens=100, do_sample=False) In this adjusted code, both the model and the Source: Conversation with Bing, 1/5/2024 |
Example Summarization Task using billsum dataset - Colab |
https://github.com/muellerzr/minimal-trainer-zoo
Checklist
Using data collators for training & error analysis
The text was updated successfully, but these errors were encountered: