You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we try to generate ground truth for datasets that contains large columns of text it currently fails. @aittalam suggests this might help
Reproduction
Call the annotation endpoint with thunderbird.csv dataset
{
dataset: "7c74038e-43c7-4527-842a-6252fd8db446" // this is the ID of thunderbird.csv dataset in my local db
description: "Groundtruth generation for dataset 7c74038e-43c7-4527-842a-6252fd8db446"
max_samples: -1
name: "Groundtruth for thunderbird.csv"
}
Relevant log output
Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars): File "/tmp/ray/session_2025-01-17_02-13-42_842760_1/runtime_resources/pip/617697551215d8488f42ccab087d2364573ba842/virtualenv/lib/python3.11/site-packages/transformers/models/bart/modeling_bart.py", line 116, in forward return super().forward(positions + self.offset) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/ray/session_2025-01-17_02-13-42_842760_1/runtime_resources/pip/617697551215d8488f42ccab087d2364573ba842/virtualenv/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 190, in forward return F.embedding( ^^^^^^^^^^^^ File "/tmp/ray/session_2025-01-17_02-13-42_842760_1/runtime_resources/pip/617697551215d8488f42ccab087d2364573ba842/virtualenv/lib/python3.11/site-packages/torch/nn/functional.py", line 2551, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: index out of range in self
Expected behavior
It should generate a groundtruth
System Info
MacOS latest version
Have you searched for similar issues before submitting this one?
Yes, I have searched for similar issues
The text was updated successfully, but these errors were encountered:
Description
If we try to generate ground truth for datasets that contains large columns of text it currently fails. @aittalam suggests this might help
Reproduction
Call the annotation endpoint with
thunderbird.csv
datasetRelevant log output
Expected behavior
It should generate a groundtruth
System Info
MacOS latest version
Have you searched for similar issues before submitting this one?
The text was updated successfully, but these errors were encountered: