-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Token Classification fails on value error text_column
#813
Comments
could you print column names in your dataset and the output of |
You can run the reproduction here: https://colab.research.google.com/drive/1shka-nlusipnN6TTAlQPhcXhrvgehNF8?usp=sharing This is the output of {'data_path': 'data', 'model': 'FacebookAI/roberta-base', 'lr': 5e-05, 'epochs': 3,
'max_seq_length': 128, 'batch_size': 8, 'warmup_ratio': 0.1,
'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear',
'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'train_split': 'train',
'valid_split': None, 'tokens_column': 'tokens', 'tags_column': 'tags',
'logging_steps': -1, 'project_name': 'project-name',
'auto_find_batch_size': False, 'mixed_precision': None, 'save_total_limit': 1,
'token': None, 'push_to_hub': False, 'eval_strategy': 'epoch', 'username': None,
'log': 'none', 'early_stopping_patience': 5, 'early_stopping_threshold': 0.01} Also, I noted that the CSV on this page is broken as there is a space after the comma that breaks CSV parsing |
fixed.
code: import os
from autotrain.params import TokenClassificationParams
from autotrain.project import AutoTrainProject
if not os.path.exists("data"):
os.makedirs("data")
with open("data/train.csv", "w") as f:
print("tokens,tags", file=f)
print("\"['I', 'love', 'Paris']\",\"['O', 'O', 'B-LOC']\"", file=f)
print("\"['I', 'live', 'in', 'New', 'York']\",\"['O', 'O', 'O', 'B-LOC', 'I-LOC']\"", file=f)
with open("data/valid.csv", "w") as f:
print("tokens,tags", file=f)
print("\"['I', 'love', 'Paris']\",\"['O', 'O', 'B-LOC']\"", file=f)
print("\"['I', 'live', 'in', 'New', 'York']\",\"['O', 'O', 'O', 'B-LOC', 'I-LOC']\"", file=f)
params = TokenClassificationParams(
model="FacebookAI/roberta-base",
data_path="data")
backend = "local"
project = AutoTrainProject(params=params, backend=backend, process=True)
project.create() Note: ive changed the test filename to apologies for the inconvenience. |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 20 days since being marked as stale. |
Prerequisites
Backend
Local
Interface Used
CLI
CLI Command
UI Screenshots & Parameters
No response
Error Logs
Additional Information
Data is formatted as in https://huggingface.co/docs/autotrain/en/tasks/token_classification
I also tried commenting out the offending lines and then run into this error
The text was updated successfully, but these errors were encountered: