You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found two unconnected singleton character annotation which are invalid:
tsd_trial.csv line 1126, instance 658:
"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 21]","Ridiculous logic!
G&M sure seem hooked to Real Estate industry cash (propaganda pieces in exchange of ad cash), Trudeau and paying interest on massive Federal debt."
"[94, 95, 96, 97, 98, 241]","He went on a 'traveling the country vacation' there. I hope they have a swift court and swift death penalty. He is immigrated here, non citizen living with parents, in Colorado.
DO NOT give him back to us. No matter how much Hickenlooper pleads."
I found these when unitizing annotations from character-level to token-level.
My script found no other singleton characters.
The text was updated successfully, but these errors were encountered:
GillesJ
changed the title
Invalid annotation: singleton character l
Invalid annotation: singleton character tsd_trial.csv#L658 and tsd_train.csv#L4616
Oct 27, 2020
I found similar problems in several more lines.
E.g. line 1143 (668 if newlines in the texts are quoted) in tsd_trial.csv is "[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]",Because Trudeau knows that his federal Liberal Party survives on voters with brain damage from recreational pot use., where characters 5 and 6 are a part of the word "Because".
How to deal with such cases?
GillesJ
changed the title
Invalid annotation: singleton character tsd_trial.csv#L658 and tsd_train.csv#L4616
Invalid annotation: singleton character tsd_trial.csv#L1126 and tsd_train.csv#L7895
Oct 27, 2020
I found two unconnected singleton character annotation which are invalid:
tsd_trial.csv line 1126, instance 658
:-> corrected label by removing singleton
21
:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
tsd_train.csv line 7895, instance 4616
:-> corrected label by removing singleton
241
:[94, 95, 96, 97, 98]
I found these when unitizing annotations from character-level to token-level.
My script found no other singleton characters.
The text was updated successfully, but these errors were encountered: