`iapp_thaiqa_xquad` dataset
Combine iapp_wiki_qa_squad
, thaiqa_squad
and xquad
training sets, using validation and test sets from iapp_wiki_qa_squad
. Remove all contexts in training sets that are similar (mUSE cosine similarity > 0.8) out of the training sets.
DatasetDict({
train: Dataset({
features: ['question_id', 'article_id', 'title', 'context', 'question', 'answers'],
num_rows: 10916
})
validation: Dataset({
features: ['question_id', 'article_id', 'title', 'context', 'question', 'answers'],
num_rows: 742
})
test: Dataset({
features: ['question_id', 'article_id', 'title', 'context', 'question', 'answers'],
num_rows: 739
})
})