From 14b4fb3739075837a40f7743ef5f8635b7117963 Mon Sep 17 00:00:00 2001 From: Joseph Liu Date: Mon, 24 Jun 2024 18:18:04 -0700 Subject: [PATCH] Formatting --- README.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index cbfb7d1..3309060 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,13 @@ ## Model Description -The model is finetuned on the [WavLM base plus](https://arxiv.org/abs/2110.13900) with 2,374 hours of audio clips from -voice chat for multilabel classification. -The audio clips are automatically labeled using a synthetic data pipeline described in [our blog post](link to blog post here). -A single output can have multiple labels. -The model outputs a n by 6 output tensor where the inferred labels are `Profanity`, `DatingAndSexting`, `Racist`, -`Bullying`, `Other`, `NoViolation`. `Other` consists of policy violation categories with low prevalence such as drugs -and alcohol or self-harm that are combined into a single category. +The model is fine-tuned on the [WavLM base plus](https://arxiv.org/abs/2110.13900) with 2,374 hours of audio clips from +voice chat for multilabel classification. The audio clips are automatically labeled using a synthetic data pipeline +described in [our blog post](link to blog post here). A single output can have multiple labels. The model outputs a +n by 6 output tensor where the inferred labels are `Profanity`, `DatingAndSexting`, `Racist`, `Bullying`, `Other`, +`NoViolation`. `Other` consists of policy violation categories with low prevalence such as drugs and alcohol or +self-harm that are combined into a single category. -We evaluated this model on a dataset with human annotated labels that contained a total of 9795 samples with the class -distribution shown below. Note that we did not include the "other" category in this evaluation dataset. +We evaluated this model on a data set with human annotated labels that contained a total of 9,795 samples with the class +distribution shown below. Note that we did not include the "other" category in this evaluation data set. |Class|Number of examples| Duration (hours)|% of dataset| |---|---|---|---| @@ -20,6 +19,8 @@ distribution shown below. Note that we did not include the "other" category in t If we set the same threshold across all classes and treat the model as a binary classifier across all 4 toxicity classes (`Profanity`, `DatingAndSexting`, `Racist`, `Bullying`), we get a binarized average precision of 94.48%. The precision recall curve is as shown below. + +

PR Curve