From 14b4fb3739075837a40f7743ef5f8635b7117963 Mon Sep 17 00:00:00 2001
From: Joseph Liu <josephliu@roblox.com>
Date: Mon, 24 Jun 2024 18:18:04 -0700
Subject: [PATCH] Formatting

---
 README.md | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/README.md b/README.md
index cbfb7d1..3309060 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,13 @@
 ## Model Description
-The model is finetuned on the [WavLM base plus](https://arxiv.org/abs/2110.13900) with 2,374 hours of audio clips from 
-voice chat for multilabel classification. 
-The audio clips are automatically labeled using a synthetic data pipeline described in [our blog post](link to blog post here). 
-A single output can have multiple labels. 
-The model outputs a n by 6 output tensor where the inferred labels are `Profanity`, `DatingAndSexting`, `Racist`, 
-`Bullying`, `Other`, `NoViolation`. `Other` consists of policy violation categories with low prevalence such as drugs 
-and alcohol or self-harm that are combined into a single category.
+The model is fine-tuned on the [WavLM base plus](https://arxiv.org/abs/2110.13900) with 2,374 hours of audio clips from 
+voice chat for multilabel classification. The audio clips are automatically labeled using a synthetic data pipeline 
+described in [our blog post](link to blog post here). A single output can have multiple labels. The model outputs a 
+n by 6 output tensor where the inferred labels are `Profanity`, `DatingAndSexting`, `Racist`, `Bullying`, `Other`, 
+`NoViolation`. `Other` consists of policy violation categories with low prevalence such as drugs and alcohol or 
+self-harm that are combined into a single category.
 
-We evaluated this model on a dataset with human annotated labels that contained a total of 9795 samples with the class 
-distribution shown below. Note that we did not include the "other" category in this evaluation dataset. 
+We evaluated this model on a data set with human annotated labels that contained a total of 9,795 samples with the class 
+distribution shown below. Note that we did not include the "other" category in this evaluation data set. 
 
 |Class|Number of examples| Duration (hours)|% of dataset| 
 |---|---|---|---|
@@ -20,6 +19,8 @@ distribution shown below. Note that we did not include the "other" category in t
 
 
 If we set the same threshold across all classes and treat the model as a binary classifier across all 4 toxicity classes (`Profanity`, `DatingAndSexting`, `Racist`, `Bullying`), we get a binarized average precision of 94.48%. The precision recall curve is as shown below.
+
+
 <p align="center">
 <img src="images/human_eval_pr_curve.png" alt="PR Curve" width="500"/>
 </p>