added evaluation notebook for stt cases

microsoft · Jan 19, 2021 · 08571a8 · 08571a8
1 parent e1d7797
commit 08571a8
Showing 1 changed file with 192 additions and 0 deletions.
diff --git a/notebooks/Eval - Speech Transcription.ipynb b/notebooks/Eval - Speech Transcription.ipynb
@@ -0,0 +1,192 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Evaluation of Custom Speech Transcription\n",
+    "This notebook serves to evaluate your Speech-to-Tex transcriptions generated by [GLUE](https://github.com/microsoft/glue)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Import required packages\n",
+    "import sys\n",
+    "import pandas as pd\n",
+    "import configparser\n",
+    "\n",
+    "# Notebook specific functions\n",
+    "from matplotlib import cm, pyplot as plt \n",
+    "\n",
+    "# Custom functions\n",
+    "sys.path.append(\"../src\")\n",
+    "import evaluate as ev\n",
+    "\n",
+    "# Notebook configs\n",
+    "%matplotlib inline\n",
+    "%load_ext autoreload\n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Input Data\n",
+    "Below, you will import the transcription file generated by GLUE in the `--do_transcribe` mode. <br>\n",
+    "The evaluation will be equivalent to the one generated by the `--do_evaluation` mode, which is only printed in the console output. <br>\n",
+    "Here, you will have a consistent view on the results. \n",
+    "\n",
+    "Make sure it has the structure below. If you used GLUE, it will have it either way:\n",
+    "- Comma-separated (.csv)\n",
+    "- UTF-8 encoded\n",
+    "- Columns \"text\" for reference transcript and \"rec\" for recognition"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "execute_result",
+     "data": {
+      "text/plain": [
+       "              audio                                               text  \\\n",
+       "0    BookFlight.wav        I would like to book a flight to Frankfurt.   \n",
+       "1  CancelFlight.wav        I want to cancel my journey to Kuala Lumpur   \n",
+       "2  ChangeFlight.wav     I would like to change my flight to Singapore.   \n",
+       "3      BookSeat.wav  I would like to book a seat on my flight to St...   \n",
+       "\n",
+       "                                  rec  \n",
+       "0  Aber leicht über Flight Frankfurt.  \n",
+       "1                                Pur.  \n",
+       "2   I would like to change my flight?  \n",
+       "3                                      "
+      ],
+      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>audio</th>\n      <th>text</th>\n      <th>rec</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>BookFlight.wav</td>\n      <td>I would like to book a flight to Frankfurt.</td>\n      <td>Aber leicht über Flight Frankfurt.</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>CancelFlight.wav</td>\n      <td>I want to cancel my journey to Kuala Lumpur</td>\n      <td>Pur.</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>ChangeFlight.wav</td>\n      <td>I would like to change my flight to Singapore.</td>\n      <td>I would like to change my flight?</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>BookSeat.wav</td>\n      <td>I would like to book a seat on my flight to St...</td>\n      <td></td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+     },
+     "metadata": {},
+     "execution_count": 2
+    }
+   ],
+   "source": [
+    "# Import transcription file\n",
+    "res = pd.read_csv(\"../assets/examples/output_files/example_transcriptions_full.csv\", sep=\",\", encoding='utf-8')[['audio', 'text', 'rec']]\n",
+    "res.text.fillna(\"\", inplace=True)\n",
+    "res.rec.fillna(\"\", inplace=True)\n",
+    "res.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Evaluate\n",
+    "Evaluation of transcription results by comparing them with reference transcripts.\n",
+    "- Calculates metrics such as [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), Sentence Error Rate (SER), Word Recognition Rate (WRR).\n",
+    "- Implementation based on [github.com/belambert/asr-evaluation](https://github.com/belambert/asr-evaluation).\n",
+    "- See some hints on [how to improve your Custom Speech accuracy](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-evaluate-data).\n",
+    "\n",
+    "Generally, we recommend not to take the WER too serious, rather see it as a tool to detect recurring patterns or issues in the speech model. Especially in combination with LUIS, an end-to-end testing is more relevant.\n",
+    "\n",
+    "Print Verbosity:\n",
+    "- 0 -> Only summary metrics\n",
+    "- 1 -> Only errors\n",
+    "- 2 -> All\n",
+    "\n",
+    "Optional variable: query_keyword.   \n",
+    "This can be used to search for certain words in the reference text."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "eva = ev.EvaluateTranscription()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "REF: \u001b[31mI\u001b[0m \u001b[31mWOULD\u001b[0m \u001b[31mLIKE\u001b[0m \u001b[31mTO  \u001b[0m \u001b[31mBOOK  \u001b[0m \u001b[31mA   \u001b[0m flight \u001b[31mTO\u001b[0m frankfurt\nREC: \u001b[31m*\u001b[0m \u001b[31m*****\u001b[0m \u001b[31m****\u001b[0m \u001b[31mABER\u001b[0m \u001b[31mLEICHT\u001b[0m \u001b[31mÜBER\u001b[0m flight \u001b[31m**\u001b[0m frankfurt\nSENTENCE 2  BookFlight.wav\nCorrect          =  22.2%    2   (     9)\nErrors           =  77.8%    7   (     9)\nREF: \u001b[31mI\u001b[0m \u001b[31mWANT\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mCANCEL\u001b[0m \u001b[31mMY\u001b[0m \u001b[31mJOURNEY\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mKUALA\u001b[0m \u001b[31mLUMPUR\u001b[0m\nREC: \u001b[31m*\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*****\u001b[0m \u001b[31mPUR   \u001b[0m\nSENTENCE 3  CancelFlight.wav\nCorrect          =   0.0%    0   (     9)\nErrors           = 100.0%    9   (     9)\nREF: i would like to change my flight \u001b[31mTO\u001b[0m \u001b[31mSINGAPORE\u001b[0m\nREC: i would like to change my flight \u001b[31m**\u001b[0m \u001b[31m*********\u001b[0m\nSENTENCE 4  ChangeFlight.wav\nCorrect          =  77.8%    7   (     9)\nErrors           =  22.2%    2   (     9)\nREF: \u001b[31mI\u001b[0m \u001b[31mWOULD\u001b[0m \u001b[31mLIKE\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mBOOK\u001b[0m \u001b[31mA\u001b[0m \u001b[31mSEAT\u001b[0m \u001b[31mON\u001b[0m \u001b[31mMY\u001b[0m \u001b[31mFLIGHT\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mSTUTTGART\u001b[0m\nREC: \u001b[31m*\u001b[0m \u001b[31m*****\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m****\u001b[0m \u001b[31m*\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m**\u001b[0m \u001b[31m******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*********\u001b[0m\nSENTENCE 5  BookSeat.wav\nCorrect          =   0.0%    0   (    12)\nErrors           = 100.0%   12   (    12)\n\nSentence count: 4\nWER: 76.923% (30 / 39)\nWRR: 23.077% (9 / 39)\nSER: 100.000% (4 / 4)\n"
+     ]
+    },
+    {
+     "output_type": "execute_result",
+     "data": {
+      "text/plain": [
+       "(0.7692307692307693, 0.23076923076923078, 1.0)"
+      ]
+     },
+     "metadata": {},
+     "execution_count": 4
+    }
+   ],
+   "source": [
+    "eva.calculate_metrics(res.text.values, res.rec.values, label=res.audio.values, print_verbosiy=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "\n***DELETIONS:\nto                            6\ni                             3\nwould                         2\nlike                          2\nmy                            2\nwant                          1\ncancel                        1\njourney                       1\nkuala                         1\nsingapore                     1\nbook                          1\na                             1\nseat                          1\non                            1\nflight                        1\nstuttgart                     1\n\n***SUBSTITUTIONS:\nto                   -> aber                            1\nbook                 -> leicht                          1\na                    -> über                            1\nlumpur               -> pur                             1\n"
+     ]
+    }
+   ],
+   "source": [
+    "eva.print_errors(min_count=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "nlp",
+   "language": "python",
+   "name": "nlp"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.6-final"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}