-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added evaluation notebook for stt cases
- Loading branch information
1 parent
e1d7797
commit 08571a8
Showing
1 changed file
with
192 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Evaluation of Custom Speech Transcription\n", | ||
"This notebook serves to evaluate your Speech-to-Tex transcriptions generated by [GLUE](https://github.com/microsoft/glue)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Import required packages\n", | ||
"import sys\n", | ||
"import pandas as pd\n", | ||
"import configparser\n", | ||
"\n", | ||
"# Notebook specific functions\n", | ||
"from matplotlib import cm, pyplot as plt \n", | ||
"\n", | ||
"# Custom functions\n", | ||
"sys.path.append(\"../src\")\n", | ||
"import evaluate as ev\n", | ||
"\n", | ||
"# Notebook configs\n", | ||
"%matplotlib inline\n", | ||
"%load_ext autoreload\n", | ||
"%autoreload 2" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Input Data\n", | ||
"Below, you will import the transcription file generated by GLUE in the `--do_transcribe` mode. <br>\n", | ||
"The evaluation will be equivalent to the one generated by the `--do_evaluation` mode, which is only printed in the console output. <br>\n", | ||
"Here, you will have a consistent view on the results. \n", | ||
"\n", | ||
"Make sure it has the structure below. If you used GLUE, it will have it either way:\n", | ||
"- Comma-separated (.csv)\n", | ||
"- UTF-8 encoded\n", | ||
"- Columns \"text\" for reference transcript and \"rec\" for recognition" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"output_type": "execute_result", | ||
"data": { | ||
"text/plain": [ | ||
" audio text \\\n", | ||
"0 BookFlight.wav I would like to book a flight to Frankfurt. \n", | ||
"1 CancelFlight.wav I want to cancel my journey to Kuala Lumpur \n", | ||
"2 ChangeFlight.wav I would like to change my flight to Singapore. \n", | ||
"3 BookSeat.wav I would like to book a seat on my flight to St... \n", | ||
"\n", | ||
" rec \n", | ||
"0 Aber leicht über Flight Frankfurt. \n", | ||
"1 Pur. \n", | ||
"2 I would like to change my flight? \n", | ||
"3 " | ||
], | ||
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>audio</th>\n <th>text</th>\n <th>rec</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>BookFlight.wav</td>\n <td>I would like to book a flight to Frankfurt.</td>\n <td>Aber leicht über Flight Frankfurt.</td>\n </tr>\n <tr>\n <th>1</th>\n <td>CancelFlight.wav</td>\n <td>I want to cancel my journey to Kuala Lumpur</td>\n <td>Pur.</td>\n </tr>\n <tr>\n <th>2</th>\n <td>ChangeFlight.wav</td>\n <td>I would like to change my flight to Singapore.</td>\n <td>I would like to change my flight?</td>\n </tr>\n <tr>\n <th>3</th>\n <td>BookSeat.wav</td>\n <td>I would like to book a seat on my flight to St...</td>\n <td></td>\n </tr>\n </tbody>\n</table>\n</div>" | ||
}, | ||
"metadata": {}, | ||
"execution_count": 2 | ||
} | ||
], | ||
"source": [ | ||
"# Import transcription file\n", | ||
"res = pd.read_csv(\"../assets/examples/output_files/example_transcriptions_full.csv\", sep=\",\", encoding='utf-8')[['audio', 'text', 'rec']]\n", | ||
"res.text.fillna(\"\", inplace=True)\n", | ||
"res.rec.fillna(\"\", inplace=True)\n", | ||
"res.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Evaluate\n", | ||
"Evaluation of transcription results by comparing them with reference transcripts.\n", | ||
"- Calculates metrics such as [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), Sentence Error Rate (SER), Word Recognition Rate (WRR).\n", | ||
"- Implementation based on [github.com/belambert/asr-evaluation](https://github.com/belambert/asr-evaluation).\n", | ||
"- See some hints on [how to improve your Custom Speech accuracy](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-evaluate-data).\n", | ||
"\n", | ||
"Generally, we recommend not to take the WER too serious, rather see it as a tool to detect recurring patterns or issues in the speech model. Especially in combination with LUIS, an end-to-end testing is more relevant.\n", | ||
"\n", | ||
"Print Verbosity:\n", | ||
"- 0 -> Only summary metrics\n", | ||
"- 1 -> Only errors\n", | ||
"- 2 -> All\n", | ||
"\n", | ||
"Optional variable: query_keyword. \n", | ||
"This can be used to search for certain words in the reference text." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"eva = ev.EvaluateTranscription()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": { | ||
"scrolled": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"output_type": "stream", | ||
"name": "stdout", | ||
"text": [ | ||
"REF: \u001b[31mI\u001b[0m \u001b[31mWOULD\u001b[0m \u001b[31mLIKE\u001b[0m \u001b[31mTO \u001b[0m \u001b[31mBOOK \u001b[0m \u001b[31mA \u001b[0m flight \u001b[31mTO\u001b[0m frankfurt\nREC: \u001b[31m*\u001b[0m \u001b[31m*****\u001b[0m \u001b[31m****\u001b[0m \u001b[31mABER\u001b[0m \u001b[31mLEICHT\u001b[0m \u001b[31mÜBER\u001b[0m flight \u001b[31m**\u001b[0m frankfurt\nSENTENCE 2 BookFlight.wav\nCorrect = 22.2% 2 ( 9)\nErrors = 77.8% 7 ( 9)\nREF: \u001b[31mI\u001b[0m \u001b[31mWANT\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mCANCEL\u001b[0m \u001b[31mMY\u001b[0m \u001b[31mJOURNEY\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mKUALA\u001b[0m \u001b[31mLUMPUR\u001b[0m\nREC: \u001b[31m*\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*****\u001b[0m \u001b[31mPUR \u001b[0m\nSENTENCE 3 CancelFlight.wav\nCorrect = 0.0% 0 ( 9)\nErrors = 100.0% 9 ( 9)\nREF: i would like to change my flight \u001b[31mTO\u001b[0m \u001b[31mSINGAPORE\u001b[0m\nREC: i would like to change my flight \u001b[31m**\u001b[0m \u001b[31m*********\u001b[0m\nSENTENCE 4 ChangeFlight.wav\nCorrect = 77.8% 7 ( 9)\nErrors = 22.2% 2 ( 9)\nREF: \u001b[31mI\u001b[0m \u001b[31mWOULD\u001b[0m \u001b[31mLIKE\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mBOOK\u001b[0m \u001b[31mA\u001b[0m \u001b[31mSEAT\u001b[0m \u001b[31mON\u001b[0m \u001b[31mMY\u001b[0m \u001b[31mFLIGHT\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mSTUTTGART\u001b[0m\nREC: \u001b[31m*\u001b[0m \u001b[31m*****\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m****\u001b[0m \u001b[31m*\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m**\u001b[0m \u001b[31m******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*********\u001b[0m\nSENTENCE 5 BookSeat.wav\nCorrect = 0.0% 0 ( 12)\nErrors = 100.0% 12 ( 12)\n\nSentence count: 4\nWER: 76.923% (30 / 39)\nWRR: 23.077% (9 / 39)\nSER: 100.000% (4 / 4)\n" | ||
] | ||
}, | ||
{ | ||
"output_type": "execute_result", | ||
"data": { | ||
"text/plain": [ | ||
"(0.7692307692307693, 0.23076923076923078, 1.0)" | ||
] | ||
}, | ||
"metadata": {}, | ||
"execution_count": 4 | ||
} | ||
], | ||
"source": [ | ||
"eva.calculate_metrics(res.text.values, res.rec.values, label=res.audio.values, print_verbosiy=1)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": { | ||
"scrolled": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"output_type": "stream", | ||
"name": "stdout", | ||
"text": [ | ||
"\n***DELETIONS:\nto 6\ni 3\nwould 2\nlike 2\nmy 2\nwant 1\ncancel 1\njourney 1\nkuala 1\nsingapore 1\nbook 1\na 1\nseat 1\non 1\nflight 1\nstuttgart 1\n\n***SUBSTITUTIONS:\nto -> aber 1\nbook -> leicht 1\na -> über 1\nlumpur -> pur 1\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"eva.print_errors(min_count=1)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "nlp", | ||
"language": "python", | ||
"name": "nlp" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.6-final" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |