-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Lots more work on text content and formatting
- Loading branch information
Showing
10 changed files
with
109 additions
and
35 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
<div class='container-fluid'> | ||
<div class='row'> | ||
<div class='col-12'> | ||
<div class='text' style='width: 1000; text-align:left'> | ||
<h2>Benchmark Objective</h2> | ||
<div class='text' style='margin: 0 100 0 100; text-align:left'> | ||
<h2><a name='objective'>Benchmark Objective</a></h2> | ||
|
||
<p>Our goal in runnin this benchmark was to provide accurate, automated data on how well various voice platforms perform.</p> | ||
|
||
|
@@ -11,16 +11,18 @@ | |
<p>Our goal is to provide interesting insight into how the platforms work, as well as show off our technology and processes at Bespoken.</p> | ||
|
||
<p>To that end, we combine:</p> | ||
<ul style='margin-top'> | ||
<li><a href='https://bespoken.io/test-robot'>Our core testing and tuning technology</a> | ||
<li><a href='https://github.com/bespoken/nlp-benchmark'>Github</a> for hosting and managing the test scripts | ||
<li><a href='https://github.com/bespoken/nlp-benchmark/actions?query=workflow%3Aprocess'>Github Actions</a> for executing the benchmark | ||
<li>MySQL and Metabase for reporting | ||
</ul> | ||
|
||
<ul> | ||
<li><a href='https://bespoken.io/test-robot'>Our core testing and tuning technology</a> | ||
<li><a href='https://github.com/bespoken/nlp-benchmark'>Github</a> for hosting and managing the test scripts | ||
<li><a href='https://github.com/bespoken/nlp-benchmark/actions?query=workflow%3Aprocess'>Github Actions</a> for executing the benchmark | ||
<li>MySQL and Metabase for reporting | ||
</ul> | ||
<p> | ||
All of this is public and open-source, part of our effort to "test in public". We want to show off not just our results, but also how we do our testing. We believe both are critical to successful automation. | ||
</p> | ||
<h2>Benchmark Dataset</h2> | ||
|
||
<h2><a name='dataset'>Benchmark Dataset</a></h2> | ||
|
||
<p>We leveraged the excellent <a href='http://qa.mpi-inf.mpg.de/comqa/'>ComQA</a> dataset for this, specifically their Dev dataset of questions. From their website:</p> | ||
|
||
|
@@ -32,17 +34,44 @@ | |
|
||
<p>Take a look at the detailed findings for the question types <a href='/questionTypes'>here</a> to see the results for the different platforms per question type, as well as in-depth explanations on each question type.</p> | ||
|
||
<h2>Benchmark Test Execution</h2> | ||
<h2><a name='execution'>Benchmark Test Execution</a></h2> | ||
|
||
<p>This dataset comprises 966 questions, which we run against the following three platforms:</p> | ||
|
||
<ul> | ||
<li>Amazon Echo Show 5</li> | ||
<li>Apple iPad Mini</li> | ||
<li>Google Nest Home Hub</li> | ||
</ul> | ||
|
||
<p>This dataset comprises 966 questions, which we run against the following three platforms: | ||
<ul> | ||
<li>Amazon Echo Show 5</li> | ||
<li>Apple iPad Mini</li> | ||
<li>Google Nest Home Hub</li> | ||
</ul> | ||
</p> | ||
|
||
<p>We used our <a href='https://bespoken.io/test-robot'>Bespoken Test Robots</a> to "talk" with these devices and record their audio and visual responses. With our Test Robots we are able to execute these tests in completely automated manner.</p> | ||
|
||
<p>For example, our Test Robot says: <i>"hey google when did bear bryant coach kentucky?"</i></p> | ||
|
||
<p><audio controls><source src='/web/media/Google-BearBryant.mp3' /></audio></p> | ||
|
||
<p>The audio response from Google is: <i>"sure here's some helpful information I found on the web"</i></p> | ||
|
||
<p><audio controls><source src='/web/media/Google-BearBryant-Answer.wav' /></audio></p> | ||
|
||
<p>The visual response from Google is:</p> | ||
|
||
<p><img src='/web/images/Google-BearBryant.jpg' height='400' /></p> | ||
|
||
<p>Our OCR sees that the display shows <i>1946-1953 Kentucky</i>, which is the correct answer, so this test is marked as success!</i></p> | ||
|
||
<p>We performed this for every question in our dataset, across each of the devices listed above.</p> | ||
<h2><a name='results'>Benchmark Results</a></h2> | ||
|
||
<p>The complete results are viewable <a href='/details'>here</a> - you can see what happened with each utterance in detail, as well as how the question is annotated.</p> | ||
|
||
<p>You can also view what happened as part of the specific runs inside of Github. For example, here are is the run for the Nest Home Hub:</p> | ||
|
||
<p><a href='https://github.com/bespoken/nlp-benchmark/runs/916650714?check_suite_focus=true'><img src='/web/images/GithubActions-Google.png' height='400'/></a></p> | ||
|
||
<p>As you can see, <a href='https://github.com/bespoken/nlp-benchmark'>Github</a> is not just a great environment for maintaining our code, but also for actually executing our tests. We commonly use this with customers. It allows for easy collaboration, operations and reporting.</p> | ||
|
||
<p>We can assist with your specialized testing and tuning needs at Bespoken - just reach out to <a href='mailto:[email protected]'>[email protected]</a> and we can show you to how to not just measure, measurably improve the performance of your voice experience.</p> | ||
</div> | ||
</div> | ||
</div> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters