Skip to content

Commit

Permalink
Lots more work on text content and formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
jkelvie committed Aug 11, 2020
1 parent ce3f2d8 commit 335aaec
Show file tree
Hide file tree
Showing 10 changed files with 109 additions and 35 deletions.
10 changes: 6 additions & 4 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,22 @@
- [X] Cache queries
- [X] Deploy server
- [ ] Finish annotations
- [ ] Add home button
- [X] Add home button

### For Final
- [X] Figure out how to change handlebars root view directory
- [ ] Add text around reports
- [ ] Add protocol page
- [X] Add text around reports
- [X] Add protocol page
- [ ] Add favicon
- [ ] Add watermark - https://github.com/AlbinoDrought/chartjs-plugin-watermark#readme
- [ ] Add logos inside charts
- [ ] Add tooltips on question type table
- [ ] wordcloud chart on topics - https://github.com/sgratzl/chartjs-chart-wordcloud
- [ ] add drilldown reports
- [ ] links to underlying data

- [ ] Add link to a screenshot of metabase from the protocol page
- [ ] Change all external links to open in new tab?
- [ ] Add CTA ot main page

workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes
https://docs.github.com/en/actions/getting-started-with-github-actions/about-github-actions#usage-limits
Binary file added web/images/Alexa-BearBryant.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added web/images/GithubActions-Google.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added web/images/Google-BearBryant.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added web/images/favicon.ico
Binary file not shown.
Binary file added web/media/Google-BearBryant-Answer.wav
Binary file not shown.
Binary file added web/media/Google-BearBryant.mp3
Binary file not shown.
36 changes: 30 additions & 6 deletions web/views/layouts/main.handlebars
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
gtag('config', 'UA-99287066-6');
</script>

<!-- favicon -->
<link rel="shortcut icon" href="/web/images/favicon.ico" type="image/x-icon" />

<link href='https://fonts.googleapis.com/css?family=Khand' rel='stylesheet' type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Roboto+Condensed' rel='stylesheet' type='text/css'>
Expand Down Expand Up @@ -44,13 +47,25 @@
text-align: center;
}
h1 a, h1 a:hover {
color: #007bff
}
h2 {
color: black;
font-family: Roboto Condensed;
font-variant: small-caps;
margin: 10 0 0 0;
text-align: left;
}
h2 a, h2 a:hover {
border-width: 0;
color: black;
outline: none;
text-decoration: none;
}
p {
margin: 10 0 0 0
}
Expand All @@ -68,38 +83,47 @@
font-family: Open Sans;
font-size: 11pt;
}
ul {
margin-top: 5;
}
</style>
<script>
// We set certain style programmatically because they are shared between charts generated in JS and HTML
window.addEventListener('load', () => {
$('body').css('font-family', ChartHelper.defaultFont())
$('body').css('font-size', ChartHelper.defaultFontSize())
$('.title').css('font-family', ChartHelper.titleFont())
$('.title').css('font-size', ChartHelper.titleFontSize())
$('.title').css('font-weight', 'bold')
// Automatically add href to every a anchor
$('a[name]').each(function() {
this.href = '#' + this.name
})
})
</script>
</head>
<body>
<div class='container-fluid' style='background-color: black;padding: 25 10 10 10;'>
<div class='container-fluid' style='background-color: black;padding: 10 10 10 10;'>
<div class='row' style=''>
<div class='col-1'><a href='https://bespoken.io'><img src='/web/images/Logo-LlamaOnly.png' height='90%' /></a></div>
<div class='col-11'>

<div class='row'>
<div class='col-10'><h1 style='color: white; font-family: Khand;margin:0'><a href='https://bespoken.io' style="text-decoration: none">Bespoken</a> NLP Benchmark</h1></div>
<div class='col-10'><h1 style='color: #007bff; font-family: Khand;margin:0'><a href='/' style="text-decoration: none">Bespoken NLP Benchmark</a></h1></div>
<div class='col-1'></div>
</div>
<div class='row' >
<div class='col-10'><h2 style='color: white;font-weight: bold;font-size: 20pt;margin:0;text-align: center'> {{page}}</h2></div>
<div class='col-1'></div>
</div>
</div>
</div>



</div>
</div>

{{{body}}}

</body>
Expand Down
65 changes: 47 additions & 18 deletions web/views/protocol.handlebars
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<div class='container-fluid'>
<div class='row'>
<div class='col-12'>
<div class='text' style='width: 1000; text-align:left'>
<h2>Benchmark Objective</h2>
<div class='text' style='margin: 0 100 0 100; text-align:left'>
<h2><a name='objective'>Benchmark Objective</a></h2>

<p>Our goal in runnin this benchmark was to provide accurate, automated data on how well various voice platforms perform.</p>

Expand All @@ -11,16 +11,18 @@
<p>Our goal is to provide interesting insight into how the platforms work, as well as show off our technology and processes at Bespoken.</p>

<p>To that end, we combine:</p>
<ul style='margin-top'>
<li><a href='https://bespoken.io/test-robot'>Our core testing and tuning technology</a>
<li><a href='https://github.com/bespoken/nlp-benchmark'>Github</a> for hosting and managing the test scripts
<li><a href='https://github.com/bespoken/nlp-benchmark/actions?query=workflow%3Aprocess'>Github Actions</a> for executing the benchmark
<li>MySQL and Metabase for reporting
</ul>

<ul>
<li><a href='https://bespoken.io/test-robot'>Our core testing and tuning technology</a>
<li><a href='https://github.com/bespoken/nlp-benchmark'>Github</a> for hosting and managing the test scripts
<li><a href='https://github.com/bespoken/nlp-benchmark/actions?query=workflow%3Aprocess'>Github Actions</a> for executing the benchmark
<li>MySQL and Metabase for reporting
</ul>
<p>
All of this is public and open-source, part of our effort to "test in public". We want to show off not just our results, but also how we do our testing. We believe both are critical to successful automation.
</p>
<h2>Benchmark Dataset</h2>

<h2><a name='dataset'>Benchmark Dataset</a></h2>

<p>We leveraged the excellent <a href='http://qa.mpi-inf.mpg.de/comqa/'>ComQA</a> dataset for this, specifically their Dev dataset of questions. From their website:</p>

Expand All @@ -32,17 +34,44 @@

<p>Take a look at the detailed findings for the question types <a href='/questionTypes'>here</a> to see the results for the different platforms per question type, as well as in-depth explanations on each question type.</p>

<h2>Benchmark Test Execution</h2>
<h2><a name='execution'>Benchmark Test Execution</a></h2>

<p>This dataset comprises 966 questions, which we run against the following three platforms:</p>

<ul>
<li>Amazon Echo Show 5</li>
<li>Apple iPad Mini</li>
<li>Google Nest Home Hub</li>
</ul>

<p>This dataset comprises 966 questions, which we run against the following three platforms:
<ul>
<li>Amazon Echo Show 5</li>
<li>Apple iPad Mini</li>
<li>Google Nest Home Hub</li>
</ul>
</p>

<p>We used our <a href='https://bespoken.io/test-robot'>Bespoken Test Robots</a> to "talk" with these devices and record their audio and visual responses. With our Test Robots we are able to execute these tests in completely automated manner.</p>

<p>For example, our Test Robot says: <i>"hey google when did bear bryant coach kentucky?"</i></p>

<p><audio controls><source src='/web/media/Google-BearBryant.mp3' /></audio></p>

<p>The audio response from Google is: <i>"sure here's some helpful information I found on the web"</i></p>

<p><audio controls><source src='/web/media/Google-BearBryant-Answer.wav' /></audio></p>

<p>The visual response from Google is:</p>

<p><img src='/web/images/Google-BearBryant.jpg' height='400' /></p>

<p>Our OCR sees that the display shows <i>1946-1953 Kentucky</i>, which is the correct answer, so this test is marked as success!</i></p>

<p>We performed this for every question in our dataset, across each of the devices listed above.</p>
<h2><a name='results'>Benchmark Results</a></h2>

<p>The complete results are viewable <a href='/details'>here</a> - you can see what happened with each utterance in detail, as well as how the question is annotated.</p>

<p>You can also view what happened as part of the specific runs inside of Github. For example, here are is the run for the Nest Home Hub:</p>

<p><a href='https://github.com/bespoken/nlp-benchmark/runs/916650714?check_suite_focus=true'><img src='/web/images/GithubActions-Google.png' height='400'/></a></p>

<p>As you can see, <a href='https://github.com/bespoken/nlp-benchmark'>Github</a> is not just a great environment for maintaining our code, but also for actually executing our tests. We commonly use this with customers. It allows for easy collaboration, operations and reporting.</p>

<p>We can assist with your specialized testing and tuning needs at Bespoken - just reach out to <a href='mailto:[email protected]'>[email protected]</a> and we can show you to how to not just measure, measurably improve the performance of your voice experience.</p>
</div>
</div>
</div>
Expand Down
33 changes: 26 additions & 7 deletions web/views/reports.handlebars
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@
#reports .row {
margin-bottom: 20px;
}
.caption {
font-size: 10pt;
font-weight: bold;
}
</style>
<div id='reports' class='container-fluid' >
<div id='reports' class='container-fluid' >
<div class='row'>
<div class='col-12' >
<div class='text' style='font-size: 12pt'>
<div class='text' style='font-size: 12pt;margin-top: 10'>
We ran this benchmark using the following devices: Amazon Echo Show 5, Apple iPad Mini, Google Nest Home Hub.<br>
We leveraged the <a href='http://qa.mpi-inf.mpg.de/comqa/'>Dev Dataset from ComQA</a> - a set of complex questions derived from the crowd-sourced WikiAnswers website.<br>
We ran the tests using our <a href='https://bespoken.io/test-robot'>Bespoken Test Robots - if you can talk to it, we can test it.</a> Read more about our test protocol <a href='/protocol'>here</a>.
</ul>
</div>
</div>
</div>
Expand All @@ -19,15 +23,15 @@
<div>
{{> successByPlatform }}
</div>
<div class='text'>
<div class='text caption'>
The number of questions answered correctly by each platform.
</div>
</div>
<div class='col-12 col-lg-6'>
<div>
{{> successByComplexity }}
</div>
<div class='text'>
<div class='text caption'>
The number of questions answered correctly by each platform, grouped by simple versus complex questions.<br>
Complex questions are ones that involve comparitive, compositional, or temporal reasoning. More...
</div>
Expand All @@ -38,7 +42,7 @@
<div>
{{> successByAnnotations }}
</div>
<div class='text'>
<div class='text caption'>
The number of questions answered correctly for each type of question.<br>
We classified questions as zero or more of the categories listed below. <br>
Hover for descriptions, and click here for more in-depth information...
Expand All @@ -48,10 +52,25 @@
<div>
{{> successByTopics }}
</div>
<div class='text'>
<div class='text caption'>
The number of questions answered correctly for each topic.<br>
Click here for a complete breakdown across all topics.
</div>
</div>
</div>
<div class='row' style='margin-top: 20' >
<div class='col-0 col-lg-3'></div>
<div class='col-12 col-lg-6' style='text-align: left'>
<h2 style='text-align: center'>What Is Next?</h2>
<p>
We plan to produce these benchmarks on a routine basis, both refreshing these results as well as doing additional studies. Here is what we currently have planned:
</p>
<ul>
<li>ASR Benchmark - including Google Speech-To-Text, Twilio AutoPilot, Houndify, and others
<li>Personal Assistant Benchmark - questions related managing calendar, appointments, email, etc. - questions of a personal nature
<li>IVR Benchmark - performance of various IVR platforms</li>
</ul>
</div>
<div class='col-0 col-lg-3'></div>
</div>
</div>

0 comments on commit 335aaec

Please sign in to comment.