-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathorbit.html
582 lines (546 loc) · 22.6 KB
/
orbit.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
<!doctype html>
<html class="no-js" lang="en" style="height: 100%;">
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>Orbit - David Yang</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="apple-touch-icon" href="apple-touch-icon.png">
<!-- Place favicon.ico in the root directory -->
<link href='https://fonts.googleapis.com/css?family=Raleway:400,700|Merriweather:300,400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="css/normalize.css">
<link rel="stylesheet" href="css/skeleton.css">
<link rel="stylesheet" href="css/main.css">
</head>
<body style="height: 100%;">
<section class="bauhaus-header">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h1>Orbit</h1>
Case Study - VMware
</div>
</div>
</div>
</section>
<section class="section-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h3 class="secondary-font section-title">The Problem</h3>
</div>
</div>
</div>
</section>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
vSphere is VMware's main product. vSphere is a cloud computing OS that enables IT administrators to run Virtual Machines(VMs) on a connected pool of physical servers.
<br>
<br>
While vSphere makes it easy to rapidly configure and create VMs, the cloud scale makes performance monitoring and troubleshooting difficult.
<br>
<br>
</div>
</div>
</div>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text research-point" style="margin-top:0;">
The first line of defense: vSphere's built-in performance charts.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width card-shadow" src="images/performanceCharts.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
But our users had a hard time using these charts:
</div>
</div>
<div class="row">
<div class="eight columns offset-by-three explanation-text pullout-list-sm">
<ol>
<li>It was slow to load</li>
<li>Admins could see only one object(VM or Server) at a time</li>
<li>Most of the metrics shown are meaningless for troubleshooting</li>
<li>There is no guidance on good/bad metrics</li>
</ol>
</div>
</div>
</div>
<div class="container research-section">
<div class="row">
<div class="eight columns offset-by-two explanation-text research-point" style="margin-top:0;">
ESXTOP was a command line that experienced admins favored
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width card-shadow" src="images/esxtop.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
ESXTOP was extremely useful for power users. It showed vital metrics and was performant, but it also had similar shortcomings:
</div>
</div>
<div class="row">
<div class="eight columns offset-by-three explanation-text pullout-list-sm">
<ol>
<li>Admins needed to master the right commands</li>
<li>Admins still needed to know what to look for beforehand</li>
<li>Scale was an issue, ESXTOP could only connect to one physical server at a time</li>
</ol>
</div>
</div>
</div>
<section class="orbit-excerpt">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two exercept">
As a vSphere Admin, I want to be told what is wrong with specific VM/Servers so I can begin remediation.
<br>
<br>
I need this information at scale and be given guidance on what to do.
</div>
<div class="eight columns offset-by-two exercept-attribute" style="text-align: center;">
Use Case
</div>
</div>
</div>
</section>
<section class="section-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h3 class="secondary-font section-title">Research</h3>
</div>
</div>
</div>
</section>
<section class="process-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h5 class="secondary-font process-title">Eat Dog Food</h5>
</div>
</div>
</div>
</section>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
I started to dogfood vSphere in order to better understand our users' frustration.
The User Experience Team had its own datacenter managed by vSphere, so I volunteered to be an admin.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width card-shadow" src="images/dogFood.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
I ran databases and applications used for prototyping in vSphere. Our designers and UI developers had never gotten their hands dirty with the inner workings of vSphere, so I naturally turned to Google for help...
</div>
</div>
</div>
<section class="process-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h5 class="secondary-font process-title">Understanding the Technology</h5>
</div>
</div>
</div>
</section>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
A sample of my Google search history:
</div>
</div>
<div class="row">
<div class="eight columns offset-by-three explanation-text pullout-list-sm">
<ul>
<li>What metrics matter in vSphere?</li>
<li>CPU ready vs CPU costop vs CPU wait</li>
<li>Memory Balloon?</li>
</ul>
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
After parsing through dozens of blogs and articles, I realized that looking at VM performance in the traditional computing paradigm was erroneous.
<br>
<br>
In traditional computing, <strong>consumption</strong> was the key metric to look at. You needed to track CPU and memory consumption rates.
<br>
<br>
But vSphere dynamically allocated resources where needed. Consumption was fine, but <strong>contention</strong> for the same resources between VMs was bad.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width card-shadow" src="images/cpuReady.jpg">
</div>
</div>
<div class="row">
<div class="six columns offset-by-three explanation-text explanation-text-cutout">
Diagrams like this helped me understand why contention was at the root of most performance issues.
</div>
</div>
</div>
<section class="process-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h5 class="secondary-font process-title">Find Domain Experts</h5>
</div>
</div>
</div>
</section>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
As I dived deeper into the world of virtualization, I came upon some brilliant vSphere experts.
<a href="http://www.yellow-bricks.com/">Duncan Epping</a> is a renowned "vExpert", and had compiled a list of important contention metrics:
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width card-shadow" src="images/ducanEpping2.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
His blog was a virtual goldmine <em>(pun intended)</em>. Duncan laid out thresholds, explained possible causes and offered potential remediations.
<br>
<br>
I was surprised that none of this was baked into the current vSphere UI.
<br>
It was clear that I needed to build my prototype on top of this knowledge.
</div>
</div>
</div>
<section class="process-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h5 class="secondary-font process-title">Look for Designspiration</h5>
</div>
</div>
</div>
</section>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
The next challenge was figuring out how to represent all this data at scale. It was common for admins to maintain environments with dozens of servers and thousands of VMs.
<br>
<br>
My mentor at VMware, <a href="http://www.gizmometer.com/blog/">Conrad Albrecht-Buehler</a>, had conducted his <a href="http://pqdtopen.proquest.com/doc/250829536.html?FMT=ABS">doctoral thesis</a> on situation awareness and monitoring. I spent a day going through his paper for insight.
<br>
<br>
His concept, called Heeds, centered on eliminating noise through dynamic thresholds. Conrad described two principal thresholds:
</div>
</div>
<div class="row">
<div class="eight columns offset-by-three explanation-text groundrules">
<ol>
<li>A lower threshold, where everything below can be ignored</li>
<li>An upper threshold, where everything above must be addressed</li>
</ol>
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width card-shadow" src="images/heeds.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
Heeds was perfect for our use case.
<br>
By eschewing a linear representation of vSphere metrics, I could reduce noise and bring attention to issues (current or potential).
Furthermore, Duncan already provided us with the thresholds for each metric.
</div>
</div>
</div>
<section class="section-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h3 class="secondary-font section-title">Visualization Design</h3>
</div>
</div>
</div>
</section>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
Unfortunately, Orbit was never launched to the public and my deliverables are under NDA.
I cannot show the final prototype or the in-progress designs.
<br>
<br>
All is not lost! Let me explain the concept with my favorite two internet memes: <strong>Grumpy Cat</strong> & <strong>Advice Dog</strong>.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
<br>
Using the <em>Heeds</em> principle, we have two thresholds.
</div>
</div>
<div class="row">
<div class="seven columns offset-by-three explanation-text groundrules">
<ol>
<li>
Grumpy Cat represents the upper threshold: be above this and life is harsh.
</li>
<li>
Advice Dog represents the lower threshold: be below this and life is good!
</li>
</ol>
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width" src="images/orbitViz1.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
I decided to eschew a timeline representation. The primary goal was to show datacenter health at the current moment. Thus, I could represent a VM (or Server) as a point across a single axis.
<br>
<br>
We had roughly a dozen metrics to track based on Duncan's best practices. I developed a weighted aggregation algorithm to map this multi-dimensional space into a single axis.
<br>
<br>
Scale was still a issue. As you start lining up hundreds of these <em>Heed Charts</em>, you run out of horizontal space.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width" src="images/orbitViz2.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
It gets difficult to fit the legions of Grumpy Cats and Advice Dogs in a limited screen space.
The design starts to get noisy.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width" src="images/orbitViz3.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
However, with some rearrangement magic, a radial orientation allows for a massive number of charts to be placed in a limited space.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width" src="images/orbitViz4.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
I minimized and dimmed <em>Advice-Dog-esque</em> VMs to reduce noise and increase readable informational density. There was no need to bring attention to VMs that were good.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width" src="images/orbitViz5.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
The good news was that this worked in perfect unison with the mathematical integrity of a circle: surface area increases exponentially relative to the radius.
<br>
<br>
The bad news was the focus on Grumpy Cat's negativity at the expense of Advice Dog's positivity. I guess that's just how the world spins.
</div>
</div>
</div>
<section class="section-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h3 class="secondary-font section-title">Design Principles</h3>
</div>
</div>
</div>
</section>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
To sum up the design in an equation:
</div>
</div>
</div>
<div class="container highlight-container">
<div class="row">
<div class="eight columns offset-by-two highlight-text">
Best practice metrics/thresholds
<br>
<span style="color: gray; font-weight: 400;">(Reduce Dimensionality)</span>
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two highlight-text">
+
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two highlight-text">
Heeds
<br>
<span style="color: gray; font-weight: 400;">(Reduce Noise)</span>
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two highlight-text">
+
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two highlight-text">
Radial Topography
<br>
<span style="color: gray; font-weight: 400;">(Increase Informational-Density)</span>
</div>
</div>
</div>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
Furthermore, I found out that admins differ on acceptable thresholds based on the nature of their VM workloads.
<br>
<br>I added the feature to customize thresholds and filter on specific metrics. Duncan's values ensured that admins started with a solid footing. They were free to customize from there.
</div>
</div>
</div>
<section class="section-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h3 class="secondary-font section-title">Development</h3>
</div>
</div>
</div>
</section>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
I used NY Times' "force-bubble" chart <a href="http://www.nytimes.com/interactive/2012/02/13/us/politics/2013-budget-proposal-graphic.html?_r=0">visualization</a> of Obama's 2012 budget as inspiration for development.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width" src="images/nytimes.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
The dynamic gravitational effect was good at conveying resource contention by showing VMs "fighting for the same spot".
<br>
<br>
Data-Viz-Wiz <a href="http://vallandingham.me/">Jim Vallandingham</a> kindly wrote a <a href="http://vallandingham.me/bubble_charts_in_d3.html">tutorial</a> on how to build this effect in D3. This was my starting point.
<br>
<br>
I built Orbit as a multi-tenant cloud service that could connect to and monitor any vSphere instance. This removed the barrier of installation and made it easy for internal teams to dogfood.
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two">
<img class="u-max-full-width" src="images/orbitDiagram.jpg">
</div>
</div>
<div class="row">
<div class="eight columns offset-by-two explanation-text">
Around this time, my partner, Paulo, joined the prototyping team. He vastly improved the architecture of the server by leveraging threading to simultaneously connect to all physical servers. This made the visualization significantly faster than the standard charts.
<br>
<br>
Paulo also built a detailed drilldown view that provided time series data and recommendations based on Duncan's best practices. We used a mutli-screen UI for this purpose and synced up the two browsers with websockets.
</div>
</div>
</div>
<section class="section-title-section">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h3 class="secondary-font section-title">Outcomes</h3>
</div>
</div>
</div>
</section>
<div class="container">
<div class="row">
<div class="eight columns offset-by-two explanation-text">
We presented Orbit at VMware's internal innovation conference.
<br>
<br>
The project garnered a lot of interest, and showcased how best practices and modern web technologies could be fused together to improve vSphere user experience.
<br>
<br>
I was unable to find a technical team with the bandwidth to continue Orbit as a production-level feature; however, the work caught the eye of vSAN architect, <a href="https://twitter.com/cdickmann">Christian Dickmann</a>.
<br>
<br>
At the time, Christian was developing a monitoring toolkit for vSAN (VMware's next money-maker). Christian followed our philosophy of bundling best practices into the UI. He also used the Heed concept and built on top of our D3 graphing code.
<br>
<br>
The beautiful part of prototyping is introducing new technologies; it was fun to see Christian (a backend engineer) dive into Angular and D3:
<div style="text-align:center;">
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Played with JavaScript and D3 today to create pretty cool performance graphs. Extremely nice.</p>— Christian Dickmann (@cdickmann) <a href="https://twitter.com/cdickmann/status/323620252531429376">April 15, 2013</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Trying to get a hang of <a href="https://twitter.com/hashtag/angularjs?src=hash">#angularjs</a> and how to use <a href="https://twitter.com/hashtag/d3js?src=hash">#d3js</a> with it. The data binding concept seems good, but my brain doesn't think this way.</p>— Christian Dickmann (@cdickmann) <a href="https://twitter.com/cdickmann/status/351407806982787072">June 30, 2013</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
</div>
<br>
We worked with Christian to flesh out the <a href="http://www.yellow-bricks.com/2013/10/21/configure-virtual-san-observer-monitoring/">vSAN Observer</a>, which shipped as an experimental feature to help admins monitor vSAN.
</div>
</div>
<div class="row" style="margin-bottom: 10rem;">
<div class="eight columns offset-by-two">
<img class="u-max-full-width" src="images/vsan.jpg">
</div>
</div>
</div>
<section class="additional-projects">
<div class="container">
<div class="row">
<div class="eight columns offset-by-two">
<h5 class="secondary-font section-title"> Next:</h5>
</div>
</div>
</div>
<a href="/fling.html" style="color:black;">
<div class="container">
<div class="row product-preview">
<div class="six columns">
<img class="u-max-full-width card-shadow" src="images/vmwareFlingThumbnail.jpg">
</div>
<div class="six columns">
<h5>VM Resource and Availability Service</h5>
<em class="case-study-label">Case Study</em>
VMware's first SAAS fling designed to help IT admin's understand their datacenter resiliency through hardware failure simulations.
</div>
</div>
</div>
</a>
<div class="container">
<div class="row">
<div class="twelve columns home">
<a href="/index.html">home</a>
</div>
</div>
</div>
</section>
</body>
</html>