forked from nbenbarak-okta/okta.github.io-1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfeed.xml
1542 lines (1201 loc) · 137 KB
/
feed.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Okta Developer</title>
<description>Secure, scalable, and highly available authentication and user management for any app.</description>
<link>http://developer.okta.com</link>
<atom:link href="http://developer.okta.com/feed.xml" rel="self" type="application/rss+xml" />
<item>
<title>How to use KentorIT AuthServices with Okta</title>
<description><p>If you’re wondering how to configure an ASP.NET application with <a href="https://github.com/KentorIT/authservices">KentorIT’s AuthServices</a> and Okta, you’ve come to the right place. But before delving into the specifics of how to make Okta work with an SAML-enabled ASP.NET application powered by KentorIT AuthServices, is is worth spending some time going over a critical, but easily fixable issue:</p>
<p><strong>Important note</strong> : As of March 22nd, 2016, you have 2 choices:</p>
<ol>
<li>
<p>Either get the source code of the AuthServices assemblies and compile them on your own machine. In this case, no specific adjustment is necessary.</p>
</li>
<li>
<p>Or use the v0.17 KentorIT NuGet assemblies. In this case, if you plan to use the SampleApplication project (not the SampleMvcApplication) for testing purposes, make sure you remove the following line from the web.config file:</p>
<p><code class="highlighter-rouge">&lt;requestedAuthnContextclassRef="Password" comparison="Minimum" /&gt;</code></p>
<p>If you don’t, the SP-initiated login flow will fail because Okta won’t manage to deserialize the SAMLRequest parameter (due to a case issue).</p>
</li>
</ol>
<p>Here’s how you should configure an app powered by Kentor AuthServices to make it work with Okta:</p>
<ol>
<li>Download the latest version of KentorIT’s AuthServices from <a href="https://github.com/KentorIT/authservices">https://github.com/KentorIT/authservices</a> and open the Kentor.AuthServices.sln solution in Visual Studio.</li>
<li>Identify the SampleApplication project and make a note of its URL property:
<img src="/assets/img/KentorOkta/VSProjectProperties.png" alt="Visual Studio Project properties" /></li>
<li>Go to you Okta organization and navigate to Admin =&gt; Applications.</li>
<li>Press the <strong>Add Application</strong> button and the green <strong>Create New App</strong> button
<img src="/assets/img/KentorOkta/CreateNewAppButton.png" alt="Press the Create a new Okta app" /></li>
<li>Select the <strong>SAML 2.0</strong> option and press the <strong>Create</strong> button.
<img src="/assets/img/KentorOkta/SAML2Option.png" alt="Choose the SAML 2.0 template" /></li>
<li>Give your application a name and optionally upload a custom logo. We’ll call it “<strong>Kentor AuthServices App 1</strong>”
<img src="/assets/img/KentorOkta/OktaAppName.png" alt="Give your Okta app a name" /></li>
<li>Press <strong>Next</strong>.</li>
<li>In the <strong>Single sign on URL</strong> field, enter the url you retrieved above in step #2 and append “ <strong>/AuthServices/Acs</strong>”, for instance <strong>http://localhost:18714/SamplePath/AuthServices/Acs</strong></li>
<li>For the Audience URI field, enter the Url you retrieved above in step #2 and append “ <strong>/AuthServices</strong>”, for instance <strong>http://localhost:18714/SamplePath/AuthServices</strong></li>
<li>In the <strong>Name ID format</strong> field, select the default <strong>Unspecified</strong> (or select any other value of your choice).</li>
<li>
<p>Select the <strong>Show Advanced Settings</strong> link. For the <strong>Signature Algorithm</strong> field, we suggest that you leave the default value, SHA-256. However, if you do, you will need to add the following line of code to the Application_Start() method of your Global.asax.cs file:</p>
<p><code class="highlighter-rouge">Kentor.AuthServices.Configuration.Options.GlobalEnableSha256XmlSignatures();</code></p>
<p>Otherwise, you may switch to RSA-SHA1 though we do not recommend it (as it less secure than SHA-256).</p>
</li>
<li>In the Attribute Statements section, optionally enter additional attributes, such as in the following screenshot:
<img src="/assets/img/KentorOkta/OptionalAttributeStatements.png" alt="Optional Attribute Statements" /></li>
<li>Press the <strong>Next</strong> button. Select the <strong>I’m a software vendor</strong> option (if you’re indeed a vendor - if you are developing an internal app, select the first option) and press the Finish button.
<img src="/assets/img/KentorOkta/VendorOrCustomerOption.png" alt="Select the customer or vendor option" /></li>
<li>Now edit the web.config file of the SampleApplication project.</li>
<li>In the <code class="highlighter-rouge">&lt;kentor.authServices&gt;</code> section, enter the following values:
<ul>
<li><strong>entityId</strong> = same value as the Audience URI for the Okta app, e.g. <a href="http://localhost:18714/SamplePath/AuthServices">http://localhost:18714/SamplePath/AuthServices</a></li>
<li><strong>returnUrl</strong> = value of the web application’s url, i.e. <a href="http://localhost:18714/SamplePath">http://localhost:18714/SamplePath</a></li>
</ul>
</li>
<li>In the <identityProviders> section, enter the following values:
</identityProviders> <ul>
<li><strong>entityId</strong> = <strong>Identity Provider Issuer</strong> from <strong>Sign On</strong> =&gt; <strong>View Setup Instructions</strong>
<img src="/assets/img/KentorOkta/ViewSetupInstructions.png" alt="View setup instructions" />
<img src="/assets/img/KentorOkta/IdentityProviderIssuer.png" alt="Identity Provider Issuer" /></li>
<li><strong>signOnUrl</strong> = value of the <strong>Identity Provider Single Sign-On URL</strong> below
<img src="/assets/img/KentorOkta/IdPSSOUrl.png" alt="Identity Provider Single Sign-On URL" /></li>
<li>In the <code class="highlighter-rouge">&lt;signingCertificate&gt;</code> section, download the <strong>okta.cert</strong> X.509 certificate from the instructions page in the Okta app and put it in the <strong>App_Data</strong> folder of your web application. Then reference it accordingly (such as with <strong>fileName=”~/App_Data/okta.cert</strong>”) in the web.config file.</li>
</ul>
</li>
</ol>
<p>You should be good to go now! Don’t forget to assign users to your Okta application and test that you can sign in into your SAML application both from the Okta portal (IdP-initiated sign-in flow) and from your SAML application itself (SP-initiated sign-in flow).</p>
<p>If you run into any issue while using the SP-initiated login flow (when a user clicks on the “Sign In” link of the /SamplePath page), then try to recompile the KentorIT.AuthServices project and make sure it is used by your project. If your project uses v0.17 of the NuGet corresponding library, make sure to comment out any <code class="highlighter-rouge">&lt;requestedAuthnContext &gt;</code> section in your web.config file.</p>
<p><strong>Happy Okta’ing!</strong></p>
</description>
<pubDate>Tue, 22 Mar 2016 00:00:00 -0700</pubDate>
<link>http://developer.okta.com/blog/2016/03/22/use-kentor-authservices-with-okta</link>
<guid isPermaLink="true">http://developer.okta.com/blog/2016/03/22/use-kentor-authservices-with-okta</guid>
</item>
<item>
<title>REST Service Authorization with JWTs</title>
<description><p>Many companies are adopting micro-services based architectures to promote
decoupling and separation of concerns in their applications. One inherent
challenge with breaking applications up into small services is that now each
service needs to deal with authenticating and authorizing requests made to it.
<a href="https://tools.ietf.org/html/rfc7519">Json Web Tokens (JWTs)</a> offer a clean
solution to this problem along with
<a href="/blog/2015/12/02/tls-client-authentication-for-services">TLS client authentication</a>
lower down in the stack.</p>
<p>Wils Dawson and I presented these topics to the <a href="http://www.meetup.com/sfjava/">Java User Group</a>
at Okta’s HQ in December and are thrilled to offer the
<a href="http://www.slideshare.net/JonTodd1/rest-service-authetication-with-tls-jwts">slides</a>,
<a href="https://github.com/wdawson/dropwizard-auth-example">code</a>, and the following
recording of the presentation. In the talk, we cover authentication and
authorization both at a server level with TLS and a user level with OAuth 2.0.
In addition, we explain claims based auth and federation while walking through
demos for these concepts using Java and Dropwizard. We purposely skipped over
client (e.g. browser) side authentication as it’s enough material for a future
talk and focused on solutions for authentication and authorization between
services within an application.</p>
<iframe src="https://player.vimeo.com/video/150714428" width="500" height="281" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe>
</description>
<pubDate>Tue, 05 Jan 2016 00:00:00 -0800</pubDate>
<link>http://developer.okta.com/blog/2016/01/05/rest-service-auth-jwt</link>
<guid isPermaLink="true">http://developer.okta.com/blog/2016/01/05/rest-service-auth-jwt</guid>
</item>
<item>
<title>Demystifying OAuth</title>
<description><p>It seems that OAuth 2.0 is everywhere these days. Whether you are building a hot new single page web application (SPA), a native mobile experience, or just trying to integrate with the API economy, you can’t go far without running into the popular authorization framework for REST/APIs and social authentication.</p>
<p>During <a href="https://www.okta.com/oktane15/">Oktane15</a>, <a href="https://www.linkedin.com/in/karlmcguinness">Karl McGuinness</a>, our Senior Director of Identity, demystified the powerful, yet often misunderstood, world of OAuth 2.0 and shared details on Okta’s growing support for <a href="http://openid.net/connect/">OpenID Connect</a>.</p>
<p><a href="http://www.slideshare.net/karl_mcguinness/demystifying-oauth-20">Slides</a></p>
<iframe src="https://player.vimeo.com/video/148164438" width="500" height="281" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe>
</description>
<pubDate>Mon, 07 Dec 2015 00:00:00 -0800</pubDate>
<link>http://developer.okta.com/blog/2015/12/07/oauth</link>
<guid isPermaLink="true">http://developer.okta.com/blog/2015/12/07/oauth</guid>
</item>
<item>
<title>TLS Client Authentication for Internal Services</title>
<description><p>If you’re like me, the most aggravating thing is finding a Stack Overflow
question that exactly describes the issue you are facing, only to scroll down
and see that it has remained unanswered since 2011. I was recently trying to
configure Transport Layer Security (TLS) <a href="https://en.wikipedia.org/wiki/Transport_Layer_Security#Client-authenticated_TLS_handshake">client
authentication</a>
(also referred to as mutual SSL) between two internal services at Okta and
found the lack of complete examples astonishing. I hope that this blog post
provides a better understanding of how to accomplish client authentication in
your applications and makes all that hard security stuff a bit easier.</p>
<h2 id="tls-background">TLS Background</h2>
<p>In a normal TLS handshake, the server sends its certificate to the client so
that the client can verify the authenticity of the server. It does this by
following the certificate chain that issued the server’s certificate until it
arrives at a certificate that it trusts. If the client reaches the end of the
chain without finding a certificate that it trusts, it will reject the
connection. For an example of what a server might send, see <a href="https://gist.github.com/jpf/9282d558bcc105ae8e1a">this
gist</a>.</p>
<p style="text-align: center"><img src="/assets/img/2015-10-29-tls-client-authentication-for-services-tls-handshake.png" alt="TLS handshake" width="540px" /></p>
<p style="text-align: center;font-size: x-small;">Image reprinted with permission from
<a href="https://blog.cloudflare.com/protecting-the-origin-with-tls-authenticated-origin-pulls/">CloudFlare</a></p>
<p>In mutual SSL, the client also sends its certificate to the server
for the server to authenticate along with an additional message (called the
CertificateVerify message), which assures the server that the client is the true
owner of the certificate. The server follows the same process of checking the
certificate chain until it finds one it trusts, refusing the connection if it
can’t find such a certificate.</p>
<p>So why is that useful? You probably interact with typical TLS all the time in
your browser. For example, when you visit <a href="https://www.okta.com">https://www.okta.com</a>, your browser
is verifying that the server serving Okta’s site is authentic (that it’s not
impersonating a legitimate Okta server). But Okta’s server has no idea who your
browser is. In this case it doesn’t care too much, so it lets you connect.</p>
<p>When we start talking about services talking to each other, authenticating the
client becomes important because it lowers the risk of our servers divulging
information to machines impersonating our services. For example, let’s say we
have a service called the User Service that holds all the information about
users in our application. We have another service called the Home Page Service
that serves up the home page to the browser. The home page has the user’s name,
email, phone number, and other personal information. The Home Page Service needs
to talk to the User Service to get the user’s name to display on the page. In
this case, the Home Page Service is the client and the User Service is the
server. If we only used normal TLS, only the User Service would be
authenticated! We need TLS client authentication to make sure the User Service
doesn’t provide data to a random client.</p>
<h2 id="implementing-tls-client-authentication">Implementing TLS Client Authentication</h2>
<p>In our case, the client and server are internal services communicating with each
other. I won’t cover configuring a browser client or other clients that may be
not under your control. In this post, I’ll give examples for the technology we
use at Okta. Specifically, we use <a href="http://www.dropwizard.io/">Dropwizard</a> as
the server framework and <a href="https://jersey.java.net/">Jersey</a> for the client
framework. We’ll also use Java’s
<a href="https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html">keytool</a>
for building the key and trust stores in Java KeyStore (JKS) format. The
examples below use these technologies, but I hope they’ll be fairly transferable
to choices you make in your applications. In addition, these samples are not
meant to be complete, so you may need to modify them to fit in your environment.</p>
<h3 id="certificates-and-key-stores">Certificates and Key Stores</h3>
<p style="text-align: center"><img src="/assets/img/2015-10-29-tls-client-authentication-for-services-ca-chain.png" alt="CA heirarchy" width="540px" /></p>
<p>First, let’s setup our trust store, which is just a key store that will only
contain certificates. Let’s assume we have a layered Certificate Authority (CA)
structure, like the image above, with a root CA and a subordinate global CA. The
root CA has its private key stored offline and its certificate is the one we
want our services to trust. The root certificate is the <em>only</em> certificate we
want our services to trust on that channel. We don’t even want a certificate
issued by a reputable 3rd party CA to be trusted by our service. So our trust
store will contain only the root certificate, which means the server will only
establish connections from clients that have a certificate issued by the root CA
or its child, the global CA, which will be the issuer of our server’s
certificate. This way, it’s quite easy to rotate our server’s certificate,
either when it expires or if it is somehow compromised; we can just change it on
that service and don’t have to worry about the other services it communicates
with losing trust because they trust the root. If all our services trusted each
other explicitly, the rotation would be much more difficult, especially if you
can’t take downtime. We’ll use the trust store for both the client and the
server, so you only need to make one, which you can copy if you need to.</p>
<div class="language-shell highlighter-rouge"><pre class="highlight"><code><span class="c"># Import your root certificate into a new trust store and follow the prompts</span>
keytool -import -alias root -file root.crt -keystore truststore.jks
</code></pre>
</div>
<p>Now that we’ve set up trust, we want to issue the certificate for our service
that chains up to the root. We’ll use the global CA to issue our server its
certificate, and since the global CA’s certificate is issued by the root CA,
we have a chain of trust. When we create the server’s certificate, we’ll include
the chain as well for clients to verify. The <a href="http://tools.ietf.org/html/rfc5246#section-7.4.2">TLS
standard</a> specifies that the
certificate chain does not require the actual root of trust since the endpoints
will have it already, so we’ll omit it to save bandwidth. Once we have the
certificate we’ll put it in a JKS for our Dropwizard application to use. If
your client does not have a certificate for service-to-service communication,
you can follow a similar pattern to create its certificate. But if it does have
an existing certificate, you can just reuse that one.</p>
<div class="language-shell highlighter-rouge"><pre class="highlight"><code><span class="c"># Create our server's key</span>
openssl genrsa -out server.key 2048
<span class="c"># Create the csr and follow the prompts for country code, ou, etc</span>
openssl req -new -key server.key -sha256 -out server.csr
<span class="c"># Sign the csr with your CA</span>
openssl ca -in server.csr -days 365 -config my-ca-conf.cnf -out server.crt
<span class="c"># Cat the cert chain together (except the root)</span>
cat server.crt global.crt &gt; chain.crt
<span class="c"># Create pkcs12 file for key and cert chain</span>
openssl pkcs12 -export -name server-tls -in chain.crt -inkey server.key -out server.p12
<span class="c"># Create JKS for server</span>
keytool -importkeystore -destkeystore keystore.jks -srckeystore server.p12 -srcstoretype pkcs12 -alias server-tls
</code></pre>
</div>
<h3 id="server-configuration">Server Configuration</h3>
<p>Now that we have our key and trust stores, let’s configure the server’s
Dropwizard application connector.</p>
<div class="language-conf highlighter-rouge"><pre class="highlight"><code><span class="n">server</span>:
<span class="n">applicationConnectors</span>:
- <span class="n">type</span>: <span class="n">https</span>
<span class="n">port</span>: <span class="m">8443</span>
<span class="c"># Key store settings
</span> <span class="n">keyStorePath</span>: <span class="n">path</span>/<span class="n">to</span>/<span class="n">keystore</span>.<span class="n">jks</span>
<span class="n">keyStorePassword</span>: <span class="s2">"notsecret"</span>
<span class="n">certAlias</span>: <span class="n">server</span>-<span class="n">tls</span>
<span class="n">enableCRLDP</span>: <span class="n">true</span>
<span class="c"># Trust store settings
</span> <span class="n">trustStorePath</span>: <span class="n">path</span>/<span class="n">to</span>/<span class="n">truststore</span>.<span class="n">jks</span>
<span class="n">trustStorePassword</span>: <span class="s2">"notsecret"</span>
<span class="c"># Fail fast at startup if the certificates are invalid
</span> <span class="n">validateCerts</span>: <span class="n">true</span>
<span class="c"># Whether or not to require authentication by peer certificate.
</span> <span class="n">needClientAuth</span>: <span class="n">true</span>
<span class="c"># Check peer certificates for validity when establishing a connection
</span> <span class="n">validatePeers</span>: <span class="n">true</span>
<span class="c"># The list of supported SSL/TLS protocols. You may need to modify
</span> <span class="c"># this section to support clients that you have.
</span> <span class="n">supportedProtocols</span>: [<span class="s2">"TLSv1.2"</span>]
<span class="n">supportedCipherSuites</span>: [<span class="s2">"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384"</span>]
<span class="n">allowRenegotiation</span>: <span class="n">false</span>
</code></pre>
</div>
<p style="text-align: center;font-size: x-small;">Dropwizard code is Copyright © 2010-2013 Coda Hale, Yammer Inc., 2014-2015 Dropwizard Team and/or its affiliates.
<a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache 2.0</a>.</p>
<p>That was pretty easy, huh? No cryptic OpenSSL commands! Now our server should be
configured to refuse connections from clients not presenting a root issued
certificate chain. We can test to make sure that happens! We can start our
server, telling Java to debug the SSL handshakes, and make sure we see it
refusing the connection for the right reason. In one terminal start the
Dropwizard server debugging SSL.</p>
<div class="language-shell highlighter-rouge"><pre class="highlight"><code><span class="gp">$ </span>java -Djavax.net.debug<span class="o">=</span>SSL,keymanager,trustmanager -jar your/jar.jar server config.yml
</code></pre>
</div>
<p>In another terminal run the following curl commands and verify you get the
expected results. First, make sure that the server does not talk HTTP over our
port.</p>
<div class="language-shell highlighter-rouge"><pre class="highlight"><code><span class="gp">$ </span>curl localhost:443
curl: <span class="o">(</span>52<span class="o">)</span> Empty reply from server
<span class="c"># The server should print something like the following because of no TLS:</span>
<span class="c"># javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?</span>
</code></pre>
</div>
<p>Next, check that the server is sending your certificate back over HTTPS.
curl has a preconfigured list of trusted certs and chances are your
root certificate is not in there.</p>
<div class="language-shell highlighter-rouge"><pre class="highlight"><code><span class="gp">$ </span>curl https://localhost:443
curl: <span class="o">(</span>60<span class="o">)</span> SSL certificate problem: Invalid certificate chain
<span class="c"># The server will print a bunch of stuff ending with something like:</span>
<span class="c"># javax.net.ssl.SSLException: Received close_notify during handshake</span>
</code></pre>
</div>
<p>Finally, ensure that the server terminates the connection if no client cert is
provided.</p>
<div class="language-shell highlighter-rouge"><pre class="highlight"><code><span class="gp">$ </span>curl -k https://localhost:443
curl: <span class="o">(</span>35<span class="o">)</span> Server aborted the SSL handshake
<span class="c"># The server will, again, print a bunch of stuff ending with something like:</span>
<span class="c"># javax.net.ssl.SSLHandshakeException: null cert chain</span>
</code></pre>
</div>
<h3 id="client-configuration">Client Configuration</h3>
<p>Now we’ll configure our client to talk to the server. I’ll use the Jersey 2.X
API, but there are equivalents in the 1.X as well as in the Apache HTTP library.</p>
<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// Assume the following variables are initialized already</span>
<span class="n">String</span> <span class="n">password</span><span class="o">;</span>
<span class="n">RSAPrivateKey</span> <span class="n">clientKey</span><span class="o">;</span>
<span class="n">X509Certificate</span> <span class="n">clientCert</span><span class="o">;</span>
<span class="n">X509Certificate</span> <span class="n">globalCert</span><span class="o">;</span>
<span class="n">X509Certificate</span> <span class="n">rootCert</span><span class="o">;</span>
<span class="n">X509Certificate</span><span class="o">[]</span> <span class="n">certChain</span> <span class="o">=</span> <span class="o">{</span><span class="n">clientCert</span><span class="o">,</span> <span class="n">globalCert</span><span class="o">};</span>
<span class="c1">// setup key store</span>
<span class="n">KeyStore</span> <span class="n">clientKeyStore</span> <span class="o">=</span> <span class="n">KeyStore</span><span class="o">.</span><span class="na">getInstance</span><span class="o">(</span><span class="s">"JKS"</span><span class="o">);</span>
<span class="n">clientKeyStore</span><span class="o">.</span><span class="na">load</span><span class="o">(</span><span class="kc">null</span><span class="o">,</span> <span class="n">password</span><span class="o">.</span><span class="na">toCharArray</span><span class="o">());</span>
<span class="n">clientKeyStore</span><span class="o">.</span><span class="na">setKeyEntry</span><span class="o">(</span><span class="s">"service-tls"</span><span class="o">,</span> <span class="n">clientKey</span><span class="o">,</span> <span class="n">password</span><span class="o">.</span><span class="na">toCharArray</span><span class="o">(),</span> <span class="n">certChain</span><span class="o">);</span>
<span class="c1">// setup trust store</span>
<span class="n">KeyStore</span> <span class="n">clientTrustStore</span> <span class="o">=</span> <span class="n">KeyStore</span><span class="o">.</span><span class="na">getInstance</span><span class="o">(</span><span class="s">"JKS"</span><span class="o">);</span>
<span class="n">clientTrustStore</span><span class="o">.</span><span class="na">load</span><span class="o">(</span><span class="kc">null</span><span class="o">,</span> <span class="n">password</span><span class="o">.</span><span class="na">toCharArray</span><span class="o">());</span>
<span class="n">clientTrustStore</span><span class="o">.</span><span class="na">setCertificateEntry</span><span class="o">(</span><span class="s">"root-ca"</span><span class="o">,</span> <span class="n">rootCert</span><span class="o">);</span>
<span class="c1">// setup Jersey client</span>
<span class="n">SslConfigurator</span> <span class="n">sslConfig</span> <span class="o">=</span> <span class="n">SslConfigurator</span><span class="o">.</span><span class="na">newInstance</span><span class="o">()</span>
<span class="o">.</span><span class="na">keyStore</span><span class="o">(</span><span class="n">clientKeyStore</span><span class="o">)</span>
<span class="o">.</span><span class="na">keyStorePassword</span><span class="o">(</span><span class="n">password</span><span class="o">)</span>
<span class="o">.</span><span class="na">keyPassword</span><span class="o">(</span><span class="n">password</span><span class="o">)</span>
<span class="o">.</span><span class="na">trustStore</span><span class="o">(</span><span class="n">clientTrustStore</span><span class="o">)</span>
<span class="o">.</span><span class="na">trustStorePassword</span><span class="o">(</span><span class="n">password</span><span class="o">)</span>
<span class="o">.</span><span class="na">securityProtocol</span><span class="o">(</span><span class="s">"TLSv1.2"</span><span class="o">);</span>
<span class="n">SSLContext</span> <span class="n">sslContext</span> <span class="o">=</span> <span class="n">sslConfig</span><span class="o">.</span><span class="na">createSSLContext</span><span class="o">();</span>
<span class="n">Client</span> <span class="n">client</span> <span class="o">=</span> <span class="n">ClientBuilder</span><span class="o">.</span><span class="na">newBuilder</span><span class="o">().</span><span class="na">sslContext</span><span class="o">(</span><span class="n">sslContext</span><span class="o">).</span><span class="na">build</span><span class="o">();</span>
</code></pre>
</div>
<p style="text-align: center;font-size: x-small;">Jersey code is Copyright © 2010-2015 Oracle and/or its affiliates.
<a href="https://jersey.java.net/license.html">GPL 2.0 Selected</a>.</p>
<p>Hooray authentication!</p>
<p style="text-align: center"><a href="https://xkcd.com/1121/"><img src="http://imgs.xkcd.com/comics/identity.png" alt="xkcd-identity" /></a></p>
<p style="text-align: center;font-size: x-small;">Comic is Copyright © <a href="https://xkcd.com">xkcd.com</a>.
<a href="http://creativecommons.org/licenses/by-nc/2.5/">CC BY-NC 2.5</a>.</p>
<h2 id="tightening-things-up">Tightening Things Up</h2>
<p>Now we are just granting any service with a certificate signed by our root CA to
talk to our server. Chances are we’d like to trim this down to only clients that
should be talking to the server so we can refuse some other service that has
no business with our server even though it has a certificate issued by our root
CA. This is useful for preventing another service we have from accessing our new
service. For example, suppose in addition to a User Service and a Home Page
Service, we have an Event Service. We may want to block the Event Service from
communicating with the User Service while allowing the Home Page Service to do
that communication.</p>
<p>To accomplish this, we could change our server’s trust store to only contain the
public key of the client, but this presents problems (and more work) when we try
to rotate that key pair. So, instead, let’s try having the server check that the
hostname of the client is one that it expects to hear from. We can also do this
in the other direction (client verifying the server).</p>
<p>Several options exist for verifying the hostname on the server side. The first
is one that Dropwizard supports this verification with a tricky configuration
change for the underlying Java SSL connection.</p>
<div class="language-conf highlighter-rouge"><pre class="highlight"><code><span class="n">server</span>:
<span class="n">applicationConnectors</span>:
- <span class="n">type</span>: <span class="n">https</span>
<span class="c">#...
</span> <span class="n">endpointIdentificationAlgorithm</span>: <span class="n">HTTPS</span>
</code></pre>
</div>
<p>The HTTPS endpoint identification algorithm will cause Java to do hostname
verification against your cert. Specifically, this will check the hostname of
the client that made the request against the DN that is given in the client’s
certificate. If they do not match, the connection will be refused. This is a
great, <a href="http://tools.ietf.org/html/rfc2818#section-3.1">standard</a> way to solve
this problem, however it can be tricky to know what the hostnames will be or to
make a wildcard pattern (or <a href="https://tools.ietf.org/html/rfc3280#section-4.2.1.7">subject alternative name
extension</a>) for your
clients. We can take a higher-level approach than hostname comparison.</p>
<p>We can, instead, provide our server with a regular expression that matches the
DNs that we expect in our certificates. This means we no longer have to worry
about hostnames. So as services move from host to host, they can keep the same
certificate and everything will Just Work™.
Additionally, a certificate can belong to a service rather than an individual
host now so there’s less management that needs to happen. To do this, we just
need to set up a filter in our server and configure a regex to match the DN in
the certificate(s) that are allowed to communicate with our service or else
return a 403 response.</p>
<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">javax.annotation.Priority</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">javax.servlet.http.HttpServletRequest</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">javax.ws.rs.Priorities</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">javax.ws.rs.container.ContainerRequestContext</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">javax.ws.rs.container.ContainerRequestFilter</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">javax.ws.rs.container.PreMatching</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">javax.ws.rs.core.Context</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">javax.ws.rs.core.Response</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.io.IOException</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.security.cert.X509Certificate</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.util.regex.Pattern</span><span class="o">;</span>
<span class="cm">/**
* A ContainerRequestFilter to do certificate validation beyond the tls validation.
* For example, the filter matches the subject against a regex and will 403 if it doesn't match
*
* @author &lt;a href="mailto:[email protected]"&gt;wdawson&lt;/a&gt;
*/</span>
<span class="nd">@PreMatching</span>
<span class="nd">@Priority</span><span class="o">(</span><span class="n">Priorities</span><span class="o">.</span><span class="na">AUTHENTICATION</span><span class="o">)</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">CertificateValidationFilter</span> <span class="kd">implements</span> <span class="n">ContainerRequestFilter</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">X509_CERTIFICATE_ATTRIBUTE</span> <span class="o">=</span> <span class="s">"javax.servlet.request.X509Certificate"</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">Pattern</span> <span class="n">dnRegex</span><span class="o">;</span>
<span class="c1">// Although this is a class level field, Jersey actually injects a proxy</span>
<span class="c1">// which is able to simultaneously serve more requests.</span>
<span class="nd">@Context</span>
<span class="kd">private</span> <span class="n">HttpServletRequest</span> <span class="n">request</span><span class="o">;</span>
<span class="cm">/**
* Constructor for the CertificateValidationFilter.
*
* @param dnRegex The regular expression to match subjects of certificates with.
* E.g.: "^CN=service1\.example\.com$"
*/</span>
<span class="kd">public</span> <span class="nf">CertificateValidationFilter</span><span class="o">(</span><span class="n">String</span> <span class="n">dnRegex</span><span class="o">)</span> <span class="o">{</span>
<span class="k">this</span><span class="o">.</span><span class="na">dnRegex</span> <span class="o">=</span> <span class="n">Pattern</span><span class="o">.</span><span class="na">compile</span><span class="o">(</span><span class="n">dnRegex</span><span class="o">);</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">filter</span><span class="o">(</span><span class="n">ContainerRequestContext</span> <span class="n">requestContext</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="o">{</span>
<span class="n">X509Certificate</span><span class="o">[]</span> <span class="n">certificateChain</span> <span class="o">=</span> <span class="o">(</span><span class="n">X509Certificate</span><span class="o">[])</span> <span class="n">request</span><span class="o">.</span><span class="na">getAttribute</span><span class="o">(</span><span class="n">X509_CERTIFICATE_ATTRIBUTE</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">certificateChain</span> <span class="o">==</span> <span class="kc">null</span> <span class="o">||</span> <span class="n">certificateChain</span><span class="o">.</span><span class="na">length</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">certificateChain</span><span class="o">[</span><span class="mi">0</span><span class="o">]</span> <span class="o">==</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
<span class="n">requestContext</span><span class="o">.</span><span class="na">abortWith</span><span class="o">(</span><span class="n">buildForbiddenResponse</span><span class="o">(</span><span class="s">"No certificate chain found!"</span><span class="o">));</span>
<span class="k">return</span><span class="o">;</span>
<span class="o">}</span>
<span class="c1">// The certificate of the client is always the first in the chain.</span>
<span class="n">X509Certificate</span> <span class="n">clientCert</span> <span class="o">=</span> <span class="n">certificateChain</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span>
<span class="n">String</span> <span class="n">clientCertDN</span> <span class="o">=</span> <span class="n">clientCert</span><span class="o">.</span><span class="na">getSubjectDN</span><span class="o">().</span><span class="na">getName</span><span class="o">();</span>
<span class="k">if</span> <span class="o">(!</span><span class="n">dnRegex</span><span class="o">.</span><span class="na">matcher</span><span class="o">(</span><span class="n">clientCertDN</span><span class="o">).</span><span class="na">matches</span><span class="o">())</span> <span class="o">{</span>
<span class="n">requestContext</span><span class="o">.</span><span class="na">abortWith</span><span class="o">(</span><span class="n">buildForbiddenResponse</span><span class="o">(</span><span class="s">"Certificate subject is not recognized!"</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="kd">private</span> <span class="n">Response</span> <span class="nf">buildForbiddenResponse</span><span class="o">(</span><span class="n">String</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span>
<span class="n">reutrn</span> <span class="n">Response</span><span class="o">.</span><span class="na">status</span><span class="o">(</span><span class="n">Response</span><span class="o">.</span><span class="na">Status</span><span class="o">.</span><span class="na">FORBIDDEN</span><span class="o">)</span>
<span class="o">.</span><span class="na">entity</span><span class="o">(</span><span class="s">"{\"message\":\""</span> <span class="o">+</span> <span class="n">message</span> <span class="o">+</span> <span class="s">"\"}"</span><span class="o">)</span>
<span class="o">.</span><span class="na">build</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre>
</div>
<p style="text-align: center;font-size: x-small;">Dropwizard code is Copyright © 2010-2013 Coda Hale, Yammer Inc., 2014-2015 Dropwizard Team and/or its affiliates.
<a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache 2.0</a>.
Jersey code is Copyright © 2010-2015 Oracle and/or its affiliates.
<a href="https://jersey.java.net/license.html">GPL 2.0 Selected</a>.</p>
<h2 id="circling-back">Circling Back</h2>
<p>We defined TLS client authentication and went over how it can help secure your
backend services. We walked through configuring a Dropwizard server with
mandatory TLS client authentication and creating a Jersey client to provide the
appropriate credentials when talking to that server. We also talked about
options to further restrict clients’ ability to talk to the server based on
their certificates. I hope you have a better understanding of how to implement
mutual SSL in your applications. Below are a few things to also keep in mind as
you implement these authentication concepts in your applications.</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Transport_Layer_Security#Security">TLS Protocol Security</a></li>
<li><a href="https://en.wikipedia.org/wiki/Transport_Layer_Security#Cipher">Cipher Suites</a></li>
<li><a href="https://en.wikipedia.org/wiki/Authentication#Authorization">Authorization</a></li>
<li><a href="https://www.java.com/en/download/faq/release_dates.xml">JVM Security Updates</a></li>
</ul>
<h4 id="references">References</h4>
<ol>
<li><a href="https://www.sslshopper.com/article-most-common-java-keytool-keystore-commands.html">Common keytool commands</a></li>
<li><a href="https://www.sslshopper.com/article-most-common-openssl-commands.html">Common openssl commands</a></li>
<li><a href="http://www.dropwizard.io/0.7.0/docs/manual/configuration.html#man-configuration-https">Dropwizard https configuration manual</a></li>
<li><a href="https://jersey.java.net/documentation/latest/client.html#d0e5128">Jersey client documentation</a></li>
</ol>
<p style="text-align: center;font-size: x-small;">Copyright © 2015 Okta, Inc. article licensed under
<a href="http://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA 3.0</a> and code samples
licensed under <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache 2.0</a>. All
rights reserved.</p>
</description>
<pubDate>Wed, 02 Dec 2015 00:00:00 -0800</pubDate>
<link>http://developer.okta.com/blog/2015/12/02/tls-client-authentication-for-services</link>
<guid isPermaLink="true">http://developer.okta.com/blog/2015/12/02/tls-client-authentication-for-services</guid>
</item>
<item>
<title>The New Age of Trust</title>
<description><p>I recently read an excellent <a href="http://www.viabilify.com/blog/trust">article</a> about how amazing products shape the <strong>trust relationship</strong> with customers. I think great products are the <em>first step</em> in building a trust relationship. And like other aspects of the product that are derived from the product but are not physically part of it, the trust relationship is now more important than ever before.</p>
<hr />
<blockquote>
<p>When you use a product, every engagement with that product has a direct correlation with your perception of the value of that product. — From <a href="http://www.viabilify.com/blog/trust">Product Loyalty Follows Trust Like Form Follows Function</a></p>
</blockquote>
<hr />
<p>These days, making sure your product solves problems that customers face everyday is not just a <em>good idea</em>, it’s <strong>table stakes</strong>. And the bar is rising constantly. The more products improve, the less customers are willing to tolerate bad experiences. With the consumerization of the enterprise, the importance of product design and favorable user experiences is indispensable to every kind of product distribution model.</p>
<p>Two of the most important aspects of the product are Product Status/Trust and Customer Support. Having dissected both of these from many angles in my recent work, I know that businesses neglect them at their peril.</p>
<p>###Communicate transparently</p>
<p>The Product Status/Trust page is the first place customers visit on your site when they have a problem. Consider yourself lucky if the customer only had an problem with a browser (like cache), or an issue with their laptop. For more serious issues, customers expect transparency and effective communication from the Product Status/Trust page. They want information that communicates exactly what the problem is. </p>
<p>###Serve multiple audiences
A Product Status/Trust page is also one of the first places that prospects are likely to visit. In fact, when the product is gaining momentum, prospect traffic may exceed customer traffic. </p>
<p>I’ve rarely seen a Status/Trust page actually serve both customers and prospects effectively. The key is to find the correct balance. As the product becomes popular, customers and prospects visit the page with very different mindsets.</p>
<p>For customers, the Status/Trust page must be the source of truth for issues, downtime, and root cause analysis. This requires radical transparency and effective communication.</p>
<p>For prospects, the Status/Trust page must do at least the following:</p>
<ul>
<li>Display all the information they need to make an informed decision</li>
<li>Make them love the product even more</li>
<li>Serve as an effective talking point for your sales reps</li>
</ul>
<p>Once users get their hands on a product, they often find new ways to use it that the founders never imagined. This experimentation can shape the future of products and platforms. This is especially true of products that also provide APIs. The growth of B2B2C, B2C2B and other hybrid distribution models means that you are putting your product in front of many different types of audiences, including direct customers, partners, resellers, channel partners, and early and late stage prospects. Understanding how all of these audiences report on the service, interpret the SLA, and react to downtime is crucial if you want to create products that users value.</p>
<p>###Be highly available but prepare for risk </p>
<p>It is a truism that one cannot solve for all technical constraints. Choosing the right platform on which to build your Product and Status/Trust pages is very important, but hosting both pages on the same platform risks both being down at the same time if your site crashes. </p>
<p>Obviously, you cannot afford to let your Trust/Status page go down, so <strong>high availability</strong> is key. But just in case the page ever <em>does</em> go down, you need a <strong>risk mitigation plan</strong>. Central to this is making sure that you are contsantly monitoring your Trust/Status page with enterprise-class monitoring tools.</p>
<p>###Send rapid, robust, and consistent notifications
In the event of trust page problems, make sure that you have ways to easily and automatically notify customers and site ops as soon as issues are detected. Employ RSS, Twitter, and other channels to notify customers; invest in monitoring tools to notify site ops.</p>
<p>If you send notifications manually, take care not to introduce human error when the site is having issues, as these are usually chaotic periods for your ops team.</p>
<p>Realize that many of your customers will probably check your trust page <em>and</em> call support, so it’s important that the trust page and the support team provide the same information. This also argues for ensuring that your support team is a key stakeholder in the development of your trust page. </p>
<p>###Conclusion</p>
<p>Designing and developing the Trust Page has taught me much in the last few months. I’d be happy to hear from you on this topic, so please feel free to send comments or questions to <a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#118;&#105;&#109;&#097;&#114;&#115;&#104;&#046;&#107;&#097;&#114;&#098;&#104;&#097;&#114;&#105;&#064;&#111;&#107;&#116;&#097;&#046;&#099;&#111;&#109;">&#118;&#105;&#109;&#097;&#114;&#115;&#104;&#046;&#107;&#097;&#114;&#098;&#104;&#097;&#114;&#105;&#064;&#111;&#107;&#116;&#097;&#046;&#099;&#111;&#109;</a>.</p>
<p><em>The Trust Page is the work of an amazing team that includes Tim Gu, Shawn Gupta, Nathan Tate, Wendy Liao, and myself.</em></p>
</description>
<pubDate>Thu, 11 Jun 2015 00:00:00 -0700</pubDate>
<link>http://developer.okta.com/blog/2015/06/11/trustpage</link>
<guid isPermaLink="true">http://developer.okta.com/blog/2015/06/11/trustpage</guid>
</item>
<item>
<title>How Okta Chased Down Severe System CPU Contention in MySQL</title>
<description><p>Sometimes fixing a problem causes or reveals a new one. And sometimes this sets off a chain reaction of problems and fixes, where each solution exposes a deeper issue. In technology, cascades like these are common, often painful, and occasionally welcome.</p>
<p>Our battle against CPU contention last fall is a good example of such a cascade. What began as a buffer pool adjustment triggered a series of issues and fixes that generated plenty of stress, but ultimately strengthened our platform.</p>
<p>Underlying each of the challenges we faced in that period was the huge amount of business our Sales organization had closed in late summer and early Fall of 2014. Growth brought a dramatic increase in the number of new customers running large import jobs and new orgs running agents.</p>
<p>As problems go, growing pains are good problems to have. But they usually come at a cost: the increased traffic caused significant CPU contention, as shown in the following image.</p>
<p><img style="width:55%" src="/assets/img/Pre-buffer_adjustment.png" alt="Before tuning the database" /></p>
<p>Those red and yellow spikes in late October, 2014 seized our attention and spurred an aggressive response from Okta’s site operations team. The team took immediate action to prevent this situation from getting worse and potentially causing a issue with our site.</p>
<p>##Tuning the database</p>
<p>As a first step, we tuned our MySQL database to fully utilize the amount of RAM in our server instances. We had been running with a relatively small buffer pool
compared to the amount of available RAM, which meant that we were sacrificing both performance and money. Increasing the size of the buffer pool decreased page response times and nearly eliminated disk reads.</p>
<p><img style="width:50%" src="/assets/img/EliminateDiskReads.png" alt="Almost eliminated disk reads" /></p>
<p>##Doubling hardware resources</p>
<p>Despite the buffer pool adjustment, we continued to see significant CPU contention. In response, we doubled the size of our servers (244 GB of RAM, 32 CPU cores, and 2 x 320 GB HDDs). CPU contention decreased (see the trough in the following image), but probably because of the Thanksgiving holiday, not the additional hardware.</p>
<p>After the holiday, CPU spikes returned, now worse than ever. Page render time slowed down, queries against the database took longer, and jobs backed up.</p>
<p><img style="width:50%" src="/assets/img/Thanksgiving.png" alt="Thanksgiving holiday" /></p>
<p><strong>Note:</strong> Flat areas in the graph showing no CPU usage indicate periods when we were running on a secondary server.</p>
<p>Why did CPU contention increase after we’d doubled the CPUs? Shouldn’t it have decreased?</p>
<p>##Kernel mutex bottleneck</p>
<p>The alarming amount of yellow in our graphs showed extremely high <strong>system CPU usage</strong> (and <strong>user CPU usage</strong> was also too high). Clearly, the operating system was working very hard at <em>something</em>. The metrics we pulled revealed that all the InnoDB threads were busy waiting on the kernel mutex. We had known that kernel mutex was a bottleneck even before we’d doubled hardware resources, but we hadn’t understood why.</p>
<p>A closer look at the MySQL source code showed that kernel mutex was trying to allocate memory to all of our transactions. This is perfectly normal behavior, but it proved to be very limiting in our case because we perform approximately 85,000 transactions per minute. The kernel has to create a transaction ID for each transaction and allocate a tiny block of memory in RAM before giving it to the thread handling the transaction.</p>
<p>Now we knew why doubling the number of CPUs caused greater contention: instead of providing transaction IDs and associated memory to approximately 24 InnoDB threads, kernel mutex was now working like mad to provide IDs and memory to approximately 48 InnoDB threads. Imagine having a single toll booth on a 16 lane highway and then <em>doubling the number of lanes</em>.</p>
<p>In the discussions that followed, some called for rolling back to the smaller machines, reasoning that fewer threads would mean less CPU contention. Others believed that rolling backward would be a mistake, arguing that our business growth required the more powerful servers in any case, and that doubling the number of CPUs was not itself a problem, but rather part of the ultimate solution because it exposed the root cause of the extreme system CPU usage.</p>
<p>The right course – the one we ultimately took – was to stick with the more powerful servers and tune them properly.</p>
<p>##Adopting TCMalloc</p>
<p>We quickly found several resources online, including a <a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html">key blog post</a> about <strong>TCMalloc</strong> (Thread-Caching Memory Allocation) and an article about <a href="http://www.olivierdoucet.info/blog/2012/05/19/debugging-a-mysql-stall/">debugging MySQL</a>.</p>
<p>Traditional memory allocation schemes, like the <strong>glibc</strong> malloc that we were then using, employ a mutex to prevent concurrent access to the transaction ID counter. Preventing concurrency is totally wrong for a multi-core, multi-thread architecture like ours.</p>
<p>In contrast, TCMalloc allocates a small pool of memory to each CPU core. Individual processor threads obtain RAM directly from their core, ideally from the L2 cache nearest the thread’s section of the CPU. This sounded promising, so we switched to TCMalloc.</p>
<p>Following the switch, things looked pretty good. User CPU decreased dramatically, never to return to the +50% usage we’d seen before. We had finally solved the memory allocation bottleneck. If we hadn’t doubled the number of CPUs, we wouldn’t have found the problem that lead us to adopt TCMalloc.</p>
<p><em>Had we finally solved our scalability problem?</em></p>
<p>##Transparent Huge Pages: Thanks for your help…please don’t help</p>
<p>By the next morning <strong>CPU contention was worse</strong>.</p>
<p>The alarmingly high system CPU usage that we’d seen in the previous 3 months was always due to MySQL using kernel mutex. But since we’d fixed that problem, <em>what the heck was this?</em></p>
<p>We discussed turning off TCMalloc, but that would’ve been a mistake. Implementing TCMalloc was a critical link in the chain of problems and solutions that ultimately strengthened our platform.</p>
<p>We discovered very quickly that the culprit this time was a <em>khugepaged</em> enabled by a Linux kernel flag called <strong>Transparent Huge Pages</strong> (THP; turned on by default in most Linux distributions). Huge pages are designed to improve performance by helping the operating system manage large amounts of memory. They effectively increase the page size from the standard 4kb to 2MB or 1Gb (depending on how it is configured).</p>
<p><strong>THP</strong> makes huge pages easier to use by, among other things, arranging your memory into larger chunks. It works great for app servers that are not performing memory-intensive operations.</p>
<p>Which is why THP is so wrong for our platform. By late 2014 we were using 95% of the RAM and 58% of the 32 CPU cores in our servers . In order to store all of those tiny transaction IDs, we were rewriting memory so rapidly that THP’s efforts to move pages around couldn’t keep up. Clearly, standard 4kb blocks were much more efficient for us than the larger page size that THP was “helping” us with. So we turned THP off. The following image tells the story.</p>
<p><img style="width:40%" src="/assets/img/TCMalloc.png" alt="TCMalloc" /></p>
<p><strong>Note:</strong> Flat areas in the graph showing no CPU usage indicate periods when we were running on a secondary server.</p>
<p>In a sense, encountering the dramatic effect of THP, an operating system problem, was clarifying. It validated our previous remedies, and turning it off definitely strengthened our platform.</p>
<p>##Lessons learned</p>
<p>Beyond the technical lessons we learned during this period, we were reminded that sometimes the best thing to do is stay the course. At times we were tempted to pull back, but moving forward ultimately paid off as each improvement we made exposed the inadequacy (for our platform) of a downstream component.</p>
</description>
<pubDate>Fri, 22 May 2015 00:00:00 -0700</pubDate>
<link>http://developer.okta.com/blog/2015/05/22/tcmalloc</link>
<guid isPermaLink="true">http://developer.okta.com/blog/2015/05/22/tcmalloc</guid>
</item>
<item>
<title>Okta Software Engineering Design Principles</title>
<description><p>Okta has been an agile development shop since the beginning. One important
aspect of being agile is enabling a mix of bottom-up and top-down decision
making. Specifically where high level vision and strategy is clearly
communicated enabling teams to autonomously deliver value while also feeding back
learnings from the trenches to inform the high level
goals.<sup id="fnref:the-knowledge-creating-company"><a href="#fn:the-knowledge-creating-company" class="footnote">1</a></sup> Below are the tacit engineering design
principles we’ve used to guide development at Okta. They continue to evolve
as we experiment and learn.</p>
<h2 id="create-user-value">1. Create User Value</h2>
<p>First and foremost, writing software is about creating value for users. This
seems straight forward, but as systems evolve and become more complex we
start introducing more abstraction and layering which brings us further away
from the concrete problem we’re trying to solve. It’s important to keep in mind
the reason for writing software in the first place and use the understanding
of the audience to inform priority.</p>
<p>At Okta, our entire company is aligned on this principle because our #1 core
value is <a href="https://www.okta.com/customers/focus-on-customer-success.html">customer
success</a>. In
practice this means there’s almost always a number of customers eager to beta a
new feature we’re working on. We collaborate closely with customers while
building features allowing for continuous feedback as we iterate and get
changes out in weekly sprints.</p>
<p><img src="http://imgs.xkcd.com/comics/the_general_problem.png" alt="xkcd - pass the salt" /></p>
<h2 id="keep-it-simple">2. Keep it Simple</h2>
<blockquote>
<p>Everything should be made as simple as possible, but no simpler — Albert
Einstein</p>
</blockquote>
<p>This truism has been around for ages, and it goes hand-in-hand with the
first principle. If it doesn’t add value to users now, <a href="http://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it">you ain’t gonna
need it - YAGNI</a>!</p>
<p>We all encounter overly complex code where it’s nearly impossible to reason
about what it does. Part of this confusion is because it’s
generally <a href="http://www.joelonsoftware.com/articles/fog0000000069.html">harder to read code than to write
it</a> but beyond
that, there are clearly fundamental qualities of some code making it
more intuitive than other code. There’s a lot of prior art on this topic and a
great place to start is <a href="http://books.google.com/books?id=dwSfGQAACAAJ">Clean
Code</a> by Robert C. Martin, aka
Uncle Bob. The book breaks down the qualities of code which make it
intuitive, and provides a framework for reasoning about code quality.</p>
<p>Here are some guiding principles about writing clean code we use in practice
which are also covered in the book.</p>
<p>Clean code:</p>
<ul>
<li>Makes intent clear, use comments when code isn’t expressive enough</li>
<li>Can be read and enhanced by others (or the author after a few
years)</li>
<li>Provides one way, rather than many, to do a particular task</li>
<li>Is idomatic</li>
<li>Is broken into pieces which each does one thing and does it well</li>
</ul>
<p>At the end of the day there is no substitue for experience, like any craft,
writing clean code takes practice. At Okta every engineer is constantly honing
their skills, we rely heavily on code reviews and pair programming to help hone
each other’s skills.</p>
<p><img src="/assets/img/2015-05-08-software-engineering-design-principles-code_quality_wtfs_per_minute.jpg" alt="wtfs per minute" /></p>
<h2 id="know-thy-service-with-data">3. Know Thy-Service With Data</h2>
<p>In the world of “big data” this point needs little explanation. Okta collects
massive amounts of operational data about our systems to:</p>
<ul>
<li>Monitor health</li>
<li>Monitor performance</li>
<li>Debug issues</li>
<li>Audit security</li>
<li>Make decisions</li>
</ul>
<p>With every new feature we add, developers are responsible for ensuring that
their designs provide visibility into these dimensions. In order to make this an
efficient process we’ve invested in:</p>
<ul>
<li>Runtime logging control toggling by level, class, tenant, user</li>
<li>Creation of dashboards and alerts is self-service</li>
<li>Every developer has access to metrics and anonymous unstructured data</li>
<li>Request ID generated at edge is passed along at every layer of stack for
correlation</li>
<li>Engineering control panel for common operational tasks like taking threaddumps</li>
</ul>
<p>Technologies we use to gain visibility include: PagerDuty, RedShift,
Zabbix, ThousandEyes, Boundary, Pingdom, App Dynamics, Splunk, ELK, S3.</p>
<h2 id="make-failure-cheap">4. Make Failure Cheap</h2>
<p>Every software system will experience failures and all code has bugs. While we
constantly work at having fewer, it’s unrealistic to assume they won’t occur. So,
in addition to investing in prevention, we invest in making failure cheap.</p>
<p>The cost of failure becomes significantly more expensive
further out on the development timeline. Making adjustments
during requirements gather and design are significantly cheaper than
finding issues in production.<sup id="fnref:agile-cost-curve"><a href="#fn:agile-cost-curve" class="footnote">2</a></sup></p>
<p><img src="/assets/img/2015-05-08-software-engineering-design-principles-agile-cost-curve.png" alt="cost curve of development" /></p>
<p>One fundamental we take from both Agile and XP is to invest in pushing failure
as early in the development timeline as possible. We mitigate failures from
poor requirements gathering by iterating quickly with the customer as described in
Principle 1. Once we get to design and development we make failure cheap through:</p>
<ul>
<li>Design reviews with stakeholders ahead of writing code</li>
<li>TDD - developers write all tests for their code; test isn’t a separate phase
from development</li>
<li>Keeping master stable - check-in to master is gated by passing all unit,
functional and UI tests</li>
<li>Developers can trigger CI on any topic branch; CI is massively parallelized
over a cloud of fast machines</li>
</ul>
<p>Since our testing phase is done during development the next phase is production
deployments. At this phase we reduce the cost of failure by:</p>
<ul>
<li>Hiding beta features behind flags in the code</li>
<li>Incremental rollout first to test accounts and then in batches of customers</li>
<li>Automated deployment process</li>
<li>Code and infrastructure is forward and backward compatible allowing
rollback</li>
<li>Health check and automatically remove down nodes</li>
<li>Return a degraded / read-only response over nothing at all</li>
</ul>
<blockquote>
<p>An escalator can never break; it can only become stairs – Mitch Hedberg</p>
</blockquote>
<h2 id="automate-everything">5. Automate Everything</h2>
<p>All tasks performed routinely should
be automated. These are automation principles we follow:</p>
<ul>
<li>Automate every aspect of the deployment including long running db migrations</li>
<li>All artifacts are immutable and versioned</li>
<li>All code modules get dependencies automatically from central artifact server</li>
<li>Creation of base images and provisioning of new hardware is automated</li>
<li>All forms of testing are automated</li>
<li>Development environment setup is automated</li>
</ul>
<p>Tools we use:</p>
<ul>
<li>AWS - Automated provisioning of hardware</li>
<li>Chef - Configuration managment</li>
<li>Ansible - Automated deployment orchestration</li>
<li>Jenkins - Continuous integration</li>
<li>Gearman - To get Jenkins to scale</li>
<li>Docker - Containerizing services</li>
</ul>
<h2 id="with-performance-less-is-more">6. With Performance, Less is More</h2>
<p>We find especially with performance, there are typically huge wins to be had in
up front design decisions which may come at very little to no cost. Our design
mantras for performance are:</p>
<ol>
<li>Don’t do it</li>
<li>Do it, but don’t do it again</li>
<li>Do it less</li>
<li>Do it later</li>
<li>Do it when they’re not looking</li>
<li>Do it concurrently</li>
<li>Do it cheaper</li>
</ol>
<p>In practice we implement a number of strategies to limit risk to poorly
performing code:</p>
<ul>
<li>Major new features and performance tunings live behind feature flags allowing
slow rollout and tuning in real life environment</li>
<li>Chunk everything that scales on order of N. When N is controlled by customer
enforce limits and design for infinity.</li>
<li>Slow query and frequent query monitoring to detect poor access patterns</li>
</ul>
<p><img style="max-width:300px" src="/assets/img/2015-05-08-software-engineering-design-principles-more_is_less.jpg" alt="if less is more, does that mean more is less?" /></p>
<h3 id="reference">Reference</h3>
<div class="footnotes">
<ol>
<li id="fn:the-knowledge-creating-company">
<p>Ikujiro Nonaka, and Hirotaka Takeuchi. The Knowledge Creating Company. Oxford University Press, 1995. Print. https://books.google.com/books/about/The_Knowledge_creating_Company.html?id=B-qxrPaU1-MC <a href="#fnref:the-knowledge-creating-company" class="reversefootnote">&#8617;</a></p>
</li>
<li id="fn:agile-cost-curve">
<p>Scott Ambler. Examining the Agile Cost of Change Curve. Website. http://www.agilemodeling.com/essays/costOfChange.htm <a href="#fnref:agile-cost-curve" class="reversefootnote">&#8617;</a></p>
</li>
</ol>
</div>
</description>
<pubDate>Fri, 08 May 2015 00:00:00 -0700</pubDate>
<link>http://developer.okta.com/blog/2015/05/08/software-engineering-design-principles</link>
<guid isPermaLink="true">http://developer.okta.com/blog/2015/05/08/software-engineering-design-principles</guid>
</item>
<item>
<title>Productionalizing ActiveMQ</title>
<description><p>This post describes our odyssey with ActiveMQ, an open-source version of the Java Messaging Service (JMS) API. We use ActiveMQ as the message broker among our app servers.</p>
<p>First, a word of thanks. To overcome the challenges we faced with ActiveMQ, we are greatly indebted to a very thorough description of an <a href="https://bugs.openjdk.java.net/browse/JDK-8054446">OpenJDK bug</a>, as well as some other <a href="https://svn.apache.org/repos/asf/harmony/standard/classlib/trunk/modules/concurrent/src/main/java/java/util/concurrent/ConcurrentLinkedQueue.java">online resources</a>. If you’re having problems with ActiveMQ, read on. Maybe our story can help you.</p>
<h2 id="growing-pains">Growing Pains</h2>
<p>Our problems with ActiveMQ date all the way back to 2012. They centered around high memory and CPU usage, message timeout errors, and message queue delays.</p>
<p>Let’s pick up the action in the spring 2014. At that time we were battling a new wave of timeout storms and message queue delays caused by our mixed ActiveMQ configuration (broker <strong>5.4.1</strong>, client <strong>5.7</strong>) and increasing traffic on our site.</p>
<p>Of course we welcomed the growth in traffic as a byproduct of our growing business. And although we did plan to address our mixed ActiveMQ configuration, we decided to delay doing so at that time, opting instead to tune the configuration. So we increased the maximum session size from 500 to 2000, and the page size from 200 to 2000 messages. Increasing the page size served to minimize “hung queue” scenarios — a side effect of using <a href="http://docs.oracle.com/cd/E19798-01/821-1841/bncer/index.html">message selectors</a>.</p>
<h2 id="another-inflection-point">Another Inflection Point</h2>
<p>Business and site traffic continued to grow, contributing to another inflection point in the fall of 2014. Timeout storms, CPU spikes, and memory issues returned. It was clear that we could no longer put off upgrading to a newer version of ActiveMQ.</p>
<p>We decided to skip versions 5.7 and 5.8 in favor of 5.10, mainly because 5.7 was considered unstable, and 5.10 provided improved failover performance.</p>
<p>Would this upgrade finally deliver the stability that had eluded us for so long?</p>
<h2 id="when-upgrades-bite-back">When Upgrades Bite Back</h2>
<p>Unfortunately, no. Within 24 hours, memory usage soared, CPUs spiked, and instability returned. Note the dramatic CPU spikes in the following screenshot.</p>
<p><img style="width:50%" src="/assets/img/2015-05-08-productionalizing-active-mq-cpu-graph-1.png" alt="Active MQ CPU" /></p>
<p>To prevent these issues from impacting customers, we were forced to restart brokers, which is always an option of
last resort. Restarting brokers is a delicate operation, which can entail a less-than-smooth failover,
risking message loss.</p>
<p>We immediately increased memory, but within a day or two we ran out of memory again.</p>
<h2 id="searching-for-the-root-cause">Searching for the Root Cause</h2>
<p>An online search turned up an <a href="https://bugs.openjdk.java.net/browse/JDK-8054446">OpenJDK bug</a> that identified an out of memory issue in the
<em>ConcurrentLinkedQueue</em>, which is a class in the <strong>java.util.concurrent</strong> package included in <strong>JVM version 1.6</strong>.
When working properly, <em>ConcurrentLinkedQueue</em> allows elements to be added and removed from the queue in a
thread-safe manner.</p>
<p>The bug caused a null object to be created whenever an element at the end of the queue was added and
then deleted. This behavior is particularly unfavorable to the way we use queuing. We call ActiveMQ
to create and destroy objects in the queue very quickly, tens of millions of times a day, as users
and agents connect to Okta. As a result, null objects rapidly fill up the queue, memory usage soars,
and CPUs spike.</p>
<h2 id="conference-call">Conference Call</h2>
<p>With the site at risk of impacting customer authentication, several key engineers, including Hector
Aguilar, Okta’s CTO and SVP of Engineering, met on a Saturday afternoon conference call. Discussion was intense, and our options were few and unappealing:
(a) revert all the way back to broker version 5.4.1, or (b) upgrade to broker version 5.11, which
was still unreleased and might introduce new problems.</p>
<p>As team members recall, Hector said very little during the first half of the meeting.</p>
<p>A bug in the JVM surprised Hector, as critical JVM bugs are relatively rare. Fortunately, the
OpenJDK bug we’d found included a very thorough description of the problem, as well as sample code
to reproduce it.</p>
<p>Initially motivated by curiosity, Hector analyzed the code and the bug description. He saw where the
problem was, and then checked online to see if it had been fixed in newer JDK versions. He noticed that
several things were changing in the class, and that others had attempted to resolve the bug in
different ways, but none that would solve our particular problem. Hector developed a very simple fix
of his own, trying to remain consistent with the work of others. He then verified his fix using the
provided sample code.</p>
<p>The JVM has a mechanism called <em>endorsed libraries</em> that allows developers to override an existing
class with a new class, effectively patching the JVM. Hector used this mechanism, packaged his fix
into a jar file, tried it against ActiveMQ, and found that it worked.</p>
<p>The mood and direction of the meeting shifted dramatically when Hector said, <em>“Guys, I have a wild
idea. What if we patch the JVM?”</em> As none of us had ever patched a JVM before, this seemed like a novel approach, even a long
shot.</p>
<h2 id="the-fix">The Fix</h2>
<p>Hector sent his JVM patch and sample code to the team and walked us through it. First, he explained
why the other attempted fixes wouldn’t solve our particular problem. He then demonstrated how his override effectively patched the original (faulty) removal method. Members of the
team volunteered to test the override at scale with our simulated environments. Within a few hours,
we were fairly sure that Hector’s fix would work.</p>