-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xk6-disruptor: add a first-steps monopage #1414
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -2,30 +2,212 @@ | |||||||||||||
title: 'xk6-disruptor first steps' | ||||||||||||||
excerpt: 'xk6-disruptor is a k6 extension providing fault injection capabilities to k6.' | ||||||||||||||
weight: 01 | ||||||||||||||
aliases: | ||||||||||||||
- /docs/k6/latest/javascript-api/xk6-disruptor/get-started/ | ||||||||||||||
--- | ||||||||||||||
|
||||||||||||||
# xk6-disruptor first steps | ||||||||||||||
# First steps | ||||||||||||||
|
||||||||||||||
[xk6-disruptor](https://github.com/grafana/xk6-disruptor) is an extension that adds fault injection capabilities to k6. | ||||||||||||||
This document will guide you through running your first fault injection test using xk6-disruptor, while providing pointers to the more detailed documentation sections along the way. We will cover the following sections: | ||||||||||||||
|
||||||||||||||
It provides a Javascript [API](https://grafana.com/docs/k6/<K6_VERSION>/javascript-api/xk6-disruptor/) to inject [faults](https://grafana.com/docs/k6/<K6_VERSION>/javascript-api/xk6-disruptor/faults) such as errors and delays into HTTP and gRPC requests served by selected Kubernetes [Pods](https://grafana.com/docs/k6/<K6_VERSION>/javascript-api/xk6-disruptor/poddisruptor) or [Services](https://grafana.com/docs/k6/<K6_VERSION>/javascript-api/xk6-disruptor/servicedisruptor). | ||||||||||||||
1. [Checking requirements](#checking-requirements) | ||||||||||||||
1. [Installing xk6-disruptor](#installing-xk6-disruptor) | ||||||||||||||
1. [Creating a simple k6 test](#creating-a-simple-k6-test) | ||||||||||||||
1. [Add fault injeciton capabilities](#add-fault-injection-capabilities) | ||||||||||||||
|
||||||||||||||
# Checking requirements | ||||||||||||||
|
||||||||||||||
xk6-disruptor focuses on injecting faults on applications running in Kubernetes. If you already have your application deployed to a development Kubernetes cluster, you are all set! If you don't, we will guide you through deploying a demo application on a local `minikube` cluster. | ||||||||||||||
|
||||||||||||||
In order to follow this guide, you will need | ||||||||||||||
|
||||||||||||||
- A microservice-based application that communicates over HTTP or GRPC. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
- The ability to reach at least one service exposed by the application from the machine where you will run the tests. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I would also add a reference to "exposing your application". |
||||||||||||||
- Privileged access to the Kubernetes cluster where your application is running from the machine where you will run the tests. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which privileges are needed? |
||||||||||||||
|
||||||||||||||
## Use our demo application | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this section should go in its own high-level section, before the "creating a simple test" section. |
||||||||||||||
|
||||||||||||||
If you don't have an application and/or a testing Kubernetes cluster set up, you can deploy the QuickPizza demo application. It will only take two commands and we have made it a perfect fit to demo Grafana tools, including xk6-disruptor. You can check the instructions to create a local cluster and deploy QuickPizza [here](https://github.com/grafana/quickpizza/blob/main/docs/kubernetes-setup.md). | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
# Installing xk6-disruptor | ||||||||||||||
|
||||||||||||||
xk6-disruptor is distributed as a statically-compiled, dependency-free binary for Linux, MacOS, and Windows. For Linux and MacOS, both x86 and arm64 architectures are supported. | ||||||||||||||
|
||||||||||||||
You can download the latest release for your architecture and platfrom from https://github.com/grafana/xk6-disruptor/releases. Look for the files named `xk6-disruptor-v*`, not `xk6-disruptor-agent-v*`: The latter are for advanced users and developers only. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
The xk6-disruptor binary includes the core k6 functionality. If you want to use xk6-disruptor together with other k6 extensions, you will need to follow a different procedure detailed in the [Installation](/docs/k6/<K6_VERSION>/testing-guides/injecting-faults-with-xk6-disruptor/installation/) page. | ||||||||||||||
|
||||||||||||||
# Creating a simple k6 test | ||||||||||||||
|
||||||||||||||
k6 was born and is often referred to as a load testing tool. However, with the flexibility that testing as code provides, and the large amount of extensions (such as xk6-disruptor) it has, a better way to think about k6 is as a reliability testing tool. We will now write a k6 test that checks if an API is behaving correctly, without putting enough load on it to significantly alter its behavior. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
This is a key part of the xk6-disruptor design philosophy: We want to test how our application behaves in adverse conditions, by carefully injecting those conditions into the application, rather than attempting to produce them as a side effect of stressing the application. This allows us to produce reliable and reproducible tests that are minimally dependent on the environment and transient state of the application. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
A simple k6 test that checks an API works as expected without stressing it could be written as follows: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
```javascript | ||||||||||||||
import http from "k6/http"; | ||||||||||||||
import { check, sleep } from "k6"; | ||||||||||||||
|
||||||||||||||
const BASE_URL = 'http://localhost:3333'; // <- Your reachable endpoint goes here. | ||||||||||||||
|
||||||||||||||
export const options = { | ||||||||||||||
scenarios: { | ||||||||||||||
// Our test scenario: Runs 10 parallel virtual users, each executes the function 'test' 10 times. | ||||||||||||||
test: { | ||||||||||||||
exec: 'test', | ||||||||||||||
executor: 'per-vu-iterations', | ||||||||||||||
vus: 10, | ||||||||||||||
iterations: 10, | ||||||||||||||
} | ||||||||||||||
} | ||||||||||||||
}; | ||||||||||||||
|
||||||||||||||
export function test() { | ||||||||||||||
// Make a POST request to our API endpoint. | ||||||||||||||
let res = http.post(`${BASE_URL}/api/pizza`, "{}", { | ||||||||||||||
headers: { | ||||||||||||||
'X-User-ID': 23423, | ||||||||||||||
}, | ||||||||||||||
}); | ||||||||||||||
|
||||||||||||||
// Check our API returns 200. | ||||||||||||||
check(res, { "status is 200": (res) => res.status === 200 }); | ||||||||||||||
sleep(1); | ||||||||||||||
} | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
You can run this test with `xk6-disruptor` and you should get an output similar to the following. Note that some lines have been elided for brevity. | ||||||||||||||
|
||||||||||||||
```console | ||||||||||||||
$ xk6-disruptor run first-steps-1.js | ||||||||||||||
# ... | ||||||||||||||
✓ status is 200 | ||||||||||||||
# ... | ||||||||||||||
running (00m10.6s), 00/10 VUs, 100 complete and 0 interrupted iterations | ||||||||||||||
test ✓ [======================================] 10 VUs 00m10.6s/10m0s 100/100 iters, 10 per VU | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
This barely scratches the surface of what k6 can do. To learn more about what different types of requests, checks, thresholds, and more things you can do you can check the [API load testing](/docs/k6/<K6_VERSION>/testing-guides/api-load-testing/) page. | ||||||||||||||
|
||||||||||||||
# Add fault injection capabilities | ||||||||||||||
|
||||||||||||||
At this point we are ready to inject some faults into the application and see how it behaves. xk6-disruptor injects faults into your destination by altering how some pods or services respond to requests. A first approach could be to inject a fault causing the endpoint we are testing, `/api/pizza`, to return [503 Service Unavailable](https://http.cat/503) for 10% of the calls. | ||||||||||||||
|
||||||||||||||
![Diagram consisting of three horizontal elements connected that arrows that point from one to the one on their right. The first element is an image of a female-presenting drawing of a crocodile, Bertha, which represents the user performing the test. An arrow points to the right to a service labeled Recommendations, but the arrow is interrupted by another cartoon crocodile snapping a cable, the xk6-disruptor project logo. Finally, from the Recommendations service, another arrow points to the Right to a service labeled Catalog](https://grafana.com/media/docs/k6-oss/xk6-disruptor-get-started-naive.png) | ||||||||||||||
|
||||||||||||||
While this can be an interesting test in some cases, this is not the real power of xk6-disruptor, as the results are pretty much predictable: We know our test results will report a 10% of failures. | ||||||||||||||
Comment on lines
+95
to
+99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that for a first contact we should get the guide focuses on one objective. Explaining two approaches may introduce more confusion than it helps users to understand how to use the disruptor.
Suggested change
|
||||||||||||||
|
||||||||||||||
A more interesting way of using xk6-disruptor is to check how faults propagate across a distributed application. For the QuickPizza example, we know the Recommendations service, which provides the `/api/pizza` endpoint we were hitting earlier, has to make several calls to another service named Catalog. We can keep our test as above, testing for failures in the Recommendation service, but inject faults in the Catalog service to see how Recommendations reacts to that: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
![A very similar diagram as above, with the three components connected by right-facing arrows. This time, the xk6-disruptor logo is located between the Recommendations and the Catalog services, instead of between the user and the Recommendations service](https://grafana.com/media/docs/k6-oss/xk6-disruptor-get-started-real.png) | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
We can use a [`ServiceDisruptor`](/docs/k6/<K6_VERSION>/javascript-api/xk6-disruptor/servicedisruptor/) to inject faults into the Catalog service that recommendations uses: | ||||||||||||||
|
||||||||||||||
```javascript | ||||||||||||||
export function disrupt() { | ||||||||||||||
const disruptor = new ServiceDisruptor("quickpizza-catalog", "default"); | ||||||||||||||
disruptor.injectHTTPFaults({errorRate: 0.1, errorCode: 503}, "10s"); | ||||||||||||||
} | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
The function above will create a disruption in the Kubernetes service called `quickpizza-catalog` in the `default` namespace, causing 10% of the requests to return [503 Service Unavailable](https://http.cat/503). | ||||||||||||||
|
||||||||||||||
We can now integrate this disruption into the simple test we wrote earlier: | ||||||||||||||
|
||||||||||||||
```javascript | ||||||||||||||
export default function () { | ||||||||||||||
// Create a new disruptor that targets a service | ||||||||||||||
const disruptor = new ServiceDisruptor("app-service","app-namespace"); | ||||||||||||||
|
||||||||||||||
// Disrupt the targets by injecting delays and faults into HTTP request for 30 seconds | ||||||||||||||
const fault = { | ||||||||||||||
averageDelay: '500ms', | ||||||||||||||
errorRate: 0.1, | ||||||||||||||
errorCode: 500 | ||||||||||||||
import http from "k6/http"; | ||||||||||||||
import { check, sleep } from "k6"; | ||||||||||||||
import {ServiceDisruptor} from "k6/x/disruptor"; | ||||||||||||||
|
||||||||||||||
const BASE_URL = 'http://localhost:3333'; // <- Your reachable endpoint goes here. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
export const options = { | ||||||||||||||
scenarios: { | ||||||||||||||
// Our test scenario: Runs 10 parallel virtual users, each executes the function 'test' 10 times. | ||||||||||||||
test: { | ||||||||||||||
exec: 'test', | ||||||||||||||
executor: 'per-vu-iterations', | ||||||||||||||
vus: 10, | ||||||||||||||
iterations: 10, | ||||||||||||||
startTime: "10s", // Disruption may take a bit to kick off, so we wait 10s before start testing. | ||||||||||||||
}, | ||||||||||||||
disrupt: { | ||||||||||||||
exec: 'disrupt', | ||||||||||||||
executor: 'shared-iterations', | ||||||||||||||
vus: 1, | ||||||||||||||
iterations: 1 | ||||||||||||||
} | ||||||||||||||
disruptor.injectHTTPFaults(fault, "30s") | ||||||||||||||
} | ||||||||||||||
}; | ||||||||||||||
|
||||||||||||||
export function test() { | ||||||||||||||
// Make a POST request to our API endpoint. | ||||||||||||||
let res = http.post(`${BASE_URL}/api/pizza`, "{}", { | ||||||||||||||
headers: { | ||||||||||||||
'X-User-ID': 23423, | ||||||||||||||
}, | ||||||||||||||
}); | ||||||||||||||
|
||||||||||||||
// Check our API returns 200. | ||||||||||||||
check(res, { "status is 200": (res) => res.status === 200 }); | ||||||||||||||
sleep(1); | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
export function disrupt() { | ||||||||||||||
const disruptor = new ServiceDisruptor("quickpizza-catalog", "default"); | ||||||||||||||
disruptor.injectHTTPFaults({errorRate: 0.1, errorCode: 503}, "20s"); | ||||||||||||||
} | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
And observe the output: | ||||||||||||||
```console | ||||||||||||||
$ xk6-disruptor run first-steps-2.js | ||||||||||||||
|
||||||||||||||
/\ |‾‾| /‾‾/ /‾‾/ | ||||||||||||||
/\ / \ | |/ / / / | ||||||||||||||
/ \/ \ | ( / ‾‾\ | ||||||||||||||
/ \ | |\ \ | (‾) | | ||||||||||||||
/ __________ \ |__| \__\ \_____/ .io | ||||||||||||||
|
||||||||||||||
execution: local | ||||||||||||||
script: first-steps-2.js | ||||||||||||||
output: - | ||||||||||||||
|
||||||||||||||
scenarios: (100.00%) 2 scenarios, 11 max VUs, 10m40s max duration (incl. graceful stop): | ||||||||||||||
* disrupt: 1 iterations shared among 1 VUs (maxDuration: 10m0s, exec: disrupt, gracefulStop: 30s) | ||||||||||||||
* test: 10 iterations for each of 10 VUs (maxDuration: 10m0s, exec: test, startTime: 10s, gracefulStop: 30s) | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
✗ status is 200 | ||||||||||||||
↳ 48% — ✓ 48 / ✗ 52 | ||||||||||||||
|
||||||||||||||
checks.........................: 48.00% ✓ 48 ✗ 52 | ||||||||||||||
data_received..................: 43 kB 1.8 kB/s | ||||||||||||||
data_sent......................: 13 kB 541 B/s | ||||||||||||||
http_req_blocked...............: avg=16.75µs min=1.07µs med=1.94µs max=157.83µs p(90)=66.74µs p(95)=141.84µs | ||||||||||||||
http_req_connecting............: avg=4.51µs min=0s med=0s max=52.44µs p(90)=3.89µs p(95)=45.7µs | ||||||||||||||
http_req_duration..............: avg=28.43ms min=474.51µs med=8.36ms max=102.8ms p(90)=75.93ms p(95)=93.84ms | ||||||||||||||
{ expected_response:true }...: avg=57.2ms min=20.37ms med=56.99ms max=102.8ms p(90)=94.76ms p(95)=100.65ms | ||||||||||||||
http_req_failed................: 52.00% ✓ 52 ✗ 48 | ||||||||||||||
http_req_receiving.............: avg=21.63µs min=10.14µs med=19.36µs max=70.34µs p(90)=30.96µs p(95)=32.96µs | ||||||||||||||
http_req_sending...............: avg=12.64µs min=6.42µs med=11.04µs max=39.91µs p(90)=21.64µs p(95)=28.52µs | ||||||||||||||
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s | ||||||||||||||
http_req_waiting...............: avg=28.4ms min=457µs med=8.31ms max=102.75ms p(90)=75.89ms p(95)=93.8ms | ||||||||||||||
http_reqs......................: 100 4.189952/s | ||||||||||||||
iteration_duration.............: avg=1.25s min=1s med=1s max=23.86s p(90)=1.07s p(95)=1.09s | ||||||||||||||
iterations.....................: 101 4.231852/s | ||||||||||||||
vus............................: 1 min=1 max=11 | ||||||||||||||
vus_max........................: 11 min=11 max=11 | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
running (00m23.9s), 00/11 VUs, 101 complete and 0 interrupted iterations | ||||||||||||||
disrupt ✓ [======================================] 1 VUs 00m23.9s/10m0s 1/1 shared iters | ||||||||||||||
test ✓ [======================================] 10 VUs 00m10.6s/10m0s 100/100 iters, 10 per VU | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
In this case, we can see that around 50% of requests failed, which is significantly more than the 10% failure rate we specified. The reason for this is that for each request that the Recommendations service receives, it will actually make several requests to the Catalog service. If any of those requests fail, the original request also fails, creating a failure amplification effect. | ||||||||||||||
|
||||||||||||||
## Next steps | ||||||||||||||
|
||||||||||||||
Explore the fault injection [API](https://grafana.com/docs/k6/<K6_VERSION>/javascript-api/xk6-disruptor/) | ||||||||||||||
|
@@ -38,4 +220,4 @@ Learn the basics of using the disruptor in your test project: | |||||||||||||
|
||||||||||||||
- [Requirements](https://grafana.com/docs/k6/<K6_VERSION>/testing-guides/injecting-faults-with-xk6-disruptor/requirements) | ||||||||||||||
|
||||||||||||||
- [Installation](https://grafana.com/docs/k6/<K6_VERSION>/testing-guides/injecting-faults-with-xk6-disruptor/installation) | ||||||||||||||
- [Installation](https://grafana.com/docs/k6/<K6_VERSION>/testing-guides/injecting-faults-with-xk6-disruptor/installation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should focus on the example. If the reader already has an application, how does this guide help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the opposite, actually: I added the example so people can still follow if they don't have their application deployed and accessible right when they are reading the article, but my original intention was for the guide to be the point of reference for developers/sres who do have an application and want to start using the disruptor. I think this was the feedback we got and originated this page: "I have an application but the multi-paged structure is hard to follow end to end to get me from zero to something"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is still the goal and I understand the intention, but my own experience reading the content was the opposite.
It was unclear what the general steps are and what is specific to the demo. And the example makes a very long chunk of text and code in the middle of the guide!
The main problem I see is that any example with the disruptor API would be application specific.
I still think it would be better to put all the steps together and then demo them in an example. In this way, if someone already has an application the demo won't be in the middle.