From 8bff949786619791ed3224addd3abe1c84155289 Mon Sep 17 00:00:00 2001 From: Daniel Garijo Date: Wed, 15 Jul 2020 19:29:22 -0700 Subject: [PATCH 1/6] update performance --- docs/benchmarking.md | 51 ++++++++++++++++++++++++++++++++++---------- mkdocs.yml | 2 +- 2 files changed, 41 insertions(+), 12 deletions(-) diff --git a/docs/benchmarking.md b/docs/benchmarking.md index 0d05e12..664e3da 100644 --- a/docs/benchmarking.md +++ b/docs/benchmarking.md @@ -1,29 +1,58 @@ +!!! Recommendation + We recommend using a Reverse Proxy with Caching when deploying OBA in production. -In this test, we evaluate three ways to obtain 100 resources from a endpoint, -Methods: +In this page, we illustrate two performance tests of OBA: 1) the overhead introduced by framing the SPARQL results; and 2) the performance of the system when retrieving results. + +## Overhead analysis +In order to perform this test, we retrieved a series of results from a SPARQL endpoint (Fuseki server) doing regular SPARQL queries; and we compared them against doing an equivalent query through an OBA-generated API (GET queries, without cache enabled). The results show that OBA adds a slight overhead below 200ms for the majority of the queries; and between 200 and 250ms for 8% of the queries. + +``` +cat endpoint.json | ./../vegeta report -type="hist[0,50ms,100ms,150ms,200ms,250ms, 350ms]" +Bucket # % Histogram +[0s, 50ms] 59 98.33% ######################################################################### +[50ms, 100ms] 1 1.67% # +[100ms, 150ms] 0 0.00% +[150ms, 200ms] 0 0.00% +[200ms, 250ms] 0 0.00% +[250ms, 350ms] 0 0.00% +[350ms, +Inf] 0 0.00% +cat api_cached_disabled_60s_1_1.json | ./../vegeta report -type="hist[0,50ms,100ms,150ms,200ms,250ms, 350ms]" +Bucket # % Histogram +[0s, 50ms] 0 0.00% +[50ms, 100ms] 0 0.00% +[100ms, 150ms] 0 0.00% +[150ms, 200ms] 55 91.67% #################################################################### +[200ms, 250ms] 5 8.33% ###### +[250ms, 350ms] 0 0.00% +[350ms, +Inf] 0 0.00% +``` + + +## Result retrieval performance + +we evaluate three ways to obtain 100 resources from a SPARQL endpoint (we used Fuseki server in this analysis) + +**Methods**: 1. Using API generated by OBA (python) using pagination. 2. Using API generated by OBA (python) using pagination and Reverse Proxy with caching enabled (NGINX). -3. Sending the SPARQL query to endpoint. +3. Sending the SPARQL query directly to endpoint. ### Summary -!!! TLDR - We recommend to use Reverse Proxy with Caching cache in production. - -If you use a Reverse Proxy with Caching, you can use OBA to production environment. We recommend to use NGINX and follow the guide [NGINX Content Caching](https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/) +When using a Reverse Proxy with Caching, OBA performs appropriately for deployment in a production environment. We recommend to use NGINX and follow the [NGINX Content Caching](https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/) guide. -The following figures show the latency by percentile distribution in a test scenario, the client sent 60 requests per second over 600 seconds. +The following figures show the latency by percentile distribution in a test scenario, where a client sent 60 requests per second over 600 seconds. Tests have been performed on a Fuseki server and the current Python implementation for OBA. ![Diagram](figures/api_cached.png) -**Figure 1**: 60 requests per second using Reverse Proxy with Caching +**Figure 1**: Submitting 60 requests per second using Reverse Proxy with Caching ![Diagram](figures/endpoint.png) -**Figure 2**: 60 requests per second to the endpoint +**Figure 2**: Submitting 60 requests per second to the SPARQL endpoint -We concluded the latencies are similar in the 99.9% of the requests. +As shown in both figures, latencies are similar in the 99.9% of the requests when enabling a reverse proxy with caching. When disabling the reverse proxy with caching, the performance deteriorates when receiving more than 10 queries per second, as shown in the tables below. Therefore, we recommend **enabling a reverse proxy with caching** when deploying OBA. ### Tests diff --git a/mkdocs.yml b/mkdocs.yml index 62eb122..9daff22 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -15,7 +15,7 @@ nav: - 'Authentication': 'authentication.md' - Configuration file: 'configuration_file.md' - Examples: examples.md - - Benchmark and production recommendation: benchmarking.md + - Performance tips: benchmarking.md theme: name: material From 1d674fccea20c982c045151c7d8d5bc7626dca32 Mon Sep 17 00:00:00 2001 From: Daniel Garijo Date: Wed, 15 Jul 2020 21:45:19 -0700 Subject: [PATCH 2/6] Update benchmarking.md --- docs/benchmarking.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/docs/benchmarking.md b/docs/benchmarking.md index 664e3da..be93e11 100644 --- a/docs/benchmarking.md +++ b/docs/benchmarking.md @@ -2,10 +2,13 @@ We recommend using a Reverse Proxy with Caching when deploying OBA in production. -In this page, we illustrate two performance tests of OBA: 1) the overhead introduced by framing the SPARQL results; and 2) the performance of the system when retrieving results. +In this page, we illustrate two performance tests of OBA: +1. The overhead introduced by framing SPARQL results into JSON; and +2. The performance of the API when retrieving results when multiple requests are received at the same time. + ## Overhead analysis -In order to perform this test, we retrieved a series of results from a SPARQL endpoint (Fuseki server) doing regular SPARQL queries; and we compared them against doing an equivalent query through an OBA-generated API (GET queries, without cache enabled). The results show that OBA adds a slight overhead below 200ms for the majority of the queries; and between 200 and 250ms for 8% of the queries. +In order to perform this test, we retrieved a series of results from a SPARQL endpoint (Fuseki server) doing regular SPARQL queries; and we compared them against doing an equivalent query through an OBA-generated API (GET queries, without cache enabled). The results show that OBA adds a slight overhead below 150ms for the majority of the queries with respect to the SPARQL endpoint (below 50ms); and between 150 and 200ms for 8% of the queries. ``` cat endpoint.json | ./../vegeta report -type="hist[0,50ms,100ms,150ms,200ms,250ms, 350ms]" @@ -31,18 +34,18 @@ Bucket # % Histogram ## Result retrieval performance -we evaluate three ways to obtain 100 resources from a SPARQL endpoint (we used Fuseki server in this analysis) +We evaluate three ways to obtain 100 resources from a SPARQL endpoint (we used Fuseki server in this analysis) **Methods**: -1. Using API generated by OBA (python) using pagination. -2. Using API generated by OBA (python) using pagination and Reverse Proxy with caching enabled (NGINX). +1. Using an API generated by OBA (python) with pagination. +2. Using an API generated by OBA (python) with pagination and Reverse Proxy with caching enabled (NGINX). 3. Sending the SPARQL query directly to endpoint. ### Summary -When using a Reverse Proxy with Caching, OBA performs appropriately for deployment in a production environment. We recommend to use NGINX and follow the [NGINX Content Caching](https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/) guide. +When using a Reverse Proxy with Caching, OBA performs appropriately for deployment in a production environment. We **recommend to use NGINX** and follow the [NGINX Content Caching](https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/) guide. The following figures show the latency by percentile distribution in a test scenario, where a client sent 60 requests per second over 600 seconds. Tests have been performed on a Fuseki server and the current Python implementation for OBA. @@ -100,7 +103,7 @@ Bucket # % [100ms, 200ms] 0 0.00% ``` -##### Endpoint +##### SPARQL Endpoint ``` Requests [total, rate, throughput] 300, 5.02, 5.01 From 4cc13cb2aaac68eba5b0f894b67e95e431df388e Mon Sep 17 00:00:00 2001 From: Daniel Garijo Date: Thu, 16 Jul 2020 08:33:06 -0700 Subject: [PATCH 3/6] Update benchmarking.md --- docs/benchmarking.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/benchmarking.md b/docs/benchmarking.md index be93e11..1c365a2 100644 --- a/docs/benchmarking.md +++ b/docs/benchmarking.md @@ -6,6 +6,7 @@ In this page, we illustrate two performance tests of OBA: 1. The overhead introduced by framing SPARQL results into JSON; and 2. The performance of the API when retrieving results when multiple requests are received at the same time. +The tests have been performed on the [model catalog OBA-Generated API](https://api.models.mint.isi.edu/v1.5.0/ui/#/), which uses a Fuseki tiple store as SPARQL endpoint. ## Overhead analysis In order to perform this test, we retrieved a series of results from a SPARQL endpoint (Fuseki server) doing regular SPARQL queries; and we compared them against doing an equivalent query through an OBA-generated API (GET queries, without cache enabled). The results show that OBA adds a slight overhead below 150ms for the majority of the queries with respect to the SPARQL endpoint (below 50ms); and between 150 and 200ms for 8% of the queries. @@ -103,7 +104,7 @@ Bucket # % [100ms, 200ms] 0 0.00% ``` -##### SPARQL Endpoint +##### SPARQL Endpoint (Fuseki) ``` Requests [total, rate, throughput] 300, 5.02, 5.01 From b7b0c85bd41fc9188d188b869071824674833750 Mon Sep 17 00:00:00 2001 From: Daniel Garijo Date: Thu, 16 Jul 2020 08:34:12 -0700 Subject: [PATCH 4/6] Update benchmarking.md --- docs/benchmarking.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/benchmarking.md b/docs/benchmarking.md index 1c365a2..8ce9ad6 100644 --- a/docs/benchmarking.md +++ b/docs/benchmarking.md @@ -174,7 +174,7 @@ Bucket # % [200ms, 300ms] 1 0.17% ``` -##### Endpoint +##### SPARQL Endpoint(Fuseki) ``` @@ -248,7 +248,7 @@ Bucket # % [100ms, 200ms] 1 0.11% ``` -##### Endpoint +##### SPARQL Endpoint (Fuseki) ``` Requests [total, rate, throughput] 3600, 60.02, 60.00 From 7546635912dbe515a4015898a4ca222d5d1e6e50 Mon Sep 17 00:00:00 2001 From: Daniel Garijo Date: Thu, 16 Jul 2020 08:34:32 -0700 Subject: [PATCH 5/6] Update benchmarking.md --- docs/benchmarking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/benchmarking.md b/docs/benchmarking.md index 8ce9ad6..456cfb3 100644 --- a/docs/benchmarking.md +++ b/docs/benchmarking.md @@ -6,7 +6,7 @@ In this page, we illustrate two performance tests of OBA: 1. The overhead introduced by framing SPARQL results into JSON; and 2. The performance of the API when retrieving results when multiple requests are received at the same time. -The tests have been performed on the [model catalog OBA-Generated API](https://api.models.mint.isi.edu/v1.5.0/ui/#/), which uses a Fuseki tiple store as SPARQL endpoint. +The tests have been performed on the [model catalog OBA-Generated API](https://api.models.mint.isi.edu/v1.5.0/ui/#/), which uses a Fuseki triple store as SPARQL endpoint. ## Overhead analysis In order to perform this test, we retrieved a series of results from a SPARQL endpoint (Fuseki server) doing regular SPARQL queries; and we compared them against doing an equivalent query through an OBA-generated API (GET queries, without cache enabled). The results show that OBA adds a slight overhead below 150ms for the majority of the queries with respect to the SPARQL endpoint (below 50ms); and between 150 and 200ms for 8% of the queries. From 00510559f8e445299cbe28ca34146727ad9c5b2c Mon Sep 17 00:00:00 2001 From: Daniel Garijo Date: Thu, 16 Jul 2020 08:53:41 -0700 Subject: [PATCH 6/6] Update benchmarking.md --- docs/benchmarking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/benchmarking.md b/docs/benchmarking.md index 456cfb3..521292d 100644 --- a/docs/benchmarking.md +++ b/docs/benchmarking.md @@ -31,7 +31,7 @@ Bucket # % Histogram [250ms, 350ms] 0 0.00% [350ms, +Inf] 0 0.00% ``` - +Since we use pagination, we expect these results to be applicable for other APIs and knowledge graphs. The only case where the overhead may increase is when a resource has hundreds of properties, as the framing into JSON-LD will be delayed. This may be circumvented with a custom query; or by simplifying the API schema of the target class. ## Result retrieval performance