Skip to content

Commit

Permalink
Merge branch 'master' of github.com:KnowledgeCaptureAndDiscovery/OBA
Browse files Browse the repository at this point in the history
  • Loading branch information
mosoriob committed Jul 16, 2020
2 parents 2a65532 + 0051055 commit d50f5fe
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 17 deletions.
65 changes: 49 additions & 16 deletions docs/benchmarking.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,62 @@
!!! Recommendation
We recommend using a Reverse Proxy with Caching when deploying OBA in production.

In this test, we evaluate three ways to obtain 100 resources from a endpoint,

Methods:
In this page, we illustrate two performance tests of OBA:
1. The overhead introduced by framing SPARQL results into JSON; and
2. The performance of the API when retrieving results when multiple requests are received at the same time.

1. Using API generated by OBA (python) using pagination.
2. Using API generated by OBA (python) using pagination and Reverse Proxy with caching enabled (NGINX).
3. Sending the SPARQL query to endpoint.
The tests have been performed on the [model catalog OBA-Generated API](https://api.models.mint.isi.edu/v1.5.0/ui/#/), which uses a Fuseki triple store as SPARQL endpoint.

## Overhead analysis
In order to perform this test, we retrieved a series of results from a SPARQL endpoint (Fuseki server) doing regular SPARQL queries; and we compared them against doing an equivalent query through an OBA-generated API (GET queries, without cache enabled). The results show that OBA adds a slight overhead below 150ms for the majority of the queries with respect to the SPARQL endpoint (below 50ms); and between 150 and 200ms for 8% of the queries.

### Summary
```
cat endpoint.json | ./../vegeta report -type="hist[0,50ms,100ms,150ms,200ms,250ms, 350ms]"
Bucket # % Histogram
[0s, 50ms] 59 98.33% #########################################################################
[50ms, 100ms] 1 1.67% #
[100ms, 150ms] 0 0.00%
[150ms, 200ms] 0 0.00%
[200ms, 250ms] 0 0.00%
[250ms, 350ms] 0 0.00%
[350ms, +Inf] 0 0.00%
cat api_cached_disabled_60s_1_1.json | ./../vegeta report -type="hist[0,50ms,100ms,150ms,200ms,250ms, 350ms]"
Bucket # % Histogram
[0s, 50ms] 0 0.00%
[50ms, 100ms] 0 0.00%
[100ms, 150ms] 0 0.00%
[150ms, 200ms] 55 91.67% ####################################################################
[200ms, 250ms] 5 8.33% ######
[250ms, 350ms] 0 0.00%
[350ms, +Inf] 0 0.00%
```
Since we use pagination, we expect these results to be applicable for other APIs and knowledge graphs. The only case where the overhead may increase is when a resource has hundreds of properties, as the framing into JSON-LD will be delayed. This may be circumvented with a custom query; or by simplifying the API schema of the target class.

## Result retrieval performance

We evaluate three ways to obtain 100 resources from a SPARQL endpoint (we used Fuseki server in this analysis)

!!! TLDR
We recommend to use Reverse Proxy with Caching cache in production.
**Methods**:

1. Using an API generated by OBA (python) with pagination.
2. Using an API generated by OBA (python) with pagination and Reverse Proxy with caching enabled (NGINX).
3. Sending the SPARQL query directly to endpoint.


### Summary

If you use a Reverse Proxy with Caching, you can use OBA to production environment. We recommend to use NGINX and follow the guide [NGINX Content Caching](https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/)
When using a Reverse Proxy with Caching, OBA performs appropriately for deployment in a production environment. We **recommend to use NGINX** and follow the [NGINX Content Caching](https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/) guide.

The following figures show the latency by percentile distribution in a test scenario, the client sent 60 requests per second over 600 seconds.
The following figures show the latency by percentile distribution in a test scenario, where a client sent 60 requests per second over 600 seconds. Tests have been performed on a Fuseki server and the current Python implementation for OBA.

![Diagram](figures/api_cached.png)
**Figure 1**: 60 requests per second using Reverse Proxy with Caching
**Figure 1**: Submitting 60 requests per second using Reverse Proxy with Caching

![Diagram](figures/endpoint.png)
**Figure 2**: 60 requests per second to the endpoint
**Figure 2**: Submitting 60 requests per second to the SPARQL endpoint

We concluded the latencies are similar in the 99.9% of the requests.
As shown in both figures, latencies are similar in the 99.9% of the requests when enabling a reverse proxy with caching. When disabling the reverse proxy with caching, the performance deteriorates when receiving more than 10 queries per second, as shown in the tables below. Therefore, we recommend **enabling a reverse proxy with caching** when deploying OBA.

### Tests

Expand Down Expand Up @@ -71,7 +104,7 @@ Bucket # %
[100ms, 200ms] 0 0.00%
```

##### Endpoint
##### SPARQL Endpoint (Fuseki)

```
Requests [total, rate, throughput] 300, 5.02, 5.01
Expand Down Expand Up @@ -141,7 +174,7 @@ Bucket # %
[200ms, 300ms] 1 0.17%
```

##### Endpoint
##### SPARQL Endpoint(Fuseki)


```
Expand Down Expand Up @@ -215,7 +248,7 @@ Bucket # %
[100ms, 200ms] 1 0.11%
```

##### Endpoint
##### SPARQL Endpoint (Fuseki)

```
Requests [total, rate, throughput] 3600, 60.02, 60.00
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ nav:
- 'Authentication': 'authentication.md'
- Configuration file: 'configuration_file.md'
- Examples: examples.md
- Benchmark and production recommendation: benchmarking.md
- Performance tips: benchmarking.md
theme:
name: material

Expand Down

0 comments on commit d50f5fe

Please sign in to comment.