Support Production Workloads #39

fabiante · 2023-09-16T13:28:10Z

I want Persul to support production workloads. This issue defines what that actually means in terms of numbers.

Numbers

PURL creation

Parameters

A rough estimation which I have repeatedly used, adjusted for nice fractions:

1.000 distinct users creating purls
1.000 purls created / per day * per user (= 40 / hour ~ 0.6 / minute)
86.400 purls created / day * user
- breaks down into 3.600 / hour = 60 / minute = 1 / second
purls have a minimum lifespan of 10 years

Deductions

Given these numbers ...

86.400.000 purls would be created every year
864.000.000 purls would have been created after 10 years

Side note on data types:

32-bit integer IDs will be able to address ~4 billion PURLs.
With 64 bit that would go up to 18.446.744.073.709.551.615
UUIDs are 128 bit have an adress space large enough that I won't bother writing down the number. Don't quote me on that 😉

PURL resolution

In my use case I expect that PURLs will be resolved infrequently, potentially never.

Due to that expectation I don't yet have detailed numbers expect these:

Only 10% of all PURLs are ever resovled. The remaining 90% are created but never resolved.
Users expect a PURL to resolve within 1 second on average. This does not take into account the time the actual resolved URL is loaded by browsers since that is not under Persurls control.

Subtasks

Run purl creation load tests on SQLite
Run purl creation load tests on Postgres
Run resolution load tests on Postgres
Run mixed load tests on Postgres - How does the application larger amounts of concurrent writes+reads
Consider upgrading to 64bit integer for primary id in purls table

Summary

Generally response times have dramatically improved with the addition of postgres in Replace SQLite with Postgres #50.
1000 agents tried to create a PURL every 50ms. In the current test these agents created a total of 190.382 purls in 1 minute.

As far as request/response time of PURL creation goes, I consider Persul to be production ready.

I have not yet done any tests on resolve times which I may add later
Currently, the id columns of the purls table is an 32bit integer. A future version of persul will have to go for a 64bit integer. To avoid downtime due to reaching this limit, the system should implement a health check based on the remaining available row count in the purls table. (Metric for remaining available database rows #62)

fabiante added testing Related to testing / writing tests epic A large goal generally guiding development labels Sep 16, 2023

fabiante mentioned this issue Sep 16, 2023

Load Testing #28

Closed

2 tasks

fabiante pinned this issue Sep 16, 2023

fabiante unpinned this issue Sep 16, 2023

fabiante added this to the v1 - First production ready version milestone Sep 17, 2023

This was referenced Sep 18, 2023

Add Postgres support #48

Closed

Adjust load test params to match recent findings #56

Merged

Scalable Persistence layer #7

Closed

fabiante mentioned this issue Sep 19, 2023

Metric for remaining available database rows #62

Open

fabiante self-assigned this Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Production Workloads #39

Support Production Workloads #39

fabiante commented Sep 16, 2023 •

edited

Loading

fabiante commented Sep 17, 2023

fabiante commented Sep 19, 2023 •

edited

Loading

Support Production Workloads #39

Support Production Workloads #39

Comments

fabiante commented Sep 16, 2023 • edited Loading

Numbers

PURL creation

Parameters

Deductions

PURL resolution

Subtasks

Related

fabiante commented Sep 17, 2023

fabiante commented Sep 19, 2023 • edited Loading

Summary

fabiante commented Sep 16, 2023 •

edited

Loading

fabiante commented Sep 19, 2023 •

edited

Loading