You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want Persul to support production workloads. This issue defines what that actually means in terms of numbers.
Numbers
PURL creation
Parameters
A rough estimation which I have repeatedly used, adjusted for nice fractions:
1.000 distinct users creating purls
1.000 purls created / per day * per user (= 40 / hour ~ 0.6 / minute)
86.400 purls created / day * user
breaks down into 3.600 / hour = 60 / minute = 1 / second
purls have a minimum lifespan of 10 years
Deductions
Given these numbers ...
86.400.000 purls would be created every year
864.000.000 purls would have been created after 10 years
Side note on data types:
32-bit integer IDs will be able to address ~4 billion PURLs.
With 64 bit that would go up to 18.446.744.073.709.551.615
UUIDs are 128 bit have an adress space large enough that I won't bother writing down the number. Don't quote me on that 😉
PURL resolution
In my use case I expect that PURLs will be resolved infrequently, potentially never.
Due to that expectation I don't yet have detailed numbers expect these:
Only 10% of all PURLs are ever resovled. The remaining 90% are created but never resolved.
Users expect a PURL to resolve within 1 second on average. This does not take into account the time the actual resolved URL is loaded by browsers since that is not under Persurls control.
Subtasks
Run purl creation load tests on SQLite
Run purl creation load tests on Postgres
Run resolution load tests on Postgres
Run mixed load tests on Postgres - How does the application larger amounts of concurrent writes+reads
Consider upgrading to 64bit integer for primary id in purls table
I ran the application (based on #42) on Railway while trying to set up a demo deployment (see #40). I ran the load tests which gave me these numbers (1 minute for each parameter set):
Concurrent PURL creators
Average request time
1-25
10-20ms
50
50ms
100
700ms
200
800ms
250
1.2s
500
2.6s
This shows that there is definitely a bottleneck at least due to the addition of a RWMutex in #38.
Right now these numbers are okay with me and the show a potential for optimization. I am sure that the request times will go down once SQLite is replaced with a dedicated database #7.
1000 agents tried to create a PURL every 50ms. In the current test these agents created a total of 190.382 purls in 1 minute.
As far as request/response time of PURL creation goes, I consider Persul to be production ready.
I have not yet done any tests on resolve times which I may add later
Currently, the id columns of the purls table is an 32bit integer. A future version of persul will have to go for a 64bit integer. To avoid downtime due to reaching this limit, the system should implement a health check based on the remaining available row count in the purls table. (Metric for remaining available database rows #62)
I want Persul to support production workloads. This issue defines what that actually means in terms of numbers.
Numbers
PURL creation
Parameters
A rough estimation which I have repeatedly used, adjusted for nice fractions:
Deductions
Given these numbers ...
Side note on data types:
PURL resolution
In my use case I expect that PURLs will be resolved infrequently, potentially never.
Due to that expectation I don't yet have detailed numbers expect these:
Subtasks
purls
tableRelated
The text was updated successfully, but these errors were encountered: