Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement to ConcurrentQuery for multi tenant case #7548

Open
khanaffan opened this issue Jan 14, 2025 · 0 comments
Open

Improvement to ConcurrentQuery for multi tenant case #7548

khanaffan opened this issue Jan 14, 2025 · 0 comments
Labels
ecdb ECDb and ECSQL related issues

Comments

@khanaffan
Copy link
Contributor

khanaffan commented Jan 14, 2025

In a multi-tenant cloud use case, multiple iModels are served through a single node process. Typically, nodes are not designed for multi-threaded workflows. By default, concurrent queries use four worker threads and one monitor thread. The monitor thread checks the quota and can interrupt a query if it exceeds the assigned quota, running every second. Worker threads operate whenever there are queries in the queue, using a thread-safe callback to post results back to the node process.

Node.js is essentially a single-threaded library that uses an event loop. To send or receive data, tasks must be queued. When the event loop processes asynchronous tasks, it picks one from the queue and performs a callback, transforming data from the native layer on another thread into Node.js-compatible data. These asynchronous tasks are queued in the microtask queue, and every time the event loop runs, Node.js processes all tasks in the microtask queue.

In a multi-tenant setup, let's say we have ( n ) iModels running on a single node process. Each iModel will have a minimum of 5 threads, resulting in ( n times 5 ) threads. However, with only 4 CPUs available, the Kubernetes pod will divide the CPU time among all the threads, causing them to take longer to execute.

  • The primary goal of concurrent queries is to maximize CPU time per thread.
  • If there aren't enough full CPUs available per thread, all queries will run slower and may time out more frequently.
  • Node.js will have to process a much longer microtask queue under load, making it less responsive. In the worst case, it will be stuck processing all microtasks from ( n times 4 ) threads, in addition to queuing new queries.
  • This setup fails mainly because iTwin.js and concurrent queries were designed for a single backend and single iModel, not for a multi-tenant environment.
  • All synchronous calls from JavaScript to native code block the thread. More connections and higher load on the element API will make Node.js unresponsive.
  • When this monolithic server backend fails, all users of those iModels are affected. New backends may be spawned with less contention initially, but they will also become unusable under load over time.
  • We've overcomplicated the architecture by trying to create a server that manages multiple users inside Kubernetes, which is designed to do the same thing.
  • The memory required by the number of threads is significant. Each connection takes up 32MB of SQLite page cache. If ( n = 10 ), then at maximum load, 1GB is used solely for SQLite connections for each thread, not including other caches.

Overall, we should move away from multi-tenant setups when possible and investigate other ways. For example instead, we should scale up and down using Kubernetes, where the backend performs a single task, such as running a query, and then terminates. Even though concurrent queries emulate a server, it would be best if they only had two threads: one to compute the query and another to interrupt it for quota purposes. We should leave the scaling to Kubernetes.

Anyway while we have this architecture we can improve it and here is how

What need to change

  • A single pool of threads should be used by the entire node process, as having more threads than CPUs is not beneficial.
  • All connections should queue tasks to the same pool of threads, optimizing CPU usage.
  • If there are a significant number of pending tasks, we should return an error to Kubernetes to scale up, indicating that the current process is busy and has too many tasks in the queue.
@khanaffan khanaffan added the ecdb ECDb and ECSQL related issues label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecdb ECDb and ECSQL related issues
Projects
None yet
Development

No branches or pull requests

1 participant