Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Enhancements to External Storage Sync Functionality #6831

Open
WillieMaddox opened this issue Dec 28, 2024 · 1 comment
Open

Comments

@WillieMaddox
Copy link

Hello,

I hope this message finds you well.

I am reaching out to inquire if there are any plans to enhance the external storage synchronization functionality for local storage. Currently, the API call initiates a one-time sync for the entire project. While this works for smaller projects, it becomes challenging for larger ones (e.g., those with more than 5,000 or so tasks), as the sync process often times out before completion.

Although I have implemented a workaround by extending the timeout period #5890, this approach becomes increasingly impractical as project size continues to grow. For instance, I have a project with over 60,000 tasks, and the synchronization process takes more than 20 minutes to complete.

A potential improvement could be the introduction of multi-threaded synchronization. Additionally, providing users with more granular control over the synchronization process would be highly beneficial. For example, having the ability to sync specific tasks based on criteria could greatly enhance usability. My top two suggestions for such functionality include:

  1. Synchronizing tasks in order of priority (e.g., oldest or newest tasks first).
  2. Synchronizing tasks by a specified range of task IDs.

Would such enhancements be feasible?

Thank you for your time and consideration. I look forward to your thoughts.

Best regards,
Willie

@makseq
Copy link
Member

makseq commented Dec 30, 2024

Hello,

  1. Regarding your question about scaling: Label Studio Enterprise is designed to handle the challenges of having a large number of tasks. It manages synchronization in the background as a separate task, running for several hours to sync a substantial number of tasks. If you require scalability, please consider switching to the enterprise edition. We do not have plans to support scalability in the community edition.

  2. As for granular control: it seems you need to update your tasks over time. Why do you require this capability? What does your workflow look like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants