Scheduler can be restarted without losing data for Git repositories #43

dicortazar · 2024-03-18T12:01:29Z

Context

Task goal: define the technical needs to scale the technology to 3.5K repositories of high activity. This includes improvements and development in the area of operations and scalability mainly.
Scope: the initial scope goes for only Git repositories at the retrieval and enrichment phase. This may include the first gathering process, although this may face certain other difficulties as for example get banned by certain platforms.
Definition of done: when the first deployment of Bitergia Analytics is ready to go, the process to download and/or enrich 3.5 repositories should take no more than half day in total.

Task Description

Assure the completeness of the data and the resiliency of the processes: when running the instance, even if there are interruptions, the data gathering and enriching process have to be resilient to cuts. This will ensure the completeness and quality of the final datasets.

Definition of done: No data losses can happen, no matters the time the service is down. Data must be complete. This includes the raw indexes and the enriched indexes.
Missing work: define the several use cases to cover to assure the success of this task.

Technical description

Develop a mechanism to recover from failures. Currently, Git fails to recover from a repository that has fetched the latest commits and is unable to determine the last commit that was returned before failing, and continue the process from there. One potential solution is to utilize packfiles, which contain the commits fetched in the last update in the correct order.

Additionally, a system is needed to identify all repositories that failed in the previous execution, and start the fetch process in recovery mode.

Create a
GrimoireLab tickets

Recovery from errors on the new scheduler for Git chaoss/grimoirelab#646

Technical Description
Business Description

canasdiaz · 2024-04-24T08:20:49Z

In our Monday meeting, @sduenas volunteered to link the tickets in CHAOSS or bitergia-analytics where our team is doing the work.

canasdiaz · 2024-12-09T12:18:39Z

@jjmerchante @sduenas Please discuss if what we have is enough to meet the requirement. We need to deliver software as soon as possible.

jjmerchante · 2025-01-10T09:55:53Z

We have made several changes to the scheduler to recover from failure. We have tested several scenarios. Some PRs and code related:

dicortazar added this to Bitergia Analytics Mar 18, 2024

dicortazar converted this from a draft issue Mar 18, 2024

dicortazar added the scalability Tickets related to scalability topics label Mar 18, 2024

dicortazar removed the status in Bitergia Analytics Mar 18, 2024

dicortazar moved this to Backlog in Bitergia Analytics Mar 18, 2024

dicortazar added this to Bitergia Analytics Apr 15, 2024

dicortazar moved this to Backlog in Bitergia Analytics Apr 15, 2024

jjmerchante moved this from Backlog to In progress in Bitergia Analytics Apr 30, 2024

dicortazar added this to the [Iteration 1] Scale Bitergia Analytics to 15K Repositories milestone Jun 17, 2024

dicortazar moved this from In progress to In review in Bitergia Analytics Jun 24, 2024

dicortazar added the GrimoireLab 2.x label Jul 4, 2024

dicortazar assigned jjmerchante Jul 4, 2024

dicortazar moved this from In progress to Done in Bitergia Analytics Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler can be restarted without losing data for Git repositories #43

Scheduler can be restarted without losing data for Git repositories #43

dicortazar commented Mar 18, 2024 •

edited

Loading

canasdiaz commented Apr 24, 2024

canasdiaz commented Dec 9, 2024

jjmerchante commented Jan 10, 2025

Scheduler can be restarted without losing data for Git repositories #43

Scheduler can be restarted without losing data for Git repositories #43

Comments

dicortazar commented Mar 18, 2024 • edited Loading

Context

Task Description

Technical description

canasdiaz commented Apr 24, 2024

canasdiaz commented Dec 9, 2024

jjmerchante commented Jan 10, 2025

dicortazar commented Mar 18, 2024 •

edited

Loading