You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Task goal: define the technical needs to scale the technology to 3.5K repositories of high activity. This includes improvements and development in the area of operations and scalability mainly.
Scope: the initial scope goes for only Git repositories at the retrieval and enrichment phase. This may include the first gathering process, although this may face certain other difficulties as for example get banned by certain platforms.
Definition of done: when the first deployment of Bitergia Analytics is ready to go, the process to download and/or enrich 3.5 repositories should take no more than half day in total.
Task Description
Assure the completeness of the data and the resiliency of the processes: when running the instance, even if there are interruptions, the data gathering and enriching process have to be resilient to cuts. This will ensure the completeness and quality of the final datasets.
Definition of done: No data losses can happen, no matters the time the service is down. Data must be complete. This includes the raw indexes and the enriched indexes.
Missing work: define the several use cases to cover to assure the success of this task.
Technical description
Develop a mechanism to recover from failures. Currently, Git fails to recover from a repository that has fetched the latest commits and is unable to determine the last commit that was returned before failing, and continue the process from there. One potential solution is to utilize packfiles, which contain the commits fetched in the last update in the correct order.
Additionally, a system is needed to identify all repositories that failed in the previous execution, and start the fetch process in recovery mode.
Context
Task Description
Assure the completeness of the data and the resiliency of the processes: when running the instance, even if there are interruptions, the data gathering and enriching process have to be resilient to cuts. This will ensure the completeness and quality of the final datasets.
Technical description
Develop a mechanism to recover from failures. Currently, Git fails to recover from a repository that has fetched the latest commits and is unable to determine the last commit that was returned before failing, and continue the process from there. One potential solution is to utilize packfiles, which contain the commits fetched in the last update in the correct order.
Additionally, a system is needed to identify all repositories that failed in the previous execution, and start the fetch process in recovery mode.
Create a
GrimoireLab tickets
The text was updated successfully, but these errors were encountered: