Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: refactor CDP componentry out of monorepo #5525

Open
mollykarcher opened this issue Nov 12, 2024 · 6 comments
Open

Spike: refactor CDP componentry out of monorepo #5525

mollykarcher opened this issue Nov 12, 2024 · 6 comments

Comments

@mollykarcher
Copy link
Contributor

mollykarcher commented Nov 12, 2024

What problem does your feature solve?

We'd like to expand the scope of responsibility for our external SDK developers that receive infrastructure grants, to include the ability to interact with a galexie-exported data lake. As a precursor to this, we'd like to have a clear, concise example that they can model their implementations after in golang.

What would you like to see?

  • Design/spike to determine which libraries/components should be moved out into a separate repository, and which should stay

Defer to follow-on ticket(tbd) for implemenation:

  • Work to refactor said components out of the go monorepo and into their own repository
  • Documentation in the public developer docs advising developers as to how to use new repo

Considerations

  • Note that the ingest library includes a lot of code that handles the lifecycle of captive core and makes the interface to it look the same as that to the data lake. For external providers, we won't be asking them to build code that interfaces with captive-core directly
  • Consider that we eventually will want to facilitate open source contributions of other data stores (S3, R2, etc). Awareness of this may influence overall design and choice of where the interfaces live
@mollykarcher mollykarcher added this to the platform sprint 54 milestone Nov 12, 2024
@mollykarcher mollykarcher changed the title ingest: refactor CDP componentry out of monorepo refactor CDP componentry out of monorepo Nov 12, 2024
@mollykarcher
Copy link
Contributor Author

Per discussion at planning meeting on 11/12, we will explicitly scope this down to just the BufferedStorageBackend. So just wrapping of a gcs client, downloading/buffering ledgers, and outputting a LCM object.

@sreuland sreuland changed the title refactor CDP componentry out of monorepo Spike: refactor CDP componentry out of monorepo Dec 3, 2024
@urvisavla urvisavla moved this from To Do to In Progress in Platform Scrum Dec 3, 2024
@urvisavla urvisavla self-assigned this Dec 3, 2024
@mollykarcher mollykarcher moved this from In Progress to Blocked in Platform Scrum Jan 14, 2025
@leighmcculloch
Copy link
Member

concise example that they can model their implementations after in golang

+1. The "ingest SDK" is fuzzy and not well defined for anyone outside the maintainers because from anyone outside maintainers it's not clear where it begins and ends, or that it exists. The monorepo is a wonderful experience to dev in, but hides the existence of the ingest SDK.

Many developers have "github glasses" on, including myself. The docs for the SDK need to compensate for that, which they do not at this time. Shifting the ingest SDK into its own repo would also give it an identity of it's own.

@urvisavla
Copy link
Contributor

As part of this spike, I created the following repos to demonstrate the feasibility of moving CDP components out of the Go monorepo:
stellar-galexie
stellar-cdp-sdk
Go monorepo (cdp-refactor branch)

Here’s a list of pros and cons to help us determine whether this approach is the right choice.

Pros:

  1. Easier for external developers to contribute:
    By separating CDP components into their own repo, external developers can more easily understand and contribute to the project without needing to navigate through unrelated complexity (like Captive Core). This lowers the learning curve and increases the likelihood of successful contributions.
  2. Faster development cycle:
    A smaller, standalone CDP repo leads to faster CI builds. This means faster feedback for developers, enabling them to make changes and merge them with minimal delay. This improves the overall contributor experience.
  3. Language portability:
    With CDP components isolated, it becomes easier for those who want to replicate CDP sdk in other languages (e.g., Python) without getting bogged down by irrelevant code and dependencies.
  4. Encourage CDP usage:
    Splitting CDP into a dedicated repo signals that CDP is the preferred architecture, not Captive Core. This helps guide developers toward using simpler solutions. Also, by putting CDP in its own repo, we can give it the attention it needs.
  5. Timing:
    No applications have been built on CDP yet, so now’s the perfect time to move it. Waiting longer could make this separation more difficult as dependencies grow.

Cons:

  1. More work for the team:
    Splitting into multiple repositories will require initial work to separate the code and set up CI for each repo. Developers at SDF will need to manage and check out multiple repos, which may slow down workflow and remove the convenience of working with a single monorepo.
  2. Syncing Between Repos:
    The monorepo still relies on the CDP repo (Horizon uses BufferedStorageBackend for reingestion). This means we’ll need to set up processes to ensure both repos stay in sync and that updates to CDP don’t break things in the monorepo. This extra setup could be a bit of work.
  3. Versioning:
    Right now, the ingest SDK doesn’t have formal versioning, but we might need to introduce it for CDP in the future. This could add extra complexity, as we’ll need to make sure the different versions are compatible with each other.
  4. Possible circular dependencies:
    Even though we’ve shown that circular dependencies between the monorepo and the new CDP repo can be avoided, we still need to be careful about introducing them in the future.
  5. Uncertain impact on external contributions
    While this change should make it easier for external developers to contribute, there’s no guarantee it will lead to more contributions. We can’t be sure that this effort will bring the desired results and it might end up creating more work without a big payoff.

@chowbao
Copy link
Contributor

chowbao commented Jan 30, 2025

No applications have been built on CDP yet, so now’s the perfect time to move it. Waiting longer could make this separation more difficult as dependencies grow.

stellar-etl in shambles

@urvisavla
Copy link
Contributor

No applications have been built on CDP yet, so now’s the perfect time to move it. Waiting longer could make this separation more difficult as dependencies grow.

stellar-etl in shambles

@chowbao, I meant to say no external applications but you're right stellar-etl will be impacted, so one could argue that we're already late 🤔

@chowbao
Copy link
Contributor

chowbao commented Jan 30, 2025

No applications have been built on CDP yet, so now’s the perfect time to move it. Waiting longer could make this separation more difficult as dependencies grow.

stellar-etl in shambles

@chowbao, I meant to say no external applications but you're right stellar-etl will be impacted, so one could argue that we're already late 🤔

Yeah no external apps. I wouldn't let stellar-etl be the deciding factor for NOT doing the split though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Blocked
Development

No branches or pull requests

6 participants