Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Update cBioPortal Architecture to start transition to clickhouse only implementation #11368

Merged
merged 12 commits into from
Feb 6, 2025

Conversation

haynescd
Copy link
Collaborator

@haynescd haynescd commented Feb 3, 2025

closes #11360

Describe changes proposed in this pull request:

  • This PR intends to utilize Strangler Fig Pattern to refactor the current code base to utilize a clean architecture approach.

This will allow cbioportal to methodically transition to clickhouse only implementation

  • I have implemented the endpoint /api/columnar-store/studies

With this PR we have added Spring Profiles, which allows the application to toggle clickhouse functionality by using the property spring.profiles.active=clickhouse

I moved most of the legacy code into a sub-package legacy

@sheridancbio
Copy link
Contributor

Good work. The focus on standardizing and insulating and (eventually) deconvoluting the business logic will be a big help. I hope that as we continue to refactor we can move all business logic out of the web controller layer and also any business logic not needed for implementing the persistence layer out of the persistence layer as well.

I scanned the changes, but there are many of them (my browser actually has a hard time rendering the detailed changes here .. but eventually succeeds). I don't think this PR should wait for a detailed review from me. I also listed below a few concerns I have for possible discussion, but I would not object to moving this work forward now. The one caution I would emphasize is that until we bring the cbioportal-core codebase (and the msk importer codebase) into alignment with the new master branch of cbioportal, no database schema changes which affect import could be merged without leaving our main production portals in a state where data updates can not be made. I think we must avoid that.

A few concerns:

  • we had previously documented our decision to follow google java style for the codebase, but (to a large extent) failed to enforce that. I think we need some kind of mechanism which will automatically check or correct style discrepancies before PRs can be merged. Even with that, I think that repo administrators can force the merger of PRs which are not passing the branch protection rules, so maybe what we actually need is a periodic (monthly?) auto-generated PR which will apply our chosen style to the entire java codebase.
  • the cbioportal codebase is packaged into the cbioportal-core codebase where the importer resides. There is also an msk specific importer codebase which packages cbioportal-core (and thus cbioportal) as a dependency. These changes will need to be propagated through those codebases. The great majority of the changes here (after scanning through them) are package relocations, which should mainly demand package path reference changes in the other codebases. I intend to attempt to make those changes over the next few days. If the effort is more tricky than that, help might be needed to adjust the other repositories to become compatible.
  • this is reshaping the layout / packaging of the codebase. I wonder whether we should reach out beyond the core development team to gather input from the community.

I wonder whether we can leverage SonarCloud or some other tool which performs code complexity analysis to help us find highly complex / monolithic business logic classes. Just as we can use a linter to help us keep to a java code style, it would be nice if we had some evaluation tool which would help us detect code which is overly convoluted. Maybe we should set some kind of complexity limit and stick to that when merging future PRs. I very much appreciate recent comments @haynescd made about the desire to isolate and minimize responsibilities for each java class (or method). I think we should push towards breaking up existing highly complex code containing entangled responsibilities. Highlighting them with a tool would help.

Copy link

sonarqubecloud bot commented Feb 6, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
18 Security Hotspots
17 New Vulnerabilities (required ≤ 0)
D Security Rating on New Code (required ≥ A)
4 New Blocker Issues (required ≤ 0)
40 New Bugs (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@haynescd
Copy link
Collaborator Author

haynescd commented Feb 6, 2025

Appreciate the feedback @sheridancbio!

To respond to your first concern. I agree something does need to be in place that either automatically checks or formats the code for each PR. I am leaning towards the check... I don't want the PR manipulating code. I found this maven plugin than can be used to check or format. I will need some help to establish these rules and collectively discourage people from ignoring them and just using their admin abilities to bypass those... This also applies to your comment about enabling sonar cloud for static analysis!! Probably need to take this convo somewhere where we won't forget about it... And start assigning some of these tasks to people

So, about the other concerns dealing with changing anything that can break other projects that depends on cbioportal(cbio-core and others) I totally agree. I think we should be very careful changes these things especially org.cbioportal.model package ( which I think is the only thing users are depending on)

That is why the org.cbioportal.model will probably be the last thing to go. ( Future changes should be propagated to others)

With this change I am not to worried since it is only a package change. ( changing folders) We should still create an issue that captures this.. So we are prepared when we do need to make a database change.

Def. appreciate the comments I need as much help as I can get to push this project into using better practices.

@haynescd haynescd merged commit f2cbcba into master Feb 6, 2025
25 of 27 checks passed
@haynescd haynescd deleted the clean-arch-poc branch February 6, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update cBioPortal Backend to utilize Clean Archictecture Principles
3 participants