Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Chemical System #616

Open
wants to merge 62 commits into
base: protos
Choose a base branch
from

Conversation

MBartkowiakSTFC
Copy link
Collaborator

@MBartkowiakSTFC MBartkowiakSTFC commented Nov 29, 2024

Description of work
Most of the classes derived from ChemicalEntity were only used for a very specific case of a trajectory created by GROMACS with a PDB file describing the topology. Since MDANSE is moving to MDAnalysis for loading GROMACS trajectories, the selection mechanism based on fixed databases needs to be replaced with a more general solution. This is provided by rdkit and substructure matching. Since rdkit introduces its own indexing of atoms, not guaranteed to be consistent with the internal indexing of the ChemicalSystem class, changes are necessary to improve the consistency of the atom indexing.

closes #545

Fixes

  1. ChemicalEntity and all derived classes have been removed from the code.
  2. The only database remaining in the Chemistry section is the atoms database.
  3. Atom indexing is now fixed and matches the indexing in the coordinate array.
  4. New ChemicalSystem implementation stores lists of indices.
  5. ChemicalSystem does not have a "configuration" attribute anymore. Configuration is accessed only via the Trajectory class.
  6. A backup ChemicalSystem loader allows the users to load old trajectories. The new ChemicalSystem instance will be created in memory and populated with information from the old /chemical_system data structure.
  7. The trajectory stores a subset of the atom database, preserving the user-made changes to atom definitions and making trajectories portable even if they use artificial atoms.
  8. Atom selection works for trajectories independent of the molecular dynamics engine used to run the simulation. However, it is necessary to detect bonds in the system using the TrajectoryEditor job before it is possible to select structures.
  9. Most of the Cython extensions have been replaced with Python equivalents.

To test
All unit tests must pass.
A detailed comparison of the results between this and main branches is necessary due to the fundamental changes to the way the code operates.
Please run TrajectoryEditor with molecule detection on a trajectory and check if all the expected Atom Selection options are working correctly in the GUI.

Jobs that require the most attention:

  • CenterOfMassTrajectory
  • Voronoi
  • SolventAccessibleSurface
  • PairDistributionFunction
  • VanHoveSelf
  • VanHoveDistinct

@ChiCheng45
Copy link
Collaborator

I'm going through the code and program. I'll keep posting any issues I find.

I found a bonding issue in the 3D viewer, bonds are being created between the dummy atoms and non-dummy atoms.

image

@ChiCheng45
Copy link
Collaborator

ChiCheng45 commented Dec 9, 2024

I've been testing the solvent SolventAccessibleSurface job. According to the MDTraj documentation https://mdtraj.org/1.9.4/examples/solvent-accessible-surface-area.html they also use the method by Shrake and Rupley. Here I run the SolventAccessibleSurface with a trajectory from SlimMD https://www.ccpbiosim.ac.uk/slim-md/slimmd-database/proteins/cyclosporin-a-37ms.

Here are the results for the first 1000 time steps.

Protos.
image

This Branch.
image

MDTraj.
image

Concentrating on the timestep at 550 (top) and 670 (the peak, bottom) we can see the molecule geometry change.
Animation54
Animation55

At 670 the molecule is in a more open geometry which agrees with the increase in the solvent accessible surface in MDTraj and this branch. Looks like there were some issues with the implementation of the SolventAccessibleSurface on protos.

Copy link
Collaborator

@ChiCheng45 ChiCheng45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the extension are all gone we can also delete the MDANSE/Src/MDANSE/Extensions directory.

Copy link
Collaborator

@ChiCheng45 ChiCheng45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the pair distribution function and the self and distinct part of the van Hove function. Here are the timing with the dftb water trajectory, I think we should try to speed up those numpy functions some more if possible.

Before
pair distribution function: 0.7s
van Hove function self: 37s
van Hove function distinct: 20s

After
pair distribution function: 4s
van Hove function self: 33s
van Hove function distinct: 1m 55s

The self part of the van Hove function looks good but I see differences for the distinct part. Looks like the intra and intermolecular parts haven't been split up.

Here is the the inter and intra PDF for the dftb water trajectory with this branch.
image

Here is the one on protos.
image

It looks like the bonding information got lost when the DFTB trajectory was converted.

@ChiCheng45
Copy link
Collaborator

I've found one issue with the forcite nylon66_rho100_500K_v300K trajectory. The trajectory converts without any issues but there are some errors when it is loaded up in the 3D viewer. Apparently, there are inf or nan values in the coordinates.

    tree = KDTree(rs)
           ^^^^^^^^^^
  File "_ckdtree.pyx", line 564, in scipy.spatial._ckdtree.cKDTree.__init__
ValueError: data must be finite, check for nan or inf values

@ChiCheng45
Copy link
Collaborator

I've found one issue when I use the LAMMPS converter with the glycyl_L_alanine_charmm trajectory from MDANSE-Examples.

Here is a the trajectory at the 0th and 12th time step on protos.
imageimage

Here is the same trajectory on this branch.
imageimage

It looks like the atom ordering might have gotten mixed up compare, for example, the nitrogen atoms.

@ChiCheng45
Copy link
Collaborator

ChiCheng45 commented Jan 6, 2025

I found that compared to Protos, the molecule information gets lost. On Protos, when I run the DFTB converter and select a job that requires molecule information, such as the angular correlation, the molecule name is detected with the name H2O1. On this branch, the molecule information appears to have gotten lost, when I select angular correlation the molecule setting doesn't get filled.

@MBartkowiakSTFC
Copy link
Collaborator Author

I found that compared to Protos, the molecule information gets lost. On Protos, when I run the DFTB converter and select a job that requires molecule information, such as the angular correlation, the molecule name is detected with the name H2O1. On this branch, the molecule information appears to have gotten lost, when I select angular correlation the molecule setting doesn't get filled.

There is a question of what the right approach should be here. If I run TrajectoryEditor with molecule detection option, I will end up with a trajectory that has both molecules and bonds defined. Some of the converters would give me grouping of atoms into molecules, but no bonds. In the MDANSE School tutorials we had the example of DL_POLY trajectories where AgI molecules are defined, even though physically there are no chemical bonds.

The question now is if we decide to trust the MD engine information, which may be provided or not, or just consistently require the molecule detection to be performed for every trajectory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] CenterOfMassesTrajectory changes elements and density
2 participants