Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CHARMM files parsing and system building #23

Merged
merged 7 commits into from
Oct 20, 2022
Merged

Conversation

mdpoleto
Copy link
Contributor

@mdpoleto mdpoleto commented Sep 1, 2022

Description

CHARMM36 systems are not being built and require box_vector assignments before creating the system with psf.CreateSystem().

Todos

  • C36 systems are correctly read and built
  • New tag for box vectors in the XML input file
  • Easier solution to read toppar files (and compatible with CHARMM-GUI inputs).

Status

  • Ready to go

PeriodicBoxVectors() is already set set for CHARMM when building the system so it does not need to be set again in line 269.
@MiaoLab20
Copy link
Owner

@dyelar @lvotapka Could you help to check if Marcelo's changes work fine and merge the request? Thanks!

@mdpoleto
Copy link
Contributor Author

mdpoleto commented Oct 6, 2022

Hi @MiaoLab20 @dyelar @lvotapka do you have any update on this issue?

@MiaoLab20
Copy link
Owner

@dyelar @lvotapka Could you help to check if Marcelo's changes work fine and merge the request? Thanks!

Copy link
Collaborator

@lvotapka lvotapka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters attribute on line 128 of config.py should probably be kept a list due to the for loop on line 135, which assumes a loop, not a string.

Copy link
Collaborator

@lvotapka lvotapka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the loop over the parameters attribute being removed? We want the users to be able to submit multiple parameters files.

Copy link
Collaborator

@lvotapka lvotapka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issue on line 189 of the proposed changes for parser.py: the possibility of multiple parameter files is being disallowed by making this change.

@dyelar
Copy link
Collaborator

dyelar commented Oct 12, 2022

Ideally, I think that if we are going to be setting the value of switchDistance and ewaldErrorTolerance in gamdSimulation.py (lines 165-166), we would want to be able set the default values in config.py, expose an xml value for changing it from the default, and then use that config value in those places.

I'm still looking into why when I tried running the simulations with this code and the files provided in issue #22 , the simulations blew up in stage 3.

Copy link
Collaborator

@dyelar dyelar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this a bit easier rather than just my reference in my previous comment. I didn't replicate the changes Lane already requested, since I have no additional input to his request.

return psf



Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this code should reside in the parser.py file, since you are reading and parsing these configuration files. They would set the appropriate values within a configuration object, if they existed to override whatever the defaults are. Reading and parsing configuration files doesn't really belong within with gamdSimulation.py. It should read a series of configuration values, which would then determine how to set up simulation. It's how we have the separation of concerns setup to keep the code manageable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I moved: 1) parameter file reading and parsing and 2) box vector parsing to parser.py file. I left in gamdSimulation.py only the code necessary to apply the box vectors to the psf and build the system.

gamd/gamdSimulation.py Outdated Show resolved Hide resolved
gamd/gamdSimulation.py Show resolved Hide resolved
gamd/gamdSimulation.py Outdated Show resolved Hide resolved
gamd/gamdSimulation.py Outdated Show resolved Hide resolved
gamd/config.py Show resolved Hide resolved
gamdSimulation.simulation.context.setPositions(positions.positions)
if box_vectors is not None:
if box_vectors is not None and config.input_files.charmm is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm don't I understand why this change was made yet . The box_vectors variable should either contain the necessary parameters, or it should be None indicating it doesn't need to be set. Can you reply or add a comment why we also need to check that the charm input_files is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHARMM systems can only be built by psf.createSystem() if box vectors are assigned beforehand to the psf object (hence the code snippet at line 156) . Thus, this update of box_vectors here is not needed for CHARMM systems and might create issues setting it twice.

gamd/parser.py Outdated
for xml_params_filename in charmm_tag:
charmm_config.parameters.append(
assign_tag(xml_params_filename, str))
# parsing list of parameter files like CHARMM-GUI does
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lvotapka note that the parameter parsing used here loops over all parameter files listed in a "toppar.str" file as defined by the user. Multiple parameter files are accepted.

I believe this approach is preferable because: 1) it is commonly used in CHARMM and is compatible with CHARMM-GUI inputs for OpenMM; and 2) CHARMM commonly used +10 parameter files and listing all of them in the XML input seems unnecessary to me.

@mdpoleto
Copy link
Contributor Author

I updated the code reflect the requests you made. Particularly, parsing of parameter files and box vectors were moved to parser.py file. I am not very well versed in XML so if something can be improved, please let me know.

@mdpoleto
Copy link
Contributor Author

@dyelar and @lvotapka , I pushed the new code reflecting the changes you requested via email. More specifically:

  1. parameter files can now be read in two formats:
            <parameters>
                    <parameter type="charmm-gui-toppar">toppar.str</parameter>
                    <parameter>toppar_c36/toppar_water_ions.str</parameter>
            </parameters>

Users can now point directly to the files they want or they can provide a .str file containing a list of toppar files (the latter approach is used by CHARMM-GUI inputs for OpenMM, so users can take advantage of that).

  1. For CHARMM systems, box vectors should be explicitly provided in order to build the system object. Users now can define their box by using:
            <box-vectors>
                    <a>2.9</a>  <!-- unit.nanometers -->
                    <b>2.9</b>  <!-- unit.nanometers -->
                    <c>2.9</c>  <!-- unit.nanometers -->
                    <alpha>90</alpha>  <!-- unit.degrees # optional-->
                    <beta>90</beta>    <!-- unit.degrees # optional-->
                    <gamma>90</gamma>  <!-- unit.degrees # optional-->
            </box-vectors>

While box lengths (a, b,c) are required, box angles are optional and their default value is 90 degress (rectangular box).

Let me know if anything else should be fixed.

@lvotapka
Copy link
Collaborator

Are we ready to accept this pull request, @dyelar ?

@codecov-commenter
Copy link

Codecov Report

Merging #23 (5ab5fa4) into main (f547b8c) will decrease coverage by 0.08%.
The diff coverage is 13.88%.

Additional details and impacted files

Copy link
Collaborator

@dyelar dyelar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found something I thought was important and another area for a quick clean up change. Let me know what you think.

gamd/parser.py Outdated
Comment on lines 231 to 244
if xml_params_filename.attrib["type"] == "charmm-gui-toppar":
# parsing list of parameter files like CHARMM-GUI does
extlist = ['rtf', 'prm', 'str']
parFiles = ()
for line in open(xml_params_filename.text, 'r'):
if '!' in line: line = line.split('!')[0]
parfile = line.strip()
if len(parfile) != 0:
ext = parfile.lower().split('.')[-1]
if not ext in extlist: continue
charmm_config.parameters.append(parfile)
else:
charmm_config.parameters.append(
assign_tag(xml_params_filename, str))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract this into it's own method. Something like parse_and_assign_charmm_gui_toppar_file and call it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify a bit more? I could create a function on parser.py or move this to config.py.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I was thinking just add another method. Something like the following:

def parse_and_assign_charmm_gui_toppar_file(self, charmm_config, xml_params_filename):
    extlist = ['rtf', 'prm', 'str']
    parFiles = ()
    for line in open(xml_params_filename.text, 'r'):
        if '!' in line: line = line.split('!')[0]
        parfile = line.strip()
        if len(parfile) != 0:
            ext = parfile.lower().split('.')[-1]
            if not ext in extlist: continue
            charmm_config.parameters.append(parfile)
    return charmm_config

Then on line 232, just replace the code with

charmm_config = self.parse_and_assign_charmm_gui_toppar_file(charmm_config, xml_params_filename)

The thought being to try and have the methods be smaller and limited to doing a single thing as much as possible, so that they don't get to large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I thought it was better to make that parsing function a method inside CharmmConfig class (config.py).

So on line 233 I am just calling:

charmm_config.parse_charmm_gui_toppar(xml_params_filename)

Which calls the method (config.py, line 147):

def parse_charmm_gui_toppar(self, xml_params_filename):
        extlist = ['rtf', 'prm', 'str']
        # inputfile is the toppar.str with the list of parameter files
        for line in open(xml_params_filename.text, 'r'):
            if '!' in line: line = line.split('!')[0]
            parfile = line.strip()
            if len(parfile) != 0:
                ext = parfile.lower().split('.')[-1]
                if not ext in extlist: continue
                # Appending each file listed in inputfile to the existing 
                # list of parameters files to be read
                self.parameters.append(parfile)
        return

I wanted to make sure users can use both parsing methods in the same input (directly pointing to parameter files or listing them in a toppar.str file). I think having all methods in config.py is probably cleaner.

Comment on lines 132 to 139
boxinfo = config.input_files.charmm.box_vectors
boxlx = boxinfo["a"]
boxly = boxinfo["b"]
boxlz = boxinfo["c"]
alpha = boxinfo["alpha"]
beta = boxinfo["beta"]
gamma = boxinfo["gamma"]
psf.setBox(boxlx, boxly, boxlz, alpha, beta, gamma)
Copy link
Collaborator

@dyelar dyelar Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this code makes an assumption that tags for a, b, and c exist. Based on what I read in the parser, it is quite possible for them to not exist and we are just using a default. I assume that means that setBox wouldn't be called at all?

I was thinking that the best way to handle this might be to add the following pieces of code:

in config.py in the CharmmConfig, add the following methods and variables:

self.config_box_vector_defined = False

def get_box_vectors(self):
    errorMessage = "The box vectors were only partially defined.  " \
                   "At least a, b, and c must be present."
    if not self.is_whole_box_defined():
        raise RuntimeError(errorMessage)
    return [self.box_vectors["a"], self.box_vectors["b"],
            self.box_vectors["c"],
            self.alpha, self.beta, self.gamma]


def is_whole_box_defined(self):
    lengths = ["a", "b", "c"]
    result = True
    for length in lengths:
        result = result and length in self.box_vectors
    return result


def is_config_box_vector_defined(self):
  return self.config_box_vector_defined

in parser.py, add the following once we are in the box-vector tag:

self.config_box_vector_defined = True

Then, right here, we should be able to just change it to something like the following:

if config.input_files.charmm.is_config_box_vector_defined:
    psf.setBox(*config.input_files.charmm.get_box_vectors())

This way, if they user doesn't define a box vector, we aren't making the call. Of course, if the box vector is required for psf files, then we should throw an error instead if they didn't define it.

FYI, I have no idea if I got that code exactly correct for the calling convention, but hopefully it gets the idea across. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dyelar the box lengths are required to create the system object with psf files, so they must be defined by the user. Currently, we throw an error if any of the lengths "a", "b" or "c" are not provided (parser.py, line 222). We do, however, make an assumption that the box angles are all 90 degrees if users do not provide alpha, beta or gamma. I can either raise an error or I can raise a warning on that matter. What do you prefer?

@dyelar
Copy link
Collaborator

dyelar commented Oct 19, 2022 via email

@dyelar
Copy link
Collaborator

dyelar commented Oct 19, 2022 via email

@mdpoleto
Copy link
Contributor Author

@dyelar I just pushed the fixes for your request. A brief explanation:

  1. regarding box vectors: I decided to throw and error if any of the box properties are not set. OpenMM does accept triclinc and orthorombic boxes, so assuming a default if not a good idea. Thus, users will have to define all box lengths ('a', 'b','c') and box angles ('alpha', 'beta', 'gamma'). Hence, expected entries in the XML should look like something like this:
<box-vectors>
        <a>2.9</a>  <!-- unit.nanometers -->
        <b>2.9</b>  <!-- unit.nanometers -->
        <c>2.9</c>  <!-- unit.nanometers -->
        <alpha>90</alpha>  <!-- unit.degrees -->
        <beta>90</beta>    <!-- unit.degrees -->
        <gamma>90</gamma>  <!-- unit.degrees -->
</box-vectors>
  1. regarding parameter file parsing: I just saw your last message and I left the parameter file parser in parser.py as you requested. Now, users can define their parameter files in two ways: 1) directly pointing to a parameter file; or 2) pointing to a file containing a list of parameter files to be read (like CHARMM-GUI inputs are parsed via toppar.str file). Users can use both parsing methods by using something like below:
<parameters>
        <parameter type="charmm-gui-toppar">toppar.str</parameter>
        <parameter>toppar_c36/top_all36_na.rtf</parameter>
        <parameter>toppar_c36/par_all36_na.prm</parameter>
        <parameter>toppar_c36/toppar_water_ions.str</parameter>
</parameters>

Note that in the example above, toppar.str file contains:

toppar_c36/top_all36_prot.rtf
toppar_c36/par_all36m_prot.prm

If you have any other comments or suggestions, feel free to make them.

@dyelar
Copy link
Collaborator

dyelar commented Oct 19, 2022 via email

@mdpoleto
Copy link
Contributor Author

@dyelar sorry for the delay. I agree with you. It is safer to require all the inputs and to not define any defaults. Could you merge and make the changes that you seem fit? I won't be able to make the changes in the next few days.

@dyelar dyelar merged commit 6fba16a into MiaoLab20:main Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants