-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic segmentation faults when reading LAMMPS dump files using Julia bindings #37
Comments
Hi, and thanks for the report! I was able to get it to segfault once over ~200 runs, so I could not get much more information. Could you also share your OS & Julia version? Did you try checking that all values in |
Thanks for looking into this! I'm on Mac OS Mojave (10.14), running Julia 1.2.0. I included a check for all of
and I still get segfaults, but not always (happened on the third time I ran it). If it's helpful, I wrote the analogous program using the C++ API and everything runs okay (ran 200+ times without a segfault):
So it seems like the issue lies specifically in the Julia bindings. I'm happy to help track down this bug but I'm not too experienced with Julia or the chemfiles codebase - if you could point me in the right direction I can try my hand at some debugging. |
Hey, that's very interesting. I can't reproduce crashes, but the second loop generates basically random values for the first frame (looks suspiciously like uninitialized memory). This seems to bee a bug in Julia, because this does not happen with the current nighly (or at least it is really rare). I have no clue about julia, so I don't know what might have changed :( Replacing the loop in the example below with minimal examplefor i in {1..10}; do ./julia-1.2.0/bin/julia test.jl; done test.jl: using Chemfiles
top_dir = "electrode-sim-files"
traj_fname = joinpath(top_dir, "dumptest.lammpstrj")
topo_fname = joinpath(top_dir, "dumptest.data")
traj = Trajectory(traj_fname, 'r', "LAMMPS")
topo_traj = Trajectory(topo_fname, 'r', "LAMMPS Data")
topo_frame = read(topo_traj)
selstr = "not type == \"16\" or type == \"17\""
sel = Selection(selstr)
for ix in 1:1
frame = read_step(traj, ix - 1)
pos = positions(frame)
indices = evaluate(sel, frame)
println(sum(pos[:, indices .+ 1]))
end
close(traj) |
Interesting. I upgraded to Julia 1.4.2, and I ran It seems that I can only get segfaults in the Julia REPL when running the script using I thought maybe this was due to some weird name collision issue, and so i wrapped everything in a function like the example script below. No dice, the script is successful when run at the cmdline, but segfaults within the first 10 clean.jl
Julia version info
|
Moved the issue on the julia binding repo =) I've also only seen the segfault with julia 1.0, and not with 1.5 nightly.
This looks like an issue with finalizers not being run, I found that running the code in REPL there is more chance to hit this kind of issue. To test this, you can add explicit calls to We already have to copy chemfiles pointers a lot (see I'll try to write up a longer explanation tomorrow =) |
Adding a call to |
More debugging information: if I comment the
That's great! I don't think having to manually free memory is a good experience for Julia developers though, so finding the root cause of this would be nice. Unfortunately, there are multiple things interacting here, and any of them could be wrong. What seems to be wrong is the Lines 52 to 57 in 5cc928b
The function calls The issue do not seems to be that simple here, I've added a bit of logging to the allocation/deallocation functions and it seems to be de-allocated at the right time.
That's very appreciated =) In addition to this bug, we also have easier issues that need a bit of love, in particular #26! Completely unrelated side note, but your selection is a bit strange. Did you meant to use |
Re: the selection string, I did think that was a little peculiar. Here's an example of the behavior I'm getting from the selections on my system. My total system has 7426 atoms. 396 atoms are type 16, and another 396 are type 17, and i want to craft a selection that excludes both atoms of type 17 and type 16. Some basic sanity checks first:
That all seems fine. Now I can use your recommended selection, and also use a selection that distributes the not operator using DeMorgan's law - these should agree and indeed they do.
Oddly enough though, my original selection also gets the exact same result:
I think this seems to suggest that in my original selection string, the |
Wow, very nice investigation! Intuitively, I would want |
Running the script in valgrind (
So my best guess is that Julia pre 1.4/1.3 run the GC a bit too much aggressively, and tries to remove the Adding an extra use of the frame after the memory access makes the segfault go away and valgrind happy. for ix in 1:length(traj)
frame = read_step(traj, ix - 1)
pos = positions(frame)
indices = evaluate(sel, frame)
println(sum(pos[:, indices .+ 1]))
println("$(size(frame))")
end Disabling GC during the loop body does the same for ix in 1:length(traj)
Base.GC.enable(false)
frame = read_step(traj, ix - 1)
pos = positions(frame)
indices = evaluate(sel, frame)
println(sum(pos[:, indices .+ 1]))
Base.GC.enable(true)
end Overall, it seems to me that we can not do much here, the issue seems to be a julia bug, which was fixed somewhere in 1.3 or 1.4. For this reason, I would tend to mark this bug as "wontfix" and point everyone to use julia >= 1.4. Is this a possibility in your case @amlimaye? |
Yeah, I can definitely bump my Julia version, but the issue seems to persist for me when running with Julia 1.4.2. Here's the script:
And results:
If I add an extra use of
Which version of Julia are you using for the results you posted above? Might need to change your last statement to julia >= 1.5. |
I spoke too soon ... I do also get the segfault with Julia 1.5, so we'll have to do something =) |
Hi,
I'm using the Julia bindings to Chemfiles to read a LAMMPS dump file (file extension .lammpstrj). The topology information is stored in a LAMMPS data file (file extension .data). For some reason, when I index the positions read out from the LAMMPS dump using indices provided by evaluating a selection on the frame, I reliably but non-deterministically get segmentation faults that crash the Julia REPL. Here's a minimal working example with Julia Chemfiles 0.9.3:
Here's the stack trace I get from the segfault, it appears to be somewhere in the getindex function for the positions array:
Could you please provide some advice on what might be going wrong here? The code above does not always segfault, but in the few times I've tried it, usually segfaults within the first 2-3 times running it. I'm attaching the offending LAMMPS dump and data files here as well. Thanks in advance for any help you can provide!
files.zip
The text was updated successfully, but these errors were encountered: