Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How often to run auto mask #168

Open
CJ-Wright opened this issue Sep 19, 2017 · 2 comments
Open

How often to run auto mask #168

CJ-Wright opened this issue Sep 19, 2017 · 2 comments

Comments

@CJ-Wright
Copy link
Member

I'm moving this discussion here as the prior issue was really about implementing a feature not a more in depth discussion of masking.
@sbillinge :I am still torn on this masking issue. I would actually like dynamic vs. static masking to be a user-selectable preference even for non-setup runs. Speed is one issue, but also users getting consistent results and there not being too much "magic" in the data analysis pipeline that they can't explain in their papers. From this perspective, I am much more comfortable with a workflow where a mask is created and then re-used for a "set" of measurements and somehow stored. I can then reproduce results exactly by using the same mask, and then deliberately change the mask if I am not happy with the results.

I am not against offering dynamic masking to users as an option, but I am still uncomfortable with having it as the default.

We could give it similar behavior to dark correction, where if it can't find a "fresh" mask it will generate a dynamic one.

@CJ-Wright
Copy link
Member Author

I don't think that this is too magical:

We are consistent when we produce a new mask (every run of xrun, or potentially every non-setup run of xrun if this issue is accepted).
The mask will consistently return the same statistical results (pixels who's value is greater than 3 standard deviations away from the mean will be masked out). This gives us some provenance, since the users can ask why a given pixel was removed, and we can return "because it was x standard deviations away from the mean", rather than just "because we masked it on a previous image".
We always write the mask down to disk with the corresponding file name to the data, allowing users to reuse the mask as needed and reproduce their exact results. The users can then take their data home to either use that mask, use a different auto mask, make a new one from scratch, or edit the existing mask.
Other software platforms have started to do this as well (although they don't write masks they just remove the pixels before integration, see pyFAI sigma clipping).
There is a paper describing how all this works, while I don't think that the ideal mask generating sample has been explored or published. I imagine that the users papers (if they chose to use the auto mask) would have a line saying: "A mask was produced using the automask procedure [citation] with a standard deviation threshold of 3". If the users then provided the data then others could set their masking to the exact same parameters to fully reproduce the mask, which is simpler than trying to recreate a user defined mask.
To some extent I would say that not using automated masking, or running with a mask which was previously produced is more magical (sometimes things will work, sometimes things won't and it won't be obvious why that happened). For some images the previous mask will produce data which has all of the "3 standard deviation or greater" pixels removed, but for others it will not since the scattering is different.

I understand the time based concerns with setup scans, they exist to check if the data is reasonable before taking more "production" quality shots.

Making the mask generation dynamic could be rather difficult to implement, since once the data processing pipeline is running there is no current way to change it's parameters (although we may be able to put something into the experimental metadata?)

@CJ-Wright
Copy link
Member Author

In discussion with @sbillinge:
We should check how this affects the data, especially if the sample changes during a long scan.
There may also be some middle ground where we check the statistical values of the pixels with the mask and decide if the mask is still valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant