Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new features to the ndcube._add_ method #794

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

PCJY
Copy link
Contributor

@PCJY PCJY commented Dec 11, 2024

PR Description

ndcube/ndcube.py Outdated Show resolved Hide resolved
ndcube/ndcube.py Outdated Show resolved Hide resolved
ndcube/ndcube.py Outdated
Comment on lines 930 to 931
# addition
new_data = self.data + value_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition should be done as part of the masked array addition. You've already done this below, you just need to extract the added data from the results as well as the mask.

ndcube/ndcube.py Outdated
Comment on lines 950 to 952
return self._new_instance(
data=new_data, uncertainty=new_uncertainty, mask=new_mask
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having a separate return here for the NDData case, I think we should build a dictionary of kwargs that we can give self._new_instance, here. So, you can create an empty kwargs dictionary at the start of the method, and add the new data, uncertainty, etc. in the relevant place, e.g.

kwargs["uncertainty"] = new_uncertainty

Then the final line of the method would become

return self._new_instance(**kwargs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this doesn't make sense

@nabobalis nabobalis added this to the 2.4.0 milestone Dec 18, 2024
ndcube/ndcube.py Outdated
if self.uncertainty is not None and value.uncertainty is not None:
new_uncertainty = self.uncertainty.propagate(
np.add, value.uncertainty, correlation=0
np.add, value.uncertainty, result_data = value.data, correlation=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result_data needs to be the result of the operation. So, assuming you moved the addition of the datas using the masked array to before the uncertainty propagation, you could do:

Suggested change
np.add, value.uncertainty, result_data = value.data, correlation=0
np.add, value.uncertainty, result_data = kwargs["data"], correlation=0

ndcube/ndcube.py Outdated
Comment on lines 1061 to 1070
# combine mask
self_ma = np.ma.MaskedArray(self.data, mask=self.mask)
value_ma = np.ma.MaskedArray(value_data, mask=value.mask)

# addition
result_ma = self_ma + value_ma
new_mask = result_ma.mask

# extract new mask and new data
kwargs["mask"] = result_ma.mask
kwargs["data"] = result_ma.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in above comment, I think it makes sense to do this before the uncertainty propagation so you can use the kwargs["data"] value in that propagation.

ndcube/ndcube.py Outdated
kwargs["data"] = result_ma.data

# return the new NDCube instance
return self._new_instance(**kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this line to the end of the method and use the kwargs approach when handling the other cases, e.g. Quantity. So, for example, L1082 would become:

kwargs["data"] = self.data + value.to_value(cube_unit)

@DanRyanIrish
Copy link
Member

A changelog file needs to be added.

And your branch needs to be updated with the latest version of main.

@PCJY
Copy link
Contributor Author

PCJY commented Jan 18, 2025

Hi @DanRyanIrish, as we have discussed in our project meetings, below are the issues we encountered and may need further discussions with others in the community:

The issue is mainly around how NumPy handles masks when performing an addition for two NumPy.MaskedArray.
We think the expected outcome for an addition should be: the sum of any value that is not masked by its individual mask.
E.g. [1] ([T]) + [2] ([F]) = [2].

However, from experimentation, it can be seen that
NumPy returns in this way:
[1] ([T]) + [2] ([F]) = [1].
Screenshot 2025-01-18 220004

I find this confusing because even if it does combine the mask and then apply it on the result, it should be:
[1] ([T]) + [2] ([F]) = [-].

Please correct me if there is anything wrong in my understanding.

@PCJY
Copy link
Contributor Author

PCJY commented Jan 18, 2025

@DanRyanIrish, Secondly, we also encountered some issues around the propagate method:
it ignores the mask of the objects that are passed in, and still takes into account the uncertainties of the masked elements when it should not have done so.
Following your guidance and suggestions, this issue is currently being worked on by setting the corresponding entries that should be masked of the uncertainty array to be 0, before passed in to the propagate method.
A clearer example of the issue was implemented as shown in the code below with the screenshot of its output attached.

from ndcube import NDCube
import numpy as np
from astropy.nddata import StdDevUncertainty
from astropy.wcs import WCS

data = np.array([[1, 2], [3, 4]])  
uncertainty = StdDevUncertainty(np.array([[0.1, 0.2], [0.3, 0.4]])) 
mask = np.array([[False, True], [False, False]])  
wcs1 = WCS(naxis=2) 
wcs1.wcs.ctype = ["HPLT-TAN", "HPLN-TAN"]

cube = NDCube(data, wcs=wcs1, uncertainty=uncertainty, mask=mask)
print(cube)

def add_operation(cube1, cube2):
    """
    Example function to add two data arrays with uncertainty propagation.
    """
    result_data = cube1.data + cube2.data 
    # Propagate the uncertainties using the NDCube objects
    propagated_uncertainty = cube1.uncertainty.propagate(
        np.add, cube2, result_data=result_data, correlation = 0
    )
    return result_data, propagated_uncertainty

# adding the cube to itself
result_data, propagated_uncertainty = add_operation(cube, cube)

print("Original Data:\n", cube.data)
print("Original Uncertainty:\n", cube.uncertainty.array)
print("Result Data (after addition):\n", result_data)
print("Propagated Uncertainty:\n", propagated_uncertainty.array)

Screenshot 2025-01-18 222146

@DanRyanIrish
Copy link
Member

DanRyanIrish commented Jan 20, 2025

Hi @PCJY. I think the first thing we need to do is decide what behaviours we want to implement in the case where at least one of the NDCube and NDData have a mask. I think we need to get some feedback from other users on this decision. I propose the following scheme (@Cadair, thoughts on this?):

Firstly, if an object has no mask, that is equivalent to all pixels being unmasked.
Secondly, for a given pixel in both objects:

  1. If both are unmasked, the resultant
    i. data value is the sum of both pixels
    ii. mask value is False
    iii. uncertainty value is the propagation of the two uncertainties. If one or other object doesn't have uncertainty, the uncertainty of that component is assumed to be 0.
  2. If it is masked in one object, but not the other, the resultant:
    i. data value is equal to the unmasked value
    ii. mask value is False
    iii. uncertainty value is the same as the unmasked pixel
  3. If both pixels are masked, this is where is gets ambiguous. I propose, in order to remain consistent with the above:
    i. The operation is not performed and the data, mask and uncertainty values remain the same as the left-hand operand, i.e. the NDCube.

Alternatives for parts of the scheme could include:
2. If it is masked in one object, but not the other, the resultant:
ii. mask value is True.

  1. If both pixels are masked:
    i. The operation IS performed as normal but the mask value is True.

Once we agree on a scheme, the way forward on your uncertainty questions will become clear.

@Cadair what are your thoughts on this scheme. I also think we should bring this up at the sunpy weekly meeting to get other thoughts.

@DanRyanIrish
Copy link
Member

I find this confusing because even if it does combine the mask and then apply it on the result, it should be: [1] ([T]) + [2] ([F]) = [-].

This is where I also find numpy masked arrays counter-intuitive. However, the logic is as follows:

  • If one pixel is masked, retain the data of the left-hand operand and set the mask value to True.
    Notice that order of the operands matters. Because you did [1] ([T]) + [2] ([F]), the results is [1] ([T]), which is displayed as [--]. I would expect if you did the operation the other way around ([2] ([F]) + [1] ([T])), the result would be [2] ([T]).

Notice that this is not the same as the scheme I've proposed in my previous comment, in part because it's confusing, as you've found.

@DanRyanIrish
Copy link
Member

DanRyanIrish commented Jan 20, 2025

@PCJY, until we agree a way forward with the mask, you should proceed by implementing in the case for which neither object has a mask. So no need for masked arrays.

@Cadair
Copy link
Member

Cadair commented Jan 21, 2025

I propose the following scheme

I haven't thought too much each of these individual cases, but the fact there is a list is enough to make me think we probably need a way for the user to choose. This is obviously not possible with the __add__ operator, so we would need to have a NDCube.add method (and presumably subtract) which accepted kwargs.

Is this also not a problem for other operators as well?

@DanRyanIrish
Copy link
Member

I think this is a good idea. As well as add and subtract, I think we would also need multiply and divide methods.

As far as I can see, this ambiguity only arises when there are masks involved. So we could still implement the dunder methods as wrappers around the above methods, but have the raise and error/return NotImplemented if the non-NDCube operand has a mask, and require users use the NDCube.add method instead.

@Cadair
Copy link
Member

Cadair commented Jan 21, 2025

I am not sure how I feel about adding methods to the API for every arithmetic operation. How about functions instead? import ndcube; ndcube.add(cube1, cube2)?

@DanRyanIrish
Copy link
Member

I am not sure how I feel about adding methods to the API for every arithmetic operation. How about functions instead? import ndcube; ndcube.add(cube1, cube2)?

I'm open to considering that. It might go well in that analysis subpackage I've proposed. I can also see an argument for making it a method as the first argument must always be the NDCube and it's/they are operation(s) valid for any cube. It needs more thought.

Either way, a method can be converted to a function and vice versa quite easily. So for now, I think we should proceed with a NDCube.add method until a longer-term design decision is made. We should discuss this on the sunpy call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants