Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pass through attributes when creating metrics, dimensions, identifiers #18

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

danfrankj
Copy link
Collaborator

@danfrankj danfrankj commented Aug 7, 2018

Summary

When configuring measures, I need to be able to configure additional properties, especially transforms. This change passes additional configuration up the chain.

e,g,

image

enables answering questions like ...

mensor_registry.evaluate('user', measures=['transfer/traders'], segment_by=['transfer/ds'])

Testing Done

Manual

@@ -446,10 +448,7 @@ def transforms(self):
@transforms.setter
def transforms(self, transforms):
# TODO: Check structure of transforms dict
if not transforms:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonsubstantive, but I do like oneliners

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye... I like this better too. I think I left it uncollapsed because I was expecting to do more logic than would be elegant in a one-liner. For now, this works well.

@@ -380,7 +380,9 @@ def __init__(self, name, unit_type=None, via=None, external=False, private=False
if self.ALLOW_ALL_ATTRIBUTES or attr in self.EXTRA_ATTRIBUTES:
setattr(self, attr, value)
else:
raise KeyError("No such attribute {}.".format(attr))
raise AttributeError(
"Cannot initialize {}<{}> with attribute {}.".format(self.__class__.__name__, self.name, attr)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you quote the attribute name in single quotes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@matthewwardrop
Copy link
Owner

matthewwardrop commented Aug 7, 2018

I don't know how I feel about passing through all extra keyword arguments like this. There are some attributes which have been intentionally masked by the subclass constructors. Doing this re-opens direct access to these attributes, but does on the other hand simplify setting them. Most of these attributes are definitely not supposed to be touched.

The transforms were intentionally not made something that measure providers could set, because that can lead to unintuitive and confusing behaviour. Can you give me some examples of where it is necessary?

@danfrankj danfrankj force-pushed the df_configure_transforms branch from ef3ba16 to 8009d35 Compare August 30, 2018 20:52
@@ -197,15 +198,14 @@ class SQLMeasureProvider(MeasureProvider):

@classmethod
def _on_registered(cls, key):
for agg in ['sum', 'mean', 'sos', 'count']:
for agg in ['sum', 'mean', 'sos', 'count', '1']:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps rename to "any"?

def __init__(self, name, expr=None, default=None, desc=None, shared=False, partition=False, requires_constraint=False, provider=None):
_ProvidedFeature.__init__(self, name, expr=expr, default=default, desc=desc, shared=shared, provider=provider)
def __init__(self, name, expr=None, default=None, desc=None, shared=False, partition=False, requires_constraint=False, provider=None, **attrs):
_ProvidedFeature.__init__(self, name, expr=expr, default=default, desc=desc, shared=shared, provider=provider, **attrs)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, perhaps we should be more explicit with which attrs get sent through

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@danfrankj danfrankj force-pushed the df_configure_transforms branch from 8009d35 to 5a33132 Compare August 30, 2018 21:01
if isinstance(self.transforms, dict):
transforms.update(self.transforms.get('_default', {}))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default transforms for all unit types?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remain unconvinced, I'm afraid, that this is a good idea, as it leads to confusing (and usually wrong) behaviour :/. Suppose we chose count as the default aggregation for some measure. That might make sense the first time it is aggregated, but on subsequent aggregations (which may occur due to multiple rebase operations, for example), you would be taking the count of counts at each aggregations, which would make the resulting measure likely meaningless.

@matthewwardrop
Copy link
Owner

@danfrankj I still need to think about this a bit more deeply. It seems important to me that MeasureProviders not be able to influence the aggregation of the metric beyond aggregations that occur over the MeasureProvider in question. Anything else seems to break the natural and intuitive composition of arbitrary MeasureProviders, each ignorant of what the others provide. And yet... and yet... I can see why something like this is valuable. I'm going to be working pretty regularly on Mensor in the evening over the next couple of weeks (while at home on Dad duty), cleaning things up and implementing a few more higher-level components (such as a revised Metric and stats layer). As I do that, I'll keep this use-case in mind and see how much of a concession it actually is.

One alternative approach to that pursued here is to encourage things like this to be implemented as metrics, rather than measures. Is there a reason why this would not work well in this case?

@matthewwardrop
Copy link
Owner

@danfrankj We've spoken about this a little through other channels, but just to keep it all in one place:

I have given some more thought to the aggregation configuration issue. Are there any cases that you can think of where it makes sense to change the aggregation past the first aggregation of a given measure?

I'm thinking of adding support at the provider level for specifying what the next aggregation will be; perhaps via something like mp.provides_measure(..., next_agg='sum', next_post_agg=None). Would that deal with your use cases?

If we allow further configuration, it seems to me that there would be an explosion of complexity, not just in building the providers, but in understanding the behaviour of new providers. Measures can be aggregated over different unit types, potentially multiple times, and if a measure provider can affect downstream aggregations beyond the first aggregation, then a lot of weird (and probably incorrect) operations will occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants