Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsupported aggregate action while running .collect() #171

Open
lastlegion opened this issue Feb 26, 2018 · 6 comments
Open

unsupported aggregate action while running .collect() #171

lastlegion opened this issue Feb 26, 2018 · 6 comments

Comments

@lastlegion
Copy link

Hi I'm not able to run .collect(). I get the the following error:
unsupported aggregate action

Digging through the source code I found there is no handler for collect. There are corresponding countToAggregation, countDistinctToAggregation etc. but no collectToAggregation. Is that the issue?
https://github.com/implydata/plywood/blob/master/src/external/utils/druidAggregationBuilder.ts#L190

@robertervin
Copy link

@lastlegion Could you post your query (make it more generic if you need to)?

You should be running collect on a DATASET type object. It will hit https://github.com/implydata/plywood/blob/master/src/expressions/baseExpression.ts#L1431 to use a CollectExpression

I may be able to help out more if I understand your exact query.

@lastlegion
Copy link
Author

lastlegion commented Feb 26, 2018

I've been trying different variants of the following query:

var ex4 = ply().apply("d", $("dataset").filter($("field2").in(0,1)))
        .apply("dd2", $("d").collect("$field2"))

or

var ex4 = ply().apply("d", $("dataset").filter($("field2").in(0,1)))
        .apply("dd", $("d").select("field1", "field2"))
        .apply("dd2", $("dd").collect("$field2"))

and I get the following error:

(node:55886) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: unsupported aggregate action $__SEGMENT__:DATASET.collect($HR:NUMBER) (as __VALUE__)

I tried replacing collect with other aggregation functions like count etc. and was able to get the right results. Thanks @robertervin for your help!

@robertervin
Copy link

@lastlegion You're completely correct on this. I also ran into this issue when I tried running it.

What you want to do instead is first split by field2. This will issue a groupBy query to Druid, which is much more efficient than pulling all field2 values into memory in javascript then making them distinct.

So your first query would turn into

var ex4 = ply()
    .apply("d", $("dataset")
        .filter($("field2").in(0,1))
    )
    .apply("dd2", $("d")
        .split({id: "$field2"})
        .collect("$id")
    )

Perhaps @vogievetsky may be able to shed some light on why Plywood doesn't perform this by default, or if my logic on the reasoning is correct.

@lastlegion
Copy link
Author

Thanks for your help! Yes I'm able to the desired output by applying .split() before .collect(). Yes if this is a bug then I'm willing to put in a pull request or help with documenting it.

@robertervin
Copy link

@lastlegion You're welcome! I'm not actually a member of Imply, so I can't accept any PRs or anything. I do think the documentation is fairly good, but could could definitely be improved.

You can try submitting a PR for it in https://github.com/implydata/plywood/blob/master/docs/expressions.md and calling out @vogievetsky (who I believe is the sole owner of this codebase).

Would appreciate if you could close this issue as well since it's fixed.

@lastlegion
Copy link
Author

Yes I agree the documentation is really good!
I'm not sure about this particular case, if this is the desired behavior. I'll keep it open for @vogievetsky to comment on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants