-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chunked_filter (#344) #346
base: master
Are you sure you want to change the base?
Add chunked_filter (#344) #346
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good, would also be good to see an extra test or two for edge cases in test_iterutils.py
. Thanks again for this!
boltons/iterutils.py
Outdated
@@ -374,6 +374,30 @@ def chunked_iter(src, size, **kw): | |||
return | |||
|
|||
|
|||
def chunked_filter(iterable, predicate, chunk_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see a docstring here. Looking around at other functions, I noticed that most refer to the predicate as key
. Do you mind updating this argument to be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see various names and since it's not accessing a property of passed objects, I'd find key
in this context confusing. builtins.filter uses iterable
and function
. Actually it's filter(function, iterable) (reverse order). Since it's a variation of that, how about using these names and order?
chunked_filter(function, iterable, size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interestingly itertools.filterfalse(predicate, iterable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think the prime reference for this should be the sorted()
function, where key
is used. min()
and max()
also use key
, I think. So, tldr. key
should be preferred here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think the prime reference for this should be the
sorted()
function, wherekey
is used.
There's a good reason why sorted and min/max use key
and filter
& filterfalse
don't. When you compare two objects, you either compare the value itself or its part, hence the key
- attribute, only shorter. When you filter a collection, it can be an arbitrary criterium that's external to the object, so you use function
(...to be called on an item) or more specifically, predicate
.
Also, since it's an extension of builtins.filter
, what's the reason for following naming pattern of any other function?
builtins.filter allows None function, in which case an identity function is used. Here it doesn't make much sense from the use case perspective, but perhaps it would from the usability perspective and for consistency? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress! I like all the edge case handling. I agree that there's no harm in allowing key
to be None
, except that then you may have to come up with a default for size
, based on the ordering. I'm fine either way, so I'll let you make the call. :)
@@ -374,26 +374,50 @@ def chunked_iter(src, size, **kw): | |||
return | |||
|
|||
|
|||
def chunked_filter(iterable, predicate, chunk_size): | |||
"""A version of :func:`filter` which will call predicate with a chunk of the iterable. | |||
def chunked_filter(iterable, predicate, size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, based on the docstring, I think you meant to rename predicate
here to key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, see previous comments
[0, 2, 4, 6, 8] | ||
|
||
In the above example the lambda function is called twice: once with values | ||
0-4 and then for 5-9. | ||
|
||
Args: | ||
iterable (Iterable): Items to filter | ||
predicate (Callable): Predicate function | ||
chunk_size (int): The maximum size of chunks that will be passed the | ||
predicate (Callable): Bulk predicate function that accepts a list of items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate
-> key
to match description above?
|
||
allow_list = list(allow_iter) | ||
if len(allow_list) != len(src_): | ||
raise ValueError('expected the iterable from key(src) has the same length as the passed chunk of items') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"has the same" -> "to have the same"
I think it's great you're thinking about this. For exceptions, these days, I usually recommend the following format to maximize debuggability:
raise ValueError("chunked_filter expected key func to return an iterable of length {src_len}, got {allow_list_len}")
.
Similar changes could be made to other exception messages above just by adding ", not {actually_received_type}"
No description provided.