-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#6831: Implement read_parquet_glob and to_parquet_glob #6854
Conversation
Signed-off-by: Anatoly Myachev <[email protected]>
Does this PR also resolve #5723? |
@@ -3187,4 +3187,4 @@ def __reduce__(self): | |||
# Persistance support methods - END | |||
|
|||
# Namespace for experimental functions | |||
modin = CachedAccessor("modin", ExperimentalFunctions) | |||
modin: ExperimentalFunctions = CachedAccessor("modin", ExperimentalFunctions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For IDE hints to work.
Yes and no. In a broad sense, yes, several files are read in one function call, but it is supposed to implement reading a list of files, and not files defined by glob syntax. It is clear that the difference is only from an interface point of view, but it is inconvenient to reuse experimental functionality in the core module. |
da71bf0
to
c4009ce
Compare
Signed-off-by: Anatoly Myachev <[email protected]>
@doc(_doc_parse_func, parameters=_doc_parse_parameters_common) | ||
def parse(fname, **kwargs): | ||
warnings.filterwarnings("ignore") | ||
num_splits = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each file is equal to one partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably change this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an opportunity for further optimization, so if necessary, yes. However it's more important to add support for different formats for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File an issue for further optimization please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Iaroslav Igoshev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
ExperimentalPandasPickleParser, ExperimentalPickleDispatcher | ||
ExperimentalPandasPickleParser, ExperimentalGlobDispatcher | ||
) | ||
to_pickle_distributed = __make_write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change read_pickle_distributed
and to_pickle_distributed
to read_pickle_glob
and to_pickle_glob
(separate issue)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like it for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File an issue for that please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more comments. Otherwise, LGTM.
Co-authored-by: Iaroslav Igoshev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
read_parquet_glob
#6831docs/development/architecture.rst
is up-to-date