Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some doc about transition from tidyverse? #130

Closed
etiennebacher opened this issue Mar 21, 2022 · 7 comments · Fixed by #183
Closed

Add some doc about transition from tidyverse? #130

etiennebacher opened this issue Mar 21, 2022 · 7 comments · Fixed by #183
Labels
Docs 📚 Improvements or additions to documentation

Comments

@etiennebacher
Copy link
Member

I still use a lot the tidyverse because I know the main functions and how they work. I think that one thing missing in the docs is an article/table that shows equivalent functions between tidyverse and easystats ecosystems. This would do two things: 1) reduce the time spent looking for the equivalent of a function we use a lot with tidyverse, 2) highlight which functions of easystats don't have equivalent in tidyverse. I don't have something complicated in mind, maybe simply a table like:

tidyverse easystats
pull data_extract
rename data_rename
replace_na convert_na_to
... ...

To go further, this could be accompanied by some examples showing how to convert tidyverse workflows (just a few functions separated by a pipe) to easystats workflows. What do you think?

@IndrajeetPatil
Copy link
Member

This would indeed be nice, but given how rapidly the API is evolving and changing, it would be better to wait for a bit before preparing a document like this.

It would ideally look something like this: https://dplyr.tidyverse.org/articles/base.html

@IndrajeetPatil IndrajeetPatil added the Docs 📚 Improvements or additions to documentation label Mar 21, 2022
@etiennebacher
Copy link
Member Author

It would ideally look something like this: https://dplyr.tidyverse.org/articles/base.html

Yes exactly, I didn't know this table, it's super useful

@strengejacke
Copy link
Member

One question is, whether we want to mimic most/all important function? While mutate() can be replaced somehow by transform(), a summarise() equivalent (that also works on grouped df) is missing, and aggregate() is by design only a poor substitute, as you can only apply one function... so, do we also want something like summarise() in datawizard, or do we promote a co-existence with dplyr / tidyr?

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Jun 10, 2022

I would say no, we simply don't have the manpower to develop and maintain a full alternative to the data-wrangling abilities of dplyr, especially since they basically developed a whole new architecture to support their group_by -> summarize pipeline.

plus, I don't think we have any legitimate reason (aside from our own hubris ^^) to present a full alternative to the tidyverse, most users will have tidyverse installed anyway so the dependency-argument doesn't really hold for regular users. Coexistence ftw

And the scope of easystats and tidyverse in general is arguably somewhat different, and in particular datawizard could be more explicitly framed as "data preprocessing / cleaning" (implied: before doing stats) than pure data wrangling

@IndrajeetPatil
Copy link
Member

I agree with Dom. Trying to mimic dplyr/tidyr will expand the scope of easystats beyond what we currently have in mind.

The existing data wrangling functions have organically materialized out of our "0-external-dependency principle", and we should continue to operate the same way, adding only those data wrangling functions which are needed in the ecosystem without being concerned whether the suite of functionality is comparable to that provided by tidyverse.

In the near future, I think other developers might also be interested in using datawizard to keep their dependencies to a minimum and if they request or implement features that mimic tidyverse, then that will be a welcome addition!

But, for now, our modus operandi should be to develop only what we need for the ecosystem.

@strengejacke
Copy link
Member

Ok, agreed

@DominiqueMakowski
Copy link
Member

as a matter of fact, datawizard could benefit from some functionalities like janitor has, and other (missing value imputation etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs 📚 Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants