Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Data Wrangling with Polars #26

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

koushikkhan
Copy link

@koushikkhan koushikkhan commented Feb 8, 2025

📝 Summary

This PR is introducing a course on data wrangling with Polars using marimo notebooks. It is linked to the following issue #18 .

📋 Checklist

  • I have included package dependencies in the notebook file using --sandbox
  • If adding a course, include a README.md
  • Keep language direct and simple.

@koushikkhan koushikkhan changed the title Feat/issue#18/polars data wrangling Introducing Data Wrangling with Polars Feb 8, 2025
@Haleshot
Copy link
Collaborator

Haleshot commented Feb 8, 2025

Just explored the .py file via the open-notebooks feature and it looks great!

Was going to mention to include polars in the inline dependencies (start of the notebook) as suggested in our README instructions but it isn't required as the library isn't imported actually in the file.

Thanks for kicking off the Polars series!

@koushikkhan
Copy link
Author

@Haleshot Unfortunately I am not able to view my notebook via the open-notebooks. I am trying with the following URL: https://marimo.app/github.com/koushikkhan/learn/blob/feat/issue%2318/polars-data-wrangling/polars/01_why_polars.py.

Definitely I am missing something here.

Copy link
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the PR! This is a great start. The content is good, and makes me excited for the rest of the notebooks.

I have left some comments and suggestions. High level feedback:

  • Let's aim to be concise, and avoid repeating ourselves often.
  • Let's get to code examples sooner rather than later.
  • Notebooks are meant to be presented in edit mode, so there is no need to duplicate the code in the markdown.
  • Let's make sure the basics are explained explicitly, such as giving an example of a DataFrame early on.
  • Let's tone the language down a little bit — the style should be educational and informative.

Again, great start and thanks so much!

polars/01_why_polars.py Outdated Show resolved Hide resolved
polars/01_why_polars.py Outdated Show resolved Hide resolved
polars/01_why_polars.py Outdated Show resolved Hide resolved
polars/01_why_polars.py Outdated Show resolved Hide resolved

df_pd = pd.DataFrame(
{
"Gender": ["Male", "Female", "Male", "Female", "Male", "Female",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency throughout future notebooks, let's use lowercase keys: "gender", "height_cm".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, will continue with lower case letter while defining column names.

polars/01_why_polars.py Outdated Show resolved Hide resolved
polars/01_why_polars.py Outdated Show resolved Hide resolved
polars/01_why_polars.py Outdated Show resolved Hide resolved
polars/01_why_polars.py Outdated Show resolved Hide resolved
polars/01_why_polars.py Outdated Show resolved Hide resolved
@koushikkhan
Copy link
Author

Thanks so much for the PR! This is a great start. The content is good, and makes me excited for the rest of the notebooks.

I have left some comments and suggestions. High level feedback:

  • Let's aim to be concise, and avoid repeating ourselves often.
  • Let's get to code examples sooner rather than later.
  • Notebooks are meant to be presented in edit mode, so there is no need to duplicate the code in the markdown.
  • Let's make sure the basics are explained explicitly, such as giving an example of a DataFrame early on.
  • Let's tone the language down a little bit — the style should be educational and informative.

Again, great start and thanks so much!

@akshayka I will work on the change requests.

koushikkhan and others added 18 commits February 10, 2025 09:51
Co-authored-by: Akshay Agrawal <[email protected]>
Co-authored-by: Akshay Agrawal <[email protected]>
Co-authored-by: Akshay Agrawal <[email protected]>
Co-authored-by: Akshay Agrawal <[email protected]>
Co-authored-by: Akshay Agrawal <[email protected]>
Co-authored-by: Akshay Agrawal <[email protected]>

Like Pandas and PySpark, the central data structure in Polars is **the DataFrame**, a tabular data structure consisting of named columns. For example, the next cell constructs a DataFrame that records the gender, age, and height in centimeters for a number of individuals.

<INSERT CODE CELL>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion was to split this markdown into two cells, and insert a block of Python in between that creates a DataFrame (for example, the gender, age, and height dataframe) which is reused in subsequent cells. Perhaps

    import polars as pl

    df_pl = pl.DataFrame(
        { 
            "gender": ["Male", "Female", "Male", "Female", "Male", "Female", 
                       "Male", "Female", "Male", "Female"],
            "age": [13, 15, 17, 19, 21, 23, 25, 27, 29, 31],
            "height_cm": [150.0, 170.0, 146.5, 142.0, 155.0, 165.0, 170.8, 130.0, 132.5, 162.0]
        }
    )
    df_pl

@akshayka
Copy link
Contributor

@koushikkhan would you like any help finishing this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants