Introducing Data Wrangling with Polars #26

koushikkhan · 2025-02-08T15:53:32Z

📝 Summary

This PR is introducing a course on data wrangling with Polars using marimo notebooks. It is linked to the following issue #18 .

📋 Checklist

I have included package dependencies in the notebook file using --sandbox
If adding a course, include a README.md
Keep language direct and simple.

Haleshot · 2025-02-08T18:03:37Z

Just explored the .py file via the open-notebooks feature and it looks great!

Was going to mention to include polars in the inline dependencies (start of the notebook) as suggested in our README instructions but it isn't required as the library isn't imported actually in the file.

Thanks for kicking off the Polars series!

koushikkhan · 2025-02-09T04:44:53Z

@Haleshot Unfortunately I am not able to view my notebook via the open-notebooks. I am trying with the following URL: https://marimo.app/github.com/koushikkhan/learn/blob/feat/issue%2318/polars-data-wrangling/polars/01_why_polars.py.

Definitely I am missing something here.

akshayka

Thanks so much for the PR! This is a great start. The content is good, and makes me excited for the rest of the notebooks.

I have left some comments and suggestions. High level feedback:

Let's aim to be concise, and avoid repeating ourselves often.
Let's get to code examples sooner rather than later.
Notebooks are meant to be presented in edit mode, so there is no need to duplicate the code in the markdown.
Let's make sure the basics are explained explicitly, such as giving an example of a DataFrame early on.
Let's tone the language down a little bit — the style should be educational and informative.

Again, great start and thanks so much!

polars/01_why_polars.py

akshayka · 2025-02-09T20:44:15Z

polars/01_why_polars.py

+
+    df_pd = pd.DataFrame(
+        { 
+            "Gender": ["Male", "Female", "Male", "Female", "Male", "Female", 


nit: for consistency throughout future notebooks, let's use lowercase keys: "gender", "height_cm".

Alright, will continue with lower case letter while defining column names.

polars/01_why_polars.py

koushikkhan · 2025-02-10T04:19:46Z

Thanks so much for the PR! This is a great start. The content is good, and makes me excited for the rest of the notebooks.

I have left some comments and suggestions. High level feedback:

Let's aim to be concise, and avoid repeating ourselves often.

Let's get to code examples sooner rather than later.

Notebooks are meant to be presented in edit mode, so there is no need to duplicate the code in the markdown.

Let's make sure the basics are explained explicitly, such as giving an example of a DataFrame early on.

Let's tone the language down a little bit — the style should be educational and informative.

Again, great start and thanks so much!

@akshayka I will work on the change requests.

Co-authored-by: Akshay Agrawal <[email protected]>

…braries Co-authored-by: Akshay Agrawal <[email protected]>

Co-authored-by: Akshay Agrawal <[email protected]>

akshayka · 2025-02-10T21:53:55Z

polars/01_why_polars.py

+
+        Like Pandas and PySpark, the central data structure in Polars is **the DataFrame**, a tabular data structure consisting of named columns. For example, the next cell constructs a DataFrame that records the gender, age, and height in centimeters for a number of individuals. 
+
+        <INSERT CODE CELL>


The suggestion was to split this markdown into two cells, and insert a block of Python in between that creates a DataFrame (for example, the gender, age, and height dataframe) which is reused in subsequent cells. Perhaps

import polars as pl df_pl = pl.DataFrame( { "gender": ["Male", "Female", "Male", "Female", "Male", "Female", "Male", "Female", "Male", "Female"], "age": [13, 15, 17, 19, 21, 23, 25, 27, 29, 31], "height_cm": [150.0, 170.0, 146.5, 142.0, 155.0, 165.0, 170.8, 130.0, 132.5, 162.0] } ) df_pl

akshayka · 2025-02-13T17:24:08Z

@koushikkhan would you like any help finishing this PR?

koushik-ta and others added 6 commits February 8, 2025 14:13

polars dir created

bca2d8f

Merge branch 'marimo-team:main' into feat/issue#18/polars-data-wrangling

cd95559

updated why_polars

24c07d4

deleted layouts dir

1f12cee

added readme for polars

173b025

notebook indexing updated

3f17228

koushikkhan changed the title ~~Feat/issue#18/polars data wrangling~~ Introducing Data Wrangling with Polars Feb 8, 2025

updated visual aspects

3bdaf96

akshayka reviewed Feb 9, 2025

View reviewed changes

koushikkhan and others added 18 commits February 10, 2025 09:51

Update polars/01_why_polars.py

fb175fb

Co-authored-by: Akshay Agrawal <[email protected]>

Updated section header - Intuitive syntax

7be0656

Co-authored-by: Akshay Agrawal <[email protected]>

updated text under intuitive syntax

912d45e

Co-authored-by: Akshay Agrawal <[email protected]>

keeping only code cell for examples

41a51b5

Co-authored-by: Akshay Agrawal <[email protected]>

updated text under introduction

e7ecc90

Co-authored-by: Akshay Agrawal <[email protected]>

updated text under why polars

1c1351f

Co-authored-by: Akshay Agrawal <[email protected]>

updated text before showing examples

2b50c4f

Co-authored-by: Akshay Agrawal <[email protected]>

Updated section header - Choosing Polars over Pandas

070c0c7

Co-authored-by: Akshay Agrawal <[email protected]>

updated text for intro

9a474f5

Co-authored-by: Akshay Agrawal <[email protected]>

keeping only code cell for examples - polars

4c8f59f

Co-authored-by: Akshay Agrawal <[email protected]>

simplifying textual description

67c85ae

Co-authored-by: Akshay Agrawal <[email protected]>

keeping only code cell for examples

40bbf43

Co-authored-by: Akshay Agrawal <[email protected]>

updating section header - A large collection of built-in APIs

1a5601d

Co-authored-by: Akshay Agrawal <[email protected]>

updated text under - A large collection of build-in APIs

f2c1d1b

Co-authored-by: Akshay Agrawal <[email protected]>

updated section header - Query optimization

15a2dfa

Co-authored-by: Akshay Agrawal <[email protected]>

updated section header - Scalability — handling large datasets in memory

a7e90d2

Co-authored-by: Akshay Agrawal <[email protected]>

updated textual description

9ae3f7e

Co-authored-by: Akshay Agrawal <[email protected]>

updated section header - Compatibility with other machine learning li…

794f008

…braries Co-authored-by: Akshay Agrawal <[email protected]>

koushikkhan and others added 6 commits February 10, 2025 10:26

updated section header - Easy to use, with room for power users

8bb057d

Co-authored-by: Akshay Agrawal <[email protected]>

updated section header - Why not PySpark?

00e8b42

Co-authored-by: Akshay Agrawal <[email protected]>

updated textual description under - Why not PySpark?

a664014

Co-authored-by: Akshay Agrawal <[email protected]>

updated reference description

64ad03e

Co-authored-by: Akshay Agrawal <[email protected]>

updated textual description under introduction

e9c1403

Co-authored-by: Akshay Agrawal <[email protected]>

updated reference header

91124fd

Co-authored-by: Akshay Agrawal <[email protected]>

akshayka reviewed Feb 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing Data Wrangling with Polars #26

Introducing Data Wrangling with Polars #26

koushikkhan commented Feb 8, 2025 •

edited

Loading

Haleshot commented Feb 8, 2025

koushikkhan commented Feb 9, 2025

akshayka left a comment

akshayka Feb 9, 2025

koushikkhan Feb 10, 2025

koushikkhan commented Feb 10, 2025

akshayka Feb 10, 2025

akshayka commented Feb 13, 2025


		Like Pandas and PySpark, the central data structure in Polars is the DataFrame, a tabular data structure consisting of named columns. For example, the next cell constructs a DataFrame that records the gender, age, and height in centimeters for a number of individuals.

		<INSERT CODE CELL>

Introducing Data Wrangling with Polars #26

Are you sure you want to change the base?

Introducing Data Wrangling with Polars #26

Conversation

koushikkhan commented Feb 8, 2025 • edited Loading

📝 Summary

📋 Checklist

Haleshot commented Feb 8, 2025

koushikkhan commented Feb 9, 2025

akshayka left a comment

Choose a reason for hiding this comment

akshayka Feb 9, 2025

Choose a reason for hiding this comment

koushikkhan Feb 10, 2025

Choose a reason for hiding this comment

koushikkhan commented Feb 10, 2025

akshayka Feb 10, 2025

Choose a reason for hiding this comment

akshayka commented Feb 13, 2025

koushikkhan commented Feb 8, 2025 •

edited

Loading