Skip to content

Latest commit

 

History

History
 
 

A023_FuncionsAgrupacio

Joins utilitzants Pandas

Per aquesta part on farem diferents exercicis, utilitzarem la taula d'alumnes-dual, ja treballada a classe anteriorment

Datasets per a practicar.

https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh/data

Exemples.

Exercicis fets a classe (2022).

Teoria

import numpy as np 
import pandas as pd
import copy
#np --> numerical panda, es una llibreria per a realitzar càlcul numèric
#les notes de dawbio amb series
student_list=["John","Mary","Lucy","Peter"]
grades_list = [7,9,8,4]
wants_dual_list = [False,True,False,True]
datos: dict[list] = {"grade": grades_list,
                   "dual": wants_dual_list}
students_frame = pd.DataFrame(
    index=student_list,
    data = datos
)
students_frame
grade dual
John 7 False
Mary 9 True
Lucy 8 False
Peter 4 True
students_grades: pd.DataFrame = copy.deepcopy(students_frame)
#al index li fiquem un nom
students_grades.index.name = "name"
students_grades
grade dual
name
John 7 False
Mary 9 True
Lucy 8 False
Peter 4 True
#inplace matxaca el mateix dataFrame
#reset_index passa el index com columna
students_grades.reset_index(inplace=True)
students_grades
name grade dual
0 John 7 False
1 Mary 9 True
2 Lucy 8 False
3 Peter 4 True
#borrem la columna
students_grades.drop(columns="dual", inplace=True)
students_grades
name grade
0 John 7
1 Mary 9
2 Lucy 8
3 Peter 4
students_grades.rename(columns={"name":"names"},inplace=True)
students_grades
names grade
0 John 7
1 Mary 9
2 Lucy 8
3 Peter 4
#Per ordenar les columnes com vols
students_grades = students_grades.loc[:, ["names","grade"]]
students_grades
names grade
0 John 7
1 Mary 9
2 Lucy 8
3 Peter 4
# SQL Like Joins
# Let's split the dataframe
students_grades: pd.DataFrame = (copy.deepcopy(students_frame)
                                    .reset_index()
                                    .rename(columns={"index":"name"})
                                    .drop(columns="dual")
                                    .loc[:,["name","grade"]]
                                    )
students_grades
name grade
0 John 7
1 Mary 9
2 Lucy 8
3 Peter 4
students_duals: pd.DataFrame = (copy.deepcopy(students_frame)
                                    .reset_index()
                                    .rename(columns={"index":"name"})
                                    .drop(columns="grade")
                                    .loc[:,["name","dual"]]
                                        )
students_duals
name dual
0 John False
1 Mary True
2 Lucy False
3 Peter True

#join with the 2 dataFrame
#how = inner fa intersecció, outer fa union. 
join: pd.DataFrame = pd.merge(students_grades,students_duals, on="name", how="outer")
join
name grade dual
0 John 7 False
1 Mary 9 True
2 Lucy 8 False
3 Peter 4 True
join.index = join.loc[:,"name"]
join.drop(columns=["name"], inplace=True)
join.index.name = ""
join
grade dual
John 7 False
Mary 9 True
Lucy 8 False
Peter 4 True

Grouping - Agrupacions

Per realitzar el que amb sql coneixem amb la comanda Group by. grouping documentació oficial

import numpy as np 
import pandas as pd
import copy
df3 = pd.DataFrame(
    {
        "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
        "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
        "C": np.random.randn(8),
        "D": np.random.randn(8),
    }
)
df3
A B C D
0 foo one 0.072135 -1.829523
1 bar one 1.642161 -0.564049
2 foo two -1.619752 -0.505827
3 bar three 0.134393 1.396961
4 foo two -0.062482 1.082786
5 bar two 0.365225 0.687873
6 foo one 0.161747 0.055421
7 foo three -0.180678 0.533017
df3_grouped = df3.groupby("A") #group by a colum(or more)
df3_grouped.groups #mostra les columnes que compleixien per cada valor de l'agrupació

{'bar': [1, 3, 5], 'foo': [0, 2, 4, 6, 7]}

df3_sum = df3_grouped.sum()
df3_sum #only numeric's columns
C D
A
bar 2.141779 1.520785
foo -1.629029 -0.664126
df3_groups = df3_grouped.groups

foo_rows = df3_groups['foo']
bar_roows = df3_groups['bar']

df3.loc[foo_rows, "C"]

0 0.072135

2 -1.619752

4 -0.062482

6 0.161747

7 -0.180678

Name: C, dtype: float64

df3.loc[foo_rows, "C"].sum()

-1.6290291261449736

df3.loc[bar_roows,"D"].sum()

1.5207848046995553

df3.loc[bar_roows,"D"].max()

1.3969613630859894

df3_grouped.cumcount()

0 0

1 0

2 1

3 1

4 2

5 2

6 3

7 4

dtype: int64

Time Series

Consultar la documentació oficial de time series

Categoricals

Es pot marcar els diferents valors d'una de les columnes, del tipus Categories, marcant-la com categories dins la columna, no com a valors. categories documentació oficial

Exemple

import numpy as np 
import pandas as pd
import copy
#np --> numerical panda, es una llibreria per a realitzar càlcul numèric
#les notes de dawbio amb series
student_list=["John","Mary","Lucy","Peter"]
grades_list = [7,9,8,4]
wants_dual_list = [False,True,False,True]
datos: dict[list] = {"grade": grades_list,
                   "dual": wants_dual_list}
students_frame = pd.DataFrame(
    index=student_list,
    data = datos
)
students_frame
grade dual
John 7 False
Mary 9 True
Lucy 8 False
Peter 4 True
#create a new column
students_frame.loc[:,"cat_grade"] = ["Not","Exc","Not","Insuf."]
students_frame.dtypes

grade int64

dual bool

cat_grade object

dtype: object

students_frame.loc[:,"cat_grade"]= students_frame.loc[:,"cat_grade"].astype("category")
students_frame.dtypes

grade int64

dual bool

cat_grade category

dtype: object

students_frame.loc[:,"cat_grade"]

John Not

Mary Exc

Lucy Not

Peter Insuf.

Name: cat_grade, dtype: category

Categories (3, object): ['Exc', 'Insuf.', 'Not']

#reasignar totes les categories 
cat_grades: list[str] = reversed(["Exc","Not","Bien","Sufi","Insuf."])
students_frame.loc[:,"cat_grade"]= students_frame.loc[:,"cat_grade"].cat.set_categories(cat_grades)
#Order by cat_grade and grade
students_frame.sort_values(by=["cat_grade","grade"],ascending=False)
grade dual cat_grade
Mary 9 True Exc
Lucy 8 False Not
John 7 False Not
Peter 4 True Insuf.

Com crear una nova columna calculada a partir d'una de les columnes, la de les notes.

Tenint les dades dels alumnes amb una columna de notes, crear una nova columna que classifiqui aquella nota, amb les expressions Insuf, Suficient, Bé, Notable o Excel·lent.

· Click per veure diferents solucions del mateix exercici