Per aquesta part on farem diferents exercicis, utilitzarem la taula d'alumnes-dual, ja treballada a classe anteriorment
https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh/data
- Desglossar el dataframe amb dos dataframes, un amb els grades i el altre amb el dual
- Realitzar un join amb els dos dataframes anteriors
- Merge i altres funcions d'agrupament
- Categorical de les notes dels alumnes (Exc.,Not...
- Auto categorical de les notes dels alumnes (Exc.,Not...
- Fitxers CSV Tractaments
import numpy as np
import pandas as pd
import copy
#np --> numerical panda, es una llibreria per a realitzar càlcul numèric
#les notes de dawbio amb series
student_list=["John","Mary","Lucy","Peter"]
grades_list = [7,9,8,4]
wants_dual_list = [False,True,False,True]
datos: dict[list] = {"grade": grades_list,
"dual": wants_dual_list}
students_frame = pd.DataFrame(
index=student_list,
data = datos
)
students_frame
grade | dual | |
---|---|---|
John | 7 | False |
Mary | 9 | True |
Lucy | 8 | False |
Peter | 4 | True |
students_grades: pd.DataFrame = copy.deepcopy(students_frame)
#al index li fiquem un nom
students_grades.index.name = "name"
students_grades
grade | dual | |
---|---|---|
name | ||
John | 7 | False |
Mary | 9 | True |
Lucy | 8 | False |
Peter | 4 | True |
#inplace matxaca el mateix dataFrame
#reset_index passa el index com columna
students_grades.reset_index(inplace=True)
students_grades
name | grade | dual | |
---|---|---|---|
0 | John | 7 | False |
1 | Mary | 9 | True |
2 | Lucy | 8 | False |
3 | Peter | 4 | True |
#borrem la columna
students_grades.drop(columns="dual", inplace=True)
students_grades
name | grade | |
---|---|---|
0 | John | 7 |
1 | Mary | 9 |
2 | Lucy | 8 |
3 | Peter | 4 |
students_grades.rename(columns={"name":"names"},inplace=True)
students_grades
names | grade | |
---|---|---|
0 | John | 7 |
1 | Mary | 9 |
2 | Lucy | 8 |
3 | Peter | 4 |
#Per ordenar les columnes com vols
students_grades = students_grades.loc[:, ["names","grade"]]
students_grades
names | grade | |
---|---|---|
0 | John | 7 |
1 | Mary | 9 |
2 | Lucy | 8 |
3 | Peter | 4 |
# SQL Like Joins
# Let's split the dataframe
students_grades: pd.DataFrame = (copy.deepcopy(students_frame)
.reset_index()
.rename(columns={"index":"name"})
.drop(columns="dual")
.loc[:,["name","grade"]]
)
students_grades
name | grade | |
---|---|---|
0 | John | 7 |
1 | Mary | 9 |
2 | Lucy | 8 |
3 | Peter | 4 |
students_duals: pd.DataFrame = (copy.deepcopy(students_frame)
.reset_index()
.rename(columns={"index":"name"})
.drop(columns="grade")
.loc[:,["name","dual"]]
)
students_duals
name | dual | |
---|---|---|
0 | John | False |
1 | Mary | True |
2 | Lucy | False |
3 | Peter | True |
#join with the 2 dataFrame
#how = inner fa intersecció, outer fa union.
join: pd.DataFrame = pd.merge(students_grades,students_duals, on="name", how="outer")
join
name | grade | dual | |
---|---|---|---|
0 | John | 7 | False |
1 | Mary | 9 | True |
2 | Lucy | 8 | False |
3 | Peter | 4 | True |
join.index = join.loc[:,"name"]
join.drop(columns=["name"], inplace=True)
join.index.name = ""
join
grade | dual | |
---|---|---|
John | 7 | False |
Mary | 9 | True |
Lucy | 8 | False |
Peter | 4 | True |
Per realitzar el que amb sql coneixem amb la comanda Group by. grouping documentació oficial
import numpy as np
import pandas as pd
import copy
df3 = pd.DataFrame(
{
"A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
"B": ["one", "one", "two", "three", "two", "two", "one", "three"],
"C": np.random.randn(8),
"D": np.random.randn(8),
}
)
df3
A | B | C | D | |
---|---|---|---|---|
0 | foo | one | 0.072135 | -1.829523 |
1 | bar | one | 1.642161 | -0.564049 |
2 | foo | two | -1.619752 | -0.505827 |
3 | bar | three | 0.134393 | 1.396961 |
4 | foo | two | -0.062482 | 1.082786 |
5 | bar | two | 0.365225 | 0.687873 |
6 | foo | one | 0.161747 | 0.055421 |
7 | foo | three | -0.180678 | 0.533017 |
df3_grouped = df3.groupby("A") #group by a colum(or more)
df3_grouped.groups #mostra les columnes que compleixien per cada valor de l'agrupació
{'bar': [1, 3, 5], 'foo': [0, 2, 4, 6, 7]}
df3_sum = df3_grouped.sum()
df3_sum #only numeric's columns
C | D | |
---|---|---|
A | ||
bar | 2.141779 | 1.520785 |
foo | -1.629029 | -0.664126 |
df3_groups = df3_grouped.groups
foo_rows = df3_groups['foo']
bar_roows = df3_groups['bar']
df3.loc[foo_rows, "C"]
0 0.072135
2 -1.619752
4 -0.062482
6 0.161747
7 -0.180678
Name: C, dtype: float64
df3.loc[foo_rows, "C"].sum()
-1.6290291261449736
df3.loc[bar_roows,"D"].sum()
1.5207848046995553
df3.loc[bar_roows,"D"].max()
1.3969613630859894
df3_grouped.cumcount()
0 0
1 0
2 1
3 1
4 2
5 2
6 3
7 4
dtype: int64
Consultar la documentació oficial de time series
Es pot marcar els diferents valors d'una de les columnes, del tipus Categories, marcant-la com categories dins la columna, no com a valors. categories documentació oficial
Exemple
import numpy as np
import pandas as pd
import copy
#np --> numerical panda, es una llibreria per a realitzar càlcul numèric
#les notes de dawbio amb series
student_list=["John","Mary","Lucy","Peter"]
grades_list = [7,9,8,4]
wants_dual_list = [False,True,False,True]
datos: dict[list] = {"grade": grades_list,
"dual": wants_dual_list}
students_frame = pd.DataFrame(
index=student_list,
data = datos
)
students_frame
grade | dual | |
---|---|---|
John | 7 | False |
Mary | 9 | True |
Lucy | 8 | False |
Peter | 4 | True |
#create a new column
students_frame.loc[:,"cat_grade"] = ["Not","Exc","Not","Insuf."]
students_frame.dtypes
grade int64
dual bool
cat_grade object
dtype: object
students_frame.loc[:,"cat_grade"]= students_frame.loc[:,"cat_grade"].astype("category")
students_frame.dtypes
grade int64
dual bool
cat_grade category
dtype: object
students_frame.loc[:,"cat_grade"]
John Not
Mary Exc
Lucy Not
Peter Insuf.
Name: cat_grade, dtype: category
Categories (3, object): ['Exc', 'Insuf.', 'Not']
#reasignar totes les categories
cat_grades: list[str] = reversed(["Exc","Not","Bien","Sufi","Insuf."])
students_frame.loc[:,"cat_grade"]= students_frame.loc[:,"cat_grade"].cat.set_categories(cat_grades)
#Order by cat_grade and grade
students_frame.sort_values(by=["cat_grade","grade"],ascending=False)
grade | dual | cat_grade | |
---|---|---|---|
Mary | 9 | True | Exc |
Lucy | 8 | False | Not |
John | 7 | False | Not |
Peter | 4 | True | Insuf. |
Tenint les dades dels alumnes amb una columna de notes, crear una nova columna que classifiqui aquella nota, amb les expressions Insuf, Suficient, Bé, Notable o Excel·lent.