Skip to content

Latest commit

 

History

History
76 lines (38 loc) · 1.72 KB

README.md

File metadata and controls

76 lines (38 loc) · 1.72 KB

Machine Learning - Data Processing

Upload the two files to Google Collab

****Run commands ********

import pandas as pd > converts data into a table import numpy as np

pd.read_csv() - to import data

=======================================================

sslc = pd.read_csv('marks.txt', sep = ';', header= None , index_col = None)

sslc.columns = ['region', 'roll_number' , 'fl','sl','math','sci','ss','total','pass','withheld','extra']

sslc['fl'] = pd.to_numeric(sslc['fl'], errors = 'coerce')

sslc['sl'] = pd.to_numeric(sslc['fl'], errors = 'coerce')

sslc['math'] = pd.to_numeric(sslc['fl'], errors = 'coerce')

sslc['sci'] = pd.to_numeric(sslc['fl'], errors = 'coerce')

sslc['ss'] = pd.to_numeric(sslc['fl'], errors = 'coerce')

sslc['total'] = pd.to_numeric(sslc['fl'], errors = 'coerce')

=====================================================

Functions to run

.head() - shows first 5 rows

.tail() - shows last 5 rows

.size - returns no of data(cells) in the data frame

.count() - returns the number of non-null value in each column

.shape - how many rows and how many columns > tuple

.ndim - returns the number of dimentions: two , three ... etc

.info

.nunique() - how many unique values in each column

.describe() - showing statistical methods like count / mean / standard deviation / min / 25% / 50% / 75% / max .describe(include =[]) - add data type like object / int / float / all .describe(exclude =[])

.memory_usage()

.columns

sslc['column name'] - access a specific column

.iloc[0] - select row

.set_index() set a custom index

.loc[] - select row

.isnull().sum()

.unique() - returns all unique values in the selected column

.replace()

pd.to_numeric() - convert to data type float