This is the teaching material that I use for laboratory lessons on practical data science for the following courses:
- Data Analytics for Smart Agriculture (I semester, Politecnico di Milano, Milan campus)
- Data Harvesting and Data Analysis for Agriculture (II semester, Politecnico di Milano, Cremona campus)
Within each directory, you will find a theory notebook that is extensively commented on the respective lesson topic, along with a homework directory containing exercises and their solutions. The comprehensive index of course topics is provided below.
- What is programming?
- Python
- Variables and Types
- Lists
- Tuples
- Basic Operators
- Conditions
- Loops
- Functions
- Dictionaries
- Classes and Objects
- Basic String Operations
- String Formatting
- Scopes and Namespaces
- Modules and Packages
- What is NumPy?
- NumPy Arrays
- Array Operations
- Array Slicing and Indexing
- Array Reshaping
- Array Stacking and Concatenation
- Random Numbers
- Unique Items and Counts
- Adding and Removing Dimensions
- What is Pandas?
- Pandas Data Structures
- Data Import and Export
- Data Exploration
- Indexing and Selecting Data
- Assigning Data
- Adding and deleting columns
- Grouping
- Merging
- What is Exploratory Data Analysis (EDA)?
- Preliminary Exploration
- Descriptive Statistics
- Data Visualization
- Pandas, Seaborn or Matplotlib?
- Summary of functions
- What is Data Preparation?
- Missing values
- Figure out why the data is missing
- Dealing with missing values
- Drop missing values
- Imputation
- Imputation with scikit-learn
- Missing indicators
- Feature scaling
- Parsing dates
- Inconsistent data entry
- What is Feature Engineering?
- Handling categorical variables
- Creating features
- Principal Component Analysis
- Feature selection
- Mutual information
- What is Supervised Learning and Regression?
- What is Linear Regression?
- Why to use Linear Regression?
- How to use Linear Regression?
- Linear Regression Equations
- Linear Regression with Scikit-Learn
- Least Squares Method
- Model Building
- Train-Validation-Test split
- Model Evaluation
- Linear Regression Assumptions
- Considerations of Multiple Linear Regression
- Overfitting
- Multicollinearity
- Polynomial Regression
- Regularization Techniques
- Model Selection
- Cross-validation
- Hypothesis Testing
- k-Nearest Neighbors Regression
- What is Classification?
- What is Logistic Regression?
- Linear Regression for Classification
- Simple Logistic Regression
- Multinomial Logistic Regression
- Model Evaluation
- Visualize Predictions and Decision Boundaries
- Polynomial Logistic Regression
- Regularization
- k-Nearest Neighbors Classification
- What are Decision Trees?
- How to build Decision Trees?
- Comparison with other models
- How do decision trees work?
- Class-imbalanced datasets
- Ensemble methods
- Bagging
- Random Forest
- Boosting
- What is Clustering?
- Distance Metrics
- Standardization for Clustering
- Agglomerative (or Hierarchical) Clustering
- Linkage Matrix
- The Dendrogram
- Linkage Methods
- K-Means Clustering
- DBSCAN Clustering
- Evaluation Metrics for Clustering
- Deciding the Number of Clusters
- Comparing Clustering Algorithms on Synthetic Data