Initial commit

atmsillinois · Mar 20, 2024 · 20b768a · 20b768a
commit 20b768a
Show file tree

Hide file tree

Showing 49 changed files with 118,375 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+/.DS_Store
+/.ipynb_checkpoints
diff --git a/ATMS 523 HW 5.md b/ATMS 523 HW 5.md
@@ -0,0 +1,55 @@
+# ATMS 523
+
+## Module 4 Project
+
+Submit this code as a pull request back to GitHub Classroom by the date and time listed in Canvas.
+
+For this assignment, use the dataset called `radar_parameters.csv` provided in the GitHub repository in the folder `homework`.
+
+The provided data has 
+
+## Dataset Description
+
+The training data consists of polarimetric radar parameters calculated from a disdrometer (an instrument that measures rain drop sizes, shapes, and rainfall rate) measurements from several years in Huntsville, Alabama. A model called `pytmatrix` is used to calculate polarimetric radar parameters from the droplet observations, which can be used as a way to compare what a remote sensing instrument would see and rainfall.
+
+## Data columns
+
+Features (radar measurements):
+
+`Zh` - radar reflectivity factor (dBZ) - use the formula $dBZ = 10\log_{10}(Z)$
+
+`Zdr` - differential reflectivity
+
+`Ldr` - linear depolarization ratio
+
+`Kdp` - specific differential phase
+
+`Ah` - specific attenuation
+
+`Adp` - differential attenuation
+
+Target :
+
+`R` - rain rate
+
+## Your assignment
+
+1. Split the data into a 70-30 split for training and testing data.
+
+2. Using the split created in (1), train a multiple linear regression dataset using the training dataset, and validate it using the testing dataset.  Compare the $R^2$ and root mean square errors of model on the training and testing sets to a baseline prediction of rain rate using the formula $Z = 200 R^{1.6}$.
+
+3. Repeat 1 doing a grid search over polynomial orders, using a grid search over orders 0-21, and use cross-validation of 7 folds.  For the best polynomial model in terms of $R^2$, does it outperform the baseline and the linear regression model in terms of $R^2$ and root mean square error?
+
+4. Repeat 1 with a Random Forest Regressor, and perform a grid_search on the following parameters:
+
+   ```python
+   {'bootstrap': [True, False],  
+   'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],  
+   'max_features': ['auto', 'sqrt'],  
+   'min_samples_leaf': [1, 2, 4],  
+   'min_samples_split': [2, 5, 10],  
+   'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}
+   ```
+  Can you beat the baseline, or the linear regression, or best polynomial model with the best optimized Random Forest Regressor in terms of $R^2$ and root mean square error?
+
+
diff --git a/BicycleWeather.csv b/BicycleWeather.csv