Skip to content

Commit

Permalink
Merge pull request #69 from chenyangkang/development
Browse files Browse the repository at this point in the history
Fully randomized grids; Lazy-loading model dictionary; version 1.1.2. #59; #64
  • Loading branch information
chenyangkang authored Oct 27, 2024
2 parents 32f80a7 + 2f6dc8f commit 6c313a3
Show file tree
Hide file tree
Showing 34 changed files with 4,579 additions and 1,608 deletions.
5 changes: 5 additions & 0 deletions docs/API_Documentation/utils/stemflow.utils.lazyloading.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# stemflow.utils.lazyloading

---
::: stemflow.utils.lazyloading
---
5 changes: 3 additions & 2 deletions docs/A_brief_introduction/A_brief_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ In the first case, the classifier and regressor "talk" to each other in each sep
User can define the size of the stixels (spatial temporal grids) in terms of space and time. Larger stixel promotes generalizability but loses precision in fine resolution; Smaller stixel may have better predictability in the exact area but reduced ability of extrapolation for points outside the stixel. See section [Optimizing stixel size](https://chenyangkang.github.io/stemflow/Examples/07.Optimizing_stixel_size.html) for discussion about selecting gridding parameters and [Tips for spatiotemporal indexing](https://chenyangkang.github.io/stemflow/Tips/Tips_for_spatiotemporal_indexing.html).

## A simple demo
In the demo, we first split the training data using temporal sliding windows with a size of 50 day of year (DOY) and step of 20 DOY (`temporal_start = 1`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval=50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 25 units (measured in longitude and latitude, `grid_len_upper_threshold=25`), and stop splitting to prevent the edge length being chunked below 5 units (`grid_len_lower_threshold=5`) or containing less than 50 checklists (`points_lower_threshold=50`). Model fitting is run using 1 core (`njobs=1`).
In the demo, we first split the training data using temporal sliding windows with a size of 50 day of year (DOY) and step of 20 DOY (`temporal_start = 1`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval=50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 25 units (measured in longitude and latitude, `grid_len_upper_threshold=25`), and stop splitting to prevent the edge length being chunked below 5 units (`grid_len_lower_threshold=5`) or containing less than 50 checklists (`points_lower_threshold=50`). Model fitting is run using 1 core (`n_jobs=1`).

This process is executed 10 times (`ensemble_fold = 10`), each time with random jitter and random rotation of the gridding, generating 10 ensembles. In the prediction phase, only spatial-temporal points with more than 7 (`min_ensemble_required = 7`) ensembles usable are predicted (otherwise, set as `np.nan`).

Expand Down Expand Up @@ -68,7 +68,8 @@ model = AdaSTEMRegressor(
Spatio2='latitude', # spatial coordinates shown in the dataframe
Temporal1='DOY',
use_temporal_to_train=True, # In each stixel, whether 'DOY' should be a predictor
njobs=1
n_jobs=1,
random_state=42
)
```

Expand Down
2,349 changes: 1,525 additions & 824 deletions docs/Examples/01.AdaSTEM_demo.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/Examples/02.AdaSTEM_learning_curve_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=1 \n",
" n_jobs=1 \n",
" )\n",
"\n",
" ## fit adastem\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/Examples/03.Binding_with_Maxent.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,7 @@
" grid_len_lower_threshold=5,\n",
" temporal_step=50,\n",
" temporal_bin_interval=50,\n",
" points_lower_threshold=100, njobs=1)\n"
" points_lower_threshold=100, n_jobs=1)\n"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/Examples/04.SphereAdaSTEM_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -744,7 +744,7 @@
" points_lower_threshold=50, # Only stixels with more than 50 samples are trained\n",
" Temporal1='DOY',\n",
" use_temporal_to_train=True, # In each stixel, whether 'DOY' should be a predictor\n",
" njobs=1\n",
" n_jobs=1\n",
")"
]
},
Expand Down
6 changes: 3 additions & 3 deletions docs/Examples/05.Hurdle_in_ada_or_ada_in_hurdle.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -671,7 +671,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4 \n",
" n_jobs=4 \n",
")\n",
"\n"
]
Expand Down Expand Up @@ -984,7 +984,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4),\n",
" n_jobs=4),\n",
" regressor=AdaSTEMRegressor(base_model=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),\n",
" save_gridding_plot = True,\n",
" ensemble_fold=10, \n",
Expand All @@ -996,7 +996,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4)\n",
" n_jobs=4)\n",
")\n",
"\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/Examples/06.Base_model_choices.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=1 \n",
" n_jobs=1 \n",
" )\n",
" \n",
" start_t = time.time()\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/Examples/07.Optimizing_stixel_size.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1165,7 +1165,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4 \n",
" n_jobs=4 \n",
")\n",
"\n",
"# Perform gridsearch\n",
Expand Down Expand Up @@ -3402,7 +3402,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4 \n",
" n_jobs=4 \n",
")\n",
"\n",
"from stemflow.model_selection import ST_train_test_split\n",
Expand Down
Loading

0 comments on commit 6c313a3

Please sign in to comment.