Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully randomized grids; Lazy-loading model dictionary #69

Merged
merged 31 commits into from
Oct 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
d1b0678
update parameter options for completely random rotation angle of ense…
chenyangkang Oct 24, 2024
8826ea1
pre-commit syntax correction
chenyangkang Oct 24, 2024
93ed961
add more pytest on the completely_random_rotation; #59
chenyangkang Oct 24, 2024
9556f6e
update badge
chenyangkang Oct 24, 2024
c26e686
add lazyloading model dictionary choices; Save ensmebles of models to…
chenyangkang Oct 25, 2024
5f878da
fix tests
chenyangkang Oct 25, 2024
ec8cc91
fix test
chenyangkang Oct 25, 2024
99f0bb4
add test for lazyloading
chenyangkang Oct 26, 2024
26f3271
fix
chenyangkang Oct 26, 2024
da34854
lazy_loading/saving dir name no longer controled by random_state para…
chenyangkang Oct 26, 2024
62cdaae
fix
chenyangkang Oct 26, 2024
cf079e6
update
chenyangkang Oct 26, 2024
55e7dfe
fix
chenyangkang Oct 26, 2024
3ebe936
change njobs to n_jobs, following sklearn way
chenyangkang Oct 26, 2024
690ac08
add test for Hurdle_for_AdaSTEM
chenyangkang Oct 26, 2024
d573274
update tests to cover more
chenyangkang Oct 26, 2024
a7740f1
fix tests
chenyangkang Oct 26, 2024
9cf0b70
fix tests; fix AdaSTEM score method
chenyangkang Oct 27, 2024
045f552
lazy loading pytests
chenyangkang Oct 27, 2024
07bd202
fix
chenyangkang Oct 27, 2024
577637d
fix n_jobs
chenyangkang Oct 27, 2024
d0c4b0e
fix
chenyangkang Oct 27, 2024
d38ff40
fix
chenyangkang Oct 27, 2024
8fdba80
fix
chenyangkang Oct 27, 2024
be81716
update tests
chenyangkang Oct 27, 2024
522d384
update doc1
chenyangkang Oct 27, 2024
7f0c050
update
chenyangkang Oct 27, 2024
853ca39
fix
chenyangkang Oct 27, 2024
4f466eb
fix
chenyangkang Oct 27, 2024
df392d1
update version
chenyangkang Oct 27, 2024
2f6dc8f
update lazyloading docs
chenyangkang Oct 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/API_Documentation/utils/stemflow.utils.lazyloading.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# stemflow.utils.lazyloading

---
::: stemflow.utils.lazyloading
---
5 changes: 3 additions & 2 deletions docs/A_brief_introduction/A_brief_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ In the first case, the classifier and regressor "talk" to each other in each sep
User can define the size of the stixels (spatial temporal grids) in terms of space and time. Larger stixel promotes generalizability but loses precision in fine resolution; Smaller stixel may have better predictability in the exact area but reduced ability of extrapolation for points outside the stixel. See section [Optimizing stixel size](https://chenyangkang.github.io/stemflow/Examples/07.Optimizing_stixel_size.html) for discussion about selecting gridding parameters and [Tips for spatiotemporal indexing](https://chenyangkang.github.io/stemflow/Tips/Tips_for_spatiotemporal_indexing.html).

## A simple demo
In the demo, we first split the training data using temporal sliding windows with a size of 50 day of year (DOY) and step of 20 DOY (`temporal_start = 1`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval=50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 25 units (measured in longitude and latitude, `grid_len_upper_threshold=25`), and stop splitting to prevent the edge length being chunked below 5 units (`grid_len_lower_threshold=5`) or containing less than 50 checklists (`points_lower_threshold=50`). Model fitting is run using 1 core (`njobs=1`).
In the demo, we first split the training data using temporal sliding windows with a size of 50 day of year (DOY) and step of 20 DOY (`temporal_start = 1`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval=50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 25 units (measured in longitude and latitude, `grid_len_upper_threshold=25`), and stop splitting to prevent the edge length being chunked below 5 units (`grid_len_lower_threshold=5`) or containing less than 50 checklists (`points_lower_threshold=50`). Model fitting is run using 1 core (`n_jobs=1`).

This process is executed 10 times (`ensemble_fold = 10`), each time with random jitter and random rotation of the gridding, generating 10 ensembles. In the prediction phase, only spatial-temporal points with more than 7 (`min_ensemble_required = 7`) ensembles usable are predicted (otherwise, set as `np.nan`).

Expand Down Expand Up @@ -68,7 +68,8 @@ model = AdaSTEMRegressor(
Spatio2='latitude', # spatial coordinates shown in the dataframe
Temporal1='DOY',
use_temporal_to_train=True, # In each stixel, whether 'DOY' should be a predictor
njobs=1
n_jobs=1,
random_state=42
)
```

Expand Down
2,349 changes: 1,525 additions & 824 deletions docs/Examples/01.AdaSTEM_demo.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/Examples/02.AdaSTEM_learning_curve_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=1 \n",
" n_jobs=1 \n",
" )\n",
"\n",
" ## fit adastem\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/Examples/03.Binding_with_Maxent.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,7 @@
" grid_len_lower_threshold=5,\n",
" temporal_step=50,\n",
" temporal_bin_interval=50,\n",
" points_lower_threshold=100, njobs=1)\n"
" points_lower_threshold=100, n_jobs=1)\n"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/Examples/04.SphereAdaSTEM_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -744,7 +744,7 @@
" points_lower_threshold=50, # Only stixels with more than 50 samples are trained\n",
" Temporal1='DOY',\n",
" use_temporal_to_train=True, # In each stixel, whether 'DOY' should be a predictor\n",
" njobs=1\n",
" n_jobs=1\n",
")"
]
},
Expand Down
6 changes: 3 additions & 3 deletions docs/Examples/05.Hurdle_in_ada_or_ada_in_hurdle.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -671,7 +671,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4 \n",
" n_jobs=4 \n",
")\n",
"\n"
]
Expand Down Expand Up @@ -984,7 +984,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4),\n",
" n_jobs=4),\n",
" regressor=AdaSTEMRegressor(base_model=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),\n",
" save_gridding_plot = True,\n",
" ensemble_fold=10, \n",
Expand All @@ -996,7 +996,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4)\n",
" n_jobs=4)\n",
")\n",
"\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/Examples/06.Base_model_choices.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=1 \n",
" n_jobs=1 \n",
" )\n",
" \n",
" start_t = time.time()\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/Examples/07.Optimizing_stixel_size.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1165,7 +1165,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4 \n",
" n_jobs=4 \n",
")\n",
"\n",
"# Perform gridsearch\n",
Expand Down Expand Up @@ -3402,7 +3402,7 @@
" Spatio2 = 'latitude', \n",
" Temporal1 = 'DOY',\n",
" use_temporal_to_train=True,\n",
" njobs=4 \n",
" n_jobs=4 \n",
")\n",
"\n",
"from stemflow.model_selection import ST_train_test_split\n",
Expand Down
Loading
Loading