Solve the problem of multiple consecutive nulls that standard approach (ffill, bfill, near, interpolation, etc) don't work well.
__
- Input values considering other time series as features in a semi supervisioned approach and uses the best approach at each null. __
- As many as you want. The time series are clustered and selected with feature engineering before the semi supervisioned approach. __
- tsklearn (cluster time series), facebook prophet, xgboost and linear regression __
- Yes, fuel pricing data of brazilian gas stations. It is at the ./data folder (rf_base.parquet and outliers_base.parquet). The data is public. __
- For more than 20 consecutive nulls the approach performed better than the standard methods (ffill, bfill, etc). __
- Execute the scripts in the order of its own prefix. The schema below ilustrates the inputs and outputs paths and order of execution in a DAG.
__
Enhance the approach and create a python library, first only with trivial methods selector.