Pump Chunk Marking #2

fleerdayo · 2021-01-15T12:41:00Z

For each resampled (chunked) pump csv file, did you only mark 1 chunk as True?

E.g. if there is a pump at 2019-03-1 17.00 and I chunked my csv data into 5 second chunks (and only taking into consideration the pump day and 1 day before and after), I only marked the chunk from 17.00.00 to 17.00.05 as True.
This leaves me with an extremely imbalanced dataset so that a RandomForrestClassifier ends up predicting every chunk as False.

What am I missing here?

------- Offtopic -----------
Also thank you guys for your effort to collect all the data. I enjoyed reading your paper too and got lots of useful information out of it. It's a welcome distraction to fiddle around with your data during all the restrictions :)

The text was updated successfully, but these errors were encountered:

RaibekTussupbekov · 2021-05-03T12:24:56Z

Hello, @fleerdayo :)

I've been trying to reverse engineer the paper model for the last two weeks:)

I've been able to achieve 77.907 % recall but ridiculously low 0.185 % precision:(

I use imblearn.ensemble.BalancedRandomForestClassifier to undersample the data.

I tried to cut off 30 minutes after each pump chunk because the paper says that "...Once a pump is detected we pause our classifier for 30 minutes to avoid multiple alerts for the same event..."

However it does not help.

So I believe that the data should be filtered before training. The paper says that the authors picked only 104 samples out of 175.

Maybe this is the main reason of so many false positives?

Let me know if you're still interested. We could collaborate:) I see that the authors are not responding here:) Maybe they are too busy...or too rich:) Just kidding:)

Btw I'm ready to share my code and collaborate with whoever is interested including the authors:)

RazcoDev · 2021-11-25T18:40:01Z

Hey @RaibekTussupbekov , did you mange to make this work ? I also encounter many issues with the dataset.
Thanks !

RaibekTussupbekov · 2021-12-01T07:47:10Z

@RazcoDev Not yet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pump Chunk Marking #2

Pump Chunk Marking #2

fleerdayo commented Jan 15, 2021

RaibekTussupbekov commented May 3, 2021

RazcoDev commented Nov 25, 2021

RaibekTussupbekov commented Dec 1, 2021

Pump Chunk Marking #2

Pump Chunk Marking #2

Comments

fleerdayo commented Jan 15, 2021

------- Offtopic ----------- Also thank you guys for your effort to collect all the data. I enjoyed reading your paper too and got lots of useful information out of it. It's a welcome distraction to fiddle around with your data during all the restrictions :)

RaibekTussupbekov commented May 3, 2021

RazcoDev commented Nov 25, 2021

RaibekTussupbekov commented Dec 1, 2021

------- Offtopic -----------
Also thank you guys for your effort to collect all the data. I enjoyed reading your paper too and got lots of useful information out of it. It's a welcome distraction to fiddle around with your data during all the restrictions :)