Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMPORTANT: the hash of SWaT_Dataset_Normal_v1.csv has changed from the expected hash, meaning process_SWaT.py will no longer function correctly. #4

Open
avdalovic opened this issue May 2, 2024 · 2 comments

Comments

@avdalovic
Copy link

Greetings,

I am experiencing a hash mismatch issue with both the normal and attack SWAT data CSV files after downloading and saving them on Ubuntu Linux. I am unsure if the mismatch is due to changes in the data itself or potentially from the way the files are being saved as CSVs. There is also a possibility that the inclusion or exclusion of headers in the CSV files could be affecting the hash values.

Should the CSV headers be included or excluded when saving the files to ensure the hashes match?
Is a specific method or tool recommended for saving the files as CSVs on Ubuntu to prevent hash mismatches?

The described procedure works with WADI data, so I believe there is some issue in transforming SWAT data from xlsx to csv.

I would appreciate guidance on how to correctly handle the file saving to ensure the hashes match. Thank you for your assistance!

@clementfung
Copy link
Member

Hello! Thank you for noticing this. process_SWaT.py expects that the headers are included, and should be able to handle small things like spaces in column names. But you are right that these discrepencies are affecting the hash matching, and throwing errors when the data is actually the same.

I am also using Ubuntu. If you just downloaded the xlsx, opened it with LibreOffice, and saved it as a CSV such that the first row is the column names, the format should be correct.

I will update the hash script to be more robust and to only check the data values after processing, and I'll close this issue once that update is made. But for now, you should be fine to proceed even if the hashes don't match.

@abudeeb
Copy link

abudeeb commented Nov 29, 2024

I had the same issue however I use windows. Make sure NOT to use the UTF-8 encoding when saving. Once I saved as a non encoded CSV it worked just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants