Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the question about the run file #14

Open
1812030208 opened this issue Jul 7, 2023 · 8 comments
Open

the question about the run file #14

1812030208 opened this issue Jul 7, 2023 · 8 comments

Comments

@1812030208
Copy link

Hello! Thank you for your recently uploaded GAIA dataset! I would like to ask if each line in the csv file in the run file corresponds to an injected exception?
Because I see some lines of log information such as "upload business logs on 2021-07-31 successfully", is this also an exception? If so, what type of exception? Looking forward to your reply, thank you very much!

@Xander-cloudwise
Copy link

Thank you for your interest in the GAIA-dataset. In a run file, we provided two kinds of messages, resp. at WARNING level and INFO level. The message at the INFO level records the routine information from different data sources. The message at the WARNING level records the unexpected actions in the system, including unexpected user behaviors and resource-consumption anomalies.

@1812030208
Copy link
Author

Thank you for your interest in the GAIA-dataset. In a run file, we provided two kinds of messages, resp. at WARNING level and INFO level. The message at the INFO level records the routine information from different data sources. The message at the WARNING level records the unexpected actions in the system, including unexpected user behaviors and resource-consumption anomalies.

Thank you for your reply! So the WARNING level message is abnormal, and the INFO level message is normal, right? When I tag business and metric, I only need to tag them according to the WARNING level message in the run file, right? Looking forward to your reply, thank you very much!

@Xander-cloudwise
Copy link

Yes, your understanding is correct.

@1812030208
Copy link
Author

Yes, your understanding is correct.

Dear official, hello! The run file provides information for fault injection, but the duration of each fault is not separately marked in these information. Do we need to further extract the duration of the fault according to the information in the "message" column? For example, does the following sentence mean that the failure duration is 11 seconds? As follows: "2021-07-12 18:57:42,805 | WARNING | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1c689ba98 | wait for 11 seconds for 2021-07-12 18:57:42,805 | warning | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1C689ba98 | wait for 11 seconds for follow-up operations to simulate the login failure of the QR code expired".
Looking forward to your reply, thank you very much!

@Xander-cloudwise
Copy link

Yes, your understanding is correct.

Dear official, hello! The run file provides information for fault injection, but the duration of each fault is not separately marked in these information. Do we need to further extract the duration of the fault according to the information in the "message" column? For example, does the following sentence mean that the failure duration is 11 seconds? As follows: "2021-07-12 18:57:42,805 | WARNING | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1c689ba98 | wait for 11 seconds for 2021-07-12 18:57:42,805 | warning | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1C689ba98 | wait for 11 seconds for follow-up operations to simulate the login failure of the QR code expired". Looking forward to your reply, thank you very much!

For resource-consumption anomalies, the duration is marked in the "message" column. Usually an anomaly lasts 600 seconds.

However, the message "wait for 11 seconds" is different and needs further explanation. MicroSS supports the user login procedure of a website. When the login procedure starts, a QR code is created and shown on the screen, and will expire after 10 seconds. Mobservice simulates the user behavior of scanning the QR code to login. Sometimes, a user may not scan the QR code in time so that logging in will fail. To simulate this scenario, mobservice will wait 11 seconds for QR code expiration, and then "scan" the expired QR code, leading to a login failure.

Messages as "2021-07-12 18:57:42,805 | WARNING | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1c689ba98 | wait for 11 seconds for 2021-07-12 18:57:42,805 | warning | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1C689ba98 | wait for 11 seconds for follow-up operations to simulate the login failure of the QR code expired" record the above login failure information.

@1812030208
Copy link
Author

Yes, your understanding is correct.

Dear official, hello! The run file provides information for fault injection, but the duration of each fault is not separately marked in these information. Do we need to further extract the duration of the fault according to the information in the "message" column? For example, does the following sentence mean that the failure duration is 11 seconds? As follows: "2021-07-12 18:57:42,805 | WARNING | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1c689ba98 | wait for 11 seconds for 2021-07-12 18:57:42,805 | warning | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1C689ba98 | wait for 11 seconds for follow-up operations to simulate the login failure of the QR code expired". Looking forward to your reply, thank you very much!

For resource-consumption anomalies, the duration is marked in the "message" column. Usually an anomaly lasts 600 seconds.

However, the message "wait for 11 seconds" is different and needs further explanation. MicroSS supports the user login procedure of a website. When the login procedure starts, a QR code is created and shown on the screen, and will expire after 10 seconds. Mobservice simulates the user behavior of scanning the QR code to login. Sometimes, a user may not scan the QR code in time so that logging in will fail. To simulate this scenario, mobservice will wait 11 seconds for QR code expiration, and then "scan" the expired QR code, leading to a login failure.

Messages as "2021-07-12 18:57:42,805 | WARNING | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1c689ba98 | wait for 11 seconds for 2021-07-12 18:57:42,805 | warning | 0.0.0.1 | 172.17.0.5 | mobservice1 | e37a99d1C689ba98 | wait for 11 seconds for follow-up operations to simulate the login failure of the QR code expired" record the above login failure information.

Dear official, hello! After aligning the log, kpi and trace corresponding to the same timestamp, can we judge whether the entire system (label) is abnormal according to whether the log corresponding to this timestamp is abnormal without looking at the run file? Looking forward to your reply!

@wangsandlmu
Copy link

Dear official, I also have the same question. When we use logs, metrics, and trace for anomaly detection, if the log at that time is "error", can I disregard the "run" file and directly determine the label corresponding to the three data at that moment as an anomaly?

@Xander-cloudwise
Copy link

It depends because a problem in a single trace or a single business transaction may not reflect the overall issue of the system, and a temporal fluctuation on a kpi time series also may not indicate system instability. The records in the run file are the anomalous actions we injected into the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants