Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About dataset generate. #12

Open
dingtian123 opened this issue Feb 23, 2023 · 1 comment
Open

About dataset generate. #12

dingtian123 opened this issue Feb 23, 2023 · 1 comment

Comments

@dingtian123
Copy link

Hello:
I have a question about the method of anomaly injection(scale_anomaly) in generate_dataset.py, why should a relatively large value be taken in row*(1-r) and 0, which will cause the predicted value of some abnormal combination to be 0, so in It will be filtered out when using squeeze.

@shaido987
Copy link
Owner

Hello,

Thank you for your question and sorry for the late reply.

Since we are generating data with synthetic anomalies, before running scale_anomaly both the predict and real values are exactly the same.
scale_anomaly moves the values of either the real or the predict column by reducing them by a factor r.
r is obtained from the anomaly specific (there can be multiple anomalies) sampled severity and deviation of the anomaly, following a normal distribution.

r = rng.normal(loc=severity, scale=deviation)
v = max(row * (1 - r), 0.0)

In this work (as with Squeeze), we only consider KPIs that are additive (or derived KPIs based on additive KPIs). Relevant examples are page views, error counts, traffic volume, etc. These are all strictly positive and can not have negative values.
The max(row * (1 - r), 0.0) is thus to make sure that the real value and the predict value (again, depending on the anomaly direction) are positive.

So for your question: that the predicted value then becomes 0 is a consequence of the severity of the anomaly being large, getting an r>=1.
Imagine for example that the KPI counts error codes and is usually normal (e.g., predict=0). When the actual values are above that we still want to be able to determine the root cause which will fail if elements with predict 0 are removed.

However, running Squeeze, any rows with predict=0 and non-zero actual value will receive a deviation score of -2 and should not automatically be filtered away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants