Skip to content

Commit

Permalink
feat: incidents (keephq#1388)
Browse files Browse the repository at this point in the history
Signed-off-by: Tal <[email protected]>
Signed-off-by: Matvey Kukuy <[email protected]>
Signed-off-by: Vladimir Filonov <[email protected]>
Co-authored-by: GlebBerjoskin <[email protected]>
Co-authored-by: Shahar Glazner <[email protected]>
Co-authored-by: Vladimir Filonov <[email protected]>
Co-authored-by: Matvey Kukuy <[email protected]>
Co-authored-by: Matvey Kukuy <[email protected]>
  • Loading branch information
6 people authored Jul 22, 2024
1 parent ce96e1f commit 2ed30fd
Show file tree
Hide file tree
Showing 59 changed files with 3,625 additions and 224 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/test-pr-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ on:
- 'keep-ui/**'
- 'tests/**'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true

env:
PYTHON_VERSION: 3.11
STORAGE_MANAGER_DIRECTORY: /tmp/storage-manager
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/test-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ on:
pull_request:
paths:
- 'keep/**'
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
# MySQL server and Elasticsearch for testing
env:
PYTHON_VERSION: 3.11
Expand Down
7 changes: 5 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
MIT License

Copyright (c) 2024 Keep

Portions of this software are licensed as follows:

* All content that resides under the "ee/" directory of this repository, if that directory exists, is licensed under the license defined in "ee/LICENSE".
* Content outside of the above mentioned directories or restrictions above is available under the "MIT" license as defined below.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile.dev.api
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ ENV PATH="/venv/bin:${PATH}"
ENV VIRTUAL_ENV="/venv"


ENTRYPOINT ["gunicorn", "keep.api.api:get_app", "--bind" , "0.0.0.0:8080" , "--workers", "1" , "-k" , "uvicorn.workers.UvicornWorker", "-c", "./keep/api/config.py", "--reload"]
CMD ["gunicorn", "keep.api.api:get_app", "--bind" , "0.0.0.0:8080" , "--workers", "1" , "-k" , "uvicorn.workers.UvicornWorker", "-c", "./keep/api/config.py", "--reload"]
35 changes: 35 additions & 0 deletions ee/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
The Keep Enterprise Edition (EE) license (the Enterprise License)
Copyright (c) 2024-present Keep Alerting LTD

With regard to the Keep Software:

This software and associated documentation files (the "Software") may only be
used in production, if you (and any entity that you represent) have agreed to,
and are in compliance with, the Keep Subscription Terms of Service, available
(if not available, it's impossible to comply)
at https://www.keephq.dev/terms-of-service (the "The Enterprise Terms”), or other
agreement governing the use of the Software, as agreed by you and Keep,
and otherwise have a valid Keep Enterprise Edition subscription for the
correct number of user seats. Subject to the foregoing sentence, you are free to
modify this Software and publish patches to the Software. You agree that Keep
and/or its licensors (as applicable) retain all right, title and interest in and
to all such modifications and/or patches, and all such modifications and/or
patches may only be used, copied, modified, displayed, distributed, or otherwise
exploited with a valid Keep Enterprise Edition subscription for the correct
number of user seats. You agree that Keep and/or its licensors (as applicable) retain
all right, title and interest in and to all such modifications. You are not
granted any other rights beyond what is expressly stated herein. Subject to the
foregoing, it is forbidden to copy, merge, publish, distribute, sublicense,
and/or sell the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

For all third party components incorporated into the Keep Software, those
components are licensed under the original license provided by the owner of the
applicable component.
Empty file added ee/experimental/__init__.py
Empty file.
148 changes: 148 additions & 0 deletions ee/experimental/incident_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
import numpy as np
import pandas as pd
import networkx as nx

from typing import List

from keep.api.models.db.alert import Alert


def mine_incidents(alerts: List[Alert], incident_sliding_window_size: int=6*24*60*60, statistic_sliding_window_size: int=60*60,
jaccard_threshold: float=0.0, fingerprint_threshold: int=1):
"""
Mine incidents from alerts.
"""

alert_dict = {
'fingerprint': [alert.fingerprint for alert in alerts],
'timestamp': [alert.timestamp for alert in alerts],
}
alert_df = pd.DataFrame(alert_dict)
mined_incidents = shape_incidents(alert_df, 'fingerprint', incident_sliding_window_size, statistic_sliding_window_size,
jaccard_threshold, fingerprint_threshold)

return [
{
"incident_fingerprint": incident['incident_fingerprint'],
"alerts": [alert for alert in alerts if alert.fingerprint in incident['alert_fingerprints']],
}
for incident in mined_incidents
]


def get_batched_alert_counts(alerts: pd.DataFrame, unique_alert_identifier: str, sliding_window_size: int) -> np.ndarray:
"""
Get the number of alerts in a sliding window.
"""

resampled_alert_counts = alerts.set_index('timestamp').resample(
f'{sliding_window_size//2}s')[unique_alert_identifier].value_counts().unstack(fill_value=0)
rolling_counts = resampled_alert_counts.rolling(
window=f'{sliding_window_size}s', min_periods=1).sum()
alert_counts = rolling_counts.to_numpy()

return alert_counts


def get_batched_alert_occurrences(alerts: pd.DataFrame, unique_alert_identifier: str, sliding_window_size: int) -> np.ndarray:
"""
Get the occurrence of alerts in a sliding window.
"""

alert_counts = get_batched_alert_counts(
alerts, unique_alert_identifier, sliding_window_size)
alert_occurences = np.where(alert_counts > 0, 1, 0)

return alert_occurences


def get_jaccard_scores(P_a: np.ndarray, P_aa: np.ndarray) -> np.ndarray:
"""
Calculate the Jaccard similarity scores between alerts.
"""

P_a_matrix = P_a[:, None] + P_a
union_matrix = P_a_matrix - P_aa

with np.errstate(divide='ignore', invalid='ignore'):
jaccard_matrix = np.where(union_matrix != 0, P_aa / union_matrix, 0)

np.fill_diagonal(jaccard_matrix, 1)

return jaccard_matrix


def get_alert_jaccard_matrix(alerts: pd.DataFrame, unique_alert_identifier: str, sliding_window_size: int) -> np.ndarray:
"""
Calculate the Jaccard similarity scores between alerts.
"""

alert_occurrences = get_batched_alert_occurrences(
alerts, unique_alert_identifier, sliding_window_size)
alert_probabilities = np.mean(alert_occurrences, axis=0)
joint_alert_occurrences = np.dot(alert_occurrences.T, alert_occurrences)
pairwise_alert_probabilities = joint_alert_occurrences / \
alert_occurrences.shape[0]

return get_jaccard_scores(alert_probabilities, pairwise_alert_probabilities)


def build_graph_from_occurrence(occurrence_row: pd.DataFrame, jaccard_matrix: np.ndarray, unique_alert_identifiers: List[str],
jaccard_threshold: float = 0.05) -> nx.Graph:
"""
Build a weighted graph using alert occurrence matrix and Jaccard coefficients.
"""

present_indices = np.where(occurrence_row > 0)[0]

G = nx.Graph()

for idx in present_indices:
alert_desc = unique_alert_identifiers[idx]
G.add_node(alert_desc)

for i in present_indices:
for j in present_indices:
if i != j and jaccard_matrix[i, j] >= jaccard_threshold:
alert_i = unique_alert_identifiers[i]
alert_j = unique_alert_identifiers[j]
G.add_edge(alert_i, alert_j, weight=jaccard_matrix[i, j])

return G

def shape_incidents(alerts: pd.DataFrame, unique_alert_identifier: str, incident_sliding_window_size: int, statistic_sliding_window_size: int,
jaccard_threshold: float = 0.2, fingerprint_threshold: int = 5) -> List[dict]:
"""
Shape incidents from alerts.
"""

incidents = []
incident_number = 0

resampled_alert_counts = alerts.set_index('timestamp').resample(
f'{incident_sliding_window_size//2}s')[unique_alert_identifier].value_counts().unstack(fill_value=0)
jaccard_matrix = get_alert_jaccard_matrix(
alerts, unique_alert_identifier, statistic_sliding_window_size)

for idx in range(resampled_alert_counts.shape[0]):
graph = build_graph_from_occurrence(
resampled_alert_counts.iloc[idx], jaccard_matrix, resampled_alert_counts.columns, jaccard_threshold=jaccard_threshold)
max_component = max(nx.connected_components(graph), key=len)

min_starts_at = resampled_alert_counts.index[idx]
max_starts_at = min_starts_at + \
pd.Timedelta(seconds=incident_sliding_window_size)

local_alerts = alerts[(alerts['timestamp'] >= min_starts_at) & (
alerts['timestamp'] <= max_starts_at)]
local_alerts = local_alerts[local_alerts[unique_alert_identifier].isin(
max_component)]

if len(max_component) > fingerprint_threshold:

incidents.append({
'incident_fingerprint': f'Incident #{incident_number}',
'alert_fingerprints': local_alerts[unique_alert_identifier].unique().tolist(),
})

return incidents
149 changes: 149 additions & 0 deletions keep-ui/app/ai/ai.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
"use client";
import { Card, List, ListItem, Title, Subtitle } from "@tremor/react";
import { useAIStats } from "utils/hooks/useAIStats";
import { useSession } from "next-auth/react";
import { getApiURL } from "utils/apiUrl";
import { toast } from "react-toastify";
import { useEffect, useState, useRef, FormEvent } from "react";

export default function Ai() {
const { data: aistats, isLoading } = useAIStats();
const { data: session } = useSession();
const [text, setText] = useState("");
const [newText, setNewText] = useState("Mine incidents");
const [animate, setAnimate] = useState(false);
const onlyOnce = useRef(false);

useEffect(() => {
let index = 0;

const interval = setInterval(() => {
setText(newText.slice(0, index + 1));
index++;

if (index === newText.length) {
clearInterval(interval);
}
}, 100);

return () => {
clearInterval(interval);
};
}, [newText]);

const mineIncidents = async (e: FormEvent) => {
e.preventDefault();
setAnimate(true);
setNewText("Mining 🚀🚀🚀 ...");
const apiUrl = getApiURL();
const response = await fetch(`${apiUrl}/incidents/mine`, {
method: "POST",
headers: {
Authorization: `Bearer ${session?.accessToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
}),
});
if (!response.ok) {
toast.error(
"Failed to mine incidents, please contact us if this issue persists."
);
}
setAnimate(false);
setNewText("Mine incidents");
};

return (
<main className="p-4 md:p-10 mx-auto max-w-full">
<div className="flex justify-between items-center">
<div>
<Title>AI Correlation</Title>
<Subtitle>
Correlating alerts to incidents based on past alerts, incidents, and
the other data.
</Subtitle>
</div>
</div>
<Card className="mt-10 p-4 md:p-10 mx-auto">
<div>
<div className="prose-2xl">👋 You are almost there!</div>
AI Correlation is coming soon. Make sure you have enough data collected to prepare.
<div className="max-w-md mt-10 flex justify-items-start justify-start">
<List>
<ListItem>
<span>
Connect an incident source to dump incidents, or create 10
incidents manually
</span>
<span>
{aistats?.incidents_count &&
aistats?.incidents_count >= 10 ? (
<div></div>
) : (
<div></div>
)}
</span>
</ListItem>
<ListItem>
<span>Collect 100 alerts</span>
<span>
{aistats?.alerts_count && aistats?.alerts_count >= 100 ? (
<div></div>
) : (
<div></div>
)}
</span>
</ListItem>
<ListItem>
<span>Collect alerts for more than 3 days</span>
<span>
{aistats?.first_alert_datetime && new Date(aistats.first_alert_datetime) < new Date(Date.now() - 3 * 24 * 60 * 60 * 1000) ? (
<div></div>
) : (
<div></div>
)}
</span>
</ListItem>
</List>
</div>
{(aistats?.is_mining_enabled && <button
className={
(animate && "animate-pulse") +
" w-full text-white mt-10 pt-2 pb-2 pr-2 rounded-xl transition-all duration-500 bg-gradient-to-tl from-amber-800 via-amber-600 to-amber-400 bg-size-200 bg-pos-0 hover:bg-pos-100"
}
onClick={ mineIncidents }
><div className="flex flex-row p-2">
<div className="p-2">
{animate && <svg
className="animate-spin h-6 w-6 text-white"
xmlns="http://www.w3.org/2000/svg"
fill="none"
viewBox="0 0 24 24"
>
<circle
className="opacity-25"
cx="12"
cy="12"
r="10"
stroke="currentColor"
stroke-width="4"
></circle>
<path
className="opacity-75"
fill="currentColor"
d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"
></path>
</svg>}
{!animate && <svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" strokeWidth={1.5} stroke="currentColor" className="w-6 h-6">
<path strokeLinecap="round" strokeLinejoin="round" d="M4.26 10.147a60.438 60.438 0 0 0-.491 6.347A48.62 48.62 0 0 1 12 20.904a48.62 48.62 0 0 1 8.232-4.41 60.46 60.46 0 0 0-.491-6.347m-15.482 0a50.636 50.636 0 0 0-2.658-.813A59.906 59.906 0 0 1 12 3.493a59.903 59.903 0 0 1 10.399 5.84c-.896.248-1.783.52-2.658.814m-15.482 0A50.717 50.717 0 0 1 12 13.489a50.702 50.702 0 0 1 7.74-3.342M6.75 15a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5Zm0 0v-3.675A55.378 55.378 0 0 1 12 8.443m-7.007 11.55A5.981 5.981 0 0 0 6.75 15.75v-1.5" />
</svg>}
</div>
<div className="pt-2">{text}</div>
</div>
</button>)}
</div>
</Card>
</main>
);
}
6 changes: 6 additions & 0 deletions keep-ui/app/ai/model.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
export interface AIStats {
alerts_count: number;
incidents_count: number;
first_alert_datetime?: Date;
is_mining_enabled: boolean;
}
11 changes: 11 additions & 0 deletions keep-ui/app/ai/page.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import AI from "./ai";

export default function Page() {
return <AI />;
}

export const metadata = {
title: "Keep - AI Correlation",
description:
"Correlate Alerts and Incidents with AI to identify patterns and trends.",
};
Loading

0 comments on commit 2ed30fd

Please sign in to comment.