Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Arch-BOM alerts/monitors #662

Closed
27 tasks done
dianakhuang opened this issue May 28, 2024 · 2 comments
Closed
27 tasks done

Migrate Arch-BOM alerts/monitors #662

dianakhuang opened this issue May 28, 2024 · 2 comments
Assignees

Comments

@dianakhuang
Copy link
Member

dianakhuang commented May 28, 2024

Acceptance Criteria:

  • go through the spreadsheet
  • migrate the alerts and alert policies to DataDog
  • update runbooks to point to DataDog

Policies:

  • prod-edx-edxapp-lms-arch-bom - Mostly done as part of [Alerts] Create Datadog alert for edx-platform edge to learn about alerting #628 but will be converted to Terraform; runbook needs update @timmc-edx
    • create monitors (lms)
    • create monitors (lms-workers)
    • update runbook
    • review monitors and runbook
  • prod-edge-edxapp-lms-arch-bom - ditto @timmc-edx
    • create monitors (lms)
    • create monitors (lms-workers)
    • update runbook
    • review monitors and runbook
  • prod-edxapp-lms-arch-bom-backend-safety-net @dianakhuang
    • create monitors (in-progress)
    • update runbook
    • review monitors and runbook
  • platform-arch-bom-event-bus-safety-net @robrap
  • platform-arch-bom-ownership
    • create monitors
    • update runbook
    • review monitors and runbook
  • platform-arch-bom-code-owner-issues - @timmc-edx
    • create monitors
    • update runbook
    • review monitors and runbook

After runbooks have been migrated:

@dianakhuang dianakhuang converted this from a draft issue May 28, 2024
@dianakhuang dianakhuang moved this to Prioritized in Arch-BOM May 28, 2024
@dianakhuang dianakhuang moved this from Prioritized to Groomed in Arch-BOM May 28, 2024
@robrap
Copy link
Contributor

robrap commented Jun 4, 2024

For event bus monitoring, migrate the following:

  • Consumer error logs across multiple topics [P3]
  • Producer error logs across multiple topics [P3]

Move the following monitors to the Kafka ticket:

  • No event consumed in past day on any topic [P3]
  • No event produced in past day on any topic [P3]
  • TransactionErrors in consuming across topics [P3]

@dianakhuang dianakhuang self-assigned this Jun 4, 2024
@dianakhuang dianakhuang moved this from Groomed to In Progress in Arch-BOM Jun 4, 2024
@timmc-edx
Copy link
Member

Still to do:

  • Migrate code-owner-issues alerts
  • Perform a review on monitors and runbooks

@timmc-edx timmc-edx moved this from In Progress to Groomed in Arch-BOM Jun 26, 2024
@timmc-edx timmc-edx moved this from Groomed to In Progress in Arch-BOM Jun 26, 2024
@dianakhuang dianakhuang moved this from In Progress to In Review in Arch-BOM Jul 3, 2024
@github-project-automation github-project-automation bot moved this from In Review to Done in Arch-BOM Jul 8, 2024
@jristau1984 jristau1984 moved this from Done to Done - Long Term Storage in Arch-BOM Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done - Long Term Storage
Development

No branches or pull requests

3 participants