v1.18.0
Improved logging in Queue Processor mode
v1.18.0
introduces the logFormatVersion
Helm chart option, to allow you to opt-in to more detailed logs.
The default value is 1
, which keeps logging the same way it did in prior releases (<= v1.17.3
).
Setting the value to 2
will give you more detail about which AWS event triggered the cordon/drain. Previously, all these events were bucketed under SQS_TERMINATE
and it was difficult to tell what was happening.
This option is also available as a command line flag, --log-format-version
What does the new logging look like?
logFormatVersion=2
modifies several Debug, Info, and Warn logs, as well as Kubernetes events emitted by NTH. These changes improve your observability about what NTH is doing when responding to events via SQS. If your monitoring system is configured to look for any of the specific strings in the tables below, you may need to modify your configuration to use the updated strings if you use the new log format version.
Changes to logs when starting up
- Remove
event_type
field from the Info log when starting a monitor; replace withmonitor_type
field, with new values. See Table 1. - Remove
event_type
field from the Warn log when a monitor fails to start; replace withmonitor_type
field, with new values. See Table 1.
Changes to logs when processing an event
- New
monitor
field in the Info log. See Table 1. - Potentially change value of
kind
field in the Info log, if running Queue Processor mode. See Table 2. - Potentially change the "reason" field in the k8s event if running Queue Processor mode. See Table 3.
Changes to logs when receiving an SQS message
- Include the specific event type instead of
SQS_TERMINATE
in the Debug log if running Queue Processor mode. See Table 2.
Tables of changed values
Table 1: Monitor types
Previous | New |
---|---|
REBALANCE_RECOMMENDATION |
REBALANCE_RECOMMENDATION_MONITOR |
SCHEDULED_EVENT |
SCHEDULED_EVENT_MONITOR |
SPOT_ITN |
SPOT_ITN_MONITOR |
SQS_TERMINATE |
SQS_MONITOR |
Table 2: Event types
Previous | New |
---|---|
REBALANCE_RECOMMENDATION |
REBALANCE_RECOMMENDATION |
SCHEDULED_EVENT |
SCHEDULED_EVENT |
SPOT_ITN |
SPOT_ITN |
SQS_TERMINATE |
REBALANCE_RECOMMENDATION SCHEDULED_EVENT SPOT_ITN STATE_CHANGE ASG_LIFECYCLE |
Table 3: Event reasons
Previous reason | New reason |
---|---|
RebalanceRecommendation |
RebalanceRecommendation |
ScheduledEvent |
ScheduledEvent |
SpotInterruption |
SpotInterruption |
SQSTermination |
RebalanceRecommendation ScheduledEvent SpotInterruption StateChange ASGLifecycle |
Commits with these changes
- feat: emit pod events on drain by @trutx in #703
- chore: add annotations to events in SQS mode by @trutx in #715
- fix: show actual event kinds in Queue mode by @trutx and @cjerad in #725
Other changes
- README: Clarify distinctions between IMDS and QP modes by @snay2 in #695
- Clarify wording about using ASG tags. Fix broken docs link. by @snay2 in #721
- Remove bespoke Prometheus helm chart and use the latest public release instead by @snay2 in #723
- upgrade to Go 1.19 by @cjerad and @snay2 in #726
Full Changelog: v1.17.3...v1.18.0