
Currently, the only CPU usage alert bundled in is " CPUThrottlingHigh", which calculates number_of_cpu_cycles_pod_gets_throttled / number_of_cpu_cycles_total (not acutal metric names) to give you a percentage of how frequently your pod is getting its CPU throttled.īut wait, what does throttled even mean? Throttled (at least in my mind) means something along the lines of just getting slowed down, but in this case throttled means completely stopped – you cannot use any more CPU until the next CFS period (every 100ms in Kubernetes, which is also the Linux default - more on this later). I already knew of the kubernetes-mixin project, which provides sane default Prometheus alerting rules for monitoring Kubernetes cluster health, so I looked there first to see what rules they are using to monitor CPU. This is what first led me to discover that it's actually far more useful to monitor how much the CPU is being throttled rather than how much it's being used. I sought out what other people are doing to monitor CPU usage of pods in Kubernetes. Like anyone else in IT investigating something they're not sure of, I turned first to Google. This is typically limited to a single pod–the one the scanner randomly gets routed to–but can still be user-visible (and Pagerduty-activating 😅), so we want to get better monitoring on it. Recently, I've been doing some investigation into high CPU utilization occurring during routine security scans of our Wordpress websites causing issues such as slow response, increased errors, and other undesirable outcomes. Menu Demystifying Kubernetes CPU Limits (and Throttling)
