PCA (Prometheus Certified Associate) sample exam question with answer 277

Question:
You are instrumenting an HTTP API with Prometheus metrics. You want to be able to get the percentage of requests that have resulted in a 401 or 403 status code over the last 5 minutes.
How should you accomplish this?

  1. Define a counter metric http_requests_total labeled by status_code that tracks the number of requests received for all status codes. Obtain the 5-minute percentage in PromQL as sum(increase(http_requests_total{status_code=~"40[31]"}[5m])) / sum(increase(http_requests_total[5m]))
  2. Keep track of the 5-minute percentage of 401 and 403 requests within the application. Expose a gauge metric called http_401_403_request_percent . Query directly via PromQL
  3. Define a gauge metric http_requests labeled by status_code that tracks the number of requests received for all status codes. Obtain the 5-minute percentage in PromQL as sum(increase(http_requests{status_code=~"401|403"}[5m])) / sum(increase(http_requests[5m]))
  4. Keep track of the 5-minute percentage of 401 and 403 requests separately within the application. Expose two gauge metrics, one called http_401_request_percent and the other called http_403_request_percent. Calculate the average in PromQL as (http_401_request_percent + http_403_request_percent) / 2
Answer:
A - is the correct answer
B - is incorrect as it is a Prometheus naming best practice to use labels to specify different characteristics of the thing that is being measured (such as status code). Additionally, it is also a best practice to expose metrics in there most "raw" form possible (a count of the number requests received with each status code in this case) and use PromQL to perform any required calculations (calculate the percentage in this case)
C - is incorrect as it is a Prometheus metric naming best practice to name accumulating count metrics with the suffix total
D - is incorrect as it is a Prometheus naming best practice to use labels to specify different characteristics of the thing that is being measured (such as status code). Additionally, this method would almost certainly lead to incorrect results as it is likely that there would be an uneven distribution of 401 and 403 requests. In this case, a simple average of the percentages will not yield the correct result