Micrometer Counter backed by DB aggregate causes Prometheus rate() to exceed 100% and spikes on app restart

03:49 14 Mar 2026

We have a Jakarta app that exposes Prometheus metrics for OIDC session billing. The success rate metric occasionally spikes above 100% (observed 133%, 300%) in production.

The metric:

(
  sum by(stage)(rate(oidc_session_success_total{stage="PROD"}[15m]))
) / sum by(stage)(rate(oidc_session_total{stage="PROD"}[15m])) * 100

Root cause

The counters are not event-driven. Instead, on every Prometheus scrape, the app queries the DB for aggregate totals and tries to sync the in-memory counter by computing a delta:

final var oidcSessionSuccessTotal = Counter.builder("oidc_session_success_total")
    .tags(tags)
    .register(registry);

// Called on every scrape
oidcSessionSuccessTotal.increment(getTotalSuccess() - oidcSessionSuccessTotal.count());

Switching to Gauges could fix the >100% bug but removes the ability to use rate() and increase() on session counts. We lost visibility into sessions-per-second throughput.

Question: What is the correct pattern for metrics that are derived from DB aggregates — where the DB is the source of truth — but where you also need rate/throughput visibility in Prometheus?

counter metrics micrometer gauge

Your Answer

Privacy & Cookie Consent