Apache Camel Kafka Consumer: using the producer-only `metadataMaxAgeMs` option to force indefinite retries on failures — is this safe/intended?
15:56 18 Jun 2026

We have a requirement for our Camel Kafka consumer to keep retrying indefinitely when it hits a failure while connecting to the broker, instead of giving up. The only way we were able to get a true "retry forever" effect was by also setting metadataMaxAgeMs on the consumer endpoint — even though the Camel Kafka component docs listed metadataMaxAgeMs strictly under (producer) options, not consumer.

Currently using camel-kafka 4.x and spring-boot 3.x.

Config

We build the consumer URI dynamically and pass these (currently hard-coded, would move to config if this approach is validated):

public String getURI() {
    int reconnectBackoffMaxMs = SpringContextLookupUtil.getSystemProperty(
            "armada.kafka.consumer.auth.retry.max.backoff.ms", Integer.class);
    int reconnectBackoffMs = SpringContextLookupUtil.getSystemProperty(
            "armada.kafka.consumer.auth.retry.initial.backoff.ms", Integer.class);
    int metadataMaxAgeMs = SpringContextLookupUtil.getSystemProperty(
            "armada.kafka.consumer.auth.retry.metadata.max.age.ms", Integer.class);

    String uri = "%s?groupId=%s&autoOffsetReset=%s&autoCommitIntervalMs=%d&consumersCount=%d"
            + "&sessionTimeoutMs=%d&heartbeatIntervalMs=%d&consumerRequestTimeoutMs=%d&maxPollRecords=%d"
            + "&maxPollIntervalMs=%d&reconnectBackoffMs=%d&reconnectBackoffMaxMs=%d&metadataMaxAgeMs=%d";

    String endpointURI = String.format(uri, endpoint,
            ComponentConstants.KAFKA_CONSUMER_GROUP_ID_PREFIX + groupId,
            autoOffsetReset, autoCommitIntervalMs, consumersCount, sessionTimeout,
            heartbeatInterval, requestTimeout, maxPollRecords, maxPollInterval,
            reconnectBackoffMs, reconnectBackoffMaxMs, metadataMaxAgeMs);

    log.debug("Endpoint Uri : {}", endpointURI);
    return endpointURI;
}

We also plugged in a custom PollExceptionStrategy to drive exponential backoff manually on auth failures:

public class KafkaAuthorizationReconnectStrategy implements PollExceptionStrategy {

    @Value("${armada.kafka.consumer.auth.retry.initial.backoff.ms:1000}")
    private long reconnectBackoffMs;

    @Value("${armada.kafka.consumer.auth.retry.max.backoff.ms:30000}")
    private long reconnectBackoffMaxMs;

    @Override
    public void handle(long partitionLastOffset, Exception exception) {
        // compute exponential backoff, cap it at reconnectBackoffMaxMs,
        // log, then Thread.sleep(backoffMs) before letting Camel retry
        ...
    }

    private long computeExponentialBackoff(int attempt) {
        double raw = reconnectBackoffMs * Math.pow(2, attempt - 1);
        return Math.min((long) raw, reconnectBackoffMaxMs);
    }

    @Override
    public boolean canContinue() {
        return true; // always keep retrying
    }
}

Example flow

  1. Connection fails → wait 1s (initial backoff)
  2. Still failing → wait 2s
  3. Still failing → wait 4s
  4. …continues doubling…
  5. Capped at 30s max between retries, repeats indefinitely

This works in our testing, but only once metadataMaxAgeMs=1000 is also set on the consumer URI. Without it, the consumer would eventually stop retrying after the backoff ceiling was hit a few times.

Summary

We're only intending to control reconnectBackoffMs / reconnectBackoffMaxMs behavior for the consumer, but per the component reference table, metadataMaxAgeMs is documented only as a (producer) option — there's no (consumer) entry for it at all. Adding it anyway to the consumer endpoint URI changed our retry behavior in a way that "fixed" our problem, but we don't fully understand why, and we'd rather not ship a workaround that depends on undocumented/unsupported behavior.

Questions

  1. Is metadataMaxAgeMs actually usable/honored on a Camel Kafka consumer endpoint, or is it being silently ignored/dropped by Camel's KafkaConfiguration for consumers (in which case our improvement might be coincidental and happening for a different reason)?
  2. Is there a supported Camel/Kafka consumer property for "retry forever with capped exponential backoff" on connection/authorization failures, rather than rolling our own PollExceptionStrategy? Right now after the backoff ceiling is hit, the consumer effectively keeps retrying every reconnectBackoffMaxMs until the underlying issue is fixed — is that the intended/recommended pattern for TopicAuthorizationException-type failures, or is there a cleaner built-in option (e.g. something equivalent to pollOnError=RECONNECT combined with native reconnect.backoff.ms / reconnect.backoff.max.ms) that doesn't need a custom strategy class at all?
  3. Why does setting metadataMaxAgeMs on the consumer URI appear to change behavior at all, given its documented purpose ("the period of time in milliseconds after which we force a refresh of metadata even if we haven't seen any partition leadership changes, to proactively discover any new brokers or partitions") doesn't obviously relate to authorization-failure reconnect/backoff? Is this an unintended side effect of how Camel parses/forwards endpoint options, or is there a legitimate mechanism by which a metadata refresh interacts with our retry loop?

Any clarification on whether metadataMaxAgeMs is genuinely consumer-applicable in Camel, and whether there's a more "supported" way to achieve indefinite capped-backoff retries on consumer authorization failures, would be appreciated.

apache-camel