Skip to content
This repository was archived by the owner on Feb 1, 2026. It is now read-only.
This repository was archived by the owner on Feb 1, 2026. It is now read-only.

Data loss from Camus when there is an exception during message decoding (deserialization) #43

@kameshb

Description

I dig into the Camus code and found that, Camus catches the exception (decoder exception) thrown from decoder.decode(payload) (getWrappedRecord() method) and if skipSchemaErrors is false, it re-throws it as an IOException. However, nextKeyValue() method catches this Exception again and handled it by writing that exception and key to mapper context and continuing back to normal execution.

if (exceptionCount < getMaximumDecoderExceptionsToPrint(context)) {
              mapperContext.write(key, new ExceptionWritable(e));
              exceptionCount++;
            } else if (exceptionCount == getMaximumDecoderExceptionsToPrint(context)) {
              exceptionCount = Integer.MAX_VALUE; //Any random value
              log.info("The same exception has occured for more than " + getMaximumDecoderExceptionsToPrint(context)
                  + " records. All further exceptions will not be printed");
            }
}

This is where data loss is happening. In the nextKeyValue() method, Kafka consumer reads a message from Kafka and increments the current offset and while decoding this message, if there is any exception, current code-base silently eating up that exception and moving to next message from Kafka.

Camus job is finally in succeeded state and updating its offsets to offset file in HDFS.

I think, when skipSchemaErrors is false, Camus job should fail if there are any decoder exceptions, otherwise it violates the definition of the property etl.ignore.schema.errors and also causes data loss.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions