This repository was archived by the owner on Jan 18, 2026. It is now read-only.
Closed
Conversation
… related reader modifications.
This reverts commit 2bffcc1.
nrwiersma
suggested changes
Dec 30, 2025
Comment on lines
-382
to
+432
| writer := avro.NewWriter(w, 512, avro.WithWriterConfig(cfg.EncodingConfig)) | ||
| writer := avro.NewWriter(w, 512, avro.WithWriterConfig(avro.DefaultConfig)) |
Comment on lines
-423
to
+473
| writer := avro.NewWriter(w, 512, avro.WithWriterConfig(cfg.EncodingConfig)) | ||
| writer := avro.NewWriter(w, 512, avro.WithWriterConfig(avro.DefaultConfig)) |
Comment on lines
+62
to
+75
| func newDecoderConfig(opts ...DecoderFunc) *decoderConfig { | ||
| cfg := decoderConfig{ | ||
| DecoderConfig: avro.DefaultConfig, | ||
| SchemaCache: avro.DefaultSchemaCache, | ||
| CodecOptions: codecOptions{ | ||
| DeflateCompressionLevel: flate.DefaultCompression, | ||
| }, | ||
| } | ||
| for _, opt := range opts { | ||
| opt(&cfg) | ||
| } | ||
| return &cfg | ||
| } | ||
|
|
Member
There was a problem hiding this comment.
Not sure that this change helped anything.
Contributor
Author
There was a problem hiding this comment.
Avoiding duplicate code.
| need := min(r.tail-r.head, tokenLen-1) | ||
|
|
||
| // Construct boundary window: stash + beginning of new buffer | ||
| boundary := make([]byte, len(stash)+need) |
Member
There was a problem hiding this comment.
This is a known size, allocate once and reuse instead of constantly re-allocating.
| copy(boundary, stash) | ||
| copy(boundary[len(stash):], r.buf[r.head:r.head+need]) | ||
|
|
||
| if idx := bytes.Index(boundary, token); idx >= 0 { |
Member
There was a problem hiding this comment.
In this case, surely the reader has advanced too far, as the start of the token is no longer in the buffer.
Contributor
Author
There was a problem hiding this comment.
This is a case when the token extrapolate the buffer, só a bigger buffer is needed
| data: []byte{0x38, 0x36}, | ||
| }, | ||
| { | ||
|
|
| data: []byte{0x38, 0x36}, | ||
| }, | ||
| { | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal of this PR
Expose Reader Offset and OCF Block Status for Concurrent Decoder
Description
This PR enhances the
ReaderandOCF Decoderby exposing internal state information that is critical for advanced processing scenarios, such as progress tracking, concurrently splitting, and debugging.Key Changes
Reader.InputOffset(): Adds a method to theReaderto retrieve the current input offset. This allows consumers to know exactly where in the underlying stream the reader is currently positioned.OCF Decoder.BlockStatus(): Introduces aBlockStatus()method (and corresponding struct) to the OCF Decoder. This provides a snapshot of the current block being processed, including:Current: The index of the current record within the block.Count: The total number of records in the current block.Size: The size (in bytes) of the current block.Offset: The input offset provided by the underlying reader.Motivation
Currently, the
avropackage abstracts away the underlying stream position and block details. While this is fine for simple reading, it limits users who need to:Use Case Example
A data processing pipeline can now use
BlockStatus()to log precise progress or checkpoint processing at specific block offsets, improving reliability and observability.