[Prometheus] Remote - Write: Enforce Maxdecoding Length check #48218

gizas · 2025-12-23T12:39:35Z

Bug

Proposed commit message

WHAT: Enforces configurable size limits on incoming requests
WHY: #Closes https://github.com/elastic/security/issues/7765

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
I have added an entry in ./changelog/fragments using the changelog tool.

How to test this PR locally

Follow the steps of doc to deploy elastic stack and metricbeat locally

Specifically add the following configuration for remote-write prometheus module:

Prometheus Module Config

apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-daemonset-modules
  namespace: kube-system
  labels:
    k8s-app: metricbeat
data:
  prometheus.yml: |-
    - module: prometheus
      metricsets:
        - remote_write
      host: "0.0.0.0"
      port: 9201
      period: 10s

Install prometheus in your kind cluster

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-server prometheus-community/prometheus --namespace kube-system

Install the following service in order to be able your prometheus server to send remote-write metrics to metricbeat

Metricbeat Service

---
apiVersion: v1
kind: Service
metadata:
  name: metricbeat-remote-write
  namespace: kube-system
  labels:
    k8s-app: metricbeat
spec:
  ports:
    - port: 9201
      protocol: TCP
      targetPort: 9201
  selector:
    k8s-app: metricbeat
  sessionAffinity: None
  type: ClusterIP

Configure prometheus server with remote write

Metricbeat Service

helm upgrade --install prometheus-server prometheus-community/prometheus -f values.yml

$ cat values.yml
server:
remoteWrite:
  - url: http://metricbeat-remote-write.kube-system:9201/write

Save following to the trigger-pod.yaml. Install following pod to be able to create snappy requests and test the new configs

kubeclt apply -f trigger-pod.yaml

  ---
apiVersion: v1
kind: ConfigMap
metadata:
  name: trigger-script
  namespace: kube-system
data:
  trigger_metricbeat_remote_write.py: |
    #!/usr/bin/env python3
    """
    Send crafted Snappy payloads to test Metricbeat remote_write size limits.
    
    This script can send a tiny payload that claims a huge decoded size,
    useful for testing the DecodedLen security check.
    """

    from __future__ import annotations

    import argparse
    import http.client
    import socket
    import sys

    DEFAULT_DECODED_SIZE = 500 * 1024 * 1024  # 500 MB


    def encode_uvarint(value: int) -> bytes:
        """Encode an integer as a varint (used in snappy header)."""
        out = bytearray()
        while value >= 0x80:
            out.append((value & 0x7F) | 0x80)
            value >>= 7
        out.append(value)
        return bytes(out)


    def build_minimal_payload(claimed_decoded_size: int) -> bytes:
        """
        Build a minimal snappy payload that claims a large decoded size.
        
        The payload is just the varint header + a few bytes of padding.
        This is useful for testing the DecodedLen check without sending
        megabytes of data.
        """
        header = encode_uvarint(claimed_decoded_size)
        # Add minimal padding (this will cause snappy.Decode to fail, 
        # but we're testing the DecodedLen check which happens first)
        return header + b'\x00\x00\x00\x00'


    def build_valid_payload(decoded_size: int) -> bytes:
        """
        Build a valid snappy payload of the specified decoded size.
        
        This creates actual valid snappy-compressed data.
        """
        TAG_LITERAL = 0x00
        TAG_COPY2 = 0x02
        
        frame = bytearray()
        frame.extend(encode_uvarint(decoded_size))

        if decoded_size < 4:
            # Small literal
            n = decoded_size - 1
            frame.append((n << 2) | TAG_LITERAL)
            frame.extend(b"\x00" * decoded_size)
            return bytes(frame)

        # Emit a small literal followed by copy operations
        literal_len = ((decoded_size - 4) % 64) + 4
        n = literal_len - 1
        if n < 60:
            frame.append((n << 2) | TAG_LITERAL)
        else:
            frame.extend(((60 << 2) | TAG_LITERAL, n))
        frame.extend(b"\x00" * literal_len)

        remaining = decoded_size - literal_len
        if remaining > 0:
            copy_chunk = bytes([((64 - 1) << 2) | TAG_COPY2, 0x01, 0x00])
            repetitions = remaining // 64
            frame.extend(copy_chunk * repetitions)
        
        return bytes(frame)


    def main() -> int:
        parser = argparse.ArgumentParser(
            description=__doc__,
            formatter_class=argparse.RawDescriptionHelpFormatter
        )
        parser.add_argument("--host", default="metricbeat-remote-write.kube-system",
                            help="Target host (default: metricbeat-remote-write.kube-system)")
        parser.add_argument("--port", type=int, default=9201,
                            help="Target port (default: 9201)")
        parser.add_argument("--decoded-len", type=int, default=DEFAULT_DECODED_SIZE,
                            help="Decoded size to claim in bytes (default: 500MB)")
        parser.add_argument("--path", default="/",
                            help="HTTP path (default: /)")
        parser.add_argument("--minimal", action="store_true",
                            help="Send minimal payload (just header) to test DecodedLen check")
        parser.add_argument("--valid", action="store_true",
                            help="Send valid snappy payload (larger, tests actual decompression)")
        args = parser.parse_args()

        if args.minimal:
            payload = build_minimal_payload(args.decoded_len)
            mode = "minimal (header only)"
        elif args.valid:
            payload = build_valid_payload(args.decoded_len)
            mode = "valid snappy"
        else:
            # Default to minimal for testing
            payload = build_minimal_payload(args.decoded_len)
            mode = "minimal (header only)"

        headers = {
            "Content-Type": "application/x-protobuf",
        }

        print(f"[*] Mode: {mode}")
        print(f"[*] Sending {len(payload)} bytes, claiming {args.decoded_len} bytes decoded")
        print(f"[*] Target: {args.host}:{args.port}{args.path}")
        print()

        conn = http.client.HTTPConnection(args.host, args.port, timeout=10)
        try:
            conn.request("POST", args.path, body=payload, headers=headers)
            resp = conn.getresponse()
            data = resp.read().decode('utf-8', errors='replace')
            
            if resp.status == 413:
                print(f"[+] SUCCESS: Got HTTP {resp.status} {resp.reason}")
                print(f"    Response: {data.strip()}")
            elif resp.status == 400:
                print(f"[*] Got HTTP {resp.status} {resp.reason}")
                print(f"    Response: {data.strip()}")
            elif resp.status == 202:
                print(f"[-] UNEXPECTED: Request was accepted (HTTP {resp.status})")
                print(f"    The size limit may not be working!")
            else:
                print(f"[?] HTTP {resp.status} {resp.reason}")
                print(f"    Response: {data[:500]}")
                
        except (ConnectionResetError, BrokenPipeError, http.client.RemoteDisconnected) as exc:
            print(f"[!] Connection dropped: {exc}")
        except socket.timeout:
            print(f"[!] Connection timed out")
        except Exception as exc:
            print(f"[!] Error: {exc}")
        finally:
            conn.close()
        
        return 0


    if __name__ == "__main__":
        sys.exit(main())
---
apiVersion: v1
kind: Pod
metadata:
  name: trigger-pod
  namespace: kube-system
  labels:
    app: trigger-pod
spec:
  containers:
    - name: python
      image: python:3.11-slim
      command: ["sleep", "infinity"]
      volumeMounts:
        - name: script
          mountPath: /scripts
  volumes:
    - name: script
      configMap:
        name: trigger-script
        defaultMode: 0755
  restartPolicy: Never

Related issues

Closes #https://github.com/elastic/security/issues/7765

Screenshots

The metrics ingestion from prometheus remote write works fine with no problems

No restars of metricbeat during our tests:

kgp -n kube-system
NAME                                                        READY   STATUS    RESTARTS   AGE
...
metricbeat-5xm98                                            1/1     Running   0          60m

Logs

kubectl exec -it trigger-pod -n kube-system -- python /scripts/trigger_metricbeat_remote_write.py --decoded-len 1073741824

[*] Mode: minimal (header only)
[*] Sending 9 bytes, claiming 1073741824 bytes decoded
[*] Target: metricbeat-remote-write.kube-system:9201/

[+] SUCCESS: Got HTTP 413 Request Entity Too Large
    Response: decoded length too large: 1073741824 bytes exceeds 10485760 max decoded bytes limit (maxDecodedBodyBytes)

{"log.level":"warn","@timestamp":"2025-12-23T12:03:36.390Z","log.logger":"prometheus.remote_write","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/prometheus/remote_write.(*MetricSet).handleFunc","file.name":"remote_write/remote_write.go","file.line":191},"message":"Decoded lenght too large: 1073741824 bytes exceeds 10485760 max decoded bytes limit (maxDecodedBodyBytes)","service.name":"metricbeat","ecs.version":"1.6.0"}

{"log.level":"warn","@timestamp":"2025-12-23T11:44:12.876Z","log.logger":"prometheus.remote_write","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/prometheus/remote_write.(*MetricSet).handleFunc","file.name":"remote_write/remote_write.go","file.line":174},"message":"Request body too large: exceeds 2097152 bytes limit","service.name":"metricbeat","ecs.version":"1.6.0”}

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

github-actions · 2025-12-23T12:39:46Z

🤖 GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

elasticmachine · 2025-12-23T12:39:54Z

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

mergify · 2025-12-23T12:40:16Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @gizas? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
backport-active-all is the label that automatically backports to all active branches.
backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

github-actions · 2025-12-23T12:41:36Z

🔍 Preview links for changed docs

docs/reference/metricbeat/metricbeat-metricset-prometheus-remote_write.md

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

MichaelKatsoulis

LGTM

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

bmorelli25 · 2025-12-23T16:42:02Z

docs/reference/metricbeat/metricbeat-metricset-prometheus-remote_write.md

        #insecure_skip_verify: true
 ```

+## Request size limits [_request_size_limits]


What version will these changes go into? We need to tag that version in the docs. For example:

Suggested change

## Request size limits [_request_size_limits]

## Request size limits [_request_size_limits]

```{applies_to}

stack: ga 9.4

```

Repeat in each of the doc entries in this PR. More info here.

Thanks @bmorelli25 . See fixing docs with versions

let me know wdyt

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

gizas added 4 commits December 22, 2025 16:40

fix the snappy decompression

2aedde2

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

merging new config vars

b8eb67c

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

updating error log

c1e7bb0

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

updating docs

84b1940

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

gizas requested review from a team as code owners December 23, 2025 12:39

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 23, 2025

gizas added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Dec 23, 2025

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 23, 2025

gizas added the prometheus label Dec 23, 2025

mergify bot assigned gizas Dec 23, 2025

github-actions bot deployed to docs-preview December 23, 2025 12:40 View deployment

gizas added 2 commits December 23, 2025 14:41

updating yaml file

d13c2af

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

Merge branch 'main' into remote_write_maxdecoding

339317d

github-actions bot deployed to docs-preview December 23, 2025 12:42 View deployment

adding changelog

b632ffd

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

github-actions bot deployed to docs-preview December 23, 2025 12:46 View deployment

gizas mentioned this pull request Dec 23, 2025

[Prometheus Remote Write] Configurable size limits on incoming requests for remote_write metricset (max_compressed_body_bytes, max_decoded_body_bytes) elastic/integrations#16679

Open

gizas requested a review from MichaelKatsoulis December 23, 2025 12:50

fixing typo

ccddeb0

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

github-actions bot deployed to docs-preview December 23, 2025 12:58 View deployment

MichaelKatsoulis approved these changes Dec 23, 2025

View reviewed changes

fixing docs

e26bb93

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

github-actions bot deployed to docs-preview December 23, 2025 13:48 View deployment

fixing docs

73edfb8

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

github-actions bot deployed to docs-preview December 23, 2025 13:49 View deployment

bmorelli25 reviewed Dec 23, 2025

View reviewed changes

gizas added backport-active-all Automated backport with mergify to all the active branches backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches and removed backport-active-all Automated backport with mergify to all the active branches labels Dec 24, 2025

fixing docs with versions

01c57ac

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>

github-actions bot had a problem deploying to docs-preview December 24, 2025 08:49 Failure

Merge branch 'main' into remote_write_maxdecoding

d40b5ff

github-actions bot deployed to docs-preview December 24, 2025 08:51 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Prometheus] Remote - Write: Enforce Maxdecoding Length check #48218

[Prometheus] Remote - Write: Enforce Maxdecoding Length check #48218

gizas commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

elasticmachine commented Dec 23, 2025

Uh oh!

mergify bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025 •

edited

Loading

Uh oh!

MichaelKatsoulis left a comment

Uh oh!

bmorelli25 Dec 23, 2025 •

edited

Loading

Uh oh!

gizas Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Prometheus] Remote - Write: Enforce Maxdecoding Length check #48218

Are you sure you want to change the base?

[Prometheus] Remote - Write: Enforce Maxdecoding Length check #48218

Conversation

gizas commented Dec 23, 2025

Proposed commit message

Checklist

How to test this PR locally

Related issues

Screenshots

Logs

Uh oh!

github-actions bot commented Dec 23, 2025

🤖 GitHub comments

Uh oh!

elasticmachine commented Dec 23, 2025

Uh oh!

mergify bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

MichaelKatsoulis left a comment

Choose a reason for hiding this comment

Uh oh!

bmorelli25 Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gizas Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Dec 23, 2025 •

edited

Loading

bmorelli25 Dec 23, 2025 •

edited

Loading