Skip to content

Conversation

@gizas
Copy link
Contributor

@gizas gizas commented Dec 23, 2025

  • Bug

Proposed commit message

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

How to test this PR locally

  • Follow the steps of doc to deploy elastic stack and metricbeat locally

    Specifically add the following configuration for remote-write prometheus module:

    Prometheus Module Config
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: metricbeat-daemonset-modules
      namespace: kube-system
      labels:
        k8s-app: metricbeat
    data:
      prometheus.yml: |-
        - module: prometheus
          metricsets:
            - remote_write
          host: "0.0.0.0"
          port: 9201
          period: 10s
  • Install prometheus in your kind cluster

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-server prometheus-community/prometheus --namespace kube-system
  • Install the following service in order to be able your prometheus server to send remote-write metrics to metricbeat

    Metricbeat Service
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: metricbeat-remote-write
      namespace: kube-system
      labels:
        k8s-app: metricbeat
    spec:
      ports:
        - port: 9201
          protocol: TCP
          targetPort: 9201
      selector:
        k8s-app: metricbeat
      sessionAffinity: None
      type: ClusterIP
  • Configure prometheus server with remote write

    Metricbeat Service
    helm upgrade --install prometheus-server prometheus-community/prometheus -f values.yml
    
    $ cat values.yml
    server:
    remoteWrite:
      - url: http://metricbeat-remote-write.kube-system:9201/write
  • Save following to the trigger-pod.yaml. Install following pod to be able to create snappy requests and test the new configs

    kubeclt apply -f trigger-pod.yaml
      ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: trigger-script
      namespace: kube-system
    data:
      trigger_metricbeat_remote_write.py: |
        #!/usr/bin/env python3
        """
        Send crafted Snappy payloads to test Metricbeat remote_write size limits.
        
        This script can send a tiny payload that claims a huge decoded size,
        useful for testing the DecodedLen security check.
        """
    
        from __future__ import annotations
    
        import argparse
        import http.client
        import socket
        import sys
    
        DEFAULT_DECODED_SIZE = 500 * 1024 * 1024  # 500 MB
    
    
        def encode_uvarint(value: int) -> bytes:
            """Encode an integer as a varint (used in snappy header)."""
            out = bytearray()
            while value >= 0x80:
                out.append((value & 0x7F) | 0x80)
                value >>= 7
            out.append(value)
            return bytes(out)
    
    
        def build_minimal_payload(claimed_decoded_size: int) -> bytes:
            """
            Build a minimal snappy payload that claims a large decoded size.
            
            The payload is just the varint header + a few bytes of padding.
            This is useful for testing the DecodedLen check without sending
            megabytes of data.
            """
            header = encode_uvarint(claimed_decoded_size)
            # Add minimal padding (this will cause snappy.Decode to fail, 
            # but we're testing the DecodedLen check which happens first)
            return header + b'\x00\x00\x00\x00'
    
    
        def build_valid_payload(decoded_size: int) -> bytes:
            """
            Build a valid snappy payload of the specified decoded size.
            
            This creates actual valid snappy-compressed data.
            """
            TAG_LITERAL = 0x00
            TAG_COPY2 = 0x02
            
            frame = bytearray()
            frame.extend(encode_uvarint(decoded_size))
    
            if decoded_size < 4:
                # Small literal
                n = decoded_size - 1
                frame.append((n << 2) | TAG_LITERAL)
                frame.extend(b"\x00" * decoded_size)
                return bytes(frame)
    
            # Emit a small literal followed by copy operations
            literal_len = ((decoded_size - 4) % 64) + 4
            n = literal_len - 1
            if n < 60:
                frame.append((n << 2) | TAG_LITERAL)
            else:
                frame.extend(((60 << 2) | TAG_LITERAL, n))
            frame.extend(b"\x00" * literal_len)
    
            remaining = decoded_size - literal_len
            if remaining > 0:
                copy_chunk = bytes([((64 - 1) << 2) | TAG_COPY2, 0x01, 0x00])
                repetitions = remaining // 64
                frame.extend(copy_chunk * repetitions)
            
            return bytes(frame)
    
    
        def main() -> int:
            parser = argparse.ArgumentParser(
                description=__doc__,
                formatter_class=argparse.RawDescriptionHelpFormatter
            )
            parser.add_argument("--host", default="metricbeat-remote-write.kube-system",
                                help="Target host (default: metricbeat-remote-write.kube-system)")
            parser.add_argument("--port", type=int, default=9201,
                                help="Target port (default: 9201)")
            parser.add_argument("--decoded-len", type=int, default=DEFAULT_DECODED_SIZE,
                                help="Decoded size to claim in bytes (default: 500MB)")
            parser.add_argument("--path", default="/",
                                help="HTTP path (default: /)")
            parser.add_argument("--minimal", action="store_true",
                                help="Send minimal payload (just header) to test DecodedLen check")
            parser.add_argument("--valid", action="store_true",
                                help="Send valid snappy payload (larger, tests actual decompression)")
            args = parser.parse_args()
    
            if args.minimal:
                payload = build_minimal_payload(args.decoded_len)
                mode = "minimal (header only)"
            elif args.valid:
                payload = build_valid_payload(args.decoded_len)
                mode = "valid snappy"
            else:
                # Default to minimal for testing
                payload = build_minimal_payload(args.decoded_len)
                mode = "minimal (header only)"
    
            headers = {
                "Content-Type": "application/x-protobuf",
            }
    
            print(f"[*] Mode: {mode}")
            print(f"[*] Sending {len(payload)} bytes, claiming {args.decoded_len} bytes decoded")
            print(f"[*] Target: {args.host}:{args.port}{args.path}")
            print()
    
            conn = http.client.HTTPConnection(args.host, args.port, timeout=10)
            try:
                conn.request("POST", args.path, body=payload, headers=headers)
                resp = conn.getresponse()
                data = resp.read().decode('utf-8', errors='replace')
                
                if resp.status == 413:
                    print(f"[+] SUCCESS: Got HTTP {resp.status} {resp.reason}")
                    print(f"    Response: {data.strip()}")
                elif resp.status == 400:
                    print(f"[*] Got HTTP {resp.status} {resp.reason}")
                    print(f"    Response: {data.strip()}")
                elif resp.status == 202:
                    print(f"[-] UNEXPECTED: Request was accepted (HTTP {resp.status})")
                    print(f"    The size limit may not be working!")
                else:
                    print(f"[?] HTTP {resp.status} {resp.reason}")
                    print(f"    Response: {data[:500]}")
                    
            except (ConnectionResetError, BrokenPipeError, http.client.RemoteDisconnected) as exc:
                print(f"[!] Connection dropped: {exc}")
            except socket.timeout:
                print(f"[!] Connection timed out")
            except Exception as exc:
                print(f"[!] Error: {exc}")
            finally:
                conn.close()
            
            return 0
    
    
        if __name__ == "__main__":
            sys.exit(main())
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: trigger-pod
      namespace: kube-system
      labels:
        app: trigger-pod
    spec:
      containers:
        - name: python
          image: python:3.11-slim
          command: ["sleep", "infinity"]
          volumeMounts:
            - name: script
              mountPath: /scripts
      volumes:
        - name: script
          configMap:
            name: trigger-script
            defaultMode: 0755
      restartPolicy: Never

Related issues

Screenshots

The metrics ingestion from prometheus remote write works fine with no problems

Screenshot 2025-12-23 at 2 34 05 PM

No restars of metricbeat during our tests:

kgp -n kube-system
NAME                                                        READY   STATUS    RESTARTS   AGE
...
metricbeat-5xm98                                            1/1     Running   0          60m

Logs

kubectl exec -it trigger-pod -n kube-system -- python /scripts/trigger_metricbeat_remote_write.py --decoded-len 1073741824

[*] Mode: minimal (header only)
[*] Sending 9 bytes, claiming 1073741824 bytes decoded
[*] Target: metricbeat-remote-write.kube-system:9201/

[+] SUCCESS: Got HTTP 413 Request Entity Too Large
    Response: decoded length too large: 1073741824 bytes exceeds 10485760 max decoded bytes limit (maxDecodedBodyBytes)

{"log.level":"warn","@timestamp":"2025-12-23T12:03:36.390Z","log.logger":"prometheus.remote_write","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/prometheus/remote_write.(*MetricSet).handleFunc","file.name":"remote_write/remote_write.go","file.line":191},"message":"Decoded lenght too large: 1073741824 bytes exceeds 10485760 max decoded bytes limit (maxDecodedBodyBytes)","service.name":"metricbeat","ecs.version":"1.6.0"}

{"log.level":"warn","@timestamp":"2025-12-23T11:44:12.876Z","log.logger":"prometheus.remote_write","log.origin":{"function":"github.com/elastic/beats/v7/metricbeat/module/prometheus/remote_write.(*MetricSet).handleFunc","file.name":"remote_write/remote_write.go","file.line":174},"message":"Request body too large: exceeds 2097152 bytes limit","service.name":"metricbeat","ecs.version":"1.6.0”}

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
@gizas gizas requested review from a team as code owners December 23, 2025 12:39
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 23, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@gizas gizas added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Dec 23, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 23, 2025
@mergify mergify bot assigned gizas Dec 23, 2025
@mergify
Copy link
Contributor

mergify bot commented Dec 23, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @gizas? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 23, 2025

🔍 Preview links for changed docs

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Copy link
Contributor

@MichaelKatsoulis MichaelKatsoulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
#insecure_skip_verify: true
```

## Request size limits [_request_size_limits]
Copy link
Member

@bmorelli25 bmorelli25 Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version will these changes go into? We need to tag that version in the docs. For example:

Suggested change
## Request size limits [_request_size_limits]
## Request size limits [_request_size_limits]
```{applies_to}
stack: ga 9.4
```

Repeat in each of the doc entries in this PR. More info here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bmorelli25 . See fixing docs with versions

let me know wdyt

@gizas gizas added backport-active-all Automated backport with mergify to all the active branches backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches and removed backport-active-all Automated backport with mergify to all the active branches labels Dec 24, 2025
Signed-off-by: Andreas Gkizas <andreas.gkizas@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches prometheus Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants