Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cluster-talos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ cluster-talos/
│ ├── cloudflare-operator-system/
│ │ ├── cloudflare-operator/ # ClusterTunnel CRD + operator
│ │ └── cloudflare-tunnel/ # ClusterTunnel instance + default TunnelBinding
│ ├── tanzu-system-logging/ # Fluent-Bit DaemonSet → VCF Operations for Logs (syslog rfc5424)
│ ├── logging/ # Logging Operator: Fluent Bit DSFluentd STS → VCF Operations for Logs (CFAPI)
│ ├── kasten-io/ # Kasten K10 (LDAPS to AD)
│ ├── spegel/ # P2P containerd image cache
│ ├── renovate/ # Dependency update bot
Expand Down
4 changes: 2 additions & 2 deletions cluster-talos/kubernetes/apps/media/plex-test/app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ Mirrors the same mechanism the prod `plex` HR uses (see
(`plex-log-tail`) that runs `tail -n0 -F` on
`/config/Library/Application Support/Plex Media Server/Logs/Plex Media Server.log`
as uid 1000 inside the `pms` container. Output flows to container
stdout → fluent-bit DaemonSet in `tanzu-system-logging`
`skw-vcflogs.boeye.net:514`.
stdout → Logging Operator Fluent Bit DS (`logging` ns) → Fluentd
aggregator → `skw-vcflogs.boeye.net:9543` CFAPI HTTPS.

Only delta vs prod: the persistence entry hangs off controller key `pms`
(plex-test calls the PMS container that, not `app` like prod does) and
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
# Same pattern as prod plex (cluster-talos/kubernetes/apps/media/plex/app/
# configmap-plex-log-tail.yaml) — s6-overlay longrun that tails PMS's
# file-based log into stdout so fluent-bit (tanzu-system-logging
# DaemonSet) ships request/decision detail to skw-vcflogs alongside
# the regular container logs. `tail -F` follows by name across PMS's
# internal 10 MB log rotation.
# file-based log into stdout so the Logging Operator's Fluent Bit
# DaemonSet (ns=logging) ships request/decision detail to skw-vcflogs
# alongside the regular container logs. `tail -F` follows by name
# across PMS's internal 10 MB log rotation.
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down
11 changes: 7 additions & 4 deletions cluster-talos/kubernetes/apps/media/plex/app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,13 @@ client) is in memory `reference_pms_html_tv_app_ac3_override.md`.
`configmap-plex-log-tail.yaml` registers an s6-overlay v3 longrun named
`plex-log-tail` inside the plex container. It tails the file Plex writes
to (`/config/Library/Application Support/Plex Media Server/Logs/Plex Media Server.log`)
into the container's stdout. The cluster's fluent-bit DaemonSet
(`tanzu-system-logging`) picks it up along with the rest of `/var/log/
containers/*.log` and ships everything to `skw-vcflogs.boeye.net:514`
via syslog RFC5424.
into the container's stdout. The Logging Operator's Fluent Bit
DaemonSet (`logging` ns) picks it up along with the rest of
`/var/log/containers/*.log` and forwards to the Fluentd aggregator,
which posts to `skw-vcflogs.boeye.net:9543` via CFAPI HTTPS (the
old syslog/RFC5424 path was replaced 2026-05-27 to escape its
2048-byte per-message cap that clipped long Plex Web Request
lines).

The longrun layout mirrors how `scaleplex_pms_dockermod` already wires
`scaleplex-relay` — three files mounted under `/etc/s6-overlay/s6-rc.d/`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
# `/config/Library/Application Support/Plex Media Server/Logs/Plex Media Server.log`
# (no exposed knob to redirect that to stdout); without this service
# the verbose request/decision lines never reach kubectl/Freelens or
# the tanzu-system-logging fluent-bit DaemonSet that ships container
# stdout to skw-vcflogs.boeye.net:514.
# the Logging Operator's Fluent Bit DaemonSet (ns=logging) that
# tails container stdout → Fluentd aggregator → skw-vcflogs CFAPI.
#
# Pattern mirrors the scaleplex_pms_dockermod's `scaleplex-relay`
# longrun (same s6-overlay v3 layout — three files per service:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: fluent
name: kube-logging
namespace: flux-system
spec:
type: oci
url: oci://ghcr.io/kube-logging/helm-charts
interval: 1h
url: https://fluent.github.io/helm-charts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@ resources:
- prometheus-community.yaml
- descheduler.yaml
- stakater.yaml
- fluent.yaml
- kube-logging.yaml
- gpu-node-vsphere-maintenance-controller.yaml
- authentik.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ resources:
- external-dns/external-dns-cloudflare/ks.yaml
- cloudflare-operator-system/cloudflare-operator/ks.yaml
- cloudflare-operator-system/cloudflare-tunnel/ks.yaml
- tanzu-system-logging/ks.yaml
- logging/ks.yaml
- kasten-io/ks.yaml
- spegel/ks.yaml
- renovate/ks.yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# `logging/` — cluster log forwarding to vcflogs via Logging Operator

Replaces the previous `tanzu-system-logging/` stack (standalone
fluent-bit DS + custom `vcflogs-cfapi-adapter` sidecar). Migrated
2026-05-27 to the [Logging Operator](https://kube-logging.dev/)
pattern so the VMware Aria CFAPI translation comes from a
maintained-by-VMware plugin instead of code we own.

## Architecture

```
/var/log/containers/*.log on every node
┌──────────────────────────────────────────────────────────────┐
│ Fluent Bit DaemonSet (operator-managed, ns=logging) │
│ INPUT tail /var/log/containers/*.log │
│ FILTER kubernetes enrich w/ pod/ns/labels │
│ OUTPUT forward → Fluentd Service (operator-managed) │
└─────────────────────────────┬────────────────────────────────┘
│ Fluentd forward protocol
│ (port 24240, in-cluster)
┌──────────────────────────────────────────────────────────────┐
│ Fluentd StatefulSet ×2 (HA, operator-managed) │
│ @type forward │
│ @type vmware_loginsight (fluent-plugin-vmware-loginsight) │
│ → CFAPI POST {"events":[…]} │
└─────────────────────────────┬────────────────────────────────┘
│ HTTPS POST
skw-vcflogs.boeye.net:9543
/api/v1/events/ingest/k8s-talos
```

No 2048-byte syslog cap. No homemade adapter. The
`fluent-plugin-vmware-loginsight` gem (v1.4.2) is bundled in the
operator's `ghcr.io/kube-logging/fluentd:v1.17-5.0-full` image —
nothing to build.

## CRD breakdown

| File | CRD | Purpose |
|---|---|---|
| `helmrelease-operator.yaml` | `HelmRelease` | Installs the operator + CRDs |
| `logging.yaml` | `Logging` | Declares the pipeline (which Fluent Bit + Fluentd specs to render) |
| `clusteroutput-vcflogs.yaml` | `ClusterOutput` | The vmwareLogInsight destination, cluster-scoped |
| `clusterflow-all.yaml` | `ClusterFlow` | "Match everything → send to vcflogs" |

The split between Logging (infrastructure) and Flow/Output (routing)
is intentional in the operator design — Logging is platform, Flow
+ Output is policy. At a multi-tenant work-scale, namespaces would
get their own `Flow` CRs (namespace-scoped, can only target outputs
their team owns), while ops would manage `ClusterFlow` /
`ClusterOutput` for cross-cutting destinations.

## What's where in the cluster

- **`logging` namespace** holds the operator pod + Fluent Bit DS + Fluentd STS
- **Fluent Bit pods** mount `hostPath: /var/log/containers` to read CRI logs
- **Fluentd pods** mount `5Gi` PVC each (`longhorn` StorageClass) for the file buffer that absorbs vcflogs back-pressure
- **leader election** uses a Lease in this ns

## Tuning knobs

| What | Where |
|---|---|
| Fluent Bit resources / tolerations | `logging.yaml` → `spec.fluentbit` |
| Fluentd replicas (HA) | `logging.yaml` → `spec.fluentd.scaling.replicas` |
| Fluentd buffer size / storage class | `logging.yaml` → `spec.fluentd.bufferStorageVolume.pvc` |
| CFAPI endpoint / TLS posture | `clusteroutput-vcflogs.yaml` → `spec.vmwareLogInsight` |
| Buffer flush cadence / retry | `clusteroutput-vcflogs.yaml` → `spec.vmwareLogInsight.buffer` |
| Per-namespace routing | replace `clusterflow-all.yaml` with multiple `Flow` / `ClusterFlow` CRs |

## Reverting (if needed)

```
git revert <merge-commit-of-this-PR>
flux reconcile kustomization platform -n flux-system
```

This re-creates the old `tanzu-system-logging/fluent-bit` HelmRelease.
The `vcflogs-cfapi-adapter` ghcr.io image was deleted with this
migration — re-installing would put the cluster back on the
2048-byte syslog cap until the image is rebuilt and republished
from `git history`.

## References

- [Logging Operator docs](https://kube-logging.dev/docs/)
- [`vmwareLogInsight` output reference](https://kube-logging.dev/docs/configuration/plugins/outputs/vmware_loginsight/)
- [`fluent-plugin-vmware-loginsight` upstream (archived)](https://github.com/vmware-archive/fluent-plugin-vmware-loginsight)
- VMware Aria Operations for Logs [ingest API](https://developer.broadcom.com/xapis/vrealize-log-insight-api/latest/)
- Predecessor: PR Varashi/k8s#151 (homemade vcflogs-cfapi-adapter sidecar)
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
# ClusterFlow — the routing rule. Matches *everything* from every
# namespace and sends it to the vcflogs ClusterOutput. Mirrors the
# previous fluent-bit `[OUTPUT] syslog Match kube.*` behavior:
# every container log goes to vcflogs, no per-namespace filtering.
#
# When we want per-namespace selectivity later (e.g., suppress noisy
# system logs), replace this with multiple Flow/ClusterFlow CRs and
# selectors. For now: one ClusterFlow, one ClusterOutput.
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
name: all-to-vcflogs
namespace: logging
spec:
match:
# Match every namespace. The empty `select: {}` is the
# operator's canonical "match all" form.
- select: {}
globalOutputRefs:
- vcflogs
# Filters run in order. Tag normalization keeps the records
# parsable downstream; the kubernetes_metadata enrichment is
# added automatically by the operator for Fluent Bit collection.
filters:
# Drop fluent-bit's per-line `logtag` (P/F) marker — it's a
# CRI partial-line indicator, not useful to vcflogs. Keep
# everything else.
- record_modifier:
remove_keys: logtag
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
# ClusterOutput — the destination. Cluster-scoped so any namespace
# can route to it via a ClusterFlow. Wraps the
# `fluent-plugin-vmware-loginsight` (v1.4.2, bundled in the operator's
# v1.17-5.0-full Fluentd image) and POSTs to vRealize Log Insight's
# CFAPI ingest endpoint.
#
# Why CFAPI (HTTPS) vs syslog (TCP/514): the syslog path enforces
# RFC 5424's 2048-byte per-message cap. Plex Web Request: lines run
# ~3800 bytes when X-Plex-Client-Profile-Extra is present, getting
# clipped mid-token at byte 2040 — losing X-Plex-Product /
# X-Plex-Version / X-Plex-Token from the tail. CFAPI has no
# documented size cap. (See:
# https://github.com/Varashi/k8s/pull/151 for the prior homemade
# sidecar that did this translation by hand before we adopted the
# operator pattern.)
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
name: vcflogs
namespace: logging
spec:
vmwareLogInsight:
scheme: https
# Aria appliance ships a self-signed cert; we accept it on the
# internal network. Flip to true once the cert gets a real CA.
ssl_verify: false
host: "skw-vcflogs.${SECRET_DOMAIN}"
port: 9543
# CFAPI's agent_id is an arbitrary tag the receiver records
# against each event — not auth, just an identifier so vcflogs
# can attribute the stream. Matches what the previous adapter
# used (k8s-talos).
agent_id: k8s-talos
# Default `log_text_keys` (`log`, `msg`, `message`) is what
# we want — the CRI parser populates `log`/`message` for
# container stdout/stderr, and the kubernetes_metadata filter
# may copy to `msg`.
log_text_keys:
- log
- msg
- message
# Plugin retries on 5xx and drops on 4xx by default. Buffer
# chunk + flush cadence inherited from the Logging CR's
# fluentd.bufferStorageVolume PVC.
buffer:
flush_interval: 5s
retry_max_interval: 30s
chunk_limit_size: 8MB
total_limit_size: 1GB
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
# Logging Operator — installs the controller that watches Logging /
# Flow / Output CRDs and reconciles a Fluent Bit DaemonSet + Fluentd
# StatefulSet from them. The chart ships only the operator + CRDs;
# the actual logging pipeline is declared via the CRs in the
# sibling logging.yaml / clusteroutput-*.yaml / clusterflow-*.yaml.
#
# Docs: https://kube-logging.dev/docs/
# Chart: oci://ghcr.io/kube-logging/helm-charts/logging-operator
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: logging-operator
namespace: logging
spec:
interval: 30m
chart:
spec:
chart: logging-operator
version: 6.5.2
sourceRef:
kind: HelmRepository
name: kube-logging
namespace: flux-system
install:
remediation:
retries: 3
crds: CreateReplace
upgrade:
cleanupOnFail: true
crds: CreateReplace
remediation:
strategy: rollback
retries: 3
values:
# Keep the operator itself lean — it just watches CRDs.
resources:
requests: {cpu: 10m, memory: 64Mi}
limits: {cpu: 200m, memory: 256Mi}
# The operator's leader election uses a Lease in this ns.
enableLeaderElection: true
# Don't ship the bundled `logging` resource — we declare ours
# explicitly in logging.yaml so it's GitOps-visible.
logging:
enabled: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- helmrelease-operator.yaml
# Order below matters at first apply: the operator's CRDs (Logging,
# Flow, ClusterFlow, Output, ClusterOutput) must exist before the
# CRs below can install. Flux retries on missing-CRD failures, so
# this resolves on the second reconcile if both arrive together.
- logging.yaml
- clusteroutput-vcflogs.yaml
- clusterflow-all.yaml
Loading
Loading