feat(telemetry): opt-in anonymous ops telemetry (per-command), with FC→SLS reference receiver#410
Closed
PeterGuy326 wants to merge 13 commits into
Closed
feat(telemetry): opt-in anonymous ops telemetry (per-command), with FC→SLS reference receiver#410PeterGuy326 wants to merge 13 commits into
PeterGuy326 wants to merge 13 commits into
Conversation
Emit one dimensions-only metric per dws invocation (error rate, latency, command distribution, version/platform health) to an operator-configured sink. Independent of the audit machinery and OFF by default. - internal/telemetry: Event (10 coarse dimensions, no content/identity) + env-driven Forwarder (DWS_TELEMETRY_ENABLED/URL/TOKEN/TIMEOUT_MS) - wire emitTelemetry into executeInvocation's defer, reusing the existing outcome/err_class/duration already computed for command-end logging - docs/telemetry.md: fields, privacy boundary, SLS ingest + 4 alert rules - tests cover enable gating, POST contract, and the privacy boundary (param content must never leak into the payload)
A minimal Flask web service to deploy as a Function Compute Web Function: verifies the bearer token, then writes each telemetry Event to an SLS Logstore via PutLogs (SLS cannot accept the raw signed-less POST directly). Promotes the query dimensions to their own columns and keeps the full event verbatim. Includes deploy walkthrough, local smoke test, and 4 alert rules.
…te boundary - localsink.py: stdlib-only HTTP collector to test the full dws->HTTP pipeline without SLS or Function Compute - telemetry.md: local-test walkthrough (incl. a mini local dashboard) and a section spelling out that the SLS project / endpoint / token live in the deployer's own infra and never enter this open-source repo
app.py now auto-detects mode: with no SLS_* env (or TELEMETRY_DRYRUN=true) it logs each event to stdout and returns 204 instead of writing to SLS, and the aliyun-log SDK is imported lazily so dry-run needs no extra dependency. Lets you deploy to Function Compute and confirm the client->FC pipeline end-to-end before provisioning any SLS resource. GET / reports the active mode. README documents the deploy-then-wire-SLS flow.
scripts/dev/telemetry_smoke.sh builds dws, starts the zero-dep local sink, fires --mock commands and asserts the pipeline: events received with all expected dimensions, bearer token enforced (401), and the privacy boundary (a sentinel command argument must never appear in any payload). Exits non-zero on failure, so it can gate pre-push / CI.
Open-source repo convention: configmeta descriptions, docs/telemetry.md and the FC ingest README are now English (code comments were already English). No behavior change; tests still pass.
Convert all Chinese content in the telemetry surface to English so the public repo leaves no localized traces: - docs/telemetry.md (full doc) - docs/telemetry/fc-sls-ingest/README.md (FC->SLS receiver guide) - internal/telemetry/telemetry.go (config item descriptions) - internal/app/telemetry_runtime_test.go (test fixture string) No behavior change; English-only wording.
…pt-out + disclosure
Lets a downstream "fleet" distribution ship telemetry on-by-default to its own
ingest, while the open-source build stays opt-in and off — and hardcodes no
endpoint.
- internal/telemetry/telemetry.go:
- build-time vars defaultURL/defaultToken (empty in OSS; injected via -ldflags
by a downstream build)
- Enabled() posture: DWS_TELEMETRY_DISABLED hard opt-out wins; explicit
DWS_TELEMETRY_ENABLED overrides; otherwise on only when a default endpoint is
baked in. Env URL/token override the build defaults.
- ShowNoticeOnce(): one-time stderr disclosure (marker ~/.dws/.telemetry_notice_shown)
- new DWS_TELEMETRY_DISABLED env + configmeta registration
- internal/app/telemetry_runtime.go: print the disclosure once when telemetry first activates
- internal/telemetry/telemetry_test.go: cover baked-in default-on + opt-out (OSS opt-in cases unchanged)
- docs/telemetry.md: document default posture, ldflags injection, opt-out, disclosure
Verified e2e: a build with a baked-in endpoint and no env defaults on, prints the
notice once, and reports; DWS_TELEMETRY_DISABLED=true suppresses it.
…erless Devs) Make the FC->SLS reference receiver deployable without hand-steps: - Dockerfile: container image (AONE / any container platform), gunicorn on :9000, app.py auto-detects dry-run vs SLS mode from env. Built + run + received a live event locally. - s.yaml + deploy.sh: Serverless Devs spec for public Aliyun FC (s build && s deploy). - .dockerignore: keep the image to app.py + requirements.txt. No behavior change to the receiver; packaging only.
…ver-less monitoring Add a zero-infra sink: when DWS_TELEMETRY_FILE is set, each event is appended as one JSON line to that local file instead of being POSTed — no receiver, no FC, no SLS. Ideal for local/per-machine stability monitoring; aggregate the file with a small script (see docs/telemetry.md). File sink takes precedence over URL and, when set, enables telemetry (with the same DWS_TELEMETRY_DISABLED opt-out). - telemetry.go: EnvFile + resolvedFile (with ~ expansion); Enabled() counts a file sink as a destination; Forwarder.File appends JSONL in Emit. - test: file sink enables + appends valid JSON lines + opt-out still wins. - docs: "Local monitoring (lightest)" section + one-line aggregation.
…PPEND) Make the zero-dep local sink usable as a tiny central collector on your own machine — e.g. "monitor on my computer" for a small team, no SLS/FC needed: - HOST env (default 127.0.0.1; set 0.0.0.0 to accept POSTs from LAN machines) - APPEND env (default truncate for tests; APPEND=1 keeps history across restarts) - startup banner shows the real bind host + append mode Token auth is strongly advised when binding 0.0.0.0. Verified: a dws pointed at the machine's LAN IP lands events in the collector file.
…deploy artifacts + gofmt), take their smoke script
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds opt-in, anonymous, dimensions-only ops telemetry: dws can emit one JSON
metric per command invocation to a deployer-configured endpoint, for monitoring
error rate, latency, command distribution, and version/platform health.
Off by default — with
DWS_TELEMETRY_ENABLEDunset, nothing is emitted (zerohot-path impact). Centralized reporting is opt-in + explicitly disclosed.
What's included
internal/telemetry/— event schema + emitter (timeout-bounded; never gates the command)internal/app/telemetry_runtime.go— runtime wiring into command executiondocs/telemetry.md— full operator doc (enabling, fields, receiver contract, local testing, SLS wiring, alerts)docs/telemetry/fc-sls-ingest/— reference FC→SLS receiver (app.pyFlask,localsink.pyzero-dep local sink) with a dry-run mode to validate the pipeline before touching SLSPrivacy boundary
command/subcommand/outcome/exit_code/duration_ms/cli_version/channel/os/corp_id/trace_id.Config (all env, all default off)
DWS_TELEMETRY_ENABLEDDWS_TELEMETRY_URLDWS_TELEMETRY_TOKENDWS_TELEMETRY_TIMEOUT_MSNotes for reviewers
mainhistory clean (the branch's intermediate commits predate the English pass).