feat(telemetry): opt-in anonymous ops telemetry per command#411
Open
PeterGuy326 wants to merge 13 commits into
Open
feat(telemetry): opt-in anonymous ops telemetry per command#411PeterGuy326 wants to merge 13 commits into
PeterGuy326 wants to merge 13 commits into
Conversation
Emit one dimensions-only metric per dws invocation (error rate, latency, command distribution, version/platform health) to an operator-configured sink. Independent of the audit machinery and OFF by default. - internal/telemetry: Event (10 coarse dimensions, no content/identity) + env-driven Forwarder (DWS_TELEMETRY_ENABLED/URL/TOKEN/TIMEOUT_MS) - wire emitTelemetry into executeInvocation's defer, reusing the existing outcome/err_class/duration already computed for command-end logging - docs/telemetry.md: fields, privacy boundary, SLS ingest + 4 alert rules - tests cover enable gating, POST contract, and the privacy boundary (param content must never leak into the payload)
A minimal Flask web service to deploy as a Function Compute Web Function: verifies the bearer token, then writes each telemetry Event to an SLS Logstore via PutLogs (SLS cannot accept the raw signed-less POST directly). Promotes the query dimensions to their own columns and keeps the full event verbatim. Includes deploy walkthrough, local smoke test, and 4 alert rules.
…te boundary - localsink.py: stdlib-only HTTP collector to test the full dws->HTTP pipeline without SLS or Function Compute - telemetry.md: local-test walkthrough (incl. a mini local dashboard) and a section spelling out that the SLS project / endpoint / token live in the deployer's own infra and never enter this open-source repo
app.py now auto-detects mode: with no SLS_* env (or TELEMETRY_DRYRUN=true) it logs each event to stdout and returns 204 instead of writing to SLS, and the aliyun-log SDK is imported lazily so dry-run needs no extra dependency. Lets you deploy to Function Compute and confirm the client->FC pipeline end-to-end before provisioning any SLS resource. GET / reports the active mode. README documents the deploy-then-wire-SLS flow.
scripts/dev/telemetry_smoke.sh builds dws, starts the zero-dep local sink, fires --mock commands and asserts the pipeline: events received with all expected dimensions, bearer token enforced (401), and the privacy boundary (a sentinel command argument must never appear in any payload). Exits non-zero on failure, so it can gate pre-push / CI.
Open-source repo convention: configmeta descriptions, docs/telemetry.md and the FC ingest README are now English (code comments were already English). No behavior change; tests still pass.
Convert all Chinese content in the telemetry surface to English so the public repo leaves no localized traces: - docs/telemetry.md (full doc) - docs/telemetry/fc-sls-ingest/README.md (FC->SLS receiver guide) - internal/telemetry/telemetry.go (config item descriptions) - internal/app/telemetry_runtime_test.go (test fixture string) No behavior change; English-only wording.
…pt-out + disclosure
Lets a downstream "fleet" distribution ship telemetry on-by-default to its own
ingest, while the open-source build stays opt-in and off — and hardcodes no
endpoint.
- internal/telemetry/telemetry.go:
- build-time vars defaultURL/defaultToken (empty in OSS; injected via -ldflags
by a downstream build)
- Enabled() posture: DWS_TELEMETRY_DISABLED hard opt-out wins; explicit
DWS_TELEMETRY_ENABLED overrides; otherwise on only when a default endpoint is
baked in. Env URL/token override the build defaults.
- ShowNoticeOnce(): one-time stderr disclosure (marker ~/.dws/.telemetry_notice_shown)
- new DWS_TELEMETRY_DISABLED env + configmeta registration
- internal/app/telemetry_runtime.go: print the disclosure once when telemetry first activates
- internal/telemetry/telemetry_test.go: cover baked-in default-on + opt-out (OSS opt-in cases unchanged)
- docs/telemetry.md: document default posture, ldflags injection, opt-out, disclosure
Verified e2e: a build with a baked-in endpoint and no env defaults on, prints the
notice once, and reports; DWS_TELEMETRY_DISABLED=true suppresses it.
…erless Devs) Make the FC->SLS reference receiver deployable without hand-steps: - Dockerfile: container image (AONE / any container platform), gunicorn on :9000, app.py auto-detects dry-run vs SLS mode from env. Built + run + received a live event locally. - s.yaml + deploy.sh: Serverless Devs spec for public Aliyun FC (s build && s deploy). - .dockerignore: keep the image to app.py + requirements.txt. No behavior change to the receiver; packaging only.
…ver-less monitoring Add a zero-infra sink: when DWS_TELEMETRY_FILE is set, each event is appended as one JSON line to that local file instead of being POSTed — no receiver, no FC, no SLS. Ideal for local/per-machine stability monitoring; aggregate the file with a small script (see docs/telemetry.md). File sink takes precedence over URL and, when set, enables telemetry (with the same DWS_TELEMETRY_DISABLED opt-out). - telemetry.go: EnvFile + resolvedFile (with ~ expansion); Enabled() counts a file sink as a destination; Forwarder.File appends JSONL in Emit. - test: file sink enables + appends valid JSON lines + opt-out still wins. - docs: "Local monitoring (lightest)" section + one-line aggregation.
…PPEND) Make the zero-dep local sink usable as a tiny central collector on your own machine — e.g. "monitor on my computer" for a small team, no SLS/FC needed: - HOST env (default 127.0.0.1; set 0.0.0.0 to accept POSTs from LAN machines) - APPEND env (default truncate for tests; APPEND=1 keeps history across restarts) - startup banner shows the real bind host + append mode Token auth is strongly advised when binding 0.0.0.0. Verified: a dws pointed at the machine's LAN IP lands events in the collector file.
…deploy artifacts + gofmt), take their smoke script
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an opt-in, anonymous operational telemetry stream: one dimensions-only
metric per dws command invocation (error rate, latency, command distribution,
version/platform health). This is the ops-monitoring counterpart to the audit
trail, kept deliberately small and independent of it.
internal/telemetry: a 12-fieldEvent(no content, no identity)and an env-driven
Forwarder.executeInvocation's existing defer, reusing theoutcome/err_class/duration already computed for command-end logging.
DWS_TELEMETRY_ENABLED=trueandDWS_TELEMETRY_URL. OptionalDWS_TELEMETRY_TOKEN,DWS_TELEMETRY_TIMEOUT_MS.docs/telemetry/fc-sls-ingest/(FC Web Function → SLS)with a dry-run mode and a zero-dependency local sink for testing without SLS.
Privacy boundary
Collects only coarse dimensions:
command/subcommand/outcome/err_class/exit_code/duration_ms/cli_version/channel/os/corp_id/trace_id.Never object names, free text, peer ids, device fingerprints, or request bodies.
A test asserts command-argument content never leaks into the payload.
Testing
Note for reviewers / merge order
Touches
internal/app/runner.go's command-end defer — the same 2-line region asthe audit PR (#398). The two additions (
emitAuditandemitTelemetry) areindependent and compose; whichever lands second resolves a trivial 2-line merge.