Skip to content

Conversation

@petercrocker
Copy link
Contributor

@petercrocker petercrocker commented Jan 19, 2026

Add comprehensive documentation for the customer metrics tracking system:

  • docs/specs/customer-metrics-tracking-system.md: Feature specification
    with use cases, proposed solution, and implementation status
  • docs/plans/customer-metrics-poc-plan.md: Detailed POC implementation plan with architecture, schemas, and step-by-step guide

Also includes:

  • License generator utility for creating signed test licenses
  • Fix date extraction in telemetry API (handle YYYY-MM-DD format)
  • Add LicenseTelemetryData model rebuild for UUID forward reference

Summary by CodeRabbit

  • New Features

    • Added telemetry management endpoints to view status, list files, and export telemetry data.
    • Added license file support with validation capabilities.
    • Introduced local telemetry storage with configurable retention policies.
  • Documentation

    • Added configuration documentation for new telemetry and license settings.
    • Added customer metrics tracking system specifications and implementation roadmap.

✏️ Tip: You can customize this high-level summary in your review settings.

- Add license module with models, validator, and loader for customer identity
- Add local telemetry storage for airgapped environments
- Add REST API endpoints for telemetry export, list, and status
- Update telemetry tasks to save locally and include license data
- Add configurable telemetry dev interval mode for testing
- Add telemetry configuration options to MainSettings
Add comprehensive documentation for the customer metrics tracking system:
- docs/specs/customer-metrics-tracking-system.md: Feature specification
  with use cases, proposed solution, and implementation status
- docs/plans/customer-metrics-poc-plan.md: Detailed POC implementation
  plan with architecture, schemas, and step-by-step guide

Also includes:
- License generator utility for creating signed test licenses
- Fix date extraction in telemetry API (handle YYYY-MM-DD format)
- Add LicenseTelemetryData model rebuild for UUID forward reference
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

Walkthrough

This pull request introduces a licensing and telemetry export system for Infrahub. It adds license management modules for validation, generation, and loading, along with local telemetry storage for airgapped environments. Three new REST API endpoints (/api/telemetry/status, /api/telemetry/list, /api/telemetry/export) enable retrieval of telemetry status, listing stored files, and exporting data with optional date filtering. Configuration fields are added for telemetry storage path, retention days, and license file paths. Workflow scheduling is updated to support dynamic telemetry intervals. Documentation and specification files describe the customer metrics tracking system architecture and implementation plan. Generated TypeScript types and OpenAPI schemas reflect the new endpoints and models.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Feature/customer-metrics-telemetry' is a branch name pattern rather than a descriptive summary. It broadly indicates the feature area but lacks specifics about the primary change. Revise the title to be a clear, single sentence that describes the main change, such as 'Add customer metrics telemetry system with license tracking and local storage' or 'Implement telemetry API endpoints and license management for customer metrics.'
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 92.31% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added type/documentation Improvements or additions to documentation group/backend Issue related to the backend (API Server, Git Agent) labels Jan 19, 2026
The documentation generator was creating env variable names like
INFRAHUB_INFRAHUB_* because it prepends the env_prefix to field names,
but the validation_alias already included the full INFRAHUB_ prefix.

Remove validation_alias and let the default pydantic naming work:
- INFRAHUB_TELEMETRY_DEV_INTERVAL_MINUTES (was _DEV_INTERVAL)
- INFRAHUB_LICENSE_FILE_PATH (was _FILE)

Update documentation to reflect the correct env variable names.
Update Vale configuration and CI workflow to only lint published
documentation in docs/docs/, excluding internal docs in docs/specs/
and docs/plans/. This prevents style errors on internal planning
documents that use different conventions.
@github-actions github-actions bot added the group/ci Issue related to the CI pipeline label Jan 19, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 19, 2026

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing feature/customer-metrics-telemetry (d843377) with develop (0641385)

Open in CodSpeed

The telemetry cron schedule function was checking `config.SETTINGS`
directly, which is always truthy since it's a ConfiguredSettings object.
Accessing `.main` on an uninitialized config raises InitializationError.

Changed to use `config.SETTINGS.initialized` which correctly checks if
settings have been loaded before accessing the main config.

Also regenerated OpenAPI schema to include new telemetry endpoints.
Set base cron to None in ANONYMOUS_TELEMETRY_SEND definition since the
actual schedule is applied dynamically in build_workflows_definitions().
This avoids random minute generation at module import time which caused
test_workflows_sorted to fail due to different random values.

Updated test to compare workflow names only, since comparing full objects
fails when workflows have dynamically generated cron schedules.
@github-actions github-actions bot added the group/frontend Issue related to the frontend (React) label Jan 19, 2026
@petercrocker
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🤖 Fix all issues with AI agents
In `@backend/infrahub/api/telemetry.py`:
- Around line 194-204: When converting from_date/to_date via
_date_str_to_datetime (in the block that sets from_datetime/to_datetime when not
export_all), catch ValueError raised by datetime.strptime and raise a client
error instead of letting it bubble to a 500; specifically, wrap the calls to
_date_str_to_datetime for from_date and to_date in try/except and on ValueError
raise an HTTP 422 (e.g., HTTPException(status_code=422, detail="Invalid date:
<input>")) so malformed dates return a 4xx validation response rather than a
server error.
- Around line 128-146: The telemetry endpoints (get_telemetry_status, and the
handlers for /list and /export) currently rely on Depends(get_current_user)
which permits anonymous access when allow_anonymous_access is true; update these
routes to require explicit authenticated sessions by replacing or augmenting the
dependency so it always enforces authentication (e.g., use or create a
get_authenticated_user dependency that raises if the request is anonymous and
require AccountSession), and apply that dependency to get_telemetry_status and
the corresponding list/export endpoint functions so license and deployment
fields (_get_license_info(), license, deployment_id, customer name, etc.) are
only returned for authenticated users. Ensure the dependency change is reflected
in the function signatures (AccountSession = Depends(get_authenticated_user))
and remove any code paths that leak sensitive fields to anonymous callers.

In `@backend/infrahub/license/generator.py`:
- Around line 88-147: generate_license currently allows end_date earlier than
start_date which can produce invalid licenses; before creating signing_data and
signing the license (in function generate_license), add a validation check
comparing start_date and end_date (and normalize types if needed) and raise a
clear exception (e.g., ValueError with a descriptive message) when end_date <
start_date so the function fails fast instead of producing an invalid signed
license.

In `@backend/infrahub/license/loader.py`:
- Around line 9-101: The current load_license_from_file and get_current_license
perform blocking file I/O which can block the event loop when first called; fix
by offloading the blocking work to a thread or by converting to async. Option A
(recommended minimal change): keep load_license_from_file synchronous but change
get_current_license to async and call the loader via asyncio.to_thread (or
loop.run_in_executor) when populating _current_license so callers don't block.
Option B (full async): make load_license_from_file async (use asyncio.to_thread
for path.read_text/json/validation) and propagate async by making
get_current_license async and updating _get_license_info and any telemetry
endpoint callers to await the new async functions. Ensure you reference
load_license_from_file, get_current_license, _get_license_info and telemetry
endpoint call sites when updating callers.

In `@backend/infrahub/license/models.py`:
- Around line 95-116: LicenseTelemetryData currently exposes PII via the
customer_name field; update the model and factory to anonymize or remove it:
replace customer_name with a deterministic pseudonymous identifier (e.g.,
customer_hash: str) and in LicenseTelemetryData.from_license compute this value
from LicenseFile.customer_name using a stable cryptographic hash (SHA-256) so
telemetry keeps a non-reversible identifier; add the required import (hashlib)
and ensure from_license uses str(license_file.license_id),
str(license_file.deployment_id), product_tier, support_tier and sets
customer_hash instead of customer_name.

In `@backend/infrahub/license/validator.py`:
- Around line 82-86: The docstring in backend/infrahub/license/validator.py
claims a "Deployment ID match" validation but there is no implementation; update
either the code or the doc: implement the check inside the license validation
flow (e.g., in validate_license or LicenseValidator.validate) by reading the
configured deployment ID, comparing it to license.deployment_id and raising the
existing InvalidLicenseError (or returning False) when they differ, or remove
the "Deployment ID match (if configured)" line from the top-level docstring to
keep docs accurate. Ensure you reference the same error/return conventions used
by the surrounding validations and reuse existing configuration accessors and
symbols (validate_license, LicenseValidator.validate, InvalidLicenseError,
license.deployment_id) when adding the check.
- Around line 95-109: When signature validation is enabled
(config.SETTINGS.main.license_skip_signature_validation is False) but no signing
key is configured (config.SETTINGS.main.license_signing_key is None), fail
closed instead of warning: in the branch where signing_key is falsy, replace the
current logger.warning/warnings.append behavior with returning
LicenseStatus.invalid with a clear message like "Missing license signing key"
(and log an error if desired); keep the existing _verify_signature_poc flow
unchanged when a signing_key exists so signature checks still call
_verify_signature_poc(license_file, signing_key) and return
LicenseStatus.invalid("License signature verification failed") on verification
failure.

In `@backend/infrahub/telemetry/storage.py`:
- Around line 87-100: The current regex in _extract_date_from_filename is too
strict for deployment IDs that contain non-hex or extra dashes; replace the
regex approach with a split-based extraction: strip the ".json" suffix, split
the basename on "-", take the last three segments, join them with "-" to form a
candidate date, validate it matches YYYY-MM-DD (e.g., with a simple
\d{4}-\d{2}-\d{2} check) and return it, otherwise return None; update
_extract_date_from_filename to use this split-and-validate logic so filenames
like telemetry-unknown-YYYY-MM-DD.json and
telemetry-my-deploy-id-with-dashes-YYYY-MM-DD.json both yield the date.
- Around line 34-85: The async functions save_telemetry_locally and
cleanup_old_telemetry (and the related list_local_telemetry and
load_telemetry_file) currently call blocking filesystem APIs directly; wrap all
blocking calls (Path.mkdir, Path.exists, Path.write_text, Path.read_text,
Path.glob, Path.stat, Path.unlink and similar) in anyio.to_thread.run_sync (or
move the sequence into small sync helper functions and call them with
anyio.to_thread.run_sync) so the event loop is not blocked; update each function
to offload creation/reading/writing/listing/deleting files via
anyio.to_thread.run_sync and return/await the results accordingly (reference the
functions save_telemetry_locally, cleanup_old_telemetry, list_local_telemetry,
load_telemetry_file and wrap their filesystem operations).

In `@backend/infrahub/telemetry/tasks.py`:
- Around line 146-147: Wrap the call to post_telemetry_data (the streaming call
using url=config.SETTINGS.main.telemetry_endpoint and payload=payload) in a
try/except so network failures in airgapped environments don't crash the flow;
catch Exception, log a warning or debug via the module logger with the exception
details and the endpoint, and allow execution to continue (graceful degradation)
so local storage steps already succeeding are not interrupted.
- Around line 59-69: The async task calls the synchronous get_current_license()
which can be converted to an async API to avoid any potential event-loop
blocking; refactor get_current_license into an async function that uses
asyncio.to_thread to perform the initial file I/O (delegating the actual
blocking read to the existing loader, e.g., a private _load_current_license or
the current synchronous implementation), preserve and consult the module-level
cache _current_license inside that async function, then update
gather_license_information to await get_current_license() and adjust any other
callers to await the new async signature; also add required asyncio import and
keep LicenseTelemetryData.from_license usage unchanged.

In `@backend/infrahub/workflows/catalogue.py`:
- Around line 11-26: The _get_telemetry_cron_schedule function allows invalid
minute-step values and re-generates a random daily-minute each call; validate
telemetry_dev_interval_minutes (from
config.SETTINGS.main.telemetry_dev_interval_minutes) to ensure it's an integer
within 1–59 and fall back to the default when out of range, and introduce a
cached module-level value (e.g., TELEMETRY_RANDOM_MINUTE) computed once (used by
_get_telemetry_cron_schedule and stable across repeated
build_workflows_definitions calls) so the daily "random" offset is deterministic
for the process.

In `@docs/plans/customer-metrics-poc-plan.md`:
- Around line 1088-1095: Replace the phrase "Easy container deployment" under
the "7.1 Recommended Tool: Grafana" section with a neutral alternative such as
"Containerized deployment" (or "Support for containerized deployment") to comply
with the wording guideline; update the bullet text in that section so it no
longer uses "easy/just/simple" language while preserving the meaning and other
bullets unchanged.
- Around line 1-7: Change the top-level and other headings from title case to
sentence case in the document (e.g., update "Customer Metrics Tracking System -
Proof of Concept Plan" to "Customer metrics tracking system - proof of concept
plan" and also convert "Overview" and "Architecture Overview" to "Overview" and
"Architecture overview" respectively); locate and edit the Markdown heading
lines at the top of the file and any other headings to follow sentence case
consistently.
- Around line 1369-1378: Fix the grammar in the summary sentence after the table
where it currently reads "The plan enables both streaming telemetry and
airgapped export/import workflows while tracking the two north star metrics
(Actions and Objects) for license compliance." — replace the malformed "an
enables" with "and enables" or reword to "This plan enables..." so the sentence
reads correctly and is polished; update the sentence near that table header in
the customer-metrics-poc-plan content.
- Around line 9-112: Several fenced code blocks in the PoC plan lack language
tags (notably the "PoC Architecture" ASCII diagram, the repository tree, and the
"Steps to Create Repository" shell snippet); update the markdown in
customer-metrics-poc-plan.md to add appropriate language identifiers for every
``` block—use ```text for the ASCII diagram, ```text or ```bash for the
repository tree, ```bash for the shell commands under "Steps to Create
Repository", ```yaml for YAML examples (e.g., schema/object manifests and
grafana provisioning snippets), and ```sql for migration examples—ensure every
fenced block (including examples like docker-compose.yml, .env.example,
migrations/001_initial_schema.sql, and grafana JSON/YAML snippets) has the
correct tag so syntax highlighting and linters pick them up.
🧹 Nitpick comments (10)
.vale.ini (1)

25-31: Confirm that excluding docs/specs/** and docs/plans/** from Vale is intentional.
If these files are meant to follow documentation quality rules (e.g., links, style), consider applying a lightweight style instead of disabling linting entirely. Based on learnings, docs should not ship with broken links.

Based on learnings, …

backend/infrahub/license/models.py (1)

34-92: Align docstrings with the required Google-style sections.
Current docstrings omit required Args/Returns/Examples sections. As per coding guidelines, please update the module/class/method docstrings to the specified format.

✍️ Example update (one method)
 def is_expired(self) -> bool:
-    """Check if the license has expired."""
+    """Check if the license has expired.
+
+    Returns:
+        bool: True when the license end date is before today.
+    """
     return datetime.now(tz=UTC).date() > self.end_date

As per coding guidelines, …

backend/infrahub/config.py (1)

200-203: Consider adding a minimum value constraint for retention days.

The telemetry_storage_retention_days field accepts any integer, including 0 or negative values, which could cause unexpected behavior in the cleanup logic. Consider adding ge=1 to ensure at least 1 day retention.

♻️ Suggested constraint
     telemetry_storage_retention_days: int = Field(
         default=90,
+        ge=1,
         description="Number of days to retain local telemetry files",
     )
docs/specs/customer-metrics-tracking-system.md (1)

22-22: Consider using hyphenated "air-gapped" for consistency.

The term "airgapped" appears multiple times in the document. While both spellings are used in industry, "air-gapped" (hyphenated) is more commonly used in security and compliance documentation. Consider updating for consistency.

backend/infrahub/license/validator.py (1)

22-92: Add Raises/Examples sections to docstrings.
These docstrings include Args/Returns but omit Raises/Examples sections required by the Python docstring standard used here. As per coding guidelines, add those sections (even if brief) for _get_license_signing_data, _verify_signature_poc, and validate_license.

backend/infrahub/telemetry/storage.py (2)

24-181: Add Raises/Examples sections to docstrings.
Module docstrings follow Args/Returns but omit Raises/Examples sections across multiple functions. As per coding guidelines, add those sections for consistency.


144-163: Latest file detection isn’t guaranteed to be the newest date.
files[-1] depends on lexicographic ordering of filenames, which can mis-rank if multiple deployment IDs exist. Consider selecting max by parsed date or mtime to keep latest_file accurate.

backend/infrahub/api/telemetry.py (2)

84-193: Add Raises/Examples sections to docstrings.
Helper and route docstrings omit Raises/Examples sections. As per coding guidelines, add these sections to comply with the standard.


158-169: Avoid sync filesystem calls inside async routes.
filepath.stat() and load_telemetry_file() rely on sync I/O. Consider moving stat/size formatting into the async storage layer or offloading to a thread to avoid event-loop blocking.

Also applies to: 208-216

backend/infrahub/license/generator.py (1)

32-152: Add Raises/Examples sections to docstrings.
These functions omit Raises/Examples sections. As per coding guidelines, include them for create_signing_data, sign_license, generate_license, and main.

Comment on lines +128 to +146
@router.get("/status")
async def get_telemetry_status(
_: AccountSession = Depends(get_current_user),
) -> TelemetryStatusResponse:
"""Get telemetry configuration and status.
Returns the current telemetry configuration including whether telemetry
is enabled, the storage path, and retention settings.
"""
status = await get_local_telemetry_status()

return TelemetryStatusResponse(
enabled=not config.SETTINGS.main.telemetry_optout,
storage_path=status["storage_path"],
retention_days=status["retention_days"],
files_count=status["files_count"],
latest_file=status.get("latest_file"),
license=_get_license_info(),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the get_current_user dependency function
rg "def get_current_user" --type py -A 10

Repository: opsmill/infrahub

Length of output: 807


🏁 Script executed:

# Check where allow_anonymous_access is used
rg "allow_anonymous_access" --type py -B 2 -A 5

Repository: opsmill/infrahub

Length of output: 32022


🏁 Script executed:

# Look at the telemetry.py file context
cat -n backend/infrahub/api/telemetry.py | head -60

Repository: opsmill/infrahub

Length of output: 1975


🏁 Script executed:

# Find TelemetryStatusResponse definition
rg "class TelemetryStatusResponse" --type py -A 15

Repository: opsmill/infrahub

Length of output: 989


🏁 Script executed:

# Get the complete get_current_user function
cat -n backend/infrahub/api/dependencies.py | sed -n '1,150p'

Repository: opsmill/infrahub

Length of output: 5730


🏁 Script executed:

# Search for telemetry endpoint tests
fd "test.*telemetry" --type f backend/tests/

Repository: opsmill/infrahub

Length of output: 42


🏁 Script executed:

# Look for permission-related checks or decorators in telemetry.py
cat -n backend/infrahub/api/telemetry.py | sed -n '100,200p'

Repository: opsmill/infrahub

Length of output: 4276


🏁 Script executed:

# Search for telemetry in test files
rg "telemetry" backend/tests/ --type py -l

Repository: opsmill/infrahub

Length of output: 167


🏁 Script executed:

# Check if there are any permission checks or decorators on telemetry routes
rg "telemetry|/status|/list|/export" backend/infrahub/api/ --type py -B 2

Repository: opsmill/infrahub

Length of output: 6403


🏁 Script executed:

# Look for how license data is used elsewhere to understand sensitivity
rg "customer_name|license_id|deployment_id" backend/infrahub/ --type py | head -20

Repository: opsmill/infrahub

Length of output: 1847


🏁 Script executed:

# Look for telemetry endpoint tests or documentation
cat -n backend/tests/component/telemetry/test_task_manager.py | head -50

Repository: opsmill/infrahub

Length of output: 309


🏁 Script executed:

# Check if there's any documentation about telemetry endpoint security
fd "telemetry" backend/ --type f | grep -E "\.md|\.txt|README"

Repository: opsmill/infrahub

Length of output: 42


🏁 Script executed:

# Search for any design decisions or comments about telemetry security
rg "telemetry.*anonymous|telemetry.*auth|telemetry.*permission" backend/ --type py -B 2 -A 2

Repository: opsmill/infrahub

Length of output: 875


Telemetry endpoints expose sensitive license and deployment data without authentication.

The /status, /list, and /export endpoints depend on get_current_user, which allows GET requests without authentication when allow_anonymous_access is true (the default). This exposes sensitive information including license ID, customer name, deployment ID, product tier, and support tier to unauthenticated users. These endpoints should require explicit authentication regardless of the allow_anonymous_access setting.

🤖 Prompt for AI Agents
In `@backend/infrahub/api/telemetry.py` around lines 128 - 146, The telemetry
endpoints (get_telemetry_status, and the handlers for /list and /export)
currently rely on Depends(get_current_user) which permits anonymous access when
allow_anonymous_access is true; update these routes to require explicit
authenticated sessions by replacing or augmenting the dependency so it always
enforces authentication (e.g., use or create a get_authenticated_user dependency
that raises if the request is anonymous and require AccountSession), and apply
that dependency to get_telemetry_status and the corresponding list/export
endpoint functions so license and deployment fields (_get_license_info(),
license, deployment_id, customer name, etc.) are only returned for authenticated
users. Ensure the dependency change is reflected in the function signatures
(AccountSession = Depends(get_authenticated_user)) and remove any code paths
that leak sensitive fields to anonymous callers.

Comment on lines +194 to +204
# Convert date strings to datetime objects
from_datetime: datetime | None = None
to_datetime: datetime | None = None

# If export_all is set, ignore date filters
if not export_all:
if from_date:
from_datetime = _date_str_to_datetime(from_date)
if to_date:
to_datetime = _date_str_to_datetime(to_date)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n backend/infrahub/api/telemetry.py | head -250 | tail -100

Repository: opsmill/infrahub

Length of output: 4131


🏁 Script executed:

# Find the _date_str_to_datetime function definition
rg "_date_str_to_datetime" backend/infrahub/api/telemetry.py -A 10 -B 2

Repository: opsmill/infrahub

Length of output: 955


🏁 Script executed:

# Check imports at the top of the file
head -30 backend/infrahub/api/telemetry.py

Repository: opsmill/infrahub

Length of output: 852


🏁 Script executed:

rg "_date_str_to_datetime" backend/infrahub/api/telemetry.py

Repository: opsmill/infrahub

Length of output: 231


🏁 Script executed:

# Also verify the file location and that this is the only endpoint using this function
rg "_date_str_to_datetime" backend/infrahub/ -l

Repository: opsmill/infrahub

Length of output: 93


Invalid date inputs raise 500s instead of 4xx.

The _date_str_to_datetime() function uses datetime.strptime(), which raises ValueError for malformed dates (e.g., 2025-13-40). Without error handling, this exception bubbles up as a 500 response. Convert this to a 422 response to signal a client-side validation error.

🛠️ Suggested fix
-from fastapi import APIRouter, Depends, Query
+from fastapi import APIRouter, Depends, HTTPException, Query, status
@@
-    if not export_all:
-        if from_date:
-            from_datetime = _date_str_to_datetime(from_date)
-        if to_date:
-            to_datetime = _date_str_to_datetime(to_date)
+    if not export_all:
+        try:
+            if from_date:
+                from_datetime = _date_str_to_datetime(from_date)
+            if to_date:
+                to_datetime = _date_str_to_datetime(to_date)
+        except ValueError as exc:
+            raise HTTPException(
+                status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+                detail=str(exc),
+            ) from exc
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Convert date strings to datetime objects
from_datetime: datetime | None = None
to_datetime: datetime | None = None
# If export_all is set, ignore date filters
if not export_all:
if from_date:
from_datetime = _date_str_to_datetime(from_date)
if to_date:
to_datetime = _date_str_to_datetime(to_date)
# Convert date strings to datetime objects
from_datetime: datetime | None = None
to_datetime: datetime | None = None
# If export_all is set, ignore date filters
if not export_all:
try:
if from_date:
from_datetime = _date_str_to_datetime(from_date)
if to_date:
to_datetime = _date_str_to_datetime(to_date)
except ValueError as exc:
raise HTTPException(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
detail=str(exc),
) from exc
🤖 Prompt for AI Agents
In `@backend/infrahub/api/telemetry.py` around lines 194 - 204, When converting
from_date/to_date via _date_str_to_datetime (in the block that sets
from_datetime/to_datetime when not export_all), catch ValueError raised by
datetime.strptime and raise a client error instead of letting it bubble to a
500; specifically, wrap the calls to _date_str_to_datetime for from_date and
to_date in try/except and on ValueError raise an HTTP 422 (e.g.,
HTTPException(status_code=422, detail="Invalid date: <input>")) so malformed
dates return a 4xx validation response rather than a server error.

Comment on lines +88 to +147
def generate_license(
customer_name: str,
deployment_id: str | UUID,
product_tier: str,
support_tier: str,
start_date: date,
end_date: date,
signing_key: str,
license_id: str | UUID | None = None,
) -> dict:
"""Generate a complete signed license file.
Args:
customer_name: Customer organization name.
deployment_id: UUID of the Infrahub deployment.
product_tier: Product tier (small/medium/large).
support_tier: Support tier (basic/advanced/24x7).
start_date: License effective date.
end_date: License expiration date.
signing_key: Secret key for signing.
license_id: Optional license ID (generated if not provided).
Returns:
Complete license file as a dictionary.
"""
if license_id is None:
license_id = uuid4()
elif isinstance(license_id, str):
license_id = UUID(license_id)

if isinstance(deployment_id, str):
deployment_id = UUID(deployment_id)

issued_at = datetime.now(tz=UTC)

# Create signing data
signing_data = create_signing_data(
license_id=license_id,
customer_name=customer_name,
deployment_id=deployment_id,
product_tier=product_tier,
support_tier=support_tier,
start_date=start_date,
end_date=end_date,
issued_at=issued_at,
)

# Sign the license
signature = sign_license(signing_data, signing_key)

return {
"license_id": str(license_id),
"customer_name": customer_name,
"deployment_id": str(deployment_id),
"product_tier": product_tier,
"support_tier": support_tier,
"start_date": start_date.isoformat(),
"end_date": end_date.isoformat(),
"issued_at": issued_at.isoformat(),
"signature": signature,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate date ordering before signing.
generate_license allows end_date earlier than start_date, producing invalid licenses. Add a guard to prevent that.

🛠️ Suggested fix
 def generate_license(
@@
 ) -> dict:
@@
-    if license_id is None:
+    if end_date < start_date:
+        raise ValueError("end_date must be on or after start_date")
+
+    if license_id is None:
         license_id = uuid4()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def generate_license(
customer_name: str,
deployment_id: str | UUID,
product_tier: str,
support_tier: str,
start_date: date,
end_date: date,
signing_key: str,
license_id: str | UUID | None = None,
) -> dict:
"""Generate a complete signed license file.
Args:
customer_name: Customer organization name.
deployment_id: UUID of the Infrahub deployment.
product_tier: Product tier (small/medium/large).
support_tier: Support tier (basic/advanced/24x7).
start_date: License effective date.
end_date: License expiration date.
signing_key: Secret key for signing.
license_id: Optional license ID (generated if not provided).
Returns:
Complete license file as a dictionary.
"""
if license_id is None:
license_id = uuid4()
elif isinstance(license_id, str):
license_id = UUID(license_id)
if isinstance(deployment_id, str):
deployment_id = UUID(deployment_id)
issued_at = datetime.now(tz=UTC)
# Create signing data
signing_data = create_signing_data(
license_id=license_id,
customer_name=customer_name,
deployment_id=deployment_id,
product_tier=product_tier,
support_tier=support_tier,
start_date=start_date,
end_date=end_date,
issued_at=issued_at,
)
# Sign the license
signature = sign_license(signing_data, signing_key)
return {
"license_id": str(license_id),
"customer_name": customer_name,
"deployment_id": str(deployment_id),
"product_tier": product_tier,
"support_tier": support_tier,
"start_date": start_date.isoformat(),
"end_date": end_date.isoformat(),
"issued_at": issued_at.isoformat(),
"signature": signature,
def generate_license(
customer_name: str,
deployment_id: str | UUID,
product_tier: str,
support_tier: str,
start_date: date,
end_date: date,
signing_key: str,
license_id: str | UUID | None = None,
) -> dict:
"""Generate a complete signed license file.
Args:
customer_name: Customer organization name.
deployment_id: UUID of the Infrahub deployment.
product_tier: Product tier (small/medium/large).
support_tier: Support tier (basic/advanced/24x7).
start_date: License effective date.
end_date: License expiration date.
signing_key: Secret key for signing.
license_id: Optional license ID (generated if not provided).
Returns:
Complete license file as a dictionary.
"""
if end_date < start_date:
raise ValueError("end_date must be on or after start_date")
if license_id is None:
license_id = uuid4()
elif isinstance(license_id, str):
license_id = UUID(license_id)
if isinstance(deployment_id, str):
deployment_id = UUID(deployment_id)
issued_at = datetime.now(tz=UTC)
# Create signing data
signing_data = create_signing_data(
license_id=license_id,
customer_name=customer_name,
deployment_id=deployment_id,
product_tier=product_tier,
support_tier=support_tier,
start_date=start_date,
end_date=end_date,
issued_at=issued_at,
)
# Sign the license
signature = sign_license(signing_data, signing_key)
return {
"license_id": str(license_id),
"customer_name": customer_name,
"deployment_id": str(deployment_id),
"product_tier": product_tier,
"support_tier": support_tier,
"start_date": start_date.isoformat(),
"end_date": end_date.isoformat(),
"issued_at": issued_at.isoformat(),
"signature": signature,
🤖 Prompt for AI Agents
In `@backend/infrahub/license/generator.py` around lines 88 - 147,
generate_license currently allows end_date earlier than start_date which can
produce invalid licenses; before creating signing_data and signing the license
(in function generate_license), add a validation check comparing start_date and
end_date (and normalize types if needed) and raise a clear exception (e.g.,
ValueError with a descriptive message) when end_date < start_date so the
function fails fast instead of producing an invalid signed license.

Comment on lines +9 to +101
import json
import logging
from pathlib import Path

from pydantic import ValidationError

from infrahub import config

from .models import LicenseFile, LicenseStatus
from .validator import validate_license

logger = logging.getLogger(__name__)

# Module-level cache for the current license
_current_license: LicenseStatus | None = None


def load_license_from_file(file_path: Path | str) -> LicenseStatus:
"""Load and validate a license file from disk.
Args:
file_path: Path to the license JSON file.
Returns:
LicenseStatus with validation results and parsed license data.
"""
path = Path(file_path) if isinstance(file_path, str) else file_path

if not path.exists():
return LicenseStatus.invalid(f"License file not found: {path}")

if not path.is_file():
return LicenseStatus.invalid(f"License path is not a file: {path}")

try:
content = path.read_text(encoding="utf-8")
except OSError as e:
return LicenseStatus.invalid(f"Failed to read license file: {e}")

try:
data = json.loads(content)
except json.JSONDecodeError as e:
return LicenseStatus.invalid(f"Invalid JSON in license file: {e}")

try:
license_file = LicenseFile.model_validate(data)
except ValidationError as e:
return LicenseStatus.invalid(f"Invalid license file format: {e}")

return validate_license(license_file)


def get_current_license() -> LicenseStatus:
"""Get the current license status.
Returns the cached license if available, otherwise attempts to load
the license from the configured path.
Returns:
LicenseStatus with the current license state.
"""
global _current_license

if _current_license is not None:
return _current_license

# Check if a license file path is configured
license_path = getattr(config.SETTINGS.main, "license_file_path", None)

if not license_path:
return LicenseStatus.invalid("No license file configured")

_current_license = load_license_from_file(license_path)

if _current_license.valid and _current_license.license_data is not None:
logger.info(f"License loaded for customer: {_current_license.license_data.customer_name}")
else:
logger.warning(f"Failed to load license: {_current_license.error}")

return _current_license


def reload_license() -> LicenseStatus:
"""Force reload the license from disk.
Clears the cached license and reloads from the configured path.
Returns:
LicenseStatus with the reloaded license state.
"""
global _current_license
_current_license = None
return get_current_license()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for usages of the functions in the license loader
rg -n "load_license_from_file|get_current_license|reload_license" --type py

Repository: opsmill/infrahub

Length of output: 1084


🏁 Script executed:

# Check if there are any imports of these functions
rg -n "from.*license.*loader import|from .loader import" --type py

Repository: opsmill/infrahub

Length of output: 350


🏁 Script executed:

# Look at the overall structure of the license module
find . -path "*license*" -type f -name "*.py" | head -20

Repository: opsmill/infrahub

Length of output: 292


🏁 Script executed:

# Check the context of get_current_license usage in telemetry/tasks.py
sed -n '60,75p' backend/infrahub/telemetry/tasks.py

Repository: opsmill/infrahub

Length of output: 664


🏁 Script executed:

# Check the context of get_current_license usage in api/telemetry.py
sed -n '90,105p' backend/infrahub/api/telemetry.py

Repository: opsmill/infrahub

Length of output: 613


🏁 Script executed:

# Check if these calling functions are async
rg -B 3 "license_status = get_current_license()" --type py

Repository: opsmill/infrahub

Length of output: 627


🏁 Script executed:

# Check where gather_license_information is used
rg -n "gather_license_information" --type py

Repository: opsmill/infrahub

Length of output: 260


🏁 Script executed:

# Check the broader context of _get_license_info usage
rg -B 5 "_get_license_info()" --type py -A 3

Repository: opsmill/infrahub

Length of output: 1791


🏁 Script executed:

# Check if the caching significantly mitigates the I/O blocking issue
sed -n '61,90p' backend/infrahub/license/loader.py

Repository: opsmill/infrahub

Length of output: 1005


🏁 Script executed:

# Check what function _get_license_info is called from and its context
rg -B 20 "def _get_license_info" backend/infrahub/api/telemetry.py | head -30

Repository: opsmill/infrahub

Length of output: 678


🏁 Script executed:

# Check if telemetry endpoints are async
rg -B 5 "_get_license_info()" backend/infrahub/api/telemetry.py | grep -E "^[^-]*-(async def|def)" | head -5

Repository: opsmill/infrahub

Length of output: 42


🏁 Script executed:

# Verify the actual API endpoint signatures
sed -n '1,120p' backend/infrahub/api/telemetry.py | grep -A 5 "@.*\|async def\|def " | head -40

Repository: opsmill/infrahub

Length of output: 855


🏁 Script executed:

# Find the actual API endpoint definitions that call _get_license_info
rg -B 10 "def _get_license_info\|_get_license_info()" backend/infrahub/api/telemetry.py | grep -E "@|async def|def get_telemetry|def export_telemetry"

Repository: opsmill/infrahub

Length of output: 42


🏁 Script executed:

# Get the full endpoint definitions
sed -n '110,180p' backend/infrahub/api/telemetry.py

Repository: opsmill/infrahub

Length of output: 2381


🏁 Script executed:

# Check the complete export_telemetry endpoint signature
sed -n '150,200p' backend/infrahub/api/telemetry.py

Repository: opsmill/infrahub

Length of output: 1904


🏁 Script executed:

# Verify the complete file to see all usages of _get_license_info
grep -n "_get_license_info()" backend/infrahub/api/telemetry.py

Repository: opsmill/infrahub

Length of output: 191


Sync I/O in async context; consider making async for guideline compliance.

The synchronous file I/O in load_license_from_file() can block the event loop when called from async code paths (telemetry endpoints call _get_license_info() which calls get_current_license()).

However, the module-level caching means blocking only occurs on the first call; subsequent calls return the cached value with no I/O. This significantly mitigates the practical impact.

If converting to async, note that _get_license_info() is a synchronous helper called directly from async endpoints, so converting get_current_license() would require making _get_license_info() async and updating its callers in the telemetry endpoints to await.

🤖 Prompt for AI Agents
In `@backend/infrahub/license/loader.py` around lines 9 - 101, The current
load_license_from_file and get_current_license perform blocking file I/O which
can block the event loop when first called; fix by offloading the blocking work
to a thread or by converting to async. Option A (recommended minimal change):
keep load_license_from_file synchronous but change get_current_license to async
and call the loader via asyncio.to_thread (or loop.run_in_executor) when
populating _current_license so callers don't block. Option B (full async): make
load_license_from_file async (use asyncio.to_thread for
path.read_text/json/validation) and propagate async by making
get_current_license async and updating _get_license_info and any telemetry
endpoint callers to await the new async functions. Ensure you reference
load_license_from_file, get_current_license, _get_license_info and telemetry
endpoint call sites when updating callers.

Comment on lines +95 to +116
class LicenseTelemetryData(BaseModel):
"""License information included in telemetry payloads.
A subset of license data suitable for inclusion in telemetry to identify
the customer without exposing sensitive information.
"""

license_id: str = Field(description="License ID as string")
customer_name: str = Field(description="Customer organization name")
deployment_id: str = Field(description="Deployment ID as string")
product_tier: ProductTier = Field(description="Product tier")
support_tier: SupportTier = Field(description="Support tier")

@classmethod
def from_license(cls, license_file: LicenseFile) -> LicenseTelemetryData:
"""Create telemetry data from a license file."""
return cls(
license_id=str(license_file.license_id),
customer_name=license_file.customer_name,
deployment_id=str(license_file.deployment_id),
product_tier=license_file.product_tier,
support_tier=license_file.support_tier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid sending raw customer names in telemetry payloads.
customer_name is PII and could create compliance/privacy risk in telemetry exports. If telemetry needs a stable identifier, consider hashing or removing this field and rely on license/deployment IDs instead.

🛡️ One possible anonymization
-from datetime import UTC, date, datetime
+from datetime import UTC, date, datetime
+import hashlib
@@
-class LicenseTelemetryData(BaseModel):
+class LicenseTelemetryData(BaseModel):
@@
-    customer_name: str = Field(description="Customer organization name")
+    customer_name_hash: str = Field(description="SHA-256 hash of customer name")
@@
     def from_license(cls, license_file: LicenseFile) -> LicenseTelemetryData:
         """Create telemetry data from a license file."""
         return cls(
             license_id=str(license_file.license_id),
-            customer_name=license_file.customer_name,
+            customer_name_hash=hashlib.sha256(license_file.customer_name.encode("utf-8")).hexdigest(),
             deployment_id=str(license_file.deployment_id),
             product_tier=license_file.product_tier,
             support_tier=license_file.support_tier,
         )
🤖 Prompt for AI Agents
In `@backend/infrahub/license/models.py` around lines 95 - 116,
LicenseTelemetryData currently exposes PII via the customer_name field; update
the model and factory to anonymize or remove it: replace customer_name with a
deterministic pseudonymous identifier (e.g., customer_hash: str) and in
LicenseTelemetryData.from_license compute this value from
LicenseFile.customer_name using a stable cryptographic hash (SHA-256) so
telemetry keeps a non-reversible identifier; add the required import (hashlib)
and ensure from_license uses str(license_file.license_id),
str(license_file.deployment_id), product_tier, support_tier and sets
customer_hash instead of customer_name.

Comment on lines +11 to +26

def _get_telemetry_cron_schedule() -> str:
"""Get the cron schedule for telemetry based on configuration.
In development mode (telemetry_dev_interval_minutes set), uses minute-based
intervals for faster testing. Otherwise defaults to daily at 2am with a
random minute offset to spread load across deployments.
Returns:
Cron schedule string (e.g., "*/5 * * * *" for 5 min or "23 2 * * *" for daily).
"""
if config.SETTINGS.initialized and config.SETTINGS.main.telemetry_dev_interval_minutes:
interval = config.SETTINGS.main.telemetry_dev_interval_minutes
return f"*/{interval} * * * *"
# Default: daily at 2am with random minute offset
return f"{random.randint(0, 59)} 2 * * *"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's view the actual file and see the implementation
head -50 backend/infrahub/workflows/catalogue.py

Repository: opsmill/infrahub

Length of output: 1758


🏁 Script executed:

# Search for telemetry_dev_interval_minutes in the codebase to understand constraints
rg "telemetry_dev_interval_minutes" --type py -B 2 -A 2

Repository: opsmill/infrahub

Length of output: 1604


🏁 Script executed:

# Check if there's any validation for interval values
rg "telemetry" backend/infrahub/workflows/ --type py -A 5 -B 5

Repository: opsmill/infrahub

Length of output: 4768


🏁 Script executed:

# Check for Field validators on telemetry_dev_interval_minutes
rg "telemetry_dev_interval_minutes" -B 5 -A 5 backend/infrahub/config.py

Repository: opsmill/infrahub

Length of output: 664


🏁 Script executed:

# Check if there are any validators on the telemetry config settings
rg -A 10 "class.*Settings" backend/infrahub/config.py | grep -A 10 "telemetry"

Repository: opsmill/infrahub

Length of output: 42


🏁 Script executed:

# Check how many times build_workflows_definitions is called in the codebase
rg "build_workflows_definitions" --type py

Repository: opsmill/infrahub

Length of output: 419


🏁 Script executed:

# Check where build_workflows_definitions is used and how it's called
rg "build_workflows_definitions" -B 3 -A 3 --type py

Repository: opsmill/infrahub

Length of output: 1688


🏁 Script executed:

# Check if there are any field validators defined in the config class
rg "field_validator|validator" backend/infrahub/config.py

Repository: opsmill/infrahub

Length of output: 525


🏁 Script executed:

# Look for any Pydantic validation imports or constraints
rg "from pydantic" backend/infrahub/config.py

Repository: opsmill/infrahub

Length of output: 145


🏁 Script executed:

# Let's check the actual cron syntax constraints - verify that 1-59 is the valid range
# and look for any existing cron validation in the codebase
rg "cron" backend/infrahub/ --type py -i | grep -i "constraint\|valid\|range\|min\|max" | head -20

Repository: opsmill/infrahub

Length of output: 187


🏁 Script executed:

# Check if there's any test coverage for _get_telemetry_cron_schedule
rg "_get_telemetry_cron_schedule" --type py

Repository: opsmill/infrahub

Length of output: 285


Add validation for interval range and consider caching random cron offset.

telemetry_dev_interval_minutes values outside 1–59 create invalid minute step expressions (e.g., */0 or */60). Additionally, if build_workflows_definitions() is called multiple times, the random minute offset will be regenerated, changing the schedule unpredictably. Consider adding a range constraint to the config field and caching the random default to ensure deterministic behavior.

✅ Suggested fix
+_TELEMETRY_CRON_CACHE: str | None = None
+
 def _get_telemetry_cron_schedule() -> str:
     """Get the cron schedule for telemetry based on configuration.
 
     In development mode (telemetry_dev_interval_minutes set), uses minute-based
     intervals for faster testing. Otherwise defaults to daily at 2am with a
     random minute offset to spread load across deployments.
 
     Returns:
         Cron schedule string (e.g., "*/5 * * * *" for 5 min or "23 2 * * *" for daily).
     """
     if config.SETTINGS.initialized and config.SETTINGS.main.telemetry_dev_interval_minutes:
         interval = config.SETTINGS.main.telemetry_dev_interval_minutes
+        interval = max(1, min(interval, 59))
         return f"*/{interval} * * * *"
     # Default: daily at 2am with stable random minute offset
+    global _TELEMETRY_CRON_CACHE
+    if _TELEMETRY_CRON_CACHE is None:
+        _TELEMETRY_CRON_CACHE = f"{random.randint(0, 59)} 2 * * *"
+    return _TELEMETRY_CRON_CACHE
-    return f"{random.randint(0, 59)} 2 * * *"
🤖 Prompt for AI Agents
In `@backend/infrahub/workflows/catalogue.py` around lines 11 - 26, The
_get_telemetry_cron_schedule function allows invalid minute-step values and
re-generates a random daily-minute each call; validate
telemetry_dev_interval_minutes (from
config.SETTINGS.main.telemetry_dev_interval_minutes) to ensure it's an integer
within 1–59 and fall back to the default when out of range, and introduce a
cached module-level value (e.g., TELEMETRY_RANDOM_MINUTE) computed once (used by
_get_telemetry_cron_schedule and stable across repeated
build_workflows_definitions calls) so the daily "random" offset is deterministic
for the process.

Comment on lines +1 to +7
# Customer Metrics Tracking System - Proof of Concept Plan

## Overview

This document outlines the step-by-step plan for creating a Proof of Concept (PoC) for the Customer Metrics Tracking System. The PoC will demonstrate the end-to-end flow from telemetry generation in Infrahub instances through to customer dashboards.

## Architecture Overview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use sentence case for headings.
The top-level heading uses title case; the guideline requires sentence case. As per coding guidelines, convert headings like “Customer Metrics Tracking System - Proof of Concept Plan” to sentence case.

🤖 Prompt for AI Agents
In `@docs/plans/customer-metrics-poc-plan.md` around lines 1 - 7, Change the
top-level and other headings from title case to sentence case in the document
(e.g., update "Customer Metrics Tracking System - Proof of Concept Plan" to
"Customer metrics tracking system - proof of concept plan" and also convert
"Overview" and "Architecture Overview" to "Overview" and "Architecture overview"
respectively); locate and edit the Markdown heading lines at the top of the file
and any other headings to follow sentence case consistently.

Comment on lines +9 to +112
```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ PoC Architecture │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Test Infrahub │ │ Cloudflare R2 │ │ Metrics Platform │ │
│ │ Instance │────▶│ (S3-compatible) │────▶│ (New Repo) │ │
│ │ │ │ │ │ │ │
│ │ - License file │ │ - Telemetry JSON │ │ ┌──────────────┐ │ │
│ │ - 5-min telemetry│ │ files │ │ │ TimescaleDB │ │ │
│ │ - Export command │ │ │ │ │ (time-series)│ │ │
│ └──────────────────┘ └──────────────────┘ │ └──────────────┘ │ │
│ │ ▲ │ │
│ ┌──────────────────┐ │ │ │ │
│ │ infrahubctl │──────────────────────────────│────────┘ │ │
│ │ telemetry export │ (airgapped import) │ │ │
│ └──────────────────┘ │ ┌──────────────┐ │ │
│ │ │ Infrahub │ │ │
│ │ │ (Customer │ │ │
│ │ │ Metadata) │ │ │
│ │ └──────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Grafana │ │ │
│ │ │ Dashboards │ │ │
│ │ └──────────────┘ │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘
```

---

## Part 1: New PoC Repository Setup

### 1.1 Repository Structure

Create a new repository with the following structure:

```
infrahub-customer-metrics/
├── README.md
├── pyproject.toml # uv-based Python project
├── docker-compose.yml # All services
├── .env.example # Environment variables template
├── .infrahub.yml # Infrahub component registry
├── schemas/ # Infrahub schema definitions
│ └── customer_metrics.yml # Customer and metrics schema
├── objects/ # Sample data (Infrahub object format)
│ └── bootstrap/
│ ├── 00_customers.yml # Sample customers
│ └── 01_licenses.yml # Sample license records
├── scripts/ # Automation scripts
│ ├── fetch_telemetry.py # Fetch from R2 to TimescaleDB
│ ├── sync_to_infrahub.py # Sync latest metrics to Infrahub
│ └── import_backfill.py # Import airgapped exports
├── grafana/ # Dashboard configurations
│ ├── provisioning/
│ │ ├── datasources/
│ │ │ ├── timescaledb.yml # TimescaleDB datasource
│ │ │ └── infrahub.yml # Infrahub GraphQL datasource
│ │ └── dashboards/
│ │ └── dashboards.yml # Dashboard provisioning config
│ └── dashboards/
│ ├── customer-detail.json # Per-customer dashboard
│ ├── portfolio-overview.json # All customers overview
│ └── health-alerts.json # At-risk customers
├── migrations/ # TimescaleDB migrations
│ └── 001_initial_schema.sql # Initial database schema
└── tests/ # Test files
└── test_telemetry_processing.py
```

### 1.2 Steps to Create Repository

```bash
# Step 1: Create and initialize repository
mkdir infrahub-customer-metrics
cd infrahub-customer-metrics
git init

# Step 2: Initialize uv project
uv init
uv add aioboto3 boto3-stubs[s3] polars pydantic httpx
uv add infrahub-sdk asyncpg psycopg2-binary
uv add --group dev pytest pytest-asyncio ruff mypy

# Step 3: Create directory structure
mkdir -p schemas objects/bootstrap scripts grafana/provisioning/{datasources,dashboards}
mkdir -p grafana/dashboards migrations tests

# Step 4: Create docker-compose.yml
# Step 5: Create .env.example with required variables
# Step 6: Create initial migration files
# Step 7: Create Infrahub schemas
# Step 8: Create sample object data
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language tags to code blocks.
Several code blocks are missing language identifiers (e.g., the architecture diagram and repository tree). As per coding guidelines, add tags like text, bash, sql, yaml, or json to every block.

🤖 Prompt for AI Agents
In `@docs/plans/customer-metrics-poc-plan.md` around lines 9 - 112, Several fenced
code blocks in the PoC plan lack language tags (notably the "PoC Architecture"
ASCII diagram, the repository tree, and the "Steps to Create Repository" shell
snippet); update the markdown in customer-metrics-poc-plan.md to add appropriate
language identifiers for every ``` block—use ```text for the ASCII diagram,
```text or ```bash for the repository tree, ```bash for the shell commands under
"Steps to Create Repository", ```yaml for YAML examples (e.g., schema/object
manifests and grafana provisioning snippets), and ```sql for migration
examples—ensure every fenced block (including examples like docker-compose.yml,
.env.example, migrations/001_initial_schema.sql, and grafana JSON/YAML snippets)
has the correct tag so syntax highlighting and linters pick them up.

Comment on lines +1088 to +1095
### 7.1 Recommended Tool: Grafana

Grafana is the recommended choice because:
- Native TimescaleDB/PostgreSQL support
- GraphQL datasource plugin for Infrahub queries
- Rich visualization options
- Easy container deployment
- Provisioning via config files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid “easy” in documentation.
“Easy container deployment” violates the wording guideline. Consider a neutral alternative like “containerized deployment.” As per coding guidelines, avoid “easy/just/simple.”

🤖 Prompt for AI Agents
In `@docs/plans/customer-metrics-poc-plan.md` around lines 1088 - 1095, Replace
the phrase "Easy container deployment" under the "7.1 Recommended Tool: Grafana"
section with a neutral alternative such as "Containerized deployment" (or
"Support for containerized deployment") to comply with the wording guideline;
update the bullet text in that section so it no longer uses "easy/just/simple"
language while preserving the meaning and other bullets unchanged.

Comment on lines +1369 to +1378
| **New PoC Repository** | Complete structure with docker-compose, schemas, scripts, and Grafana configs | ⏳ Pending |
| **TimescaleDB Schema** | Hypertables, continuous aggregates, and retention policies | ⏳ Pending |
| **Infrahub Schema** | Customer metadata model for the metrics platform | ⏳ Pending |
| **Infrahub Changes** | License module, local storage, dev mode, REST API | ✅ Implemented |
| **SDK Changes** | `infrahubctl telemetry export` command | ⏳ Pending |
| **Dashboard Setup** | Grafana with three dashboard types | ⏳ Pending |
| **Step-by-Step Plan** | Day-by-day implementation guide | 📋 Documented |

The plan enables both streaming telemetry and airgapped export/import workflows while tracking the two north star metrics (Actions and Objects) for license compliance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix grammar: “an enables” → “and enables” / “This plan enables.”
There’s a grammatical typo in the summary sentence that should be corrected. As per coding guidelines, keep user-facing text polished.

🧰 Tools
🪛 LanguageTool

[grammar] ~1377-~1377: Ensure spelling is correct
Context: ...an enables both streaming telemetry and airgapped export/import workflows while tracking ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In `@docs/plans/customer-metrics-poc-plan.md` around lines 1369 - 1378, Fix the
grammar in the summary sentence after the table where it currently reads "The
plan enables both streaming telemetry and airgapped export/import workflows
while tracking the two north star metrics (Actions and Objects) for license
compliance." — replace the malformed "an enables" with "and enables" or reword
to "This plan enables..." so the sentence reads correctly and is polished;
update the sentence near that table header in the customer-metrics-poc-plan
content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent) group/ci Issue related to the CI pipeline group/frontend Issue related to the frontend (React) type/documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants