Describe the feature
I would like to request the switch of the existing logging practices in this project to strictly structured logging. For clarity, some of the services in this project use structured logging at least some of the time, but there doesn't appear to be a standard format. Some logs are plain unstructured text, some are standard logfmt, and others fall into well-structured formats like JSON.
This is also in relation to an issue I opened recently on one of the smd repos: OpenCHAMI/coresmd#50
Why do you want this feature?
This feature would help me because it would greatly simplify downstream system monitoring and debugging for HPC environments leveraging OpenCHAMI. With structured logging, we can more easily query the state of the cluster as well as debug issues by tracing events across services w.r.t. specific nodes.
Alternatives you've considered
Currently, there are two semi-viable alternatives:
- Regex parsing: Fragile. Would require frequent updates as the OpenCHAMI projects are in active development.
- Everything and the kitchen sink: Attempt to parse a log using every known structured format parser. This approach requires an unreasonable amount of computation while still remaining fallible, especially in cases involving purely unstructured text and oddly placed escape characters.
Additional context
bss appears to have multiple logging structures which all appear to follow the unstructured text paradigm. Often, these logs are either w.r.t. function calls / returns or purely unstructured diagnostic information (e.g., error output). Regardless, these logs contain a wealth of contextual information and structured logging would make it simpler to query/answer questions such as:
- What kernel parameters did node with nodeid/nid/xname receive?
- Is node xname configured properly for cloud init?
- Which nodes have recently requested boot parameters?
- Tracing node errors from cloud-init -> bss -> smd
Some logs, like those for debugging routine calls, already provide this information in a semi-structured format
{
"appname": "bss",
"component_id": null,
"data": {},
"facility": "user",
"host": "localhost",
"hostname": "localhost",
"message": "2026/03/02 23:46:51 DEBUG: checkParam returning \"nomodeset ro root=live:http://172.16.0.254:7070/boot-images/compute/base /rocky9.7-compute-base-rocky9 ip=dhcp overlayroot=tmpfs overlayroot_cfgdisk=disabled apparmor=0 selinux=0 console=ttyS0,115200 ip6=off cloud-init=enabled ds=nocloud-net;s=http://172.16.0.254:8081/cloud-init xname=x1000c0s0b0n0\"",
"node_id": null,
"parse_error": null,
"procid": 3274460,
"request_id": null,
"request_uri": null,
"request_user": null,
"severity": "err",
"source_ip": "127.0.0.1",
"source_type": "syslog",
"timestamp": "2026-03-02T16:46:51Z",
"trace_id": null,
"xname": "x1000c0s0b0n0"
}
{
"appname": "bss",
"component_id": null,
"data": {},
"facility": "user",
"host": "localhost",
"hostname": "localhost",
"message": "2026/03/02 23:46:51 DEBUG: checkParam(\"nomodeset ro root=live:http://172.16.0.254:7070/boot-images/compute/base/rocky9.7-compute-base-rocky9 ip=dhcp overlayroot=tmpfs overlayroot_cfgdisk=disabled apparmor=0 selinux=0 console=ttyS0,115200 ip6=off cloud-init=enabled ds=nocloud-net;s=http://172.16.0.254:8081/cloud-init xname=x1000c0s0b0n0\", \"nid=\", \"1\")",
"node_id": null,
"parse_error": null,
"procid": 3274460,
"request_id": null,
"request_uri": null,
"request_user": null,
"severity": "err",
"source_ip": "127.0.0.1",
"source_type": "syslog",
"timestamp": "2026-03-02T16:46:51Z",
"trace_id": null,
"xname": "x1000c0s0b0n0"
}
Others, despite providing valuable debugging information, cannot be parsed easily:
{
"appname": "bss",
"component_id": null,
"data": {},
"facility": "user",
"host": "localhost",
"hostname": "localhost",
"message": "2026/03/02 23:46:51 Failed to store last access timestamp for endpoint=\"bootscript\" name=\"x1000c0s0b0n0\" to postgres DB: postgres.LogEndpointAccess: Error executing query to add endpoint access {x1000c0s0b0n0 bootscript 1772495211}: pq: relation \"endpoint_access\" does not exist",
"node_id": null,
"parse_error": null,
"procid": 3274460,
"request_id": null,
"request_uri": null,
"request_user": null,
"severity": "err",
"source_ip": "127.0.0.1",
"source_type": "syslog",
"timestamp": "2026-03-02T16:46:51Z",
"trace_id": null,
"xname": null
}
There are also some non-standard field names in the logs. For instance, above xname is instead referred to as name.
Code of Conduct
Describe the feature
I would like to request the switch of the existing logging practices in this project to strictly structured logging. For clarity, some of the services in this project use structured logging at least some of the time, but there doesn't appear to be a standard format. Some logs are plain unstructured text, some are standard logfmt, and others fall into well-structured formats like JSON.
This is also in relation to an issue I opened recently on one of the
smdrepos: OpenCHAMI/coresmd#50Why do you want this feature?
This feature would help me because it would greatly simplify downstream system monitoring and debugging for HPC environments leveraging OpenCHAMI. With structured logging, we can more easily query the state of the cluster as well as debug issues by tracing events across services w.r.t. specific nodes.
Alternatives you've considered
Currently, there are two semi-viable alternatives:
Additional context
bssappears to have multiple logging structures which all appear to follow the unstructured text paradigm. Often, these logs are either w.r.t. function calls / returns or purely unstructured diagnostic information (e.g., error output). Regardless, these logs contain a wealth of contextual information and structured logging would make it simpler to query/answer questions such as:Some logs, like those for debugging routine calls, already provide this information in a semi-structured format
{ "appname": "bss", "component_id": null, "data": {}, "facility": "user", "host": "localhost", "hostname": "localhost", "message": "2026/03/02 23:46:51 DEBUG: checkParam returning \"nomodeset ro root=live:http://172.16.0.254:7070/boot-images/compute/base /rocky9.7-compute-base-rocky9 ip=dhcp overlayroot=tmpfs overlayroot_cfgdisk=disabled apparmor=0 selinux=0 console=ttyS0,115200 ip6=off cloud-init=enabled ds=nocloud-net;s=http://172.16.0.254:8081/cloud-init xname=x1000c0s0b0n0\"", "node_id": null, "parse_error": null, "procid": 3274460, "request_id": null, "request_uri": null, "request_user": null, "severity": "err", "source_ip": "127.0.0.1", "source_type": "syslog", "timestamp": "2026-03-02T16:46:51Z", "trace_id": null, "xname": "x1000c0s0b0n0" } { "appname": "bss", "component_id": null, "data": {}, "facility": "user", "host": "localhost", "hostname": "localhost", "message": "2026/03/02 23:46:51 DEBUG: checkParam(\"nomodeset ro root=live:http://172.16.0.254:7070/boot-images/compute/base/rocky9.7-compute-base-rocky9 ip=dhcp overlayroot=tmpfs overlayroot_cfgdisk=disabled apparmor=0 selinux=0 console=ttyS0,115200 ip6=off cloud-init=enabled ds=nocloud-net;s=http://172.16.0.254:8081/cloud-init xname=x1000c0s0b0n0\", \"nid=\", \"1\")", "node_id": null, "parse_error": null, "procid": 3274460, "request_id": null, "request_uri": null, "request_user": null, "severity": "err", "source_ip": "127.0.0.1", "source_type": "syslog", "timestamp": "2026-03-02T16:46:51Z", "trace_id": null, "xname": "x1000c0s0b0n0" }Others, despite providing valuable debugging information, cannot be parsed easily:
{ "appname": "bss", "component_id": null, "data": {}, "facility": "user", "host": "localhost", "hostname": "localhost", "message": "2026/03/02 23:46:51 Failed to store last access timestamp for endpoint=\"bootscript\" name=\"x1000c0s0b0n0\" to postgres DB: postgres.LogEndpointAccess: Error executing query to add endpoint access {x1000c0s0b0n0 bootscript 1772495211}: pq: relation \"endpoint_access\" does not exist", "node_id": null, "parse_error": null, "procid": 3274460, "request_id": null, "request_uri": null, "request_user": null, "severity": "err", "source_ip": "127.0.0.1", "source_type": "syslog", "timestamp": "2026-03-02T16:46:51Z", "trace_id": null, "xname": null }There are also some non-standard field names in the logs. For instance, above
xnameis instead referred to asname.Code of Conduct