Skip to content

Simpler log error when nothing is found and update timezones#809

Merged
PawelPlesniak merged 3 commits intodevelopfrom
emmuhamm/simple-log-error
Mar 11, 2026
Merged

Simpler log error when nothing is found and update timezones#809
PawelPlesniak merged 3 commits intodevelopfrom
emmuhamm/simple-log-error

Conversation

@emmuhamm
Copy link
Contributor

@emmuhamm emmuhamm commented Mar 9, 2026

Description

Fixes issue #805

See above. Also added some pytests for this specific thing.

Screenshot 2026-03-09 at 14 50 46

Fixes #799

See above

Screenshot 2026-03-09 at 15 18 04

Installed daqpytools for this

Type of change

  • Bug fix

List of required branches from other repositories

None

Changelog

  • Move exception down
  • Add Pytests

Suggested manual testing checklist

Run a drunc session, and boot.

logs --name unknown

Developer checklist

Prior to marking this as "Ready for Review"

Tests ran on: np04-srv-028 from release NFS_260305

Unit tests - some tests can't be ran on the CI. This is documented. If this PR checks a feature that can't be tested with CI, this has been marked appropriately.

Integration tests - the daqsystemtest_integtest_bundle requires a lot of resources, and connections to the EHN1 infrastructure. Check the cross referenced list if you can't run these. The developer needs to run at least the .

  • Unit tests (pytest --marker) passed
    • With relevant marker
    • Without marker
  • Integration tests passed
    • Only daqsystemtest_integtest_bundle.sh -k minimal_system_quick_test.py
    • Full daqsystemtest_integtest_bundle.sh
  • Testing skipped as there are no core code changes in this PR, this only relates to documentation/CI workflows

Final checklist prior to marking this as "Ready for Review"

  • Code is clearly commented.
  • New unit tests have been added, or is documented in # ISSUE NUMBER
  • A suitable reviewer has been chosen from this list.

Reviewer checklist

  • This branch has been rebased with develop prior to testing.
  • Suggested manual tests show changes.
  • CI workflows fails documented (if present)
  • Integration tests passed
    • Only concern yourself if failures related to drunc are in the log files
    • If non-drunc failure appears:
      • Validate failure in fresh working area
      • Contact Pawel if unsure

Once the features are validated and both the unit and integration tests pass, the PRs is ready to be merged.

Prior to merging

Choose one of the following an complete all substeps
  • Changes only affect the Run Control, are in a single repository, and do not affect the end user.
    • Changes are documented in docstrings and code comments
    • Wiki has been updated if architectural or endpoint changes
  • Otherwise
    • Workflow changes demonstrated in the Change Log (if necessary)
    • Wiki has been updated (if necessary)
    • #daq-sw-librarians Slack channel notified (see below)

Once completed, the reviewer can merge the PR.

Notification message for #daq-sw-librarians Slack channel

For an single merge that changes the user workflow

The CCM WG has an isolated PR ready to merge that affects user workflows. The PR is:

_URL_

I will leave time for any comments, otherwise will merge these at the end of the work day _Insert your time zone_.

For co-ordinated merge

The CCM WG has a set of co-ordinated merges ready to merge. The PRs are:

_URL_

_URL_


I will leave time for any comments, otherwise will merge these at the end of the day.

@emmuhamm emmuhamm force-pushed the emmuhamm/simple-log-error branch from 61949e0 to 38e092a Compare March 9, 2026 14:54
@emmuhamm emmuhamm marked this pull request as ready for review March 9, 2026 14:55
@emmuhamm emmuhamm self-assigned this Mar 9, 2026
@emmuhamm emmuhamm requested a review from PawelPlesniak March 9, 2026 14:55
@emmuhamm emmuhamm changed the title Emmuhamm/simple-log-error Simpler log error when nothing is found Mar 9, 2026
@emmuhamm emmuhamm added bug Something isn't working and removed bug Something isn't working labels Mar 9, 2026
@emmuhamm emmuhamm changed the title Simpler log error when nothing is found Simpler log error when nothing is found and update timezones Mar 9, 2026
@PawelPlesniak
Copy link
Collaborator

I have validated that this works using today's nightly before merging develop into this branch, but after merging in develop there is an error as

((dbt) ) pplesnia@np04-srv-029 ~/nightlyDev/NFD_DEV_260311_A9 $ drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config pawel
Traceback (most recent call last):
  File "/nfs/home/pplesnia/nightlyDev/NFD_DEV_260311_A9/.venv/bin/drunc-unified-shell", line 3, in <module>
    from drunc.apps.unified_shell import main
  File "/nfs/home/pplesnia/nightlyDev/NFD_DEV_260311_A9/.venv/lib/python3.12/site-packages/drunc/apps/unified_shell.py", line 1, in <module>
    from drunc.unified_shell.context import UnifiedShellContext
  File "/nfs/home/pplesnia/nightlyDev/NFD_DEV_260311_A9/.venv/lib/python3.12/site-packages/drunc/unified_shell/context.py", line 6, in <module>
    from drunc.utils.shell_utils import ShellContext
  File "/nfs/home/pplesnia/nightlyDev/NFD_DEV_260311_A9/.venv/lib/python3.12/site-packages/drunc/utils/__init__.py", line 1, in <module>
    from drunc.utils.utils import get_logger
  File "/nfs/home/pplesnia/nightlyDev/NFD_DEV_260311_A9/.venv/lib/python3.12/site-packages/drunc/utils/utils.py", line 17, in <module>
    from daqpytools.logging import get_daq_logger, setup_root_logger
ImportError: cannot import name 'get_daq_logger' from 'daqpytools.logging' (/nfs/home/pplesnia/nightlyDev/NFD_DEV_260311_A9/.venv/lib/python3.12/site-packages/daqpytools/logging/__init__.py)

@emmuhamm
Copy link
Contributor Author

ImportError: cannot import name 'get_daq_logger' from 'daqpytools.logging'

In DUNE-DAQ/daqpytools#54 I moved the imports to somewhere more logical. Since it was merged today, it hasn't propagated to the latest nightly (I presume).

Can you try with develop on daqpytools?

@PawelPlesniak
Copy link
Collaborator

Yep, you're right your ERS work went in today, my apologies. Manual test worked. Quick integ test to finish off and merge. Thank you!

@PawelPlesniak
Copy link
Collaborator

Running the integration tests I see

+++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++ SUMMARY ++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++

Wed Mar 11 04:15:41 PM CET 2026
Log file is: /tmp/pytest-of-pplesnia/dunedaq_integtest_bundle_20260311151408.log

⮕ Running daqsystemtest/3ru_1df_multirun_test.py ⬅
======================== 6 passed ✅ in 302.19s (0:05:02) =========================
⮕ Running daqsystemtest/3ru_3df_multirun_test.py ⬅
======================== 6 passed ✅ in 342.94s (0:05:42) =========================
⮕ Running daqsystemtest/example_system_test.py ⬅
=================== 4 failed ❌, 8 passed ✅ in 720.65s (0:12:00) ====================
⮕ Running daqsystemtest/fake_data_producer_test.py ⬅
======================== 6 passed ✅ in 297.41s (0:04:57) =========================
⮕ Running daqsystemtest/long_window_readout_test.py ⬅
============================== 1 skipped 🟡 in 1.34s ==============================
⮕ Running daqsystemtest/minimal_system_quick_test.py ⬅
========================= 4 passed ✅ in 77.91s (0:01:17) =========================
⮕ Running daqsystemtest/readout_type_scan_test.py ⬅
======================== 33 passed ✅ in 879.96s (0:14:39) ========================
⮕ Running daqsystemtest/sample_ehn1_multihost_test.py ⬅
============================= 4 skipped 🟡 in 53.14s ==============================
⮕ Running daqsystemtest/small_footprint_quick_test.py ⬅
========================= 3 passed ✅ in 81.57s (0:01:21) =========================
⮕ Running daqsystemtest/tpg_state_collection_test.py ⬅
======================== 5 passed ✅ in 146.56s (0:02:26) =========================
⮕ Running daqsystemtest/tpreplay_test.py ⬅
======================== 6 passed ✅ in 182.89s (0:03:02) =========================
⮕ Running daqsystemtest/tpstream_writing_test.py ⬅
======================== 4 passed ✅ in 146.01s (0:02:26) =========================
⮕ Running daqsystemtest/trigger_bitwords_test.py ⬅
======================== 18 passed ✅ in 440.11s (0:07:20) ========================

Looking in the logs the error is

[2026/03/11 14:28:08 UTC] INFO       ssh_process_manager.py:305               drunc.process_manager.SSH_SHELL_process_manager    Process 'trg-controller' (session: 'ehn1-local-1x1-config-pplesnia-Ymgo', user: 'pplesnia') process exited with exit code 1
[2026/03/11 14:28:08 UTC] INFO       ssh_process_manager.py:305               drunc.process_manager.SSH_SHELL_process_manager    Process 'hsi-fake-controller' (session: 'ehn1-local-1x1-config-pplesnia-Ymgo', user: 'pplesnia') process exited with exit code 1
[2026/03/11 14:28:18 UTC] CRITICAL   process_manager.py:246                   drunc.process_manager.SSH_SHELL_process_manager    Process trg-controller has died with a return code 0
[2026/03/11 14:28:18 UTC] CRITICAL   process_manager.py:246                   drunc.process_manager.SSH_SHELL_process_manager    Process df-controller has died with a return code 0
[2026/03/11 14:28:18 UTC] CRITICAL   process_manager.py:246                   drunc.process_manager.SSH_SHELL_process_manager    Process ru-controller has died with a return code 0
[2026/03/11 14:28:18 UTC] CRITICAL   process_manager.py:246                   drunc.process_manager.SSH_SHELL_process_manager    Process root-controller has died with a return code 0
[2026/03/11 14:28:18 UTC] CRITICAL   process_manager.py:246                   drunc.process_manager.SSH_SHELL_process_manager    Process hsi-fake-controller has died with a return code 0
  Looking for root-controller on the connectivity service... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 0:03:01
[2026/03/11 14:31:08 UTC] WARNING    process_manager_driver.py:585            drunc.process_manager_driver                       Connectivity service lookup failed: Application 'root-controller' not found.
[2026/03/11 14:31:08 UTC] ERROR      process_manager_driver.py:982            drunc.process_manager_driver                       
# Could not find 'root-controller' on the connectivity service.

# Two possibilities:

# 1. The most likely, the controller died. You can check that by looking for error like:
# Process 'root-controller' (session: 'ehn1-local-1x1-config-pplesnia-Ymgo', user: 'pplesnia') process exited with exit code 1).
# Try running ps to see if the root-controller is still running.
# You may also want to check the logs of the controller, try typing:
# logs --name root-controller --how-far 1000
# If that's not helping, you can restart this shell with --log-level debug, and look out for 'STDOUT' and 'STDERR'.

# 2. The controller did not die, but is still setting up and has not advertised itself on the connection service.
# You may be able to connect to the root-controller in a bit. Check the logs of the controller:
# logs --name root-controller --grep grpc
# And look for messages like:
# Registering root-controller to the connectivity service at grpc://xxx.xxx.xxx.xxx:xxxxx
# To find the controller address, you can look up 'root-controller_control' on http://np04-srv-017:30005 (you may need a SOCKS proxy from outside CERN), or use the address from the logs as above. Then just connect this shell to the controller with:
# connect {controller_address}:{controller_port}>
            
[2026/03/11 14:31:08 UTC] WARNING    process_manager_driver.py:595            drunc.process_manager_driver                       Falling back to static OKS configuration for address resolution.
⠋ Trying to talk to the root controller... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0:00:01 0:01:00
[2026/03/11 14:32:11 UTC] INFO       shell.py:412                             drunc.unified_shell                                Shutting down the unified_shell
[2026/03/11 14:32:11 UTC] ERROR      shell.py:451                             drunc.unified_shell                                Could not retrieve the controller status, reason: failed to connect to all addresses; last error: UNKNOWN: ipv4:10.73.136.71:30683: Failed to connect to remote host: 
connect: Connection refused (111)
[2026/03/11 14:32:11 UTC] INFO       shell_utils.py:135                       drunc.utils.ShellContext                           You will not be able to issue commands to the controller anymore.
[2026/03/11 14:32:11 UTC] INFO       shell_utils.py:137                       drunc.utils.ShellContext                           Controller driver has been deleted.
[2026/03/11 14:32:11 UTC] INFO       ssh_process_manager.py:203               drunc.process_manager.SSH_SHELL_process_manager    Terminating
... # Termination logs
---------- DRUNC Run END ----------
==========================================
EHN1 1x1 Conf-StandAloneSSH_PM-run_nanorc0
==========================================
.
----------
🚨 Problem(s) found in logfile /log/log_pplesnia_ehn1-local-1x1-config-pplesnia-Ymgo_root-controller_2026-03-11-15-28-06.txt:
[2026/03/11 14:28:07 UTC] ERROR      controller.py:11                         drunc.controller_app                               Exception thrown!

[2026/03/11 14:28:07 UTC] ERROR      controller.py:12                         drunc.controller_app                               The Kafka broker cannot be initialised using address='monkafka.cern.ch:30092' and topic='ers_stream'

AttributeError: 'Controller' object has no attribute 'opmon_publisher'

----------
🚨 Problem(s) found in logfile /log/log_pplesnia_ehn1-local-1x1-config-pplesnia-Ymgo_ru-controller_2026-03-11-15-28-06.txt:
[2026/03/11 14:28:07 UTC] ERROR      controller.py:11                         drunc.controller_app                               Exception thrown!

[2026/03/11 14:28:07 UTC] ERROR      controller.py:12                         drunc.controller_app                               The Kafka broker cannot be initialised using address='monkafka.cern.ch:30092' and topic='ers_stream'

AttributeError: 'Controller' object has no attribute 'opmon_publisher'

----------
🚨 Problem(s) found in logfile /log/log_pplesnia_ehn1-local-1x1-config-pplesnia-Ymgo_df-controller_2026-03-11-15-28-06.txt:
[2026/03/11 14:28:08 UTC] ERROR      controller.py:11                         drunc.controller_app                               Exception thrown!

[2026/03/11 14:28:08 UTC] ERROR      controller.py:12                         drunc.controller_app                               The Kafka broker cannot be initialised using address='monkafka.cern.ch:30092' and topic='ers_stream'

AttributeError: 'Controller' object has no attribute 'opmon_publisher'

----------
🚨 Problem(s) found in logfile /log/log_pplesnia_ehn1-local-1x1-config-pplesnia-Ymgo_trg-controller_2026-03-11-15-28-06.txt:
[2026/03/11 14:28:08 UTC] ERROR      controller.py:11                         drunc.controller_app                               Exception thrown!

[2026/03/11 14:28:08 UTC] ERROR      controller.py:12                         drunc.controller_app                               The Kafka broker cannot be initialised using address='monkafka.cern.ch:30092' and topic='ers_stream'

AttributeError: 'Controller' object has no attribute 'opmon_publisher'

----------
🚨 Problem(s) found in logfile /log/log_pplesnia_ehn1-local-1x1-config-pplesnia-Ymgo_hsi-fake-controller_2026-03-11-15-28-07.txt:
[2026/03/11 14:28:08 UTC] ERROR      controller.py:11                         drunc.controller_app                               Exception thrown!

[2026/03/11 14:28:08 UTC] ERROR      controller.py:12                         drunc.controller_app                               The Kafka broker cannot be initialised using address='monkafka.cern.ch:30092' and topic='ers_stream'

AttributeError: 'Controller' object has no attribute 'opmon_publisher'

These are unrelated to this PR, merging

@PawelPlesniak PawelPlesniak merged commit cfc59a9 into develop Mar 11, 2026
4 checks passed
@PawelPlesniak PawelPlesniak deleted the emmuhamm/simple-log-error branch March 11, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: If a log file does not exist, it should generate a simpler error message [Feature]: The Run Info table does not have a time zone

2 participants