Skip to content

Hotfix/may20#443

Merged
astronomerdave merged 30 commits into
mainfrom
hotfix/may20
Jun 1, 2026
Merged

Hotfix/may20#443
astronomerdave merged 30 commits into
mainfrom
hotfix/may20

Conversation

@astronomerdave
Copy link
Copy Markdown
Contributor

pipin' hot!

these are all the fixes we had to put in last night, needed to make ngps work

@astronomerdave astronomerdave requested review from prkrtg and scizen9 May 20, 2026 23:32
Copy link
Copy Markdown

@scizen9 scizen9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the impact on previous PRs, but otherwise, approved.

* fix: thermald pub-sub handling with:
       - thermald handles external data with a single function
       - adds db_column_defs.h to remove confusion between
         message keys and database fields
* fix: Sequence::shutdown error incorrectly initialized, which
       would return ERROR on success.
* fix: Sequence::shutdown bound slit_shutdown to ACAM tag,
       and vice-versa
* move tcs_shutdown to the end of the shutdown sequence
  to avoid errors from daemons still talking to it
* replace hard-coded strings in slicecamd with message keys
* publish acam temp on temp topic instead of status topic
* acam shutdown shuts down camera before motion
* acam publishes final status on close
* adds message keys for TANDOR_*
* updates targetcontrol.py to use "poll" command when talking
  to tcs, to avoid excess logging
     meant that a thread could close somone elses's fd. This
     touches a lot of files but it's a single charactor to pass
     by referece.
state tracking and recovery. Until I can implement ZMQ REQ/REP I am
switching DaemonClient to per-command connections so each command
opens, sends, and closes its own connection.
        not being routed to the tcs on subsequent restarts.
fix: increase ACAMD_MOVE_TIMEOUT (was timing out during moves)
init_log()'s return was ignored in every daemon. If it failed it
wouldn't retry until 24 hours later. This retries every 60s and
writes to stderr. Might not have been a big problem but worth fixing.
On a database fault, dothread_telemetry threw on connect, logged ERROR,
and terminated; the 1Hz telemetry_watchdog respawned it immediately,
producing a ~1Hz connect-storm and ERROR-log spam for the duration of
the outage. This adds a backoff counter that doubles on each failed
attempt (30s max) and resets on a successful connection.
fix: tcs emulator didn't recognize "Z"
update: reduce tcs logging on polling
@astronomerdave astronomerdave merged commit 3eccb3d into main Jun 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants