feat: introduce serial link recovery#123
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
alexjoedt
left a comment
There was a problem hiding this comment.
Hey, sorry for the late review! Really like the direction here.
I left a few comments, mostly around defaults and some small robustness things. Hope they're helpful, let me know what you think!
| // Silent period after successful connection | ||
| ConnectDelay time.Duration | ||
| // Recovery timeout if the connection is lost | ||
| LinkRecoveryTimeout time.Duration |
There was a problem hiding this comment.
I noticed that neither NewRTUClientHandler nor NewASCIIClientHandler set a default for LinkRecoveryTimeout. With a zero value, linkRecoveryDeadline is immediately in the past, so the recovery loop will never actually retry, it will always hit "link recovery timeout reached" on the first error.
The TCP transporter handles this by checking if mb.LinkRecoveryTimeout == 0 before giving up. Should we either set a sensible default here or add the same zero-check in the serial Send methods?
| // Send the request | ||
| mb.logf("modbus: send % x\n", aduRequest) | ||
| if _, err = mb.port.Write(aduRequest); err != nil { | ||
| if err == io.EOF || err == io.ErrUnexpectedEOF || err == syscall.ECONNRESET { |
There was a problem hiding this comment.
The reconnect-on-error block is repeated four times across rtuSerialTransporter.Send and asciiSerialTransporter.Send (write error + read error in each). Would it make sense to extract something like a shouldRecover(err) bool helper and/or a reconnect(ctx) error method on serialPort? That way future changes only need to happen in one place.
| // Send the request | ||
| mb.logf("modbus: send % x\n", aduRequest) | ||
| if _, err = mb.port.Write(aduRequest); err != nil { | ||
| if err == io.EOF || err == io.ErrUnexpectedEOF || err == syscall.ECONNRESET { |
There was a problem hiding this comment.
Could we use errors.Is() instead of == here? Something like errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) || errors.Is(err, syscall.ECONNRESET). That way we'd also catch wrapped errors. Same in asciiclient.go:193 and asciiclient.go:218 (see comment above - could be solved with a helper method).
| } | ||
|
|
||
| aduResponse, err = readIncrementally(aduRequest[0], aduRequest[1], mb.port, connDeadline) | ||
| mb.logf("modbus: recv % x\n", aduResponse[:]) |
There was a problem hiding this comment.
If readIncrementally returns nil for aduResponse, this line will panic since aduResponse[:] is called before the error check. Could we move this log line after the if err != nil block, or guard it with if aduResponse != nil?
| connDeadline := time.Now().Add(mb.Timeout) | ||
| linkRecoveryDeadline := time.Now().Add(mb.LinkRecoveryTimeout) | ||
|
|
||
| for { |
There was a problem hiding this comment.
Just thinking out loud: if close() + connect() keeps succeeding but the remote device stays unresponsive, this could spin pretty fast until linkRecoveryDeadline. Would it be worth adding a small backoff delay between reconnect attempts, or capping the number of retries?
| for { | ||
| if time.Now().After(deadline) { // Possible that serialport may spew data | ||
| return nil, errors.New("failed to read from serial port within deadline") | ||
| return nil, context.DeadlineExceeded |
There was a problem hiding this comment.
This changed from a custom error string to context.DeadlineExceeded. Since that sentinel usually means the context deadline was exceeded, callers might misinterpret this as a context cancellation rather than an internal serial read timeout. Maybe wrap it to keep the distinction, like fmt.Errorf("failed to read from serial port within deadline: %w", context.DeadlineExceeded)?
| @@ -0,0 +1,117 @@ | |||
| //go:build darwin || linux || freebsd || openbsd || netbsd | |||
| // +build darwin linux freebsd openbsd netbsd | |||
There was a problem hiding this comment.
The build tags include darwin but openPTY() uses syscall.TIOCSPTLCK and syscall.TIOCGPTN, which are Linux-only. This won't compile on macOS. Should we either restrict the tag to linux or use something like github.com/creack/pty for cross-platform PTY support?
| return master, slavePath, nil | ||
| } | ||
|
|
||
| func TestRTUSerialTransporter_Send_PTY(t *testing.T) { |
There was a problem hiding this comment.
Nice to see PTY-based tests!
Any plans to also add a test for the recovery path? e.g. simulating an EOF mid-communication and verifying that the reconnect logic kicks in.
Right now the tests only cover the happy path and a plain timeout.
Attempt protection against shoddy wires in RTU connection.
Fixes: #122