Skip to content

Task Runner needs timeout on socket operations #32

@mthuurne

Description

@mthuurne

We had an issue where a Task Runner had an established socket (checked with netstat -tpn) and was waiting forever for traffic on that socket to occur. However, the other side had no corresponding socket and therefore couldn't send anything. My guess is that the router had dropped the socket from its NAT tables.

After killing the socket using ss -K sport = <port>, the Task Runner resumed normal operations. So the Task Runner was still operational, just waiting forever for a reply that wouldn't come. To make the Task Runner robust against situations like this, we should put a timeout on socket operations, so the operation fails if it doesn't make progress for a long time and a new socket can be opened on the next try.

This issue is very rare: I've had three Task Runners on the same machine with the same router for over half a year and it happened only once. So we can set the timeout value relatively high, for example a minute.

Note that the Task Runner doesn't do low-level socket operations directly: it uses java.net.HttpURLConnection instead. That class has setConnectTimeout() and setReadTimeout() methods that we can use. But those were added in Java 1.5, while the Task Runner was originally written in Java 1.4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions