Skip to content

close unused socket after graceful restart#150

Open
focksor wants to merge 1 commit into
lighttpd:masterfrom
focksor:close-unused-socket-after-graceful-restart-pr
Open

close unused socket after graceful restart#150
focksor wants to merge 1 commit into
lighttpd:masterfrom
focksor:close-unused-socket-after-graceful-restart-pr

Conversation

@focksor
Copy link
Copy Markdown

@focksor focksor commented Feb 2, 2026

This PR addresses an issue where network ports (sockets) are not properly released after a graceful restart triggered by SIGUSR1

While the commit 6c1e6e66 introduced the core logic for graceful restarts, a limitation in the current implementation prevents the process from closing the old listening sockets. This can lead to port exhaustion or conflicts when multiple restarts occur in a short period.

Changed:

  • Refactored socket management: Instead of binding srv to sockets during the graceful restart phase, the binding now occurs during network initialization. This allows the system to accurately track which sockets are active in the updated configuration.
  • Resource Leak Fix: Added logic to identify and close unused sockets immediately after server initialization, ensuring that redundant ports are released to the system.

@focksor
Copy link
Copy Markdown
Author

focksor commented Feb 3, 2026

@gstrauss Hoping for your review and advice :)

@gstrauss
Copy link
Copy Markdown
Member

gstrauss commented Feb 3, 2026

This PR addresses an issue where network ports (sockets) are not properly released after a graceful restart triggered by SIGUSR1

While the commit 6c1e6e66 introduced the core logic for graceful restarts, a limitation in the current implementation prevents the process from closing the old listening sockets. This can lead to port exhaustion or conflicts when multiple restarts occur in a short period.

Would you please try to describe this in more detail if you can reproduce the issue? Is the set of listening sockets changing frequently? If so, how? Can you describe the usage scenario?

Blanket question: was this PR written with the assistance of AI?

I have not had time to review closely yet, but have a few initial comments:

  • Why have you chosen to use ->srv = NULL as a flag for unused rather than ->sidx sentinel value? Can the solution be narrowed to address whatever might be wrong with setting that sentinel? Is lighttpd missing code to clean up changed set of listening sockets and closing previously used sockets, similar to what is done in network_close(), and should such cleanup be at the end of network_init(). (Maybe I had held onto inherited sockets in case a subsequent graceful restart tried listening to those inherited sockets again?)
  • Not reviewed/tested: lighttpd should continue to work with systemd socket activation conventions.
  • server_sockets_remove_unused() could be called in server_main_setup(), some time before return 1; at the end of the function. server_main_setup() is already marked cold, so server_sockets_remove_unused() need not repeat. Also, the two line contents of server_sockets_remove_unused() could be near the end of server_main_setup() rather than as a separate function (and why did you mark it noinline?). Instead of moving to server_main_setup(), why is your new code not called at the end of network_init()?
  • There are whitespace inconsistencies. I recognize that whitespace in lighttpd code is not universally consistent. Still, please match the whitespace usage in the functions you are modifying.

@focksor
Copy link
Copy Markdown
Author

focksor commented Feb 3, 2026

First of all, I sincerely apologize for the numerous issues in my PR. The primary reason I submitted this PR is that I encountered a problem while using lighttpd at work. Since I was a complete newcomer to the lighttpd project before this, my considerations may be far from exhaustive. Please feel free to point out any issues, and I am happy to make further adjustments.

Would you please try to describe this in more detail if you can reproduce the issue?

Yes! Here is a minimal reproducible example:

focksor@focksor:~/workSpace/lighttpd1.4$ cat test.conf
server.document-root = "/var/www/html"
server.port = 8080

$SERVER["socket"] == ":1234" {}
focksor@focksor:~/workSpace/lighttpd1.4$ lighttpd -f `pwd`/test.conf
focksor@focksor:~/workSpace/lighttpd1.4$ ss -tlpn4 | grep lighttpd
LISTEN 0      1024          0.0.0.0:8080       0.0.0.0:*    users:(("lighttpd",pid=487338,fd=4))       
LISTEN 0      1024          0.0.0.0:1234       0.0.0.0:*    users:(("lighttpd",pid=487338,fd=3))       
focksor@focksor:~/workSpace/lighttpd1.4$ sed -i 's/1234/2345/g' test.conf
focksor@focksor:~/workSpace/lighttpd1.4$ cat test.conf
server.document-root = "/var/www/html"
server.port = 8080

$SERVER["socket"] == ":2345" {}
focksor@focksor:~/workSpace/lighttpd1.4$ kill -SIGUSR1 `pidof lighttpd`
focksor@focksor:~/workSpace/lighttpd1.4$ ss -tlpn4 | grep lighttpd
LISTEN 0      1024          0.0.0.0:8080       0.0.0.0:*    users:(("lighttpd",pid=487338,fd=4))       
LISTEN 0      1024          0.0.0.0:1234       0.0.0.0:*    users:(("lighttpd",pid=487338,fd=3))       
LISTEN 0      1024          0.0.0.0:2345       0.0.0.0:*    users:(("lighttpd",pid=487338,fd=5))   

As you see, after graceful restarting, lighttpd binds both port 1234 and port 2345 while the updated configuration file only mentions port 2345.

So, lighttpd indeed does not release the old port after a graceful restart.

Is the set of listening sockets changing frequently? If so, how? Can you describe the usage scenario?

We use lighttpd on a device that contains many service(including HTTP and many other servers) and use lighttpd as the proxy of the HTTP servers.

Let’s take Service A (an HTTP service) and Service B (a non-HTTP service that does not use a lighttpd proxy) as an example. Occasionally, a user may wish to reassign Service A to a different port and then assign Service B to the port previously used by Service A. However, because lighttpd continues to occupy the old port after a graceful restart, Service B is unable to bind to that port.

Additionally, if users disable Service A but find that the port remains open, they will be concerned about security issues.

Blanket question: was this PR written with the assistance of AI?

Not really. All the analysis and coding were done by me (a human). However, since English is not my native language, I used AI tools to translate the statements in the PR Description and in this response.

Why have you chosen to use ->srv = NULL as a flag for unused rather than ->sidx sentinel value? Can the solution be narrowed to address whatever might be wrong with setting that sentinel? Is lighttpd missing code to clean up changed set of listening sockets and closing previously used sockets, similar to what is done in network_close(), and should such cleanup be at the end of network_init(). (Maybe I had held onto inherited sockets in case a subsequent graceful restart tried listening to those inherited sockets again?)

I chose to use ->srv = NULL instead of ->sidx because I noticed that in server_sockets_restore(), the ->srv field is reset for both srv->srv_sockets and srv->srv_sockets_inherited. However, ->sidx is not reset for srv->srv_sockets_inherited. Since I wasn't certain of the reason behind this inconsistency, I opted for ->srv as the flag.

If you approve, I will make the following changes in an amended commit:

  1. Remove all of changes before
  2. In server_sockets_restore(), set ->sidx = (unsigned short)~0u for both srv->srv_sockets and srv->srv_sockets_inherited.
  3. Modify network_init() in src/network.c (lines 987-993) to close and release resources for all sockets in srv->srv_sockets where sidx == ~0u.

Not reviewed/tested: lighttpd should continue to work with systemd socket activation conventions.

I will check it after the next amended commit.

server_sockets_remove_unused() could be called in server_main_setup(), some time before return 1; at the end of the function. server_main_setup() is already marked cold, so server_sockets_remove_unused() need not repeat. Also, the two line contents of server_sockets_remove_unused() could be near the end of server_main_setup() rather than as a separate function (and why did you mark it noinline?). Instead of moving to server_main_setup(), why is your new code not called at the end of network_init()?

These codes will be move to network_init() as I mentioned above.

There are whitespace inconsistencies. I recognize that whitespace in lighttpd code is not universally consistent. Still, please match the whitespace usage in the functions you are modifying.

Sorry about that, my IDE changed it automatically so I did not notice about that. I will match them in the next amend commit.

Signed-off-by: focksor <focksor@outlook.com>
@focksor focksor force-pushed the close-unused-socket-after-graceful-restart-pr branch from b77ebb9 to 263ef15 Compare February 4, 2026 03:35
@focksor
Copy link
Copy Markdown
Author

focksor commented Feb 4, 2026

Good morning @gstrauss ! I update the implement method in the latest commit(263ef15) and tested it with following commands.

+ git log -1 --oneline
22c59dd7 (HEAD -> master, origin/master, origin/HEAD, focksor/master) [build] support lua 5.5
+ scripts/ci-build.sh meson
+ grep Ok:
Ok:                 9   
+ ./build/src/lighttpd -v
lighttpd/1.4.83 (ssl) - a light and fast webserver
+ cat test.conf
server.document-root = "/var/www/html"
server.port = 8080

$SERVER["socket"] == ":1234" {}
++ pwd
+ ./build/src/lighttpd -f /home/focksor/workSpace/lighttpd1.4/test.conf
+ ss -tlpn4
+ grep lighttpd
LISTEN 0      4096          0.0.0.0:8080       0.0.0.0:*    users:(("lighttpd",pid=552227,fd=4))       
LISTEN 0      4096          0.0.0.0:1234       0.0.0.0:*    users:(("lighttpd",pid=552227,fd=3))       
+ sed -i s/1234/2345/g test.conf
+ cat test.conf
server.document-root = "/var/www/html"
server.port = 8080

$SERVER["socket"] == ":2345" {}
++ pidof lighttpd
+ kill -SIGUSR1 552227
+ ss -tlpn4
+ grep lighttpd
LISTEN 0      4096          0.0.0.0:8080       0.0.0.0:*    users:(("lighttpd",pid=552227,fd=4))       
LISTEN 0      4096          0.0.0.0:1234       0.0.0.0:*    users:(("lighttpd",pid=552227,fd=3))       
LISTEN 0      4096          0.0.0.0:2345       0.0.0.0:*    users:(("lighttpd",pid=552227,fd=5))       
+ sed -i s/2345/3456/g test.conf
+ cat test.conf
server.document-root = "/var/www/html"
server.port = 8080

$SERVER["socket"] == ":3456" {}
++ pidof lighttpd
+ kill -SIGUSR1 552227
+ ss -tlpn4
+ grep lighttpd
LISTEN 0      4096          0.0.0.0:8080       0.0.0.0:*    users:(("lighttpd",pid=552227,fd=4))       
LISTEN 0      4096          0.0.0.0:1234       0.0.0.0:*    users:(("lighttpd",pid=552227,fd=3))       
LISTEN 0      4096          0.0.0.0:3456       0.0.0.0:*    users:(("lighttpd",pid=552227,fd=6))       
LISTEN 0      4096          0.0.0.0:2345       0.0.0.0:*    users:(("lighttpd",pid=552227,fd=5))       
++ pidof lighttpd
+ kill 552227

Hoping for your review and advice.

@gstrauss
Copy link
Copy Markdown
Member

gstrauss commented Feb 4, 2026

@focksor thank you for the additional information and for the iterative work. I will set aside some time to dig into this on Sunday.

@gstrauss
Copy link
Copy Markdown
Member

gstrauss commented May 8, 2026

Not forgotten, but sorry for the delay.

The history of the code is intentional: listening sockets are preserved in case lighttpd has dropped privileges and might no longer be able to re-open those sockets, and a configuration change may remove and then another add them back.

The behavior you expect is also reasonable, but is different, and so I will probably modify this patch to add a feature flag to enable your desired behavior, while preserving the current behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants