Mekhanik evgenii/fix 1346 1 by EvgeniiMekhanik · Pull Request #2456 · tempesta-tech/tempesta

EvgeniiMekhanik · 2025-07-01T17:59:43Z

No description provided.

EvgeniiMekhanik · 2025-07-03T14:26:39Z

 	__u8 pfmemalloc = skb->pfmemalloc;

-	WARN_ON_ONCE(skb->sk);
+	skb_orphan(skb);


Please pay attention on this place. Here we release skb owner and decrease client->mem. This function ss_skb_init_for_xmit is called before push skb to the socket write queue. So all skbs in socket write queue are not taken into account for client memory calculation. We release skb owner here, because if don't do it we need a rather big kernel patch to adjust skb memory before it will be passed to socket write queue. @krizhanovsky @const-t what do you think about it?

Why don't we make a pointer to a client accounting in skb->cb instead of to play with skb_orphan()? I'd prefer to avoid this since we can get plenty of crashes in this patch or in later kernel version migrations due to breaking kernel logic about orphaned skbs.

const-t · 2026-04-23T14:04:25Z


 static inline int
-tfw_cli_mem_init(TfwClientMem *cli_mem)
+tfw_cli_mem_init(TfwClientMem *cli_mem, bool from_pool)


Maybe just pass flags would be better? But it is not important I'm ok with it

const-t

LGTM, however review from @krizhanovsky is required, especially to approve new approach with memory limit checking.

krizhanovsky

LGTM for now. I listed all required to dos in #1715 (comment) and made a small commit to adjust coding style to not to pollute the comment. Let's address the comments in a new PR - it becomes to hard to review this PR - let's just merge it for now.

krizhanovsky · 2026-04-25T14:58:30Z

+				 conn->peer, skb->truesize);
+	}
+
+	r = frang_client_mem_limit((TfwCliConn *)conn, false);


@const-t is this comment still relevant? I see for example, that ss_skb_adjust_data_len() is called fromtfw_hpack_cache_decode_expand() -> tfw_hpack_cache_decode_expand() -> tfw_http_msg_expand_data(), i.e the current code does account HTTP responses

krizhanovsky · 2026-04-25T18:04:46Z

 	BUG_ON(!conn);
 	BUG_ON(!list_empty(&conn->list));
 	BUG_ON(conn->stream.msg);
+	BUG_ON(conn->write_queue);


The point was to comment this in code, not in PR

krizhanovsky · 2026-04-25T19:19:20Z

 	if (tfw_runstate_is_reconfig())
 		return;

-	tfw_client_free_lru();


It seems the content of commit a3ad176 and its message are about different things

client_mem <soft_limit> <hard_limit> - controls haw many memory is used to store unanswered client requests and requests with linked responses which can not be forwarded to a client.

Adjust FRAME_HEADER_SIZE during calculation of send window during making frames. (There was a mistake with accuracy of send window calculation, we don't take into account, that each frame also contains frame header).

To track socket memory we should pass TfwHttpMsg * not TfwMsgIter * to most of http_nsg_* functions, because TfwHttpMsg has a pointer to connection and socket.

In task #498 we decide to use `client_mem` option to limit count of memory used by client. This commit is a part of this task - now Tempesta FW uses `sk->sk_rmem_alloc` to adjust memory used by Tempesta FW for this client connection.

In task #498 we decide to use `client_mem` option to limit count of memory used by client. This commit is a part of this task and the next step of implementaion. Previosly Tempesta FW uses `sk->sk_rmem_alloc` to adjust memory used by Tempesta FW for this client connection, now we adjust memory for the whole TfwClient, because the can be a lot of connection for one client and for all other cases we use limitation for TfwClient and block it if necessary.

If administrator specify `client_mem` and the memory used by all connection of current client exceeded this value Tempesta FW drops connection and block client by ip if `ip_block on;` is specified.

Previosuly we get connection when we adjust memory for skb, but it leads to several problems: - we can't adjust memory for skb before tls decryption, because skb from `tls->io_in.skb_list` are freed during connection released (but connection will be never released if we increment it's reference counter for these skbs). - We have the same problems for skbs, which are wait for appropriate tcp window to be pushed in socket write queue. Now we increment/decrement reference counter for TfwClient and adjust skb memory for requests before tls decryption.

Previously we adjust tcp send window only for http2 connection and only during making HEADER or DATA frames, but if we want to control client memory usage we should do it for all type of sending data. (We orphane skb and decrease memory usage when we pass skb to the socket write queue, so we we don't adjust tcp send window we push a lot of skbs in socket write queue and don't adjust it's memory).

- remove `client_get_light/client_put_light` functions, because after removing lock from `client` structure we don't need these functions at all. - Adjust memory usage of skb in `skb->cb`. Usually it is equal to `skb->truesize, but for some cases ( skb which was created by `pskb_copy_for_clone` for example it is different).

Do not use `skb->sk` and `skb->destructor` to check memory used by skb, use `skb->cb` for this purposes. - Implement our own version of `skb_orphan` with name `ss_skb_orphan` which is called when skb is freed in Tempesta FW code our just before pushing skb to socket write queue. - Implement wrappers over `__kfree_skb` and `kfree_skb` where we call `ss_skb_orphan` before free skb. - Check that skb is pushed to socket write queue, using new ipmlemented function `skb_tfw_is_in_socket_write_queue` from linux kernel, to skip adjusting memory used be skb, when it belongs to kernel (when `ss_skb_*` functions called from `tls_encrypt`).

- Usually we use callbacks which are set in `skb->cb` for different purposes. So remove to callbacks, which was added in previous patches and use callbacks saved in `skb->cb`.

- Since we use pool for http memory allocation, change api of all `tfw_pool_*` functions to pass `TfwClient` and accounting memory in this structure. - Remove `TfwClient` refcounter (it not used, can be done in previous commits). - Fix unit tests to check memory accounting, cleanup memory after each test, to check that client memory is equal to zero after test.

A big performance degradation was found after this patch. During investigation it was found that the problem is in usage atomic counter for client mem accounting. Usage per_cpu array instead of atomic counter fix a performance issue.

Previously we remove client entry from TDB if there is no entry in `client_lru.free_list` and new client is allocated, even if such removed client still have any active connections. There is a BUG in such strategy - if this removed client has hung connections, we can't close and destroy them during Tempesta FW unloading, because we close and destroy connections during iteration through active clients (`tfw_client_for_each`). In new strategy we change logic in `tdb_htrie_put_rec`. We add pointer to the bucket in the record structure. When we remove record we zeroed this pointer. If record reference counter became equal to zero, but bucket pointer is still not NULL (record was not removed) we remove such record from the bucket using this pointer. For clients we just use tfw_client_put, without record removing, when client reference counter became equal to zero client record will be removed from bucket and freed.

We can't call tfw_client_get/put on each allocated or orphaned skb. (Or each pool creation/destroing). Under pressure when we have a lot of cpus that lead to atomic contention and bad performance degradation. To fix this problem we implement special TfwClientMem structure, with it's own reference accounting (using struct percpu_ref!) and save in the client structure point to it. We use percpu_ref_tryget/percpu_ref_put during skb allocation/deallocation (it's very cheap). When we destroy client we schedule work, call `percpu_ref_kill_and_confirm` and wait until all skbs will be orphaned. Also make some fixes according review: - Call `tfw_client_free` for incomplete records also. - Implement `tfw_alloc_percpu_gfp` same as `alloc_percpu_gfp` but with error injection - Fix memory accouting during copying skbs.

- Use cache for client mem allocations - Use typedef for TfwClientMem structure and pass TfwClientMem * pointer instread of void * in all functions. - Make client_mem option reconfigurable.

- Preallocate and initialize TfwClientMem structures according to client_cfg.lru_size. During TfwCLient structure allocation try to get TfwClientMem from preallocated pool. If this pool is empty alloc TfwClientMem from cache. - Implement new fault injection alloc functions to cover new code. 1396: small: finished in 50.03s, 1294710.94 req/s, 1002.60MB/s finished in 50.03s, 1287197.04 req/s, 998.01MB/s large: finished in 50.03s, 103497.42 req/s, 9.90GB/s finished in 50.03s, 108665.42 req/s, 10.39GB/s 1396 with client_mem: small: finished in 50.03s, 1225390.98 req/s, 948.74MB/s finished in 50.03s, 1223275.66 req/s, 947.72MB/s large: finished in 50.08s, 78906.58 req/s, 7.55GB/s finished in 50.08s, 86201.98 req/s, 8.24GB/s master: small: finished in 50.03s, 1294782.10 req/s, 1002.49MB/s finished in 50.03s, 1294782.10 req/s, 1001.33MB/s large: finished in 50.04s, 98625.80 req/s, 9.43GB/s finished in 50.04s, 97767.22 req/s, 9.35GB/s

We should check that TlsMpiPool was allocated during `ttls_mpool_exit`

- Remove SS_* errocodes, use only T_* error codes, since we already include "lib/log.h" in all files. - Split enum with error codes to two enums (one for common error codes and one for internal). - List all error codes from least crusial to most crusial - Implement some useful functions.

Don not check client memory consumption during parsing requests/http2 frame processing, check it only at the end of softirq rx path. Under load we can process little bit more requests, before dropping connection and block client but in this case we have no perfomance degradation: 1396 with memory check: small: finished in 50.03s, 1308389.70 req/s, 1013.29MB/s finished in 50.03s, 1303456.74 req/s, 1010.62MB/s finished in 50.03s, 1298829.02 req/s, 1007.03MB/s large: finished in 50.08s, 103259.90 req/s, 9.88GB/s finished in 50.08s, 111437.44 req/s, 10.66GB/s finished in 50.07s, 101983.44 req/s, 9.75GB/s finished in 50.07s, 111777.22 req/s, 10.69GB/s master: small: finished in 50.03s, 1330912.18 req/s, 1.01GB/s finished in 50.03s, 1343435.90 req/s, 1.02GB/s finished in 50.03s, 1344150.06 req/s, 1.02GB/s large: finished in 50.04s, 98945.70 req/s, 9.46GB/s finished in 50.04s, 96055.18 req/s, 9.19GB/s finished in 50.04s, 102439.50 req/s, 9.80GB/s

EvgeniiMekhanik requested review from const-t and krizhanovsky July 1, 2025 17:59

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch from a62787c to c9bba36 Compare July 2, 2025 09:46

EvgeniiMekhanik commented Jul 3, 2025

View reviewed changes

krizhanovsky mentioned this pull request Jul 7, 2025

HTTTP/2 (D)DoS prevention #1346

Open

2 tasks

EvgeniiMekhanik marked this pull request as draft July 8, 2025 09:12

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch from c9bba36 to 26e3525 Compare July 8, 2025 09:12

const-t reviewed Jul 8, 2025

View reviewed changes

Comment thread fw/http_limits.h Outdated

Comment thread fw/sock_clnt.c

Comment thread fw/ss_skb.h Outdated

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch 16 times, most recently from 40654f8 to e2de424 Compare July 11, 2025 14:09

EvgeniiMekhanik marked this pull request as ready for review July 11, 2025 14:09

EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch 5 times, most recently from 7b5e367 to ac06de7 Compare July 14, 2025 21:12

const-t requested changes Apr 3, 2026

View reviewed changes

Comment thread fw/client.c

Comment thread fw/client.c Outdated

Comment thread fw/client.c Outdated

Comment thread fw/client.c Outdated

Comment thread fw/client.c

const-t reviewed Apr 7, 2026

View reviewed changes

Comment thread fw/pool.h Outdated

const-t reviewed Apr 20, 2026

View reviewed changes

Comment thread lib/log.h Outdated

Comment thread lib/log.h Outdated

Comment thread lib/log.h Outdated

Comment thread lib/log.h Outdated

Comment thread lib/log.h Outdated

Comment thread fw/client.c Outdated

Comment thread fw/client.c Outdated

const-t reviewed Apr 23, 2026

View reviewed changes

const-t approved these changes Apr 23, 2026

View reviewed changes

krizhanovsky mentioned this pull request Apr 24, 2026

Slow read DoS prevention #1715

Open

krizhanovsky approved these changes Apr 25, 2026

View reviewed changes

EvgeniiMekhanik and others added 23 commits April 28, 2026 13:08

Implement new config option client_mem.

1ca04af

client_mem <soft_limit> <hard_limit> - controls haw many memory is used to store unanswered client requests and requests with linked responses which can not be forwarded to a client.

Codestyle fixes

38ca553

Adjust frame size during snd_wnd calculation

352bcbd

Adjust FRAME_HEADER_SIZE during calculation of send window during making frames. (There was a mistake with accuracy of send window calculation, we don't take into account, that each frame also contains frame header).

Change API of some functions.

bbe69f9

To track socket memory we should pass TfwHttpMsg * not TfwMsgIter * to most of http_nsg_* functions, because TfwHttpMsg has a pointer to connection and socket.

Adjust memory used by Tempesta FW.

694b98b

In task #498 we decide to use `client_mem` option to limit count of memory used by client. This commit is a part of this task - now Tempesta FW uses `sk->sk_rmem_alloc` to adjust memory used by Tempesta FW for this client connection.

Drop connetion with TCP RST if client mem is exceeded

40d2bb9

If administrator specify `client_mem` and the memory used by all connection of current client exceeded this value Tempesta FW drops connection and block client by ip if `ip_block on;` is specified.

Remove extra connections callbacks

6f69c6f

- Usually we use callbacks which are set in `skb->cb` for different purposes. So remove to callbacks, which was added in previous patches and use callbacks saved in `skb->cb`.

Use per cpu counter instead of atomic

85bc711

A big performance degradation was found after this patch. During investigation it was found that the problem is in usage atomic counter for client mem accounting. Usage per_cpu array instead of atomic counter fix a performance issue.

Fix according review

6f59e49

- Use cache for client mem allocations - Use typedef for TfwClientMem structure and pass TfwClientMem * pointer instread of void * in all functions. - Make client_mem option reconfigurable.

Fix BUG in case of allocation fail.

6bc6875

We should check that TlsMpiPool was allocated during `ttls_mpool_exit`

Update copyright years

d6a3271

Adjust coding style and add a minor comment

1273899

Conversation

EvgeniiMekhanik commented Jul 1, 2025

Uh oh!

EvgeniiMekhanik Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

krizhanovsky Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

const-t Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

const-t left a comment

Choose a reason for hiding this comment

Uh oh!

krizhanovsky left a comment

Choose a reason for hiding this comment

Uh oh!

krizhanovsky Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krizhanovsky Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

krizhanovsky Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants