Skip to content

Mekhanik evgenii/fix 1346 1#2456

Open
EvgeniiMekhanik wants to merge 23 commits intomasterfrom
MekhanikEvgenii/fix-1346-1
Open

Mekhanik evgenii/fix 1346 1#2456
EvgeniiMekhanik wants to merge 23 commits intomasterfrom
MekhanikEvgenii/fix-1346-1

Conversation

@EvgeniiMekhanik
Copy link
Copy Markdown
Contributor

No description provided.

@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch from a62787c to c9bba36 Compare July 2, 2025 09:46
Comment thread fw/ss_skb.c Outdated
__u8 pfmemalloc = skb->pfmemalloc;

WARN_ON_ONCE(skb->sk);
skb_orphan(skb);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pay attention on this place. Here we release skb owner and decrease client->mem. This function ss_skb_init_for_xmit is called before push skb to the socket write queue. So all skbs in socket write queue are not taken into account for client memory calculation. We release skb owner here, because if don't do it we need a rather big kernel patch to adjust skb memory before it will be passed to socket write queue. @krizhanovsky @const-t what do you think about it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we make a pointer to a client accounting in skb->cb instead of to play with skb_orphan()? I'd prefer to avoid this since we can get plenty of crashes in this patch or in later kernel version migrations due to breaking kernel logic about orphaned skbs.

@krizhanovsky krizhanovsky mentioned this pull request Jul 7, 2025
2 tasks
@EvgeniiMekhanik EvgeniiMekhanik marked this pull request as draft July 8, 2025 09:12
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch from c9bba36 to 26e3525 Compare July 8, 2025 09:12
Comment thread fw/http_limits.h Outdated
Comment thread fw/sock_clnt.c
Comment thread fw/ss_skb.h Outdated
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch 16 times, most recently from 40654f8 to e2de424 Compare July 11, 2025 14:09
@EvgeniiMekhanik EvgeniiMekhanik marked this pull request as ready for review July 11, 2025 14:09
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-1346-1 branch 5 times, most recently from 7b5e367 to ac06de7 Compare July 14, 2025 21:12
Comment thread fw/client.c
Comment thread fw/client.c Outdated
Comment thread fw/client.c Outdated
Comment thread fw/client.c Outdated
Comment thread fw/client.c
Comment thread fw/pool.h Outdated
Comment thread lib/log.h Outdated
Comment thread lib/log.h Outdated
Comment thread lib/log.h Outdated
Comment thread lib/log.h Outdated
Comment thread lib/log.h Outdated
Comment thread fw/client.c Outdated
Comment thread fw/client.c Outdated
Comment thread fw/client.c Outdated

static inline int
tfw_cli_mem_init(TfwClientMem *cli_mem)
tfw_cli_mem_init(TfwClientMem *cli_mem, bool from_pool)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just pass flags would be better? But it is not important I'm ok with it

Copy link
Copy Markdown
Contributor

@const-t const-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, however review from @krizhanovsky is required, especially to approve new approach with memory limit checking.

Copy link
Copy Markdown
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now. I listed all required to dos in #1715 (comment) and made a small commit to adjust coding style to not to pollute the comment. Let's address the comments in a new PR - it becomes to hard to review this PR - let's just merge it for now.

Comment thread fw/http.c Outdated
conn->peer, skb->truesize);
}

r = frang_client_mem_limit((TfwCliConn *)conn, false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@const-t is this comment still relevant? I see for example, that ss_skb_adjust_data_len() is called fromtfw_hpack_cache_decode_expand() -> tfw_hpack_cache_decode_expand() -> tfw_http_msg_expand_data(), i.e the current code does account HTTP responses

Comment thread fw/http_frame.c Outdated
Comment thread fw/connection.h
BUG_ON(!conn);
BUG_ON(!list_empty(&conn->list));
BUG_ON(conn->stream.msg);
BUG_ON(conn->write_queue);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point was to comment this in code, not in PR

Comment thread fw/client.c Outdated
if (tfw_runstate_is_reconfig())
return;

tfw_client_free_lru();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the content of commit a3ad176 and its message are about different things

EvgeniiMekhanik and others added 23 commits April 28, 2026 13:08
client_mem <soft_limit> <hard_limit> - controls haw many
memory is used to store unanswered client requests and
requests with linked responses which can not be forwarded
to a client.
Adjust FRAME_HEADER_SIZE during calculation of send window
during making frames. (There was a mistake with accuracy
of send window calculation, we don't take into account,
that each frame also contains frame header).
To track socket memory we should pass TfwHttpMsg * not
TfwMsgIter * to most of http_nsg_* functions, because
TfwHttpMsg has a pointer to connection and socket.
In task #498 we decide to use `client_mem` option to limit
count of memory used by client. This commit is a part of
this task - now Tempesta FW uses `sk->sk_rmem_alloc` to
adjust memory used by Tempesta FW for this client connection.
In task #498 we decide to use `client_mem` option to limit
count of memory used by client. This commit is a part of
this task and the next step of implementaion. Previosly
Tempesta FW uses `sk->sk_rmem_alloc` to adjust memory
used by Tempesta FW for this client connection, now we
adjust memory for the whole TfwClient, because the can
be a lot of connection for one client and for all other
cases we use limitation for TfwClient and block it if
necessary.
If administrator specify `client_mem` and the memory
used by all connection of current client exceeded this
value Tempesta FW drops connection and block client
by ip if `ip_block on;` is specified.
Previosuly we get connection when we adjust memory
for skb, but it leads to several problems:
- we can't adjust memory for skb before tls decryption,
because skb from `tls->io_in.skb_list` are freed
during connection released (but connection will be
never released if we increment it's reference counter
for these skbs).
- We have the same problems for skbs, which are wait
for appropriate tcp window to be pushed in socket
write queue.
Now we increment/decrement reference counter for
TfwClient and adjust skb memory for requests before
tls decryption.
Previously we adjust tcp send window only for http2
connection and only during making HEADER or DATA frames,
but if we want to control client memory usage we should
do it for all type of sending data. (We orphane skb and
decrease memory usage when we pass skb to the socket
write queue, so we we don't adjust tcp send window we
push a lot of skbs in socket write queue and don't
adjust it's memory).
- remove `client_get_light/client_put_light` functions,
  because after removing lock from `client` structure
  we don't need these functions at all.
- Adjust memory usage of skb in `skb->cb`. Usually it
  is equal to `skb->truesize, but for some cases (
  skb which was created by `pskb_copy_for_clone` for
  example it is different).
Do not use `skb->sk` and `skb->destructor` to check
memory used by skb, use `skb->cb` for this purposes.
- Implement our own version of `skb_orphan` with
name `ss_skb_orphan` which is called when skb is freed
in Tempesta FW code our just before pushing skb to
socket write queue.
- Implement wrappers over `__kfree_skb` and `kfree_skb`
  where we call `ss_skb_orphan` before free skb.
- Check that skb is pushed to socket write queue, using
  new ipmlemented function `skb_tfw_is_in_socket_write_queue`
  from linux kernel, to skip adjusting memory used be skb,
  when it belongs to kernel (when `ss_skb_*` functions
  called from `tls_encrypt`).
- Usually we use callbacks which are set in `skb->cb`
  for different purposes. So remove to callbacks, which
  was added in previous patches and use callbacks saved
  in `skb->cb`.
- Since we use pool for http memory allocation, change
  api of all `tfw_pool_*` functions to pass `TfwClient`
  and accounting memory in this structure.
- Remove `TfwClient` refcounter (it not used, can be done
  in previous commits).
- Fix unit tests to check memory accounting, cleanup memory
  after each test, to check that client memory is equal to
  zero after test.
A big performance degradation was found after this patch.
During investigation it was found that the problem is in
usage atomic counter for client mem accounting. Usage per_cpu
array instead of atomic counter fix a performance issue.
Previously we remove client entry from TDB if there
is no entry in `client_lru.free_list` and new client
is allocated, even if such removed client still have
any active connections. There is a BUG in such strategy -
if this removed client has hung connections, we can't
close and destroy them during Tempesta FW unloading,
because we close and destroy connections during iteration
through active clients (`tfw_client_for_each`).
In new strategy we change logic in `tdb_htrie_put_rec`.
We add pointer to the bucket in the record structure. When we
remove record we zeroed this pointer. If record reference counter
became equal to zero, but bucket pointer is still not NULL (record
was not removed) we remove such record from the bucket using
this pointer. For clients we just use tfw_client_put, without
record removing, when  client reference counter became equal
to zero client record will be removed from bucket and freed.
We can't call tfw_client_get/put on each allocated
or orphaned skb. (Or each pool creation/destroing).
Under pressure when we have a lot of cpus that lead
to atomic contention and bad performance degradation.
To fix this problem we implement special TfwClientMem
structure, with it's own reference accounting (using
struct percpu_ref!) and save in the client structure
point to it. We use percpu_ref_tryget/percpu_ref_put
during skb allocation/deallocation (it's very cheap).
When we destroy client we schedule work, call
`percpu_ref_kill_and_confirm` and wait until all skbs
will be orphaned.
Also make some fixes according review:
- Call `tfw_client_free` for incomplete records also.
- Implement `tfw_alloc_percpu_gfp` same as `alloc_percpu_gfp`
  but with error injection
- Fix memory accouting during copying skbs.
- Use cache for client mem allocations
- Use typedef for TfwClientMem structure and pass
  TfwClientMem * pointer instread of void * in all
  functions.
- Make client_mem option reconfigurable.
- Preallocate and initialize TfwClientMem structures
  according to client_cfg.lru_size. During TfwCLient
  structure allocation try to get TfwClientMem from
  preallocated pool. If this pool is empty alloc
  TfwClientMem from cache.
- Implement new fault injection alloc functions to
  cover new code.

1396:
small:
finished in 50.03s, 1294710.94 req/s, 1002.60MB/s
finished in 50.03s, 1287197.04 req/s, 998.01MB/s
large:
finished in 50.03s, 103497.42 req/s, 9.90GB/s
finished in 50.03s, 108665.42 req/s, 10.39GB/s
1396 with client_mem:
small:
finished in 50.03s, 1225390.98 req/s, 948.74MB/s
finished in 50.03s, 1223275.66 req/s, 947.72MB/s
large:
finished in 50.08s, 78906.58 req/s, 7.55GB/s
finished in 50.08s, 86201.98 req/s, 8.24GB/s
master:
small:
finished in 50.03s, 1294782.10 req/s, 1002.49MB/s
finished in 50.03s, 1294782.10 req/s, 1001.33MB/s
large:
finished in 50.04s, 98625.80 req/s, 9.43GB/s
finished in 50.04s, 97767.22 req/s, 9.35GB/s
We should check that TlsMpiPool was allocated during
`ttls_mpool_exit`
- Remove SS_* errocodes, use only T_* error codes, since we already
  include "lib/log.h" in all files.
- Split enum with error codes to two enums (one for
  common error codes and one for internal).
- List all error codes from least crusial to most crusial
- Implement some useful functions.
Don not check client memory consumption during parsing
requests/http2 frame processing, check it only at the
end of softirq rx path. Under load we can process little
bit more requests, before dropping connection and block
client but in this case we have no perfomance degradation:
1396 with memory check:
small:
finished in 50.03s, 1308389.70 req/s, 1013.29MB/s
finished in 50.03s, 1303456.74 req/s, 1010.62MB/s
finished in 50.03s, 1298829.02 req/s, 1007.03MB/s
large:
finished in 50.08s, 103259.90 req/s, 9.88GB/s
finished in 50.08s, 111437.44 req/s, 10.66GB/s
finished in 50.07s, 101983.44 req/s, 9.75GB/s
finished in 50.07s, 111777.22 req/s, 10.69GB/s
master:
small:
finished in 50.03s, 1330912.18 req/s, 1.01GB/s
finished in 50.03s, 1343435.90 req/s, 1.02GB/s
finished in 50.03s, 1344150.06 req/s, 1.02GB/s
large:
finished in 50.04s, 98945.70 req/s, 9.46GB/s
finished in 50.04s, 96055.18 req/s, 9.19GB/s
finished in 50.04s, 102439.50 req/s, 9.80GB/s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants