Mekhanik evgenii/fix 1346 1#2456
Conversation
a62787c to
c9bba36
Compare
| __u8 pfmemalloc = skb->pfmemalloc; | ||
|
|
||
| WARN_ON_ONCE(skb->sk); | ||
| skb_orphan(skb); |
There was a problem hiding this comment.
Please pay attention on this place. Here we release skb owner and decrease client->mem. This function ss_skb_init_for_xmit is called before push skb to the socket write queue. So all skbs in socket write queue are not taken into account for client memory calculation. We release skb owner here, because if don't do it we need a rather big kernel patch to adjust skb memory before it will be passed to socket write queue. @krizhanovsky @const-t what do you think about it?
There was a problem hiding this comment.
Why don't we make a pointer to a client accounting in skb->cb instead of to play with skb_orphan()? I'd prefer to avoid this since we can get plenty of crashes in this patch or in later kernel version migrations due to breaking kernel logic about orphaned skbs.
c9bba36 to
26e3525
Compare
40654f8 to
e2de424
Compare
7b5e367 to
ac06de7
Compare
|
|
||
| static inline int | ||
| tfw_cli_mem_init(TfwClientMem *cli_mem) | ||
| tfw_cli_mem_init(TfwClientMem *cli_mem, bool from_pool) |
There was a problem hiding this comment.
Maybe just pass flags would be better? But it is not important I'm ok with it
const-t
left a comment
There was a problem hiding this comment.
LGTM, however review from @krizhanovsky is required, especially to approve new approach with memory limit checking.
krizhanovsky
left a comment
There was a problem hiding this comment.
LGTM for now. I listed all required to dos in #1715 (comment) and made a small commit to adjust coding style to not to pollute the comment. Let's address the comments in a new PR - it becomes to hard to review this PR - let's just merge it for now.
| conn->peer, skb->truesize); | ||
| } | ||
|
|
||
| r = frang_client_mem_limit((TfwCliConn *)conn, false); |
There was a problem hiding this comment.
@const-t is this comment still relevant? I see for example, that ss_skb_adjust_data_len() is called fromtfw_hpack_cache_decode_expand() -> tfw_hpack_cache_decode_expand() -> tfw_http_msg_expand_data(), i.e the current code does account HTTP responses
| BUG_ON(!conn); | ||
| BUG_ON(!list_empty(&conn->list)); | ||
| BUG_ON(conn->stream.msg); | ||
| BUG_ON(conn->write_queue); |
There was a problem hiding this comment.
The point was to comment this in code, not in PR
| if (tfw_runstate_is_reconfig()) | ||
| return; | ||
|
|
||
| tfw_client_free_lru(); |
There was a problem hiding this comment.
It seems the content of commit a3ad176 and its message are about different things
client_mem <soft_limit> <hard_limit> - controls haw many memory is used to store unanswered client requests and requests with linked responses which can not be forwarded to a client.
Adjust FRAME_HEADER_SIZE during calculation of send window during making frames. (There was a mistake with accuracy of send window calculation, we don't take into account, that each frame also contains frame header).
To track socket memory we should pass TfwHttpMsg * not TfwMsgIter * to most of http_nsg_* functions, because TfwHttpMsg has a pointer to connection and socket.
In task #498 we decide to use `client_mem` option to limit count of memory used by client. This commit is a part of this task - now Tempesta FW uses `sk->sk_rmem_alloc` to adjust memory used by Tempesta FW for this client connection.
In task #498 we decide to use `client_mem` option to limit count of memory used by client. This commit is a part of this task and the next step of implementaion. Previosly Tempesta FW uses `sk->sk_rmem_alloc` to adjust memory used by Tempesta FW for this client connection, now we adjust memory for the whole TfwClient, because the can be a lot of connection for one client and for all other cases we use limitation for TfwClient and block it if necessary.
If administrator specify `client_mem` and the memory used by all connection of current client exceeded this value Tempesta FW drops connection and block client by ip if `ip_block on;` is specified.
Previosuly we get connection when we adjust memory for skb, but it leads to several problems: - we can't adjust memory for skb before tls decryption, because skb from `tls->io_in.skb_list` are freed during connection released (but connection will be never released if we increment it's reference counter for these skbs). - We have the same problems for skbs, which are wait for appropriate tcp window to be pushed in socket write queue. Now we increment/decrement reference counter for TfwClient and adjust skb memory for requests before tls decryption.
Previously we adjust tcp send window only for http2 connection and only during making HEADER or DATA frames, but if we want to control client memory usage we should do it for all type of sending data. (We orphane skb and decrease memory usage when we pass skb to the socket write queue, so we we don't adjust tcp send window we push a lot of skbs in socket write queue and don't adjust it's memory).
- remove `client_get_light/client_put_light` functions, because after removing lock from `client` structure we don't need these functions at all. - Adjust memory usage of skb in `skb->cb`. Usually it is equal to `skb->truesize, but for some cases ( skb which was created by `pskb_copy_for_clone` for example it is different).
Do not use `skb->sk` and `skb->destructor` to check memory used by skb, use `skb->cb` for this purposes. - Implement our own version of `skb_orphan` with name `ss_skb_orphan` which is called when skb is freed in Tempesta FW code our just before pushing skb to socket write queue. - Implement wrappers over `__kfree_skb` and `kfree_skb` where we call `ss_skb_orphan` before free skb. - Check that skb is pushed to socket write queue, using new ipmlemented function `skb_tfw_is_in_socket_write_queue` from linux kernel, to skip adjusting memory used be skb, when it belongs to kernel (when `ss_skb_*` functions called from `tls_encrypt`).
- Usually we use callbacks which are set in `skb->cb` for different purposes. So remove to callbacks, which was added in previous patches and use callbacks saved in `skb->cb`.
- Since we use pool for http memory allocation, change api of all `tfw_pool_*` functions to pass `TfwClient` and accounting memory in this structure. - Remove `TfwClient` refcounter (it not used, can be done in previous commits). - Fix unit tests to check memory accounting, cleanup memory after each test, to check that client memory is equal to zero after test.
A big performance degradation was found after this patch. During investigation it was found that the problem is in usage atomic counter for client mem accounting. Usage per_cpu array instead of atomic counter fix a performance issue.
Previously we remove client entry from TDB if there is no entry in `client_lru.free_list` and new client is allocated, even if such removed client still have any active connections. There is a BUG in such strategy - if this removed client has hung connections, we can't close and destroy them during Tempesta FW unloading, because we close and destroy connections during iteration through active clients (`tfw_client_for_each`). In new strategy we change logic in `tdb_htrie_put_rec`. We add pointer to the bucket in the record structure. When we remove record we zeroed this pointer. If record reference counter became equal to zero, but bucket pointer is still not NULL (record was not removed) we remove such record from the bucket using this pointer. For clients we just use tfw_client_put, without record removing, when client reference counter became equal to zero client record will be removed from bucket and freed.
We can't call tfw_client_get/put on each allocated or orphaned skb. (Or each pool creation/destroing). Under pressure when we have a lot of cpus that lead to atomic contention and bad performance degradation. To fix this problem we implement special TfwClientMem structure, with it's own reference accounting (using struct percpu_ref!) and save in the client structure point to it. We use percpu_ref_tryget/percpu_ref_put during skb allocation/deallocation (it's very cheap). When we destroy client we schedule work, call `percpu_ref_kill_and_confirm` and wait until all skbs will be orphaned. Also make some fixes according review: - Call `tfw_client_free` for incomplete records also. - Implement `tfw_alloc_percpu_gfp` same as `alloc_percpu_gfp` but with error injection - Fix memory accouting during copying skbs.
- Use cache for client mem allocations - Use typedef for TfwClientMem structure and pass TfwClientMem * pointer instread of void * in all functions. - Make client_mem option reconfigurable.
- Preallocate and initialize TfwClientMem structures according to client_cfg.lru_size. During TfwCLient structure allocation try to get TfwClientMem from preallocated pool. If this pool is empty alloc TfwClientMem from cache. - Implement new fault injection alloc functions to cover new code. 1396: small: finished in 50.03s, 1294710.94 req/s, 1002.60MB/s finished in 50.03s, 1287197.04 req/s, 998.01MB/s large: finished in 50.03s, 103497.42 req/s, 9.90GB/s finished in 50.03s, 108665.42 req/s, 10.39GB/s 1396 with client_mem: small: finished in 50.03s, 1225390.98 req/s, 948.74MB/s finished in 50.03s, 1223275.66 req/s, 947.72MB/s large: finished in 50.08s, 78906.58 req/s, 7.55GB/s finished in 50.08s, 86201.98 req/s, 8.24GB/s master: small: finished in 50.03s, 1294782.10 req/s, 1002.49MB/s finished in 50.03s, 1294782.10 req/s, 1001.33MB/s large: finished in 50.04s, 98625.80 req/s, 9.43GB/s finished in 50.04s, 97767.22 req/s, 9.35GB/s
We should check that TlsMpiPool was allocated during `ttls_mpool_exit`
- Remove SS_* errocodes, use only T_* error codes, since we already include "lib/log.h" in all files. - Split enum with error codes to two enums (one for common error codes and one for internal). - List all error codes from least crusial to most crusial - Implement some useful functions.
Don not check client memory consumption during parsing requests/http2 frame processing, check it only at the end of softirq rx path. Under load we can process little bit more requests, before dropping connection and block client but in this case we have no perfomance degradation: 1396 with memory check: small: finished in 50.03s, 1308389.70 req/s, 1013.29MB/s finished in 50.03s, 1303456.74 req/s, 1010.62MB/s finished in 50.03s, 1298829.02 req/s, 1007.03MB/s large: finished in 50.08s, 103259.90 req/s, 9.88GB/s finished in 50.08s, 111437.44 req/s, 10.66GB/s finished in 50.07s, 101983.44 req/s, 9.75GB/s finished in 50.07s, 111777.22 req/s, 10.69GB/s master: small: finished in 50.03s, 1330912.18 req/s, 1.01GB/s finished in 50.03s, 1343435.90 req/s, 1.02GB/s finished in 50.03s, 1344150.06 req/s, 1.02GB/s large: finished in 50.04s, 98945.70 req/s, 9.46GB/s finished in 50.04s, 96055.18 req/s, 9.19GB/s finished in 50.04s, 102439.50 req/s, 9.80GB/s
No description provided.