-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework HTTP message forwarning #1955
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need several small fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just couple of comments
4933987
to
97db973
Compare
76aeaa3
to
2ce9209
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request is an impressive amount of work, but I think there are still things to optimize, maybe we just need some discussions - please have a look onto my comments and let's discuss them.
I reviewed https://tempesta-tech.com/knowledge-base/Modify-HTTP-Messages/ regarding headers appending and it seems now we don't have ,
-appending and there is no actual logic in the code for appending. Again - let's discuss.
fw/cache.c
Outdated
const bool h2 = TFW_MSG_H2(req), tls = TFW_CONN_TLS(req->conn); | ||
const bool sh_frag = h2 ? false : (true & tls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same as
const bool h2 = TFW_MSG_H2(req), tls = TFW_CONN_TLS(req->conn); | |
const bool sh_frag = h2 ? false : (true & tls); | |
const bool sh_frag = !TFW_MSG_H2(req) && TFW_CONN_TLS(req->conn); |
?
|
||
it->hdrs_len += h_app.len; | ||
return tfw_strcat(req->pool, orig_hdr, &h_app); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now append
is used for the -ENOENT
check only, i.e. it does not affect any logic, so __h2_req_hdrs()
should not accept append
.
Moreover, now it seems nobody actually use TfwHdrModsDesc->append
, so the whole logic in http.c and http_msg.c could be simplified by removing the flag processing. tfw_http_msg_hdr_xfrm()
is also called in only one place with append = 0
.
In my understanding now the wiki page is incorrect in (@RomanBelozerov could you also please confirm with tests?): for resp_hdr_add Cache-Control "no-cache"
it seems now we get
Cache-Control: no-store
Cache-Control: no-cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking append before calling __h2_req_hdrs()
is reasonable and I will do it. But I can't remove TfwHdrModsDesc->append
, this flag indicates type of operation: resp_hdr_add or resp_header_set. If append == 0
we use resp_header_set otherwise resp_hdr_add. append
means: Header must be appended to message. We can choose another name for the flag if it confusing.
tfw_http_msg_hdr_xfrm_str()
has a little bit of garbage code, I'll rewrite it.
const TfwStr *c, *end; | ||
unsigned int room, skb_room, n_copy, rlen, off, acc = 0; | ||
TfwMsgIter *it = &hm->iter; | ||
TfwPool* pool = hm->pool; | ||
|
||
BUG_ON(it->skb->len > SS_SKB_MAX_DATA_LEN); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the function is called a lot of times, so it should be well optimized. I think it makes sense to make if (unlikely(skb_headlen(it->skb)))
since even if we have a lot of such skbs, then only the first call hits the condition and all following calls go further. The inner if
can eliminate unliekly
- it's no sense to have 2 nested unlikely statements.
Also probably it makes sense to add if (likely(TFW_STR_PLAIN(s)))
to TFW_STR_FOR_EACH_CHUNK_INIT()
and unlikely(room == 0)
in the while
loop at the below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment will be addressed in #2136
fw/http_msg.c
Outdated
@@ -1336,32 +1287,30 @@ tfw_http_msg_expand_data(TfwMsgIter *it, struct sk_buff **skb_head, | |||
return 0; | |||
} | |||
|
|||
static int | |||
tfw_http_msg_alloc_from_pool(TfwHttpTransIter *mit, TfwPool* pool, size_t size) | |||
static char * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function returns an address in memory, not an array of chars, so I'd propose to make it void *
fw/http.c
Outdated
if (req->cleanup) { | ||
__tfw_http_msg_cleanup(req->cleanup); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (req->cleanup) { | |
__tfw_http_msg_cleanup(req->cleanup); | |
} | |
if (req->cleanup) | |
__tfw_http_msg_cleanup(req->cleanup); |
fw/http.c
Outdated
req->cleanup = tfw_pool_alloc(hm->pool, sizeof(TfwHttpMsgCleanup)); | ||
if (unlikely(!req->cleanup)) | ||
return -ENOMEM; | ||
memset(req->cleanup, 0, sizeof(TfwHttpMsgCleanup)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have pages_sz
to check whether we have any pages in the data structures, so why do we do memset()
instead of just to zeroize pages_sz
and maybe skb_head
?
if (!cache) | ||
r = tfw_http_msg_expand_from_pool((TfwHttpMsg *)resp, &hdr); | ||
else | ||
r = tfw_http_msg_expand_data(&resp->iter, skb_head, &hdr, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to add a TODO #634
comment. The function is relatively heavyweight operating with skbs and probably it makes sense to reconstruct all the headers from the cache also into TfwPool allocated area.
Please review #634 (comment) and the whole issue - maybe you have more thoughts after the pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment will be addressed in #2136
} | ||
|
||
if (tfw_http_hdr_sub(hid, hdr, h_mods)) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we make the headers substitution here? The function is just traverses the list of the headers to be substituted, but doesn't do anything useful. The same for tfw_http_adjust_resp()
. If it looks too big, then let's add a comment in the code and in #634 to do it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headers substitution splitted into two parts: first tfw_http_hdr_sub()
where we just skip header and don't add it to message, because we iterating over headers not over list of modifications. Second tfw_h1_add_loc_hdrs()
where we adding all headers. We don't do it in tfw_http_hdr_sub()
because in this case we need add additional if condition into tfw_h1_add_loc_hdrs()
to skip headers if append == 0
- resp/req_header_set case.
Name tfw_http_hdr_sub()
looks little bit confusing, I will rename it.
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
Now we use cleanup field instead of `old_head`, cleanup also used during http1 request adjusting.
Allowed to use unknown method for headers that stored in dynamic table. Before if tempesta received such request it was dropped. Unknown method is method that processed as _TFW_HTTP_METH_UNKNOWN.
`tfw_http_add_hdr_clen()` replaced by new function `tfw_http_resp_term_add_hdr_clen()`. Now it constructs content-length header into headers table, without skb modification as it done before. The headers copies to response in the tfw_http_adjust_resp() along with other headers. `tfw_http_msg_hdr_xfrm()` - deleted, because it's un- used. Some of functions that were used only by `tfw_http_msg_hdr_xfrm()/tfw_http_msg_hdr_xfrm_str()` not deleted, because we consider them as library functions. For instance: `tfw_strcpy_comp_ext()`. `ss_skb_get_room()` could be lib function, but it's just a wrapper for `ss_skb_get_room_w_frag()`. `__hdr_is_singular()` was used during adding headers configured by administrator, but we treat adding singular RAW headers via `resp_hdr_add` as misconfiguration, there is no sense to check it for each header.
In this patch we optimize header skipping logic that applies in header modification for HTTP1 request/re- sponse and HTTP2 response including cache. Overview of current behavior. It consists of two parts: skipping and addition. During response transformation or copying we skipping headers that must be substituted or deleted using `tfw_http_hdr_skip()`, when all headers from original response are copied we add new headers (directive (resp/req)_hdr_(add/set)) and add headers that must be substituted ((resp/req)_hdr_set). What have done: 1. `tfw_hdr_mods_t::spec_hdrs` turned into bitmap. 2. `tfw_hdr_mods_t::spec_hdrs` contains used only for *_hdr_set directives, headers related to *_hdr_add must not be skipped, thus excluded from `spec_hdrs`. 2. `tfw_http_hdr_skip()` only iterates over *_hdr_set headers and doesn't check `append` flag. 3. added `tfw_hdr_mods_t::scan_off`, this is offset where we start itarating headers and comparing the name. It's the case when modification not found by special header index or hpack index. 4. changed the layout of `tfw_hdr_mods_t::hdrs`, now we storing headers in following way: .----------------------------------------. | `tfw_hdr_mods_t::hdrs` | :----------------------------------------: | Special headers *hdr_set | :----------------------------------------: | Hpack indexed raw headers *hdr_set | :----------------------------------------: | Non-indexed raw headers *hdr_set | :----------------------------------------: | All headers *hdr_add | '----------------------------------------' due this structure we can iterate only over unindexed raw headers. Note: Looks like iterating over headers modification table especially for removing is more profit approach, because if modification applied we will not iterate over this opearation again, we using it during http2 request headers modification, but it has disadvantages. For instance we must realloc header table or split skipping and adition(similar with current approach) and such approach can't be used with current cache architecture.
Just fix to commit with logic unifying, this patch adds abillity to adding headers to HTTP2 request, without this header will be substituted even if req_hdr_add directive was used.
Add a new method intended to close parsed HTTP1.1 method. `tfw_http_msg_hdr_close()` is not suitable in this case, it has a lot of logic that unnecessary for method.
fix 1: Skip trailer headers during copying regular headers to not duplicate them in headers and in trailer. fix 2: Call `mark_spec_hbh` for headers with only LF in the end of a message. fix 3: Headers `connection` and `keep-alive` marked as hop-by-hop during parsing, it's allowed to remove following conditions: ``` if (hid == TFW_HTTP_HDR_KEEP_ALIVE || hid == TFW_HTTP_HDR_CONNECTION) ``` do only simple `if (tgt->flags & TFW_STR_HBH_HDR)`. Before this patch `keep-alive` was marked as HBH, but only if `keep-alive` specified in `connection` header. RFC 9110 7.6.1 say that intermediaries SHOULD remove or replace such headers even if the not listed in Connection.
We had a bug during the suffix match in function `tfw_http_search_cookie()`, before comparing the name we store current chunk `t = *chunk`, then modify and compare the chunk, after we rostore the chunk `*chunk = t;` that leads to header corruption, because macros `TFW_STR_EQ_CSTR` may assign new address to `chunk`, therefore restoration happens to a newly assigned adress instead of original adress in `chunk`. Fixed by storing address to temporary variable and then restore `data` and `len` by this adress. `__tfw_str_eq_cstr` rewritten to inline function. Inline functions is more clear, less error prone and suitable in this case, because we don't need to define tmp vars and explicitly save values, only move `chunk` pointer forward. Generated assembly will be the same.
We must copy a mark from old skb head to new, otherwise we will lose the mark.
Call `mark_spec_hbh` only for parsed special header and do not iterate over headers table.
Validating a syntax of the `Host` header, to catch protocol violations.
- return replaced with `goto clean` in `tfw_h1_adjust_req()`
Before this patch we compared the Host header and uri's host as strings including port part. However, in case when explicitly specified default port in one of them, but not specified in the second we had error. Example: uri: tempesta-tech.com host: tempesta-tech.com:80 In this case uri and host must be treated as the same, due to 80 port is default and can be omitted. In this patch we parse uri and host in the same way, we mark host part from the uri and from Host header as TFW_STR_VALUE excluding the port (e.g ":80") part. Port we parse and save into `TfwHttpReq::uri_port` for uri port and into `TfwHttpReq::host` for host respectively. Then in `frang_http_host_check()` we compare host parts that marked as TFW_STR_VALUE (use ":" as stop symbol) and separately compare ports.
This commit adds special case for setting Host header in the request using `req_hdr_set` directive. We simply write the new Host header to the request skipping the old one during request adjusting. However we don't update original Host header in headers table, to be able to use original header. We need original header during caching response, in this case we make the key using original Host header. We do so, because we always find response in the cache using original host header.
Add this overriding it exists in original version of tfw_h1_adjust_req().
In case when request has absolute URI that looks like this: `http://tempesta-tech.com` - without trailing slash, `uri_path` will be empty, this commit handles this case.
Userinfo subcomponent is depricated by RFC 9110 4.2.4 Empty host in URI prohibited by RFC 9110 4.2.1. Example of URI with empty host: "http:///path"
Free cleanup structure in safe way, to prevent use-after-free. We free clean structure immediately after error to get free memory to create the response.
Write method string directly to access log instead of using method id and table with static names. It gives an abillity to write to log methods that not listed in `tfw_http_meth_t`.
Separate HEAD to GET from PURGE to GET logic
HTTP part of #1103.