= Name =
chunkin-nginx-module - HTTP 1.1 chunked-encoding request body support for Nginx.
This module is not distributed with the Nginx source. See the installation instructions.
This module is considered production ready.
This document describes chunkin-nginx-module v0.21 released on August 3, 2010.
<geshi lang="nginx"> chunkin on; error_page 411 = @my_411_error; location @my_411_error { chunkin_resume; } location /foo { # your fastcgi_pass/proxy_pass/set/if and # any other config directives go here... } ... </geshi> <geshi lang="nginx"> chunkin on; error_page 411 = @my_411_error; location @my_411_error { chunkin_resume; } location /bar { chunkin_keepalive on; # WARNING: too experimental! # your fastcgi_pass/proxy_pass/set/if and # any other config directives go here... } </geshi>This module adds HTTP 1.1 chunked input support for Nginx without the need of patching the Nginx core.
Behind the scene, it registers an access-phase handler that will eagerly read and decode incoming request bodies when a Transfer-Encoding: chunked
header triggers a 411
error page in Nginx. For requests that are not in the chunked
transfer encoding, this module is a "no-op".
To enable the magic, just turn on the chunkin config option and define a custom 411 error_page
using chunkin_resume, like this:
No other modification is required in your nginx.conf file and everything should work out of the box including the standard proxy module (except for those known issues). Note that the chunkin directive is not allowed in the location block while the chunkin_resume directive is only allowed on in locations
.
The core module's client_body_buffer_size, client_max_body_size, and client_body_timeout directive settings are honored. Note that, the "body sizes" here always indicate chunked-encoded body, not the data that has already been decoded. Basically, the chunked-encoded body will always be slightly larger than the original data that is not encoded.
The client_body_in_file_only and client_body_in_single_buffer settings are followed partially. See Know Issues.
This module is not supposed to be merged into the Nginx core because I've used Ragel to generate the chunked encoding parser for joy :)
Nginx explicitly checks chunked Transfer-Encoding
headers and absent content length header in its very
early phase. Well, as early as the ngx_http_process_request_header
function. So this module takes a rather tricky approach. That is, use an output filter to intercept the 411 Length Required
error page response issued by ngx_http_process_request_header
,
fix things and finally issue an internal redirect to the current location,
thus starting from those phases we all know and love, this time
bypassing the horrible ngx_http_process_request_header
function.
In the rewrite
phase of the newly created request, this module eagerly reads in the chunked request body in a way similar to that of the standard ngx_http_read_client_request_body
function, but using its own chunked parser generated by Ragel. The decoded request body will be put into r->request_body->bufs
and a corresponding Content-Length
header will be inserted into r->headers_in
.
Those modules using the standard ngx_http_read_client_request_body
function to read the request body will just work out of box because ngx_http_read_client_request_body
returns immediately when it sees r->request_body->bufs
already exists.
Special efforts have been made to reduce data copying and dynamic memory allocation.
syntax: chunkin on|off
default: off
context: http, server
Enables or disables this module's hooks.
syntax: chunkin_resume
default: none
context: location
This directive must be used in your custom 411 error page
location to help this module work correctly. For example:
<geshi lang="nginx"> error_page 411 = @my_error; location @my_error { chunkin_resume; } </geshi>
For the technical reason behind the necessity of this directive, please read the nginx-devel
thread Content-Length is not ignored for chunked requests: Nginx violates RFC 2616.
This directive was first introduced in the v0.17 release.
syntax: chunkin_max_chunks_per_buf <number></number>
default: 512
context: http, server, location
Set the max chunk count threshold for the buffer determined by the client_body_buffer_size directive.
If the average chunk size is 1 KB
and your client_body_buffer_size setting
is 1 meta bytes, then you should set this threshold to 1024
or 2048
.
When the raw body size is exceeding client_body_buffer_size or the chunk counter is exceeding this chunkin_max_chunks_per_buf
setting, the decoded data will be temporarily buffered into disk files, and then the main buffer gets cleared and the chunk counter gets reset back to 0 (or 1
if there's a "pending chunk").
This directive was first introduced in the v0.17 release.
syntax: chunkin_keepalive on|off
default: off
context: http, server, location, if
Turns on or turns off HTTP 1.1 keep-alive and HTTP 1.1 pipelining support.
Keep-alive without pipelining should be quite stable but pipelining support is very preliminary, limited, and almost untested.
This directive was first introduced in the v0.07 release.
Technical note on the HTTP 1.1 pipeling support
The basic idea is to copy the bytes left by my chunked parser in
r->request_body->buf
over into r->header_in
so that nginx's
ngx_http_set_keepalive
and ngx_http_init_request
functions will pick
it up for the subsequent pipelined requests. When the request body is
small enough to be completely preread into the r->header_in
buffer,
then no data copy is needed here -- just setting r->header_in->pos
correctly will suffice.
The only issue that remains is how to enlarge r->header_in
when the
data left in r->request_body->buf
is just too large to be hold in the
remaining room between r->header_in->pos
and r->header_in->end
. For
now, this module will just give up and simply turn off r->keepalive
.
I know we can always use exactly the remaining room in r->header_in
as
the buffer size when reading data from c->recv
, but's suboptimal when
the remaining room in r->header_in
happens to be very small while
r->request_body->buf
is quite large.
I haven't fully grokked all the details among r->header_in
, c->buffer
,
busy/free lists and those so-called "large header buffers". Is there a
clean and safe way to reallocate or extend the r->header_in
buffer?
When combining this module with ngx_proxy and ngx_fastcgi, nginx sends a "Transfer-Encoding: " header which is invalid and not being treated well by some webservers on backend, for example, riak. So a work-around for now is to use the ngx_headers_more module to remove the Transfer-Encoding
completely, as in
<geshi lang="nginx"> chunkin on;
error_page 411 = @my_411_error;
location @my_411_error { chunkin_resume;
}
location / {
more_clear_input_headers 'Transfer-Encoding'; proxy_pass http://riak;
} </geshi>
Thanks hoodoos for sharing this trick :)
Grab the nginx source code from nginx.net, for example, the version 0.8.41 (see nginx compatibility), and then build the source with this module:
<geshi lang="bash">
# Here we assume you would install you nginx under /opt/nginx/. $ ./configure --prefix=/opt/nginx \ --add-module=/path/to/chunkin-nginx-module $ make -j2 $ make install
</geshi>
Download the latest version of the release tarball of this module from chunkin-nginx-module file list.
The chunked parser is generated by Ragel. If you want to regenerate the parser's C file, i.e., src/chunked_parser.c, use the following command from the root of the chunkin module's source tree:
<geshi lang="bash">
$ ragel -G2 src/chunked_parser.rl
</geshi>
The following source and binary rpm files are contributed by Ernest Folch, with nginx 0.8.54, ngx_chunkin v0.21 and ngx_headers_more v0.13:
The following versions of Nginx should work with this module:
- 1.0.x (last tested: 1.0.2)
- 0.8.x (last tested: 0.8.54)
- 0.7.x >= 0.7.21 (last tested: 0.7.67)
If you find that any particular version of Nginx above 0.7.21 does not work with this module, please consider reporting a bug.
Although a lot of effort has been put into testing and code tuning, there must be some serious bugs lurking somewhere in this module. So whenever you are bitten by any quirks, please don't hesitate to
- send a bug report or even patches to <[email protected]></[email protected]>,
- or create a ticket on the issue tracking interface provided by GitHub.
Available on github at agentzh/chunkin-nginx-module.
- applied a patch from Gong Kaihui (龚开晖) to always call
post_handler
inngx_http_chunkin_read_chunked_request_body
.
- fixed a bug that may read incomplete chunked body. thanks Gong Kaihui (龚开晖).
- fixed various memory issues in the implementation which may cause nginx processes to crash.
- added support for chunked PUT requests.
- now we always require "error_page 411 @resume" and no default (buggy) magic any more. thanks Gong Kaihui (龚开晖).
- we now use ragel -G2 to generate the chunked parser and we're 36% faster.
- we now eagerly read the data octets in the chunked parser and we're 43% faster.
- added support for
chunk&#45;extension
to the chunked parser as per RFC 2616, but we just ignore them (if any) because we don't understand them. - added more diagnostic information for certian error messages.
- implemented the chunkin_max_chunks_per_buf directive to allow overriding the default
512
setting. - we now bypass nginx's discard requesty body bug by requiring our users to define explicit
411 error_page
with chunkin_resume in the error page location. Thanks J for reporting related bugs. - fixed
r&#45;&gt;phase_handler
in our post read handler. our handler may run one more time before :P - the chunkin handler now returns
NGX_DECLINED
rather thanNGX_OK
when ourngx_http_chunkin_read_chunked_request_body
function returnsNGX_OK
, to avoid bypassing other access-phase handlers.
- turned off ddebug in the previous release. thanks J for reporting it.
- fixed a regression that ctx->chunks_count never incremented in earlier versions.
- now we no longer skip those operations between the (interrupted) ngx_http_process_request_header and the server rewrite phase. this fixed the security issues regarding the internal directive as well as SSL sessions.
- try to ignore CR/LF/SP/HT at the begining of the chunked body.
- now we allow HT as padding spaces and ignore leading CRLFs.
- improved diagnostic info in the error.log messages when parsefail occurs.
- added a random valid-chunked-request generator in t/random.t.
- fixed a new connection leak issue caught by t/random.t.
- fixed a serious bug in the chunked parser grammer: there would be ambiguity when CRLF appears in the chunked data sections. Thanks J for reporting it.
- fixed gcc compilation errors on x86_64, thanks J for reporting it.
- used the latest Ragel 6.6 to generate the
chunked_parser.c
file in the source tree.
- marked the disgarded 411 error page's output chain bufs as consumed by setting
buf&#45;&gt;pos &#61; buf&#45;&gt;last
. (See this nginx-devel thread for more details.) - added the chunkin_keepalive directive which can enable HTTP 1.1 keep-alive and HTTP 1.1 pipelining, and defaults to
off
. - fixed the
alphtype
bug in the Ragel parser spec; which caused rejection of non-ascii octets in the chunked data. Thanks J for his bug report. - added
Test&#58;&#58;Nginx&#58;&#58;Socket
to test our nginx module on the socket level. Thanks J for his bug report. - rewrote the bufs recycling part and preread-buf-to-rb-buf transition part, also refactored the Ragel parser spec, thus eliminating lots of serious bugs.
- provided better diagnostics in the error log message for "bad chunked body" parsefails in the chunked parser. For example:
2009/12/02 17:35:52 [error] 32244#0: *1 bad chunked body (offset 7, near "4^M hell <-- HERE o^M 0^M ^M ", marked by " <-- HERE "). , client: 127.0.0.1, server: localhost, request: "POST /main
HTTP/1.1", host: "localhost" </geshi>
- added some code to let the chunked parser handle special 0-size chunks that are not the last chunk.
- fixed a connection leak bug regarding incorrect
r&#45;&gt;main&#45;&gt;count
reference counter handling for nginx 0.8.11+ (well, thengx_http_read_client_request_body
function in the nginx core also has this issue, I'll report it later.)
- minor optimization: we won't traverse the output chain link if the chain count is not large enough.
This module comes with a Perl-driven test suite. The test cases are declarative too. Thanks to the Test::Base module in the Perl world.
To run it on your side:
<geshi lang="bash">
$ cd test $ PATH=/path/to/your/nginx-with-chunkin-module:$PATH prove -r t
</geshi>
You need to terminate any Nginx processes before running the test suite if you have changed the Nginx server binary.
At the moment, LWP::UserAgent is used by the test scaffold for simplicity.
Because a single nginx server (by default, localhost&#58;1984
) is used across all the test scripts (.t
files), it's meaningless to run the test suite in parallel by specifying &#45;jN
when invoking the prove
utility.
Some parts of the test suite requires modules proxy and echo to be enabled as well when building Nginx.
- May not work with certain 3rd party modules like the upload module because it implements its own request body reading mechanism.
- "client_body_in_single_buffer on" may *not* be obeyed for short contents and fast network.
- "client_body_in_file_only on" may *not* be obeyed for short contents and fast network.
- HTTP 1.1 pipelining may not fully work yet.
- make the chunkin handler run at the end of the
access phase
rather than beginning. - add support for
trailers
as specified in the RFC 2616. - fix the known issues.
You'll be very welcomed to submit patches to the author or just ask for a commit bit to the source repository on GitHub.
agentzh (章亦春) <[email protected]></[email protected]>
This wiki page is also maintained by the author himself, and everybody is encouraged to improve this page as well.
The basic client request body reading code is based on the ngx_http_read_client_request_body
function and its utility functions in the Nginx 0.8.20 core. This part of code is copyrighted by Igor Sysoev.
Copyright (c) 2009, Taobao Inc., Alibaba Group ( http://www.taobao.com ).
Copyright (c) 2009, agentzh <[email protected]></[email protected]>.
This module is licensed under the terms of the BSD license.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the Taobao Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
- The original thread on the Nginx mailing list that inspires this module's development: "'Content-Length' header for POSTs".
- The orginal announcement thread on the Nginx mailing list: "The chunkin module: Experimental chunked input support for Nginx".
- The original blog post about this module's initial development.
- The thread discussing chunked input support on the nginx-devel mailing list: "Chunked request body and HTTP header parser".
- The echo module for Nginx module's automated testing.
- RFC 2616 - Chunked Transfer Coding.