| TOC |
|
By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. This document may not be modified, and derivative works of it may not be created.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 23, 2009.
The HyperText Transfer Protocol (HTTP) has been widely used by the World Wide Web (WWW) since 1990. This specification updates RFC 2616, defining how to parse HTTP requests and responses in a way that is compatible with user-agents (UAs) and servers at the time of writing.
[anchor1] (Remove this section upon publication.)
This is a work in progress, and may change in part, or in whole. Do not take anything in any draft version to be final. Comments are very welcome, and should be sent to geoffers@gmail.com .
Known issues as of writing:
- A.
- "one thing for the security section of that draft is the need for implementations to follow the spec exactly lest they be vulnerable to content stuffing that abuses differences in parsing algorithms" - Hixie
- B.
- Most are unchanged from [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.).
1.
Introduction
1.1.
Notational Conventions
1.1.1.
Basic ABNF Rules
1.2.
Terminology
1.3.
Conformance Requirements
2.
Errors
2.1.
Fatal Error
3.
Tokenization
3.1.
Shared Rules
3.2.
Requests
3.3.
Responses
4.
Parsing
4.1.
Unescaping Quoted Strings
5.
Security Considerations
6.
IANA Considerations
7.
References
7.1.
Normative References
7.2.
Informative References
Appendix A.
Acknowledgments
Appendix B.
Further Suggestions
§
Author's Address
§
Intellectual Property and Copyright Statements
| TOC |
Ever since HTTP's conception, there have never been any standards regarding its parsing in the real world. [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.) tried to improve this situation with a section (19.3) entitled "Tolerant Applications", providing advice about parsing requests and responses. However, it did not go into specific details that are needed for interoperability with current (non-conformant) user-agents (UAs) and servers. The lack of any current specification defining such specifics makes it hard for any new UA to be created without first spending large amounts of time reverse engineering what is in cases purely bizarre behaviour, which unless you know about beforehand, you may not write enough test cases to find some of the oddest behaviour.
This specification aims to help the above mentioned problem by documenting the behaviour of UAs at the time of writing. Hopefully, over time, the real world will align itself with this specification.
| TOC |
This specification is defined in terms of the US-ASCII character set, as defined in [ANSI.X3‑4.1986] (American National Standards Institute, “Coded Character Set - 7-bit American Standard Code for Information Interchange,” 1986.).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.).
This specification is defined in terms of ABNF, as described in [RFC5234] (Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” January 2008.).
| TOC |
Rules inherited from [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.) converted to [RFC5234] (Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” January 2008.) ABNF:
LWS = [ [ CR ] LF ] 1*( SP / HTAB )
; This is changed from RFC2616, as CR is now
; optional within the already optional line
; break sequence (this is suggested in RFC2616's
; section 19.3, "Tolerant Applications").
separators = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\"
/ DQUOTE / "/" / "[" / "]" / "?" / "=" / "{" / "}"
/ SP / HTAB
token = 1*( "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+"
/ "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT /
ALPHA )
comment = "(" *( ctext / quoted-pair / comment ) ")"
ctext = %x21-27 / %x2A-7E / %x80-FF / LWS
quoted-string = ( DQUOTE *( qdtext / quoted-pair ) DQUOTE )
qdtext = %x21 / %x23-5B / %x5D-7E / %x80-FF / LWS
quoted-pair = "\" CHAR
As well as the above, this specification also inherits all the rules from [RFC3986] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.), which are not given here as they are already given in ABNF.
| TOC |
Terminology is as in [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.) Section 1.3, with the following additions:
interactive user agent
This is a type of user agent, which directly returns the result to the same user that made the request (e.g., web browsers).
non-interactive user agent
This is a type of user agents, which don't return the result of the request to the user that made the request (e.g., search engine spiders).
| TOC |
The conformance requirements of this specification are phrased as algorithms and may be implemented in any manner, so long as the end result is equivalent (in particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant).
Implementations may impose implementation-specific limits on otherwise unconstrained inputs, e.g., to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
This specification defines two different types of parsers: "strict" parsers, and "non-strict" parsers. It is RECOMMENDED that request parsers are strict parsers, and that response parsers are non-strict parsers.
| TOC |
This section describes the behaviour that MUST be taken on certain types of errors.
| TOC |
The tokenizer/parser MUST stop processing immediately. If a request is being parsed, the server MUST respond with 400 (Bad Request); if a response is being parsed, the client SHOULD report the error.
| TOC |
A HTTP request/response MUST be broken up into header-fields and message-body following the request rule for requests, and the response rule for responses. If the appropriate rule fails to match, it is a fatal error (Fatal Error).
Any matches of the LWS rule MUST be replaced by a single 0x20 byte (US-ASCII space), except where there are consecutive matches of the LWS rule, where they MUST be compressed to a single 0x20 byte.
If the parser is a strict parser, a fatal error (Fatal Error) MUST be thrown in any of the following circumstances:
If the major-version is "0" or "1" (or has no match although the appropriate rule as a whole matches), then the recipient of the message MUST follow this specification; if it is not, it is RECOMMENDED to follow this specification.
| TOC |
http-version = "HTTP/" *"0" major-version "." *"0" minor-version
; Note that strings in ABNF are case-insensitive
version-number = %x31-39 *DIGIT
; A version number cannot begin with a "0".
major-version = version-number
minor-version = version-number
header = header-name ":" *LWS header-value *LWS
header-name = 1*header-content-nc
header-value = header-content
[ *( header-content / LWS ) header-content ]
header-content = header-content-nc / ":"
header-content-nc = ( %x00-08 / %x0B-0C / %x0E-1F / %x21-39 / %x3B-FF )
invalid-header = ( [ ":" *LWS ] 1*header-content-nc [ *LWS ":" ] /
1*":" / 1*header-content-nc 1*LWS ":" *LWS
header-content [ *( header-content / LWS )
header-content ] ) *LWS
| TOC |
request = simple-request / full-request
simple-request = get absolute-uri / path-absolute [ CR ] LF
get = %x47.45.54
; "GET" case-sensitively
full-request = request-line *( ( header / invalid-header )
[ CR ] LF ) [ CR ] LF message-body
request-line = method SP request-uri SP http-version [ CR ] LF
method = token
request-uri = "*" / absolute-uri / path-absolute / authority
| TOC |
response = status-line [ CR ] LF *( ( header / invalid-header )
[ CR ] LF ) [ CR ] LF message-body
status-line = http-version ( 1*SP ( status-code ( 1*SP
[ reason-phrase ] / sp-garbage ) / code-garbage )
/ sp-garbage )
status-code = 1*DIGIT
reason-phrase = 1*( %x00-09 / %x0B-0C / %x0E-7F )
; All US-ASCII except CR and LF
sp-garbage = [ ( %x00-09 / %x0B-0C / %x0E-19 / %x21-FF )
status-garbage ]
code-garbage = [ ( %x00-09 / %x0B-0C / %x0E-2F / %x3A-FF )
status-garbage ]
status-garbage = *( %x00-09 / %x0B-0C / %x0E-FF )
If there is no reason-phrase, let it be equal to "OK". If there is no status-code, let it be equal to 200.
| TOC |
This section details the processing follows that tokenizing.
| TOC |
To unescape a quoted string (i.e., a string that follows the quoted-string specification in [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.)), the following algorithm MUST be run:
| TOC |
[anchor14] (This section is just a very rough draft.)
This specification is just a parsing algorithm, and therefore any risks (excluding implementations issues such as buffer overflows) are inherited from [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.).
| TOC |
This document has no actions for IANA.
| TOC |
| TOC |
| [ANSI.X3-4.1986] | American National Standards Institute, “Coded Character Set - 7-bit American Standard Code for Information Interchange,” ANSI X3.4, 1986. |
| [RFC2119] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML). |
| [RFC2616] | Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” RFC 2616, June 1999 (TXT, PS, PDF, HTML, XML). |
| [RFC3986] | Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” STD 66, RFC 3986, January 2005 (TXT, HTML, XML). |
| [RFC5234] | Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” STD 68, RFC 5234, January 2008 (TXT). |
| TOC |
| [W3C.WD-html5-20080610] | Hyatt, D. and I. Hickson, “HTML 5,” World Wide Web Consortium WD WD-html5-20080610, June 2008 (HTML). |
| TOC |
Thanks to: Ian Hickson, Philip Taylor.
| TOC |
This section is informative.
While the scope of this specification is only parsing of HTTP requests and responses, there are several other things that I am aware of that should be pointed out to anyone implementing [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.):
| TOC |
| Geoffrey Sneddon | |
| Toll Park | |
| 20 Hepburn Gardens | |
| St Andrews, Fife KY16 9DE | |
| GB | |
| Phone: | +44 7807 360 291 |
| Email: | geoffers@gmail.com |
| URI: | http://gsnedders.com/ |
| TOC |
Copyright © The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an “AS IS” basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.