diff --git a/mimesniff.bs b/mimesniff.bs index 3a0e943..c5a5c84 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -3,7 +3,7 @@ Group: WHATWG H1: MIME Sniffing Shortname: mimesniff Text Macro: TWITTER mimesniff -Text Macro: LATESTRD 2023-07 +Text Macro: LATESTRD 2024-07 Abstract: The MIME Sniffing standard defines sniffing resources. Translation: ja https://triple-underscore.github.io/mimesniff-ja.html Markup Shorthands: css off diff --git a/review-drafts/2024-07.bs b/review-drafts/2024-07.bs new file mode 100644 index 0000000..6c55d71 --- /dev/null +++ b/review-drafts/2024-07.bs @@ -0,0 +1,2970 @@ +
+Group: WHATWG +Status: RD +Date: 2024-07-15 +H1: MIME Sniffing +Shortname: mimesniff +Text Macro: TWITTER mimesniff +Text Macro: LATESTRD 2024-07 +Abstract: The MIME Sniffing standard defines sniffing resources. +Translation: ja https://triple-underscore.github.io/mimesniff-ja.html +Markup Shorthands: css off ++ +
+{ + "FTP": { + "aliasOf": "rfc0959" + }, + "HTTP-SEMANTICS": { + "aliasOf": "rfc9110" + }, + "KEYWORDS": { + "aliasOf": "rfc2119" + }, + "MIMETYPE": { + "aliasOf": "rfc2046" + }, + "SECCONTSNIFF": { + "authors": ["Adam Barth", "Juan Caballero", "Dawn Song"], + "date": "May 2009", + "href": "https://www.adambarth.com/papers/2009/barth-caballero-song.pdf", + "title": "Secure Content Sniffing for Web Browsers, or How to Stop Papers from Reviewing Themselves" + } +} ++ +
+spec: HTTP-SEMANTICS; urlPrefix: https://www.rfc-editor.org/rfc/rfc9110 + type: dfn + text: media-type; url: #name-media-type + text: quoted-string; url: #name-quoted-strings + text: token; url: #name-tokens ++ +
+spec:html; + type:element; text:script + type:element-attr; text:type ++ + + +
+ The HTTP Content-Type
header field is intended to indicate the
+ MIME type of an HTTP response.
+ However, many HTTP servers supply a Content-Type
header field
+ value that does not match the actual contents of the response.
+ Historically, web browsers have tolerated these servers by examining the
+ content of HTTP responses in addition to the Content-Type
header
+ field in order to determine the effective MIME type of the response.
+
+
+ Without a clear specification for how to "sniff" the MIME type, each user + agent has been forced to reverse-engineer the algorithms of other user agents + in order to maintain interoperability. + Inevitably, these efforts have not been entirely successful, resulting in + divergent behaviors among user agents. + In some cases, these divergent behaviors have had security implications, as a + user agent could interpret an HTTP response as a different MIME type than + the server intended. + +
+ These security issues are most severe when an "honest" server allows
+ potentially malicious users to upload their own files and then serves the
+ contents of those files with a low-privilege MIME type.
+ For example, if a server believes that the client will treat a contributed
+ file as an image (and thus treat it as benign), but a user agent believes the
+ content to be HTML (and thus privileged to execute any scripts contained
+ therein), an attacker might be able to steal the user's authentication
+ credentials and mount other cross-site scripting attacks.
+ (Malicious servers, of course, can specify an arbitrary MIME type in the
+ Content-Type
header field.)
+
+
+ This document describes a content sniffing algorithm that carefully balances + the compatibility needs of user agent with the security constraints imposed + by existing web content. + The algorithm originated from research conducted by Adam Barth, Juan + Caballero, and Dawn Song, based on content sniffing algorithms present in + popular user agents, an extensive database of existing web content, and + metrics collected from implementations deployed to a sizable number of users. + [[SECCONTSNIFF]] + + + +
+ The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", + "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119. + For readability, these keywords will generally not appear in all uppercase + letters. + [[!KEYWORDS]] + +
+ Requirements phrased in the imperative as part of algorithms (such as "strip + any leading space characters" or "return false and abort these steps") are to + be interpreted with the meaning of the keyword used in introducing the + algorithm. + +
+ Conformance requirements phrased as algorithms or specific steps can be + implemented in any manner, so long as the end result is equivalent. + In particular, note that the algorithms defined in this specification are + intended to be easy to understand and are not intended to be performant. + + + +
+ This specification depends on the Infra Standard. [[!INFRA]] + +
An HTTP token code point is U+0021 (!), U+0023 (#), U+0024 ($), U+0025 (%), +U+0026 (&), U+0027 ('), U+002A (*), U+002B (+), U+002D (-), U+002E (.), U+005E (^), U+005F (_), +U+0060 (`), U+007C (|), U+007E (~), or an ASCII alphanumeric.
+ +This matches the value space of the token token production. [[HTTP-SEMANTICS]] + +
An HTTP quoted-string token code point is U+0009 TAB, a code point in the range +U+0020 SPACE to U+007E (~), inclusive, or a code point in the range U+0080 through +U+00FF (ÿ), inclusive. + +
This matches the effective value space of the quoted-string token +production. By definition it is a superset of the HTTP token code points. [[HTTP-SEMANTICS]] + +
+ A binary data byte is a byte in the range 0x00 to + 0x08 (NUL to BS), the byte 0x0B (VT), a byte in the + range 0x0E to 0x1A (SO to SUB), or a byte in the range 0x1C to + 0x1F (FS to US). + +
+ A whitespace byte (abbreviated + 0xWS) is any one of the following + bytes: 0x09 (HT), 0x0A (LF), 0x0C (FF), 0x0D (CR), + 0x20 (SP). + +
+ A tag-terminating byte (abbreviated
+ 0xTT) is any one of the following
+ bytes: 0x20 (SP), 0x3E (">
").
+
+
+ Equations are using the mathematical operators as defined in + [[!ENCODING]]. In addition, the bitwise NOT is + represented by ~. + + +
A MIME type represents an +internet media type as defined by +Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. It can also be +referred to as a MIME type record. [[!MIMETYPE]] + +
Standards are encouraged to consistently use the term MIME type to avoid +confusion with the use of media type as described in Media Queries. +[[MEDIAQUERIES]] + +
A MIME type's type is a non-empty +ASCII string. + +
A MIME type's subtype is a non-empty +ASCII string. + +
A MIME type's parameters is an +ordered map whose keys are ASCII strings and values are +strings limited to HTTP quoted-string token code points. It is initially empty. + + + +
The essence of a MIME type mimeType is +mimeType's type, followed by U+002F (/), followed by +mimeType's subtype. + +
A MIME type is supported by the user agent if the user agent has the +capability to interpret a resource of that MIME type and present it to the user. + +
Ideally this would be more precise. See +w3c/preload #113. + +
To minimize a supported MIME type given a MIME type mimeType, +run these steps. They return an ASCII string. + +
If mimeType is a JavaScript MIME type, then return
+ "text/javascript
".
+
+
If mimeType is a JSON MIME type, then return
+ "application/json
".
+
+
If mimeType's essence is "image/svg+xml
", then
+ return "image/svg+xml
".
+
+
SVG is worth distinguishing from other XML MIME types. + +
If mimeType is an XML MIME type, then return
+ "application/xml
".
+
+
If mimeType is supported by the user agent, then return + mimeType's essence. + +
Return the empty string. +
The goal of this algorithm is to allow the caller to distinguish MIME types with +different processing models, such as those for GIF and PNG, but otherwise provide as little +information as possible. +
A valid MIME type string is a string that matches the +media-type token production. In particular, a valid MIME type string may +include parameters. [[!HTTP-SEMANTICS]] + +
A valid MIME type string is supposed to be used for conformance checkers only. + +
"text/html
" is a valid MIME type string.
+
+
"text/html;
" is not a valid MIME type string, though
+ parse a MIME type returns a MIME type record for it identical to if the input had
+ been "text/html
".
+
A +valid MIME type string with no parameters is +a valid MIME type string that does not contain U+003B (;). + + +
To parse a MIME type, given a string input, run these steps: + +
Remove any leading and trailing HTTP whitespace from input. + +
Let position be a position variable for input, + initially pointing at the start of input. + +
Let type be the result of collecting a sequence of code points that are + not U+002F (/) from input, given position. + +
If type is the empty string or does not solely contain + HTTP token code points, then return failure. + +
If position is past the end of input, then return failure. + +
Advance position by 1. (This skips past U+002F (/).) + +
Let subtype be the result of collecting a sequence of code points that are + not U+003B (;) from input, given position. + +
Remove any trailing HTTP whitespace from subtype. + +
If subtype is the empty string or does not solely contain + HTTP token code points, then return failure. + +
Let mimeType be a new MIME type record whose type + is type, in ASCII lowercase, and subtype is + subtype, in ASCII lowercase. + +
While position is not past the end of input: + +
Advance position by 1. (This skips past U+003B (;).) + +
Collect a sequence of code points that are HTTP whitespace from + input given position. + +
This is roughly equivalent to skip ASCII whitespace, except that + HTTP whitespace is used rather than ASCII whitespace. + +
Let parameterName be the result of collecting a sequence of code points + that are not U+003B (;) or U+003D (=) from input, given position. + +
Set parameterName to parameterName, in ASCII lowercase. + +
If position is not past the end of input, then: + +
If the code point at position within input is U+003B (;), + then continue. + +
Advance position by 1. (This skips past U+003D (=).) +
If position is past the end of input, then + break. + +
Let parameterValue be null. + +
If the code point at position within input is U+0022 ("), + then: + +
Set parameterValue to the result of collecting an HTTP quoted string + from input, given position and true. + +
Collect a sequence of code points that are not U+003B (;) from input, + given position. + +
Given
+ text/html;charset="shift_jis"iso-2022-jp
you end up with
+ text/html;charset=shift_jis
.
+
Otherwise: + +
Set parameterValue to the result of + collecting a sequence of code points that are not U+003B (;) from input, + given position. + +
Remove any trailing HTTP whitespace from parameterValue. + +
If parameterValue is the empty string, then continue. +
If all of the following are true + +
then set mimeType's + parameters[parameterName] to parameterValue. +
Return mimeType. +
To parse a MIME type from bytes, given a byte sequence input, +run these steps: + +
Let string be input, isomorphic decoded. + +
Return the result of parse a MIME type with string. +
To serialize a MIME type, given a MIME type mimeType, run +these steps: + +
Let serialization be the concatenation of mimeType's + type, U+002F (/), and mimeType's subtype. + +
For each name → value of mimeType's + parameters: + +
Append U+003B (;) to serialization. + +
Append name to serialization. + +
Append U+003D (=) to serialization. + +
If value does not solely contain HTTP token code points or value + is the empty string, then: + +
Precede each occurrence of U+0022 (") or U+005C (\) in value with U+005C (\). + +
Prepend U+0022 (") to value. + +
Append U+0022 (") to value. +
Append value to serialization. +
Return serialization. +
To serialize a MIME type to bytes, given a MIME type +mimeType, run these steps: + +
Let stringSerialization be the result of serialize a MIME type with + mimeType. + +
Return stringSerialization, isomorphic encoded. +
An image MIME type is a MIME type whose type is
+"image
".
+
+
An audio or video MIME type is any MIME type whose
+type is "audio
" or "video
", or whose
+essence is "application/ogg
".
+
+
A font MIME type is any MIME type whose type is
+"font
", or whose essence is one of the following: [[RFC8081]]
+
+
application/font-cff
+ application/font-off
+ application/font-sfnt
+ application/font-ttf
+ application/font-woff
+ application/vnd.ms-fontobject
+ application/vnd.ms-opentype
+A ZIP-based MIME type is any MIME type whose
+subtype ends in "+zip
" or whose essence
+is one of the following:
+
+
application/zip
+An archive MIME type is any MIME type whose + +essence is one of the following: + +
application/x-rar-compressed
+ application/zip
+ application/x-gzip
+An XML MIME type is any MIME type whose subtype
+ends in "+xml
" or whose essence is "text/xml
" or
+"application/xml
". [[RFC7303]]
+
+
An HTML MIME type is any MIME type whose essence
+is "text/html
".
+
+
A scriptable MIME type is an XML MIME type, HTML MIME type, or
+any MIME type whose essence is "application/pdf
".
+
+
A JavaScript MIME type is any MIME type whose +essence is one of the following: + +
application/ecmascript
+ application/javascript
+ application/x-ecmascript
+ application/x-javascript
+ text/ecmascript
+ text/javascript
+ text/javascript1.0
+ text/javascript1.1
+ text/javascript1.2
+ text/javascript1.3
+ text/javascript1.4
+ text/javascript1.5
+ text/jscript
+ text/livescript
+ text/x-ecmascript
+ text/x-javascript
+A string is a JavaScript MIME type essence match if it is an +ASCII case-insensitive match for one of the JavaScript MIME type essence strings. + +
This hook is used by the <{script/type}> attribute of <{script}> elements. [[HTML]] + +
A JSON MIME type is any MIME type whose subtype
+ends in "+json
" or whose essence is
+"application/json
" or "text/json
".
+
+
+
+ A resource is …. + +
+ For each resource it handles, the user agent must keep track of + the following associated metadata: + +
+ The user agent can choose to use outside information, such as previous + experience with a site, to determine whether to opt out of sniffing for a + particular resource. The user agent can also choose to opt + out of sniffing for all resources. However, + opting out of sniffing does not exempt the user agent from using the + MIME type sniffing algorithm. + +
+ The supplied MIME type of a resource is provided + to the user agent by an external source associated with that + resource. + The method of obtaining this information varies depending upon how the + resource is retrieved. + +
+ To determine the supplied MIME type of a resource, + user agents must use the following supplied MIME type detection + algorithm: + +
Content-Type
headers are associated with the
+ resource, execute the following steps:
+
+ Content-Type
header associated with the
+ resource.
+
+ + File extensions are not used to determine the supplied MIME + type of a resource retrieved via HTTP because they are + unreliable and easily spoofed. + +
Bytes in Hexadecimal + | Bytes in ASCII + + + |
---|---|
+ 74 65 78 74 2F 70 6C 61 69 6E + + |
+ text/plain
+
+
+ |
+ 74 65 78 74 2F 70 6C 61 69 6E + 3B 20 63 68 61 72 73 65 74 3D + 49 53 4F 2D 38 38 35 39 2D 31 + + |
+ text/plain; charset=ISO-8859-1
+
+
+ |
+ 74 65 78 74 2F 70 6C 61 69 6E + 3B 20 63 68 61 72 73 65 74 3D + 69 73 6F 2D 38 38 35 39 2D 31 + + |
+ text/plain; charset=iso-8859-1
+
+
+ |
+ 74 65 78 74 2F 70 6C 61 69 6E + 3B 20 63 68 61 72 73 65 74 3D + 55 54 46 2D 38 + + |
+ text/plain; charset=UTF-8
+
+
+
+ |
+ The supplied MIME type detection algorithm detects these + exact byte sequences because some older installations of + Apache contain + a + bug that causes them to supply one of these Content-Type headers + when serving files with unrecognized MIME + types. +
+ A resource header is the byte sequence at the + beginning of a resource, as determined by + reading the resource header. + +
+ To read the resource header, perform the following steps: + +
+ If the number of bytes in buffer is + greater than or equal to 1445, the MIME type sniffing + algorithm will be deterministic for the majority of cases. + + However, certain factors (such as a slow connection) may prevent the + user agent from reading 1445 bytes in a + reasonable amount of time. + +
+ The resource header need only be determined once per + resource. + + + +
+ A byte pattern is a byte sequence used as a template + to be matched against in the pattern matching algorithm. + +
+ A pattern mask is a byte sequence used to determine + the significance of bytes being compared against a + byte pattern in the pattern matching algorithm. + +
+ In a pattern mask, 0xFF indicates the byte is + strictly significant, 0xDF indicates that the byte is + significant in an ASCII case-insensitive way, and 0x00 indicates that the + byte is not significant. + +
To determine whether a byte sequence matches a particular byte pattern, use the +following pattern matching algorithm. It is given a byte sequence +input, a byte pattern pattern, a pattern mask mask, +and a set of bytes to be ignored ignored, and returns true or false. + +
If input's length is less than pattern's + length, return false. + +
Let s be 0. + +
While s < input's length: + +
+ +Let p be 0. + +
While p < pattern's length: + +
Let maskedData be the result of applying the bitwise AND operator to + input[s] and mask[p]. + +
If maskedData is not equal to pattern[p], return false. + +
Set s to s + 1. + +
Set p to p + 1. +
Return true. +
To determine which image MIME type byte pattern a byte sequence +input matches, if any, use the following +image type pattern matching algorithm: + +
Execute the following steps for each row row in the following table: + +
Let patternMatched be the result of the pattern matching algorithm + given input, the value in the first column of row, the value in the second + column of row, and the value in the third column of row. + +
If patternMatched is true, return the value in the fourth column of + row. +
+ Byte Pattern + + | + Pattern Mask + + | + Leading Bytes to Be Ignored + + | + Image MIME Type + + | + Note + + + + |
---|---|---|---|---|
+ 00 00 01 00 + + | + FF FF FF FF + + | + None. + + |
+ image/x-icon
+
+ | + A Windows Icon signature. + + + + |
+ 00 00 02 00 + + | + FF FF FF FF + + | + None. + + |
+ image/x-icon
+
+ | + A Windows Cursor signature. + + + |
+ 42 4D + + | + FF FF + + | + None. + + |
+ image/bmp
+
+ |
+ The string "BM ", a BMP signature.
+
+
+
+ |
+ 47 49 46 38 37 61 + + | + FF FF FF FF FF FF + + | + None. + + |
+ image/gif
+
+ |
+ The string "GIF87a ", a GIF signature.
+
+
+
+ |
+ 47 49 46 38 39 61 + + | + FF FF FF FF FF FF + + | + None. + + |
+ image/gif
+
+ |
+ The string "GIF89a ", a GIF signature.
+
+
+
+ |
+ 52 49 46 46 00 00 00 00 57 45 42 50 56 50 + + | + FF FF FF FF 00 00 00 00 FF FF FF FF FF FF + + | + None. + + |
+ image/webp
+
+ |
+ The string "RIFF " followed by four
+ bytes followed by the string
+ "WEBPVP ".
+
+
+
+ |
+ 89 50 4E 47 0D 0A 1A 0A + + | + FF FF FF FF FF FF FF FF + + | + None. + + |
+ image/png
+
+ |
+ An error-checking byte followed by the string
+ "PNG " followed by CR LF SUB LF, the PNG signature.
+
+
+
+ |
+ FF D8 FF + + | + FF FF FF + + | + None. + + |
+ image/jpeg
+
+ | + The JPEG Start of Image marker followed by the indicator + byte of another marker. + + + + |
Return undefined. +
To determine which audio or video MIME type byte pattern a byte sequence +input matches, if any, use the following audio or video type pattern matching +algorithm: + +
Execute the following steps for each row row in the following table: + +
Let patternMatched be the result of the pattern matching algorithm + given input, the value in the first column of row, the value in the second + column of row, and the value in the third column of row. + +
If patternMatched is true, return the value in the fourth column of + row. +
+ Byte Pattern + + | + Pattern Mask + + | + Leading Bytes to Be Ignored + + | + Audio or Video MIME Type + + | + Note + + + + |
---|---|---|---|---|
+ 46 4F 52 4D 00 00 00 00 41 49 46 46 + + | + FF FF FF FF 00 00 00 00 FF FF FF FF + + | + None. + + |
+ audio/aiff
+
+ |
+ The string "FORM " followed by four
+ bytes followed by the string
+ "AIFF ", the AIFF signature.
+
+
+
+ |
+ 49 44 33 + + | + FF FF FF + + | + None. + + |
+ audio/mpeg
+
+ |
+ The string "ID3 ", the ID3v2-tagged MP3 signature.
+
+
+ |
+ 4F 67 67 53 00 + + | + FF FF FF FF FF + + | + None. + + |
+ application/ogg
+
+ |
+ The string "OggS " followed by NUL, the Ogg container
+ signature.
+
+
+
+ |
+ 4D 54 68 64 00 00 00 06 + + | + FF FF FF FF FF FF FF FF + + | + None. + + |
+ audio/midi
+
+ |
+ The string "MThd " followed by four
+ bytes representing the number 6 in 32 bits
+ (big-endian), the MIDI signature.
+
+
+
+
+ |
+ 52 49 46 46 00 00 00 00 41 56 49 20 + + | + FF FF FF FF 00 00 00 00 FF FF FF FF + + | + None. + + |
+ video/avi
+
+ |
+ The string "RIFF " followed by four
+ bytes followed by the string
+ "AVI ", the AVI signature.
+
+
+
+ |
+ 52 49 46 46 00 00 00 00 57 41 56 45 + + | + FF FF FF FF 00 00 00 00 FF FF FF FF + + | + None. + + |
+ audio/wave
+
+ |
+ The string "RIFF " followed by four
+ bytes followed by the string
+ "WAVE ", the WAVE signature.
+
+
+
+ |
If input matches the signature for MP4, return "video/mp4
".
+
+
If input matches the signature for WebM, return
+ "video/webm
".
+
+
If input matches the signature for MP3 without ID3, return
+ "audio/mpeg
".
+
+
Return undefined. +
+ To determine whether a byte sequence matches the signature + for MP4, use the following steps: + +
ftyp
"), return false.
+
+ mp4
"), return true.
+
+ + This ignores the four bytes that correspond to the + version number of the "major brand". + +
mp4
"), return true.
+
+ + To determine whether a byte sequence matches the signature + for WebM, use the following steps: + +
vint
starting at sequence[iter].
+ webm
") on sequence at offset
+ iter.
+ vint
on a byte sequence
+sequence of size length, starting at index iter
+use the following steps:
++Matching a padded sequence pattern on a sequence +sequence at starting at byte offset and ending at by +end means returning true if sequence has a length greater +than end, and contains exactly, in the range [offset, +end], the bytes in pattern, in the same order, eventually +preceded by bytes with a value of 0x00, false otherwise. + + +
index + | mp3-rates + + + |
---|---|
0 + | 0 + + |
1 + | 32000 + + |
2 + | 40000 + + |
3 + | 48000 + + |
4 + | 56000 + + |
5 + | 64000 + + |
6 + | 80000 + + |
7 + | 96000 + + |
8 + | 112000 + + |
9 + | 128000 + + |
10 + | 160000 + + |
11 + | 192000 + + |
12 + | 224000 + + |
13 + | 256000 + + |
14 + | 320000 + + + |
index + | mp2.5-rates + + + |
---|---|
0 + | 0 + + |
1 + | 8000 + + |
2 + | 16000 + + |
3 + | 24000 + + |
4 + | 32000 + + |
5 + | 40000 + + |
6 + | 48000 + + |
7 + | 56000 + + |
8 + | 64000 + + |
9 + | 80000 + + |
10 + | 96000 + + |
11 + | 112000 + + |
12 + | 128000 + + |
13 + | 144000 + + |
14 + | 160000 + + + |
index + | samplerate + + + |
---|---|
0 + | 44100 + + |
1 + | 48000 + + |
2 + | 32000 + + + |
To determine which font MIME type byte pattern a byte sequence +input matches, if any, use the following font type pattern matching algorithm: + +
Execute the following steps for each row row in the following table: + +
Let patternMatched be the result of the pattern matching algorithm + given input, the value in the first column of row, the value in the second + column of row, and the value in the third column of row. + +
If patternMatched is true, return the value in the fourth column of + row. +
+ Byte Pattern + + | + Pattern Mask + + | + Leading Bytes to Be Ignored + + | + Font MIME Type + + | + Note + + + + |
---|---|---|---|---|
+ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 00 00 00 4C 50 + + | + 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 00 00 00 FF FF + + | + None. + + |
+ application/vnd.ms-fontobject
+
+ |
+ 34 bytes followed by the string
+ "LP ", the Embedded OpenType signature.
+
+
+
+
+ |
+ 00 01 00 00 + + | + FF FF FF FF + + | + None. + + |
+ font/ttf
+
+ | + 4 bytes representing the version number 1.0, a + TrueType signature. + + + + + |
+ 4F 54 54 4F + + | + FF FF FF FF + + | + None. + + |
+ font/otf
+
+ |
+ The string "OTTO ", the OpenType signature.
+
+
+
+
+ |
+ 74 74 63 66 + + | + FF FF FF FF + + | + None. + + |
+ font/collection
+
+ |
+ The string "ttcf ", the TrueType Collection
+ signature.
+
+
+
+ |
+ 77 4F 46 46 + + | + FF FF FF FF + + | + None. + + |
+ font/woff
+
+ |
+ The string "wOFF ", the Web Open Font Format 1.0
+ signature.
+
+
+ |
+ 77 4F 46 32 + + | + FF FF FF FF + + | + None. + + |
+ font/woff2
+
+ |
+ The string "wOF2 ", the Web Open Font Format 2.0
+ signature.
+
+
+ |
Return undefined. +
To determine which archive MIME type byte pattern a byte sequence +input matches, if any, use the following archive type pattern matching +algorithm: + +
Execute the following steps for each row row in the following table: + +
Let patternMatched be the result of the pattern matching algorithm + given input, the value in the first column of row, the value in the second + column of row, and the value in the third column of row. + +
If patternMatched is true, return the value in the fourth column of + row. +
+ Byte Pattern + + | + Pattern Mask + + | + Leading Bytes to Be Ignored + + | + Archive MIME Type + + | + Note + + + + |
---|---|---|---|---|
+ + 1F 8B 08 + + | + FF FF FF + + | + None. + + |
+ application/x-gzip
+
+ | + The GZIP archive signature. + + + |
+ 50 4B 03 04 + + | + FF FF FF FF + + | + None. + + |
+ application/zip
+
+ |
+ The string "PK " followed by ETX EOT, the ZIP archive
+ signature.
+
+
+ |
+ 52 61 72 20 1A 07 00 + + | + FF FF FF FF FF FF FF + + | + None. + + |
+ application/x-rar-compressed
+
+ |
+ The string "Rar " followed by SUB BEL NUL, the RAR
+ archive signature.
+
+
+
+ |
Return undefined. +
+ To determine the computed MIME type of a resource, + user agents must use the following MIME type sniffing algorithm: + +
unknown/unknown
",
+ "application/unknown
", or "*/*
",
+ execute the rules for identifying an unknown MIME type with
+ the sniff-scriptable flag equal to the inverse of the
+ no-sniff flag and abort these steps.
+
+ text/html
",
+ execute the rules for distinguishing if a resource is a feed or HTML and
+ abort these steps.
+
+ + The sniff-scriptable flag is used by the rules for + identifying an unknown MIME type to determine whether to sniff for + scriptable MIME types. + + If the setting of the sniff-scriptable flag is not specified + when calling the rules for identifying an unknown MIME type, + the sniff-scriptable flag must default to unset. + +
To determine the computed MIME type of a resource resource with an +unknown MIME type, execute the following rules for identifying an unknown MIME +type: + +
If the sniff-scriptable flag is set, execute the following steps for each row + row in the following table: + +
Let patternMatched be the result of the pattern matching algorithm + given resource's resource header, the value in the first column of + row, the value in the second column of row, and the value in the third + column of row. + +
If patternMatched is true, return the value in the fourth column of + row. +
+ Byte Pattern + + | + Pattern Mask + + | + Leading Bytes to Be Ignored + + | + Computed MIME Type + + | + Note + + + + |
---|---|---|---|---|
+ 3C 21 44 4F 43 54 59 50 45 20 48 54 4D 4C TT + + | + FF FF DF DF DF DF DF DF DF FF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<!DOCTYPE HTML "
+ followed by a tag-terminating byte.
+
+
+ |
+ 3C 48 54 4D 4C TT + + | + FF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<HTML " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 48 45 41 44 TT + + | + FF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<HEAD " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 53 43 52 49 50 54 TT + + | + FF DF DF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<SCRIPT " followed by
+ a tag-terminating byte.
+
+
+ |
+ 3C 49 46 52 41 4D 45 TT + + | + FF DF DF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<IFRAME " followed by
+ a tag-terminating byte.
+
+
+ |
+ 3C 48 31 TT + + | + FF DF FF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<H1 " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 44 49 56 TT + + | + FF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<DIV " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 46 4F 4E 54 TT + + | + FF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<FONT " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 54 41 42 4C 45 TT + + | + FF DF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<TABLE " followed by
+ a tag-terminating byte.
+
+
+ |
+ 3C 41 TT + + | + FF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<A " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 53 54 59 4C 45 TT + + | + FF DF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<STYLE " followed by
+ a tag-terminating byte.
+
+
+ |
+ 3C 54 49 54 4C 45 TT + + | + FF DF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<TITLE " followed by
+ a tag-terminating byte.
+
+
+ |
+ 3C 42 TT + + | + FF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<B " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 42 4F 44 59 TT + + | + FF DF DF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<BODY " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 42 52 TT + + | + FF DF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<BR " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 50 TT + + | + FF DF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The case-insensitive string "<P " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 21 2D 2D TT + + | + FF FF FF FF FF + + | + Whitespace bytes. + + |
+ text/html
+
+ |
+ The string "<!-- " followed by a
+ tag-terminating byte.
+
+
+ |
+ 3C 3F 78 6D 6C + + | + FF FF FF FF FF + + | + Whitespace bytes. + + |
+ text/xml
+
+ |
+ The string "<?xml ".
+
+
+ |
+ 25 50 44 46 2D + + | + FF FF FF FF FF + + | + None. + + |
+ application/pdf
+
+ |
+ The string "%PDF- ", the PDF signature.
+
+
+
+ |
+ What about feeds? + +
Execute the following steps for each row row in the following table: + +
Let patternMatched be the result of the pattern matching algorithm + given resource's resource header, the value in the first column of + row, the value in the second column of row, and the value in the third + column of row. + +
If patternMatched is true, return the value in the fourth column of + row. +
+ Byte Pattern + + | + Pattern Mask + + | + Leading Bytes to Be Ignored + + | + Computed MIME Type + + | + Note + + + + |
---|---|---|---|---|
+ 25 21 50 53 2D 41 64 6F 62 65 2D + + | + FF FF FF FF FF FF FF FF FF FF FF + + | + None. + + |
+ application/postscript
+
+ |
+ The string "%!PS-Adobe- ", the PostScript signature.
+
+
+ |
+ FE FF 00 00 + + | + FF FF 00 00 + + | + None. + + |
+ text/plain
+
+ | + UTF-16BE BOM + + + |
+ FF FE 00 00 + + | + FF FF 00 00 + + | + None. + + |
+ text/plain
+
+ | + UTF-16LE BOM + + + |
+ EF BB BF 00 + + | + FF FF FF 00 + + | + None. + + |
+ text/plain
+
+ | + UTF-8 BOM + + + + |
User agents may implicitly extend this table to support additional MIME types. + +
However, user agents should not implicitly extend this table to include additional + byte patterns for any computed MIME type already present in this table, as doing so + could introduce privilege escalation vulnerabilities. + +
User agents must not introduce any privilege escalation vulnerabilities when extending this + table. + +
Let matchedType be the result of executing the + image type pattern matching algorithm given resource's resource header. + +
If matchedType is not undefined, return matchedType. + +
Set matchedType to the result of executing the + audio or video type pattern matching algorithm given resource's + resource header. + +
If matchedType is not undefined, return matchedType. + + + +
Set matchedType to the result of executing the + archive type pattern matching algorithm given resource's + resource header. + +
If matchedType is not undefined, return matchedType. + +
If resource's resource header contains no binary data bytes,
+ return "text/plain
".
+
+
Return "application/octet-stream
".
+
+ To determine whether a binary resource has been mislabeled as + plain text, execute the following rules for + distinguishing if a resource is text or binary: + +
text/plain
".
+
+ Abort these steps.
+
+ text/plain
".
+
+ Abort these steps.
+
+ text/plain
".
+
+ Abort these steps.
+
+ application/octet-stream
".
+
+ + It is critical that the rules for distinguishing if a resource is + text or binary never determine the computed MIME type + to be a scriptable MIME type, as this could allow a privilege + escalation attack. +
+ To determine whether a feed has been mislabeled as HTML, execute the + following rules for distinguishing if a resource is a feed or + HTML: + +
<
"), increment s by 1 and exit loop
+ L.
+
+ !--
"), increment s by 3 and enter loop
+ M:
+
+ -->
"), increment s by 3 and exit
+ loops M and L.
+
+ !
"), increment s by 1 and enter loop
+ M:
+
+ >
"), increment s by 1 and exit loops
+ M and L.
+
+ ?
"), increment s by 1 and enter loop
+ M:
+
+ ?>
"), increment s by 2 and exit loops
+ M and L.
+
+ rss
"), the computed MIME type is
+ "application/rss+xml
".
+
+ Abort these steps.
+
+ feed
"), the computed MIME type is
+ "application/atom+xml
".
+
+ Abort these steps.
+
+ rdf:RDF
"), increment s
+ by 7 and enter loop M:
+
+ http://purl.org/rss/1.0/
"), increment
+ s by 24 and enter loop N:
+
+ http://www.w3.org/1999/02/22-rdf-syntax-ns#
"),
+ the computed MIME type is
+ "application/rss+xml
".
+
+ Abort these steps.
+
+ http://www.w3.org/1999/02/22-rdf-syntax-ns#
"),
+ increment s by 24 and enter loop N:
+
+ http://purl.org/rss/1.0/
"), the computed
+ MIME type is "application/rss+xml
".
+
+ Abort these steps.
+
+ + It might be more efficient for the user agent to implement the rules + for distinguishing if a resource is a feed or HTML in parallel with + its algorithm for detecting the character encoding of an HTML document. + + + +
+ A context is …. + +
+ In certain contexts, it is only useful to identify + resources that belong to a certain subset of + MIME types. + + In such contexts, it is appropriate to use a + context-specific sniffing algorithm in place of the MIME + type sniffing algorithm in order to determine the computed MIME + type of a resource. + +
+ A context-specific sniffing algorithm determines the + computed MIME type of a resource only if the + resource is a MIME type relevant to a particular + context. + + + +
+ Use the MIME type sniffing algorithm. + + + +
+ To determine the computed MIME type of a resource + with an image MIME type, execute the following rules for + sniffing images specifically: + +
+ To determine the computed MIME type of a resource + with an audio or video MIME type, execute the following rules + for sniffing audio and video specifically: + +
+ To determine the computed MIME type of a resource + fetched in a plugin context, execute + the following rules for sniffing in a plugin context: + +
application/octet-stream
".
+
+ + To determine the computed MIME type of a resource + fetched in a style context, execute the + following rules for sniffing in a style context: + +
+ To determine the computed MIME type of a resource + fetched in a script context, execute + the following rules for sniffing in a script context: + +
+ To determine the computed MIME type of a resource + with a font MIME type, execute the following rules for sniffing + fonts specifically: + +
+ The computed MIME type is "text/vtt
".
+
+
+
+
+ The computed MIME type is
+ "text/cache-manifest
".
+
+
+
+
+ Special thanks to Adam Barth and Ian Hickson for maintaining previous + incarnations of this document. + +
+ Thanks also to + Alfred Hönes, + Andreu Botella, + Anne van Kesteren, + Boris Zbarsky, + Darien Maillet Valentine, + David Singer, + Domenic Denicola, + Henri Sivonen, + Jean-Yves Avenard, + Jonathan Neal, + Joshua Cranmer, + Larry Masinter, + 罗泽轩, + Mariko Kosaka, + Mark Pilgrim, + Paul Adenot, + Peter Occil, + Rob Buis, + Russ Cox, + Simon Pieters, and + triple-underscore + for their invaluable contributions. + +
This standard is written by Gordon P. Hemsley +(me@gphemsley.org).