From 937deaa79cfb7f1f56d4ac7047fd2590317b16bb Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 14 May 2023 15:30:13 +0100 Subject: [PATCH 01/20] MSC4016: Streaming E2EE file transfers with random access --- .../4016-streaming-e2ee-file-transfer.md | 88 +++++++++++++++++++ 1 file changed, 88 insertions(+) create mode 100644 proposals/4016-streaming-e2ee-file-transfer.md diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md new file mode 100644 index 00000000000..71dfec85df9 --- /dev/null +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -0,0 +1,88 @@ +# MSC4016: Streaming E2EE file transfer with random access + +## Problem + +* File transfers currently take twice as long as they could, as they must first be uploaded in their entirety to the sender’s server before being downloaded via the receiver’s server. +* As a result, relative to a dedicated file-copying system (e.g. scp) they feel sluggish. For instance, you can’t incrementally view a progressive JPEG or voice or video file as it’s being uploaded for “zero latency” file transfers. +* You can’t skip within them without downloading the whole thing (if they’re streamable content, such as an .opus file) +* For instance, you can’t do realtime broadcast of voice messages via Matrix, or skip within them (other than splitting them into a series of separate file transfers). + +Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com/matrix-org/matrix-spec/issues/432) + +## Solution overview + +* Upload content in a single file made up of contiguous blocks of AES-GCM content. + * Typically constant block size (e.g. 32KB) + * Or variable block size (to allow time-based blocksize for low-latency seeking in streamable content) - e.g. one block per opus frame. Otherwise a 32KB block ends up being 8s of typical opus latency. + * This would then require a registration sequence to identify the starts of blocks boundaries when seeking randomly (potentially escaping the bitstream to avoid registration code collisions). +* Unlike today’s AES-CTR attachments, AES-GCM makes the content self-authenticating, in that it includes an authentication tag (AEAD) to hash the contents and protect against substitution attacks (i.e. where an attacker flips some bits in the encrypted payload to strategically corrupt the plaintext, and nobody notices as the content isn’t hashed). + * (The only reason Matrix currently uses AES-CTR is that native AES-GCM primitives weren’t widespread enough on Android back in 2016) +* To prevent against reordering attacks, each AES-GCM block has to include an encrypted block header which includes a sequence number, so we can be sure that when we request block N, we’re actually getting block N back. + * XXX: is there still a vulnerability here? Other approaches use Merkle trees to hash the AEADs rather than simple sequence numbers, but why? +* We then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek while downloading +* We could also use [Youtube-style](https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol) off-standard Content-Range headers on POST when uploading for resumable/incremental uploads. + +## Advantages + +* Backwards compatible with current implementations at the HTTP layer +* Relatively minor changes needed from AES-CTR to sequence-of-AES-GCM-blocks for implementations like [https://github.com/matrix-org/matrix-encrypt-attachment](https://github.com/matrix-org/matrix-encrypt-attachment) +* We automatically maintain a serverside E2EE store of the file as normal, while also getting 1:many streaming semantics +* Provides streaming transfer for any file type - not just media formats +* Leverages AES-GCM’s existing primitives and hashing rather than inventing our own hashing strategy +* We already had Range/Content-Range resumable/seekable zero-latency HTTP transfer implemented and working excellently pre-E2EE and pre-Matrix in our ‘glow’ codebase. + +## Limitations + +* Enterprisey features like content scanning and CDGs require visibility on the whole file, so would eliminate the advantages of streaming by having to buffering it up in order to scan it. +* When applied to unencrypted files, server-side content scanning (for trust & safety etc) would be unable to scan until it’s too late. +* Cancelled file uploads will still leak a partial file transfer to receivers who start to stream, which could be awkward if the sender sent something sensitive, and then can’t tell who downloaded what before they hit the cancel button +* Small bandwidth overhead for the additional AEADs and block headers - probably ~16 bytes per block. +* Incompatible with multi-bitstream streaming formats like HLS or DASH + +## Detailed proposal + +TODO, if folks think it's worth it + +## Alternatives + +* Split files into a series of separate m.file uploads which the client then has to glue back together (as the [voice broadcast feature](https://github.com/vector-im/element-meta/discussions/632) does in Element today). + * Pros: + * Works automatically with antivirus & CDGs + * Could be made to map onto HLS or DASH? (by generating an .m3u8 which contains a bunch of MXC urls? This could also potentially solve the glitching problems we’ve had, by reusing existing HLS players augmented with our E2EE support) + * Cons: + * Can be a pain to glue media uploads back together without glitching + * Is always going to be high latency (e.g. Element currently splits into ~30s chunks) given rate limits on sending file events +* Transfer files via streaming P2P file transfer via WebRTC data channels + * Pros: + * Easy to implement with Matrix’s existing WebRTC signalling + * Could use MSC3898-inspired media control to seek in the stream + * Cons: + * You don’t get a serverside copy of the data + * You expose client IPs to each other if going P2P rather than via TURN +* Do streaming voice/video messages/broadcast via WebRTC media channels instead (as hinted in [MSC3888](https://github.com/matrix-org/matrix-spec-proposals/pull/3888)) + * Pros: + * Lowest latency + * Could use media control to seek + * Automatically supports variable streams via SFU + * If the SFU does E2EE and archiving, you get that for free. + * Cons: + * Complex; you can’t just download the file via HTTP + * Requires client to have a WebRTC stack + * A suitable SFU still doesn’t exist yet +* Transfer files out of band using a protocol which already provides streaming transfers (e.g. IPFS?) + +## Security considerations + +* Variable size blocks could leak metadata for VBR audio. Mitigation is to use CBR if you care about leaking voice traffic patterns (constant size blocks isn’t necessarily enough, as you’d still leak the traffic patterns) +* Is encrypting a sequence number in block header (with authenticated encryption) sufficient to mitigate reordering attacks? +* Do the repeated and predictable encrypted block headers facilitate attacks? + +## Conclusion + +It’s a bit unclear whether this is actually an improvement over formalising chunk-based file transfer as voice broadcast does today. The fact that it’s incompatible with content scanners and CDGs is a bit of a turn off. It’s also a bit unclear whether voice/video broadcast would be better served via MSC3888 style behaviour. + +Therefore, I’ve written this as a high-level MSC to gather feedback, and to get the design down on paper before I forget it (I originally sketched this out a month or so ago). + +## Dependencies + +This MSC depends on [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), which is landing currently in the spec. \ No newline at end of file From e7c23955097fe1e1b75986da8a4b033b70fdf553 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 14 May 2023 15:43:03 +0100 Subject: [PATCH 02/20] tweaks --- proposals/4016-streaming-e2ee-file-transfer.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 71dfec85df9..33e3bbeac77 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -1,4 +1,4 @@ -# MSC4016: Streaming E2EE file transfer with random access +# WIP: MSC4016: Streaming E2EE file transfer with random access ## Problem @@ -37,7 +37,7 @@ Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com * When applied to unencrypted files, server-side content scanning (for trust & safety etc) would be unable to scan until it’s too late. * Cancelled file uploads will still leak a partial file transfer to receivers who start to stream, which could be awkward if the sender sent something sensitive, and then can’t tell who downloaded what before they hit the cancel button * Small bandwidth overhead for the additional AEADs and block headers - probably ~16 bytes per block. -* Incompatible with multi-bitstream streaming formats like HLS or DASH +* Out of the box it wouldn't be able to adapt streaming to network conditions (no HLS or DASH style support for multiple bitstreams) ## Detailed proposal @@ -50,9 +50,9 @@ TODO, if folks think it's worth it * Works automatically with antivirus & CDGs * Could be made to map onto HLS or DASH? (by generating an .m3u8 which contains a bunch of MXC urls? This could also potentially solve the glitching problems we’ve had, by reusing existing HLS players augmented with our E2EE support) * Cons: - * Can be a pain to glue media uploads back together without glitching * Is always going to be high latency (e.g. Element currently splits into ~30s chunks) given rate limits on sending file events -* Transfer files via streaming P2P file transfer via WebRTC data channels + * Can be a pain to glue media uploads back together without glitching +* Transfer files via streaming P2P file transfer via WebRTC data channels (https://github.com/matrix-org/matrix-spec/issues/189) * Pros: * Easy to implement with Matrix’s existing WebRTC signalling * Could use MSC3898-inspired media control to seek in the stream @@ -63,7 +63,9 @@ TODO, if folks think it's worth it * Pros: * Lowest latency * Could use media control to seek - * Automatically supports variable streams via SFU + * Supports multiple senders + * Works with CDGs and other enterprisey scanners which know how to scan VOIP payloads + * Could automatically support variable streams via SFU to adapt to network conditions * If the SFU does E2EE and archiving, you get that for free. * Cons: * Complex; you can’t just download the file via HTTP From bce7730fab38f81d1b2aaad4a8b15a98274ef32f Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 14 May 2023 15:46:04 +0100 Subject: [PATCH 03/20] note RAM --- proposals/4016-streaming-e2ee-file-transfer.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 33e3bbeac77..359216d762b 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -28,6 +28,7 @@ Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com * Relatively minor changes needed from AES-CTR to sequence-of-AES-GCM-blocks for implementations like [https://github.com/matrix-org/matrix-encrypt-attachment](https://github.com/matrix-org/matrix-encrypt-attachment) * We automatically maintain a serverside E2EE store of the file as normal, while also getting 1:many streaming semantics * Provides streaming transfer for any file type - not just media formats +* Minimises memory usage in Matrix clients for large file transfers. Currently all(?) implementations store the whole file in RAM in order to check hashes and then decrypt, whereas this would naturally lend itself to processing files incrementally in blocks. * Leverages AES-GCM’s existing primitives and hashing rather than inventing our own hashing strategy * We already had Range/Content-Range resumable/seekable zero-latency HTTP transfer implemented and working excellently pre-E2EE and pre-Matrix in our ‘glow’ codebase. From 6e845a70892725bb2f8771e373784b9ba3ad6cc6 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 14 May 2023 16:38:55 +0100 Subject: [PATCH 04/20] Update 4016-streaming-e2ee-file-transfer.md add use cases --- proposals/4016-streaming-e2ee-file-transfer.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 359216d762b..3649d2eeb51 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -6,6 +6,7 @@ * As a result, relative to a dedicated file-copying system (e.g. scp) they feel sluggish. For instance, you can’t incrementally view a progressive JPEG or voice or video file as it’s being uploaded for “zero latency” file transfers. * You can’t skip within them without downloading the whole thing (if they’re streamable content, such as an .opus file) * For instance, you can’t do realtime broadcast of voice messages via Matrix, or skip within them (other than splitting them into a series of separate file transfers). +* Another example is sharing document snapshots for real-time collaboration. If a user uploads 100MB of glTF in Third Room to edit a scene, you want all participants to get it and stream the decode with minimal latency. Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com/matrix-org/matrix-spec/issues/432) @@ -25,6 +26,7 @@ Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com ## Advantages * Backwards compatible with current implementations at the HTTP layer +* Fully backwards compatible for unencrypted transfers * Relatively minor changes needed from AES-CTR to sequence-of-AES-GCM-blocks for implementations like [https://github.com/matrix-org/matrix-encrypt-attachment](https://github.com/matrix-org/matrix-encrypt-attachment) * We automatically maintain a serverside E2EE store of the file as normal, while also getting 1:many streaming semantics * Provides streaming transfer for any file type - not just media formats From 0b135b7bf0485060cc806beb464f937940c24566 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 14 May 2023 19:15:13 +0100 Subject: [PATCH 05/20] tweaks --- proposals/4016-streaming-e2ee-file-transfer.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 3649d2eeb51..03d52c60cd2 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -6,7 +6,7 @@ * As a result, relative to a dedicated file-copying system (e.g. scp) they feel sluggish. For instance, you can’t incrementally view a progressive JPEG or voice or video file as it’s being uploaded for “zero latency” file transfers. * You can’t skip within them without downloading the whole thing (if they’re streamable content, such as an .opus file) * For instance, you can’t do realtime broadcast of voice messages via Matrix, or skip within them (other than splitting them into a series of separate file transfers). -* Another example is sharing document snapshots for real-time collaboration. If a user uploads 100MB of glTF in Third Room to edit a scene, you want all participants to get it and stream the decode with minimal latency. +* Another example is sharing document snapshots for real-time collaboration. If a user uploads 100MB of glTF in Third Room to edit a scene, you want all participants to be able to receive the data and stream-decode it with minimal latency. Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com/matrix-org/matrix-spec/issues/432) @@ -30,7 +30,7 @@ Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com * Relatively minor changes needed from AES-CTR to sequence-of-AES-GCM-blocks for implementations like [https://github.com/matrix-org/matrix-encrypt-attachment](https://github.com/matrix-org/matrix-encrypt-attachment) * We automatically maintain a serverside E2EE store of the file as normal, while also getting 1:many streaming semantics * Provides streaming transfer for any file type - not just media formats -* Minimises memory usage in Matrix clients for large file transfers. Currently all(?) implementations store the whole file in RAM in order to check hashes and then decrypt, whereas this would naturally lend itself to processing files incrementally in blocks. +* Minimises memory usage in Matrix clients for large file transfers. Currently all(?) client implementations store the whole file in RAM in order to check hashes and then decrypt, whereas this would naturally lend itself to processing files incrementally in blocks. * Leverages AES-GCM’s existing primitives and hashing rather than inventing our own hashing strategy * We already had Range/Content-Range resumable/seekable zero-latency HTTP transfer implemented and working excellently pre-E2EE and pre-Matrix in our ‘glow’ codebase. @@ -41,6 +41,7 @@ Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com * Cancelled file uploads will still leak a partial file transfer to receivers who start to stream, which could be awkward if the sender sent something sensitive, and then can’t tell who downloaded what before they hit the cancel button * Small bandwidth overhead for the additional AEADs and block headers - probably ~16 bytes per block. * Out of the box it wouldn't be able to adapt streaming to network conditions (no HLS or DASH style support for multiple bitstreams) +* Might not play nice with CDNs? (I haven't checked if they pass through Range headers properly) ## Detailed proposal @@ -90,4 +91,5 @@ Therefore, I’ve written this as a high-level MSC to gather feedback, and to ge ## Dependencies -This MSC depends on [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), which is landing currently in the spec. \ No newline at end of file +This MSC depends on [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), which is landing currently in the spec. +Extends [MSC3469](https://github.com/matrix-org/matrix-spec-proposals/pull/3469). \ No newline at end of file From 6dc6f94d4e5cc921669482edbb744378dcc9ba68 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 14 May 2023 19:17:18 +0100 Subject: [PATCH 06/20] spell out security consideration on partial xfers --- proposals/4016-streaming-e2ee-file-transfer.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 03d52c60cd2..671f8d1e007 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -82,6 +82,7 @@ TODO, if folks think it's worth it * Variable size blocks could leak metadata for VBR audio. Mitigation is to use CBR if you care about leaking voice traffic patterns (constant size blocks isn’t necessarily enough, as you’d still leak the traffic patterns) * Is encrypting a sequence number in block header (with authenticated encryption) sufficient to mitigate reordering attacks? * Do the repeated and predictable encrypted block headers facilitate attacks? +* The resulting lack of atomicity on file transfer means that accidentally uploaded files may leak partial contents to other users, even if they're cancelled. ## Conclusion From 68d5d140f3474f3ce1204b8d504dab9798faeac1 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Mon, 15 May 2023 10:17:38 +0100 Subject: [PATCH 07/20] add torrent note from anoa --- proposals/4016-streaming-e2ee-file-transfer.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 671f8d1e007..6310f3978ea 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -33,6 +33,7 @@ Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com * Minimises memory usage in Matrix clients for large file transfers. Currently all(?) client implementations store the whole file in RAM in order to check hashes and then decrypt, whereas this would naturally lend itself to processing files incrementally in blocks. * Leverages AES-GCM’s existing primitives and hashing rather than inventing our own hashing strategy * We already had Range/Content-Range resumable/seekable zero-latency HTTP transfer implemented and working excellently pre-E2EE and pre-Matrix in our ‘glow’ codebase. +* Random access could enable future torrent-like semantics in future (i.e. servers doing parallel downloads of different chunks from different servers, with appropriate coordination) ## Limitations @@ -93,4 +94,4 @@ Therefore, I’ve written this as a high-level MSC to gather feedback, and to ge ## Dependencies This MSC depends on [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), which is landing currently in the spec. -Extends [MSC3469](https://github.com/matrix-org/matrix-spec-proposals/pull/3469). \ No newline at end of file +Extends [MSC3469](https://github.com/matrix-org/matrix-spec-proposals/pull/3469). From 65f20d0cdd1f24d65644bedb00e26c6472b6a288 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Thu, 1 Jun 2023 14:04:22 +0100 Subject: [PATCH 08/20] Update proposals/4016-streaming-e2ee-file-transfer.md Co-authored-by: opusforlife2 <53176348+opusforlife2@users.noreply.github.com> --- proposals/4016-streaming-e2ee-file-transfer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 6310f3978ea..dacc74e0ff2 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -33,7 +33,7 @@ Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com * Minimises memory usage in Matrix clients for large file transfers. Currently all(?) client implementations store the whole file in RAM in order to check hashes and then decrypt, whereas this would naturally lend itself to processing files incrementally in blocks. * Leverages AES-GCM’s existing primitives and hashing rather than inventing our own hashing strategy * We already had Range/Content-Range resumable/seekable zero-latency HTTP transfer implemented and working excellently pre-E2EE and pre-Matrix in our ‘glow’ codebase. -* Random access could enable future torrent-like semantics in future (i.e. servers doing parallel downloads of different chunks from different servers, with appropriate coordination) +* Random access could enable torrent-like semantics in future (i.e. servers doing parallel downloads of different chunks from different servers, with appropriate coordination) ## Limitations From dc613544de9291baef471c32cd2c77671c12bf94 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Thu, 1 Jun 2023 21:15:20 +0100 Subject: [PATCH 09/20] clarify that MSC4016 is not needed to stream decryption/encryption --- proposals/4016-streaming-e2ee-file-transfer.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index dacc74e0ff2..5fa18df56b2 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -1,4 +1,4 @@ -# WIP: MSC4016: Streaming E2EE file transfer with random access +# WIP: MSC4016: Streaming E2EE file transfer with random access and zero latency ## Problem @@ -10,6 +10,8 @@ Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com/matrix-org/matrix-spec/issues/432) +N.B. this MSC is *not* needed to do a streaming decryption or encryption of E2EE files (as opposed to streaming transfer). The current APIs let you stream a download of AES-CTR data and incrementally decrypt it without loading the whole thing into RAM, calculating the hash as you go, and then either surfacing or deleting the decrypted result at the end if the hash matches. Similarly when uploading (if combined with [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), given you won't be able to send the hash in the event contents until you've uploaded the media). + ## Solution overview * Upload content in a single file made up of contiguous blocks of AES-GCM content. From d25b7e000abf8ac7bf7e8980586df90df6795669 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sat, 30 Dec 2023 18:47:15 +0000 Subject: [PATCH 10/20] flesh out MSC4016 with a detailed proposal --- .../4016-streaming-e2ee-file-transfer.md | 103 ++++++++++++++++-- 1 file changed, 94 insertions(+), 9 deletions(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 5fa18df56b2..6aebfc9953e 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -10,9 +10,11 @@ Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com/matrix-org/matrix-spec/issues/432) -N.B. this MSC is *not* needed to do a streaming decryption or encryption of E2EE files (as opposed to streaming transfer). The current APIs let you stream a download of AES-CTR data and incrementally decrypt it without loading the whole thing into RAM, calculating the hash as you go, and then either surfacing or deleting the decrypted result at the end if the hash matches. Similarly when uploading (if combined with [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), given you won't be able to send the hash in the event contents until you've uploaded the media). +N.B. this MSC is *not* needed to do a streaming decryption or encryption of E2EE files (as opposed to streaming transfer). The current APIs let you stream a download of AES-CTR data and incrementally decrypt it without loading the whole thing into RAM, calculating the hash as you go, and then either surfacing or deleting the decrypted result at the end if the hash matches. -## Solution overview +Relatedly, v2 MXC attachments can't be stream-transferred, even if combined with [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), given you won't be able to send the hash in the event contents until you've uploaded the media. + +## Solution sketch * Upload content in a single file made up of contiguous blocks of AES-GCM content. * Typically constant block size (e.g. 32KB) @@ -20,7 +22,7 @@ N.B. this MSC is *not* needed to do a streaming decryption or encryption of E2EE * This would then require a registration sequence to identify the starts of blocks boundaries when seeking randomly (potentially escaping the bitstream to avoid registration code collisions). * Unlike today’s AES-CTR attachments, AES-GCM makes the content self-authenticating, in that it includes an authentication tag (AEAD) to hash the contents and protect against substitution attacks (i.e. where an attacker flips some bits in the encrypted payload to strategically corrupt the plaintext, and nobody notices as the content isn’t hashed). * (The only reason Matrix currently uses AES-CTR is that native AES-GCM primitives weren’t widespread enough on Android back in 2016) -* To prevent against reordering attacks, each AES-GCM block has to include an encrypted block header which includes a sequence number, so we can be sure that when we request block N, we’re actually getting block N back. +* To prevent against reordering attacks, each AES-GCM block has to include an encrypted block header which includes a sequence number, so we can be sure that when we request block N, we’re actually getting block N back - or equivalent. * XXX: is there still a vulnerability here? Other approaches use Merkle trees to hash the AEADs rather than simple sequence numbers, but why? * We then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek while downloading * We could also use [Youtube-style](https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol) off-standard Content-Range headers on POST when uploading for resumable/incremental uploads. @@ -39,7 +41,7 @@ N.B. this MSC is *not* needed to do a streaming decryption or encryption of E2EE ## Limitations -* Enterprisey features like content scanning and CDGs require visibility on the whole file, so would eliminate the advantages of streaming by having to buffering it up in order to scan it. +* Enterprisey features like content scanning and CDGs require visibility on the whole file, so would eliminate the advantages of streaming by having to buffering it up in order to scan it. (Clientside scanners would benefit from file transfer latency halving but wouldn't be able to show mid-transfer files) * When applied to unencrypted files, server-side content scanning (for trust & safety etc) would be unable to scan until it’s too late. * Cancelled file uploads will still leak a partial file transfer to receivers who start to stream, which could be awkward if the sender sent something sensitive, and then can’t tell who downloaded what before they hit the cancel button * Small bandwidth overhead for the additional AEADs and block headers - probably ~16 bytes per block. @@ -48,10 +50,84 @@ N.B. this MSC is *not* needed to do a streaming decryption or encryption of E2EE ## Detailed proposal -TODO, if folks think it's worth it +The file is uploaded asynchronously using [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246). + +The encrypted file block looks like: + +```json5 +"file": { + "v": "org.matrix.msc4016.v3", + "key": { + "alg": "A256GCM", + "ext": true, + "k": "cngOuL8OH0W7lxseExjxUyBOavJlomA7N0n1a3RxSUA", + "key_ops": [ + "encrypt", + "decrypt" + ], + "kty": "oct" + }, + "iv": "HVTXIOuVEax4E+TB", // 96-bit base-64 encoded initialisation vector + "url": "mxc://example.com/raAZzpGSeMjpAYfVdTrQILBI", +}, +``` + +N.B. there is no longer a `hashes` key, as AES-GCM includes its own hashing to enforce the integrity of the file transfer. +Therefore we can authenticate the transfer by the fact we can decrypt it using its key & IV (unless an attacker who controls +the same key & IV has substituted it for another file - but the benefit of doing so is questionable). + +We split the file stream into blocks of AES-256-GCM, with the following simple framing: + + * File header with a magic number of: 0x4D, 0x58, 0x43, 0x03 ("MXC" 0x03) - just so `file` can recognise it. + * 1..N blocks, each with a header of: + * a 32-bit field: 0xFFFFFFFF (a registration code to let a parser handle random access within the file + * a 32-bit field: block sequence number (starting at zero, used to calculate the IV of the block, and to aid random access) + * a 32-bit field: the length in bytes of the encrypted data in this block. + * a 32-bit field: a CRC32 checksum of the prior data. This is used when randomly seeking as a consistency check to confirm that the registration code really did indicate the beginning of a valid frame of data. It is not used for cryptographic integrity. + * the actual AES-GCM bitstream for that block. + * the plaintext block size can be variable; 32KB is a good default for most purposes. + * Audio streams may want to use a smaller block size (e.g. 1KB blocks for a CBR 32kbps Opus stream will give 250ms of streaming latency). Audio streams should be CBR to avoid leaking audio waveform metadata via block size. + * The block is encrypted using an IV formed by concatenating the block sequence number of the `file` block with the IV from the `file` block (forming a 128-bit IV, which will be hashed down to 96-bit again within AES-GCM). This avoids IV reuse (at least until it wraps after 2^32-1 blocks, which at 32KB per block is 137TB (18 hours of 8k raw video), or at 1KB per block is 4TB (34 years of 32kbps audio)). + * Implementations MUST terminate a stream if the seqnum is exhausted, to prevent IV reuse. + * XXX: Alternatively, we could use a 64-bit seqnum, spending 8 bytes of header on seqnums feels like a waste of bandwidth just to support massive transfers. And we'd have to manually hash it with the 96-bit IV rather than use the GCM implementation. + * The block is encrypted including the 32-bit block sequence number as Additional Authenticated Data, thus stopping encrypted blocks from impersonating each other. + +Or graphically, each frame is: + +``` +protocol "Registration Code (0xFFFFFFF):32,Block sequence number:32,Encrypted block length:32,CRC32:32,AES-GCM encrypted Data:64" + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Registration Code (0xFFFFFFF) | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Block sequence number | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| Encrypted block length | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| CRC32 | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| | ++ AES-GCM encrypted Data + +| | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +``` + +The actual file upload can then be streamed in blocks to the media server using `Content-Range` headers on the `PUT` method +as https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol#Resume_Upload. + +XXX: the media API needs to advertise that it supports streamed file transfer somehow. + +We then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek and +resume while downloading. The actual download should stream as rapidly as possible from the media server, letting the +receiver view it incrementally as the upload happens, providing "zero-latency". ## Alternatives +* We could use an existing streaming encrypted framing format of some kind rather (SRTP perhaps, which would give us timestamps for easier random access for audio/video streams) - but this feels a bit strange for plain old file streams. +* Alternatively, we could descope random access entirely, given it only makes sense for AV streams, and requires timestamps to work nicely - and simply being able to stream encryption/decryption is a win in its own right. For instance, glow doesn't let you seek randomly within files which are mid transfer; only tail. * Split files into a series of separate m.file uploads which the client then has to glue back together (as the [voice broadcast feature](https://github.com/vector-im/element-meta/discussions/632) does in Element today). * Pros: * Works automatically with antivirus & CDGs @@ -65,6 +141,7 @@ TODO, if folks think it's worth it * Could use MSC3898-inspired media control to seek in the stream * Cons: * You don’t get a serverside copy of the data + * Hard for clients to implement relative to a simple HTTP download * You expose client IPs to each other if going P2P rather than via TURN * Do streaming voice/video messages/broadcast via WebRTC media channels instead (as hinted in [MSC3888](https://github.com/matrix-org/matrix-spec-proposals/pull/3888)) * Pros: @@ -79,21 +156,29 @@ TODO, if folks think it's worth it * Requires client to have a WebRTC stack * A suitable SFU still doesn’t exist yet * Transfer files out of band using a protocol which already provides streaming transfers (e.g. IPFS?) +* Could use tus.io as an almost-standard format for HTTP resumable uploads (PATCH + Upload-Offset headers) instead, although tus servers don't seem to stream. ## Security considerations * Variable size blocks could leak metadata for VBR audio. Mitigation is to use CBR if you care about leaking voice traffic patterns (constant size blocks isn’t necessarily enough, as you’d still leak the traffic patterns) * Is encrypting a sequence number in block header (with authenticated encryption) sufficient to mitigate reordering attacks? -* Do the repeated and predictable encrypted block headers facilitate attacks? * The resulting lack of atomicity on file transfer means that accidentally uploaded files may leak partial contents to other users, even if they're cancelled. +* Clients may well wish to scan untrusted inbound file transfers for malware etc, which means buffering the inbound transfer and scanning it before presenting it to the user. +* Removing the `hashes` entry on the EncryptedFile description means that an attacker who controls the key & IV of the original file transfer could strategically substitute the file contents. This could be desirable for CDGs wishing to switch a file for a sanitised version without breaking the Matrix event hashes. For other scenarios it could be undesirable. An alternative might be for the sender to keep sending new hashes in related matrix events as the stream uploads, but it's unclear if this is worth it. ## Conclusion -It’s a bit unclear whether this is actually an improvement over formalising chunk-based file transfer as voice broadcast does today. The fact that it’s incompatible with content scanners and CDGs is a bit of a turn off. It’s also a bit unclear whether voice/video broadcast would be better served via MSC3888 style behaviour. +For the voice broadcast use case, it's a bit unclear whether this is actually an improvement over splitting files into multiple file uploads (or [MSC3888](https://github.com/matrix-org/matrix-spec-proposals/blob/weeman1337/voice-broadcast/proposals/3888-voice-broadcast.md)). It's also unfortunate that the benefits of the MSC are reduced with content scanners and CDGs. It’s also a bit unclear whether voice/video broadcast would be better served via MSC3888 style behaviour. -Therefore, I’ve written this as a high-level MSC to gather feedback, and to get the design down on paper before I forget it (I originally sketched this out a month or so ago). +However, for halving the transfer time for large videos and files (and the magic "zero latency" of being able to see file transfers instantly start to download as they upload) it still feels like a worthwhile MSC. Switching to GCM is desirable too in terms of providing authenticated encryption and avoiding having to calculate out-of-band hashes for file transfer. Finally, implementating this MSC will force implementations to stream their file encryption/decryption and avoid the temptation to load the whole file into RAM (which doesn't scale, especially in constrained environments such as iOS Share Extensions). ## Dependencies -This MSC depends on [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), which is landing currently in the spec. +This MSC depends on [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), which has now landed in the spec. Extends [MSC3469](https://github.com/matrix-org/matrix-spec-proposals/pull/3469). + +## Unstable prefixes + +| Unstable prefix | Stable prefix | +| --------------------- | ------------------- | +| org.matrix.msc4016.v3 | v3 | From 3d9d7885833e4e7aa542312a9bd2ead80b9ba327 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sat, 30 Dec 2023 18:49:44 +0000 Subject: [PATCH 11/20] line wrap and de-WIP --- .../4016-streaming-e2ee-file-transfer.md | 177 +++++++++++++----- 1 file changed, 125 insertions(+), 52 deletions(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 6aebfc9953e..d8415b08679 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -1,51 +1,86 @@ -# WIP: MSC4016: Streaming E2EE file transfer with random access and zero latency +# MSC4016: Streaming E2EE file transfer with random access and zero latency ## Problem -* File transfers currently take twice as long as they could, as they must first be uploaded in their entirety to the sender’s server before being downloaded via the receiver’s server. -* As a result, relative to a dedicated file-copying system (e.g. scp) they feel sluggish. For instance, you can’t incrementally view a progressive JPEG or voice or video file as it’s being uploaded for “zero latency” file transfers. +* File transfers currently take twice as long as they could, as they must first be uploaded in their entirety to the + sender’s server before being downloaded via the receiver’s server. +* As a result, relative to a dedicated file-copying system (e.g. scp) they feel sluggish. For instance, you can’t + incrementally view a progressive JPEG or voice or video file as it’s being uploaded for “zero latency” file + transfers. * You can’t skip within them without downloading the whole thing (if they’re streamable content, such as an .opus file) -* For instance, you can’t do realtime broadcast of voice messages via Matrix, or skip within them (other than splitting them into a series of separate file transfers). -* Another example is sharing document snapshots for real-time collaboration. If a user uploads 100MB of glTF in Third Room to edit a scene, you want all participants to be able to receive the data and stream-decode it with minimal latency. +* For instance, you can’t do realtime broadcast of voice messages via Matrix, or skip within them (other than splitting + them into a series of separate file transfers). +* Another example is sharing document snapshots for real-time collaboration. If a user uploads 100MB of glTF in Third + Room to edit a scene, you want all participants to be able to receive the data and stream-decode it with minimal + latency. Closes [https://github.com/matrix-org/matrix-spec/issues/432](https://github.com/matrix-org/matrix-spec/issues/432) -N.B. this MSC is *not* needed to do a streaming decryption or encryption of E2EE files (as opposed to streaming transfer). The current APIs let you stream a download of AES-CTR data and incrementally decrypt it without loading the whole thing into RAM, calculating the hash as you go, and then either surfacing or deleting the decrypted result at the end if the hash matches. +N.B. this MSC is *not* needed to do a streaming decryption or encryption of E2EE files (as opposed to streaming +transfer). The current APIs let you stream a download of AES-CTR data and incrementally decrypt it without loading the +whole thing into RAM, calculating the hash as you go, and then either surfacing or deleting the decrypted result at the +end if the hash matches. -Relatedly, v2 MXC attachments can't be stream-transferred, even if combined with [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), given you won't be able to send the hash in the event contents until you've uploaded the media. +Relatedly, v2 MXC attachments can't be stream-transferred, even if combined with [MSC2246] +(https://github.com/matrix-org/matrix-spec-proposals/pull/2246), given you won't be able to send the hash in the event +contents until you've uploaded the media. ## Solution sketch * Upload content in a single file made up of contiguous blocks of AES-GCM content. * Typically constant block size (e.g. 32KB) - * Or variable block size (to allow time-based blocksize for low-latency seeking in streamable content) - e.g. one block per opus frame. Otherwise a 32KB block ends up being 8s of typical opus latency. - * This would then require a registration sequence to identify the starts of blocks boundaries when seeking randomly (potentially escaping the bitstream to avoid registration code collisions). -* Unlike today’s AES-CTR attachments, AES-GCM makes the content self-authenticating, in that it includes an authentication tag (AEAD) to hash the contents and protect against substitution attacks (i.e. where an attacker flips some bits in the encrypted payload to strategically corrupt the plaintext, and nobody notices as the content isn’t hashed). - * (The only reason Matrix currently uses AES-CTR is that native AES-GCM primitives weren’t widespread enough on Android back in 2016) -* To prevent against reordering attacks, each AES-GCM block has to include an encrypted block header which includes a sequence number, so we can be sure that when we request block N, we’re actually getting block N back - or equivalent. - * XXX: is there still a vulnerability here? Other approaches use Merkle trees to hash the AEADs rather than simple sequence numbers, but why? -* We then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek while downloading -* We could also use [Youtube-style](https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol) off-standard Content-Range headers on POST when uploading for resumable/incremental uploads. + * Or variable block size (to allow time-based blocksize for low-latency seeking in streamable content) - e.g. one + block per opus frame. Otherwise a 32KB block ends up being 8s of typical opus latency. + * This would then require a registration sequence to identify the starts of blocks boundaries when seeking + randomly (potentially escaping the bitstream to avoid registration code collisions). +* Unlike today’s AES-CTR attachments, AES-GCM makes the content self-authenticating, in that it includes an + authentication tag (AEAD) to hash the contents and protect against substitution attacks (i.e. where an attacker flips + some bits in the encrypted payload to strategically corrupt the plaintext, and nobody notices as the content isn’t + hashed). + * (The only reason Matrix currently uses AES-CTR is that native AES-GCM primitives weren’t widespread enough on + Android back in 2016) +* To prevent against reordering attacks, each AES-GCM block has to include an encrypted block header which includes a + sequence number, so we can be sure that when we request block N, we’re actually getting block N back - or + equivalent. + * XXX: is there still a vulnerability here? Other approaches use Merkle trees to hash the AEADs rather than simple + sequence numbers, but why? +* We then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek while + downloading +* We could also use [Youtube-style] + (https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol) off-standard Content-Range headers + on POST when uploading for resumable/incremental uploads. ## Advantages * Backwards compatible with current implementations at the HTTP layer * Fully backwards compatible for unencrypted transfers -* Relatively minor changes needed from AES-CTR to sequence-of-AES-GCM-blocks for implementations like [https://github.com/matrix-org/matrix-encrypt-attachment](https://github.com/matrix-org/matrix-encrypt-attachment) -* We automatically maintain a serverside E2EE store of the file as normal, while also getting 1:many streaming semantics +* Relatively minor changes needed from AES-CTR to sequence-of-AES-GCM-blocks for implementations like + [https://github.com/matrix-org/matrix-encrypt-attachment](https://github.com/matrix-org/matrix-encrypt-attachment) +* We automatically maintain a serverside E2EE store of the file as normal, while also getting 1:many streaming + semantics * Provides streaming transfer for any file type - not just media formats -* Minimises memory usage in Matrix clients for large file transfers. Currently all(?) client implementations store the whole file in RAM in order to check hashes and then decrypt, whereas this would naturally lend itself to processing files incrementally in blocks. +* Minimises memory usage in Matrix clients for large file transfers. Currently all(?) client implementations store the + whole file in RAM in order to check hashes and then decrypt, whereas this would naturally lend itself to processing + files incrementally in blocks. * Leverages AES-GCM’s existing primitives and hashing rather than inventing our own hashing strategy -* We already had Range/Content-Range resumable/seekable zero-latency HTTP transfer implemented and working excellently pre-E2EE and pre-Matrix in our ‘glow’ codebase. -* Random access could enable torrent-like semantics in future (i.e. servers doing parallel downloads of different chunks from different servers, with appropriate coordination) +* We already had Range/Content-Range resumable/seekable zero-latency HTTP transfer implemented and working excellently + pre-E2EE and pre-Matrix in our ‘glow’ codebase. +* Random access could enable torrent-like semantics in future (i.e. servers doing parallel downloads of different chunks + from different servers, with appropriate coordination) ## Limitations -* Enterprisey features like content scanning and CDGs require visibility on the whole file, so would eliminate the advantages of streaming by having to buffering it up in order to scan it. (Clientside scanners would benefit from file transfer latency halving but wouldn't be able to show mid-transfer files) -* When applied to unencrypted files, server-side content scanning (for trust & safety etc) would be unable to scan until it’s too late. -* Cancelled file uploads will still leak a partial file transfer to receivers who start to stream, which could be awkward if the sender sent something sensitive, and then can’t tell who downloaded what before they hit the cancel button +* Enterprisey features like content scanning and CDGs require visibility on the whole file, so would eliminate the + advantages of streaming by having to buffering it up in order to scan it. (Clientside scanners would benefit from + file transfer latency halving but wouldn't be able to show mid-transfer files) +* When applied to unencrypted files, server-side content scanning (for trust & safety etc) would be unable to scan until + it’s too late. +* Cancelled file uploads will still leak a partial file transfer to receivers who start to stream, which could be + awkward if the sender sent something sensitive, and then can’t tell who downloaded what before they hit the cancel + button * Small bandwidth overhead for the additional AEADs and block headers - probably ~16 bytes per block. -* Out of the box it wouldn't be able to adapt streaming to network conditions (no HLS or DASH style support for multiple bitstreams) +* Out of the box it wouldn't be able to adapt streaming to network conditions (no HLS or DASH style support for multiple + bitstreams) * Might not play nice with CDNs? (I haven't checked if they pass through Range headers properly) ## Detailed proposal @@ -72,25 +107,37 @@ The encrypted file block looks like: }, ``` -N.B. there is no longer a `hashes` key, as AES-GCM includes its own hashing to enforce the integrity of the file transfer. -Therefore we can authenticate the transfer by the fact we can decrypt it using its key & IV (unless an attacker who controls -the same key & IV has substituted it for another file - but the benefit of doing so is questionable). +N.B. there is no longer a `hashes` key, as AES-GCM includes its own hashing to enforce the integrity of the file +transfer. Therefore we can authenticate the transfer by the fact we can decrypt it using its key & IV (unless an +attacker who controls the same key & IV has substituted it for another file - but the benefit of doing so is +questionable). We split the file stream into blocks of AES-256-GCM, with the following simple framing: * File header with a magic number of: 0x4D, 0x58, 0x43, 0x03 ("MXC" 0x03) - just so `file` can recognise it. * 1..N blocks, each with a header of: * a 32-bit field: 0xFFFFFFFF (a registration code to let a parser handle random access within the file - * a 32-bit field: block sequence number (starting at zero, used to calculate the IV of the block, and to aid random access) + * a 32-bit field: block sequence number (starting at zero, used to calculate the IV of the block, and to aid random + access) * a 32-bit field: the length in bytes of the encrypted data in this block. - * a 32-bit field: a CRC32 checksum of the prior data. This is used when randomly seeking as a consistency check to confirm that the registration code really did indicate the beginning of a valid frame of data. It is not used for cryptographic integrity. + * a 32-bit field: a CRC32 checksum of the prior data. This is used when randomly seeking as a consistency check to + confirm that the registration code really did indicate the beginning of a valid frame of data. It is not used + for cryptographic integrity. * the actual AES-GCM bitstream for that block. * the plaintext block size can be variable; 32KB is a good default for most purposes. - * Audio streams may want to use a smaller block size (e.g. 1KB blocks for a CBR 32kbps Opus stream will give 250ms of streaming latency). Audio streams should be CBR to avoid leaking audio waveform metadata via block size. - * The block is encrypted using an IV formed by concatenating the block sequence number of the `file` block with the IV from the `file` block (forming a 128-bit IV, which will be hashed down to 96-bit again within AES-GCM). This avoids IV reuse (at least until it wraps after 2^32-1 blocks, which at 32KB per block is 137TB (18 hours of 8k raw video), or at 1KB per block is 4TB (34 years of 32kbps audio)). + * Audio streams may want to use a smaller block size (e.g. 1KB blocks for a CBR 32kbps Opus stream will give + 250ms of streaming latency). Audio streams should be CBR to avoid leaking audio waveform metadata via block + size. + * The block is encrypted using an IV formed by concatenating the block sequence number of the `file` block with + the IV from the `file` block (forming a 128-bit IV, which will be hashed down to 96-bit again within + AES-GCM). This avoids IV reuse (at least until it wraps after 2^32-1 blocks, which at 32KB per block is + 137TB (18 hours of 8k raw video), or at 1KB per block is 4TB (34 years of 32kbps audio)). * Implementations MUST terminate a stream if the seqnum is exhausted, to prevent IV reuse. - * XXX: Alternatively, we could use a 64-bit seqnum, spending 8 bytes of header on seqnums feels like a waste of bandwidth just to support massive transfers. And we'd have to manually hash it with the 96-bit IV rather than use the GCM implementation. - * The block is encrypted including the 32-bit block sequence number as Additional Authenticated Data, thus stopping encrypted blocks from impersonating each other. + * XXX: Alternatively, we could use a 64-bit seqnum, spending 8 bytes of header on seqnums feels like a waste + of bandwidth just to support massive transfers. And we'd have to manually hash it with the 96-bit IV + rather than use the GCM implementation. + * The block is encrypted including the 32-bit block sequence number as Additional Authenticated Data, thus + stopping encrypted blocks from impersonating each other. Or graphically, each frame is: @@ -115,8 +162,8 @@ protocol "Registration Code (0xFFFFFFF):32,Block sequence number:32,Encrypted bl ``` -The actual file upload can then be streamed in blocks to the media server using `Content-Range` headers on the `PUT` method -as https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol#Resume_Upload. +The actual file upload can then be streamed in blocks to the media server using `Content-Range` headers on the `PUT` +method as https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol#Resume_Upload. XXX: the media API needs to advertise that it supports streamed file transfer somehow. @@ -126,16 +173,25 @@ receiver view it incrementally as the upload happens, providing "zero-latency". ## Alternatives -* We could use an existing streaming encrypted framing format of some kind rather (SRTP perhaps, which would give us timestamps for easier random access for audio/video streams) - but this feels a bit strange for plain old file streams. -* Alternatively, we could descope random access entirely, given it only makes sense for AV streams, and requires timestamps to work nicely - and simply being able to stream encryption/decryption is a win in its own right. For instance, glow doesn't let you seek randomly within files which are mid transfer; only tail. -* Split files into a series of separate m.file uploads which the client then has to glue back together (as the [voice broadcast feature](https://github.com/vector-im/element-meta/discussions/632) does in Element today). +* We could use an existing streaming encrypted framing format of some kind rather (SRTP perhaps, which would give us + timestamps for easier random access for audio/video streams) - but this feels a bit strange for plain old file + streams. +* Alternatively, we could descope random access entirely, given it only makes sense for AV streams, and requires + timestamps to work nicely - and simply being able to stream encryption/decryption is a win in its own right. For + instance, glow doesn't let you seek randomly within files which are mid transfer; only tail. +* Split files into a series of separate m.file uploads which the client then has to glue back together (as the + [voice broadcast feature](https://github.com/vector-im/element-meta/discussions/632) does in Element today). * Pros: * Works automatically with antivirus & CDGs - * Could be made to map onto HLS or DASH? (by generating an .m3u8 which contains a bunch of MXC urls? This could also potentially solve the glitching problems we’ve had, by reusing existing HLS players augmented with our E2EE support) + * Could be made to map onto HLS or DASH? (by generating an .m3u8 which contains a bunch of MXC urls? This could + also potentially solve the glitching problems we’ve had, by reusing existing HLS players augmented with our + E2EE support) * Cons: - * Is always going to be high latency (e.g. Element currently splits into ~30s chunks) given rate limits on sending file events + * Is always going to be high latency (e.g. Element currently splits into ~30s chunks) given rate limits on + sending file events * Can be a pain to glue media uploads back together without glitching -* Transfer files via streaming P2P file transfer via WebRTC data channels (https://github.com/matrix-org/matrix-spec/issues/189) +* Transfer files via streaming P2P file transfer via WebRTC data channels + (https://github.com/matrix-org/matrix-spec/issues/189) * Pros: * Easy to implement with Matrix’s existing WebRTC signalling * Could use MSC3898-inspired media control to seek in the stream @@ -143,7 +199,7 @@ receiver view it incrementally as the upload happens, providing "zero-latency". * You don’t get a serverside copy of the data * Hard for clients to implement relative to a simple HTTP download * You expose client IPs to each other if going P2P rather than via TURN -* Do streaming voice/video messages/broadcast via WebRTC media channels instead (as hinted in [MSC3888](https://github.com/matrix-org/matrix-spec-proposals/pull/3888)) +* Do streaming voice/video messages/broadcast via WebRTC media channels instead * Pros: * Lowest latency * Could use media control to seek @@ -156,26 +212,43 @@ receiver view it incrementally as the upload happens, providing "zero-latency". * Requires client to have a WebRTC stack * A suitable SFU still doesn’t exist yet * Transfer files out of band using a protocol which already provides streaming transfers (e.g. IPFS?) -* Could use tus.io as an almost-standard format for HTTP resumable uploads (PATCH + Upload-Offset headers) instead, although tus servers don't seem to stream. +* Could use tus.io as an almost-standard format for HTTP resumable uploads (PATCH + Upload-Offset headers) instead, + although tus servers don't seem to stream. ## Security considerations -* Variable size blocks could leak metadata for VBR audio. Mitigation is to use CBR if you care about leaking voice traffic patterns (constant size blocks isn’t necessarily enough, as you’d still leak the traffic patterns) -* Is encrypting a sequence number in block header (with authenticated encryption) sufficient to mitigate reordering attacks? -* The resulting lack of atomicity on file transfer means that accidentally uploaded files may leak partial contents to other users, even if they're cancelled. -* Clients may well wish to scan untrusted inbound file transfers for malware etc, which means buffering the inbound transfer and scanning it before presenting it to the user. -* Removing the `hashes` entry on the EncryptedFile description means that an attacker who controls the key & IV of the original file transfer could strategically substitute the file contents. This could be desirable for CDGs wishing to switch a file for a sanitised version without breaking the Matrix event hashes. For other scenarios it could be undesirable. An alternative might be for the sender to keep sending new hashes in related matrix events as the stream uploads, but it's unclear if this is worth it. +* Variable size blocks could leak metadata for VBR audio. Mitigation is to use CBR if you care about leaking voice + traffic patterns (constant size blocks isn’t necessarily enough, as you’d still leak the traffic patterns) +* Is encrypting a sequence number in block header (with authenticated encryption) sufficient to mitigate reordering + attacks? +* The resulting lack of atomicity on file transfer means that accidentally uploaded files may leak partial contents to + other users, even if they're cancelled. +* Clients may well wish to scan untrusted inbound file transfers for malware etc, which means buffering the inbound + transfer and scanning it before presenting it to the user. +* Removing the `hashes` entry on the EncryptedFile description means that an attacker who controls the key & IV of the + original file transfer could strategically substitute the file contents. This could be desirable for CDGs wishing to + switch a file for a sanitised version without breaking the Matrix event hashes. For other scenarios it could be + undesirable. An alternative might be for the sender to keep sending new hashes in related matrix events as the + stream uploads, but it's unclear if this is worth it. ## Conclusion -For the voice broadcast use case, it's a bit unclear whether this is actually an improvement over splitting files into multiple file uploads (or [MSC3888](https://github.com/matrix-org/matrix-spec-proposals/blob/weeman1337/voice-broadcast/proposals/3888-voice-broadcast.md)). It's also unfortunate that the benefits of the MSC are reduced with content scanners and CDGs. It’s also a bit unclear whether voice/video broadcast would be better served via MSC3888 style behaviour. +For the voice broadcast use case, it's a bit unclear whether this is actually an improvement over splitting files into +multiple file uploads (or [MSC3888](https://github.com/matrix-org/matrix-spec-proposals/blob/weeman1337/voice-broadcast/proposals/3888-voice-broadcast.md)). +It's also unfortunate that the benefits of the MSC are reduced with content scanners and CDGs. It’s also a bit unclear +whether voice/video broadcast would be better served via MSC3888 style behaviour. -However, for halving the transfer time for large videos and files (and the magic "zero latency" of being able to see file transfers instantly start to download as they upload) it still feels like a worthwhile MSC. Switching to GCM is desirable too in terms of providing authenticated encryption and avoiding having to calculate out-of-band hashes for file transfer. Finally, implementating this MSC will force implementations to stream their file encryption/decryption and avoid the temptation to load the whole file into RAM (which doesn't scale, especially in constrained environments such as iOS Share Extensions). +However, for halving the transfer time for large videos and files (and the magic "zero latency" of being able to see +file transfers instantly start to download as they upload) it still feels like a worthwhile MSC. Switching to GCM is +desirable too in terms of providing authenticated encryption and avoiding having to calculate out-of-band hashes for +file transfer. Finally, implementating this MSC will force implementations to stream their file encryption/decryption +and avoid the temptation to load the whole file into RAM (which doesn't scale, especially in constrained environments +such as iOS Share Extensions). ## Dependencies -This MSC depends on [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), which has now landed in the spec. -Extends [MSC3469](https://github.com/matrix-org/matrix-spec-proposals/pull/3469). +This MSC depends on [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246), which has now landed in +the spec. Extends [MSC3469](https://github.com/matrix-org/matrix-spec-proposals/pull/3469). ## Unstable prefixes From 97f72f79f5dbf1b80638f620f489865a69b3553c Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sat, 30 Dec 2023 19:07:39 +0000 Subject: [PATCH 12/20] some TODOs --- proposals/4016-streaming-e2ee-file-transfer.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index d8415b08679..a68f0e1d6ff 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -165,12 +165,17 @@ protocol "Registration Code (0xFFFFFFF):32,Block sequence number:32,Encrypted bl The actual file upload can then be streamed in blocks to the media server using `Content-Range` headers on the `PUT` method as https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol#Resume_Upload. -XXX: the media API needs to advertise that it supports streamed file transfer somehow. +TODO: the media API needs to advertise that it supports streamed file transfer somehow. We then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek and resume while downloading. The actual download should stream as rapidly as possible from the media server, letting the receiver view it incrementally as the upload happens, providing "zero-latency". +TODO: We need a way to mark a transfer as complete or cancelled (via a relation?). If cancelled, the sender should +delete the partial upload (but the partial contents will have already leaked to the other side, of course). + +TODO: While we're at it, let's actually let users DELETE their file transfers, at last. + ## Alternatives * We could use an existing streaming encrypted framing format of some kind rather (SRTP perhaps, which would give us From 16efd7f88b096a3d6804c0cc8607ce573aefcc74 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sat, 30 Dec 2023 19:10:54 +0000 Subject: [PATCH 13/20] typo --- proposals/4016-streaming-e2ee-file-transfer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index a68f0e1d6ff..44e1f2be773 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -246,7 +246,7 @@ whether voice/video broadcast would be better served via MSC3888 style behaviour However, for halving the transfer time for large videos and files (and the magic "zero latency" of being able to see file transfers instantly start to download as they upload) it still feels like a worthwhile MSC. Switching to GCM is desirable too in terms of providing authenticated encryption and avoiding having to calculate out-of-band hashes for -file transfer. Finally, implementating this MSC will force implementations to stream their file encryption/decryption +file transfer. Finally, implementing this MSC will force implementations to stream their file encryption/decryption and avoid the temptation to load the whole file into RAM (which doesn't scale, especially in constrained environments such as iOS Share Extensions). From abca46f69934dbfe7afd944a75225c891f91ac9c Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sat, 30 Dec 2023 21:23:23 +0000 Subject: [PATCH 14/20] unentangle resumable uploads from streamable transfers --- proposals/4016-streaming-e2ee-file-transfer.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 44e1f2be773..ff5f18fc470 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -162,14 +162,18 @@ protocol "Registration Code (0xFFFFFFF):32,Block sequence number:32,Encrypted bl ``` -The actual file upload can then be streamed in blocks to the media server using `Content-Range` headers on the `PUT` -method as https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol#Resume_Upload. +The actual file upload can then be streamed in the request body in the PUT (requires HTTP/2 in browsers). Similarly, the +download can be streamed in the response body. The download should stream as rapidly as possible from the media +server, letting the receiver view it incrementally as the upload happens, providing "zero-latency" - while also storing +the stream to disk. -TODO: the media API needs to advertise that it supports streamed file transfer somehow. +For resumable uploads (or to upload in blocks for HTTP clients which don't support streaming request bodies), the client +can use `Content-Range` headers as per https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol#Resume_Upload. -We then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek and -resume while downloading. The actual download should stream as rapidly as possible from the media server, letting the -receiver view it incrementally as the upload happens, providing "zero-latency". +TODO: the media API needs to advertise if it supports resumable uploads. + +For resumable downloads, we then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek and +resume while downloading. TODO: We need a way to mark a transfer as complete or cancelled (via a relation?). If cancelled, the sender should delete the partial upload (but the partial contents will have already leaked to the other side, of course). From e671945a3e98886eda2dd456af68b8e22e93ebe3 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 31 Dec 2023 12:36:30 +0000 Subject: [PATCH 15/20] warn about seqnum discontinuities --- proposals/4016-streaming-e2ee-file-transfer.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index ff5f18fc470..8541d36dd51 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -133,6 +133,8 @@ We split the file stream into blocks of AES-256-GCM, with the following simple f AES-GCM). This avoids IV reuse (at least until it wraps after 2^32-1 blocks, which at 32KB per block is 137TB (18 hours of 8k raw video), or at 1KB per block is 4TB (34 years of 32kbps audio)). * Implementations MUST terminate a stream if the seqnum is exhausted, to prevent IV reuse. + * Receivers MUST terminate a stream if the seqnum does not sequentially increase (to prevent the server from + shuffling the blocks) * XXX: Alternatively, we could use a 64-bit seqnum, spending 8 bytes of header on seqnums feels like a waste of bandwidth just to support massive transfers. And we'd have to manually hash it with the 96-bit IV rather than use the GCM implementation. @@ -230,6 +232,7 @@ TODO: While we're at it, let's actually let users DELETE their file transfers, a traffic patterns (constant size blocks isn’t necessarily enough, as you’d still leak the traffic patterns) * Is encrypting a sequence number in block header (with authenticated encryption) sufficient to mitigate reordering attacks? + * When doing random access, the reader has to trust the server to serve the right blocks after a discontinuity * The resulting lack of atomicity on file transfer means that accidentally uploaded files may leak partial contents to other users, even if they're cancelled. * Clients may well wish to scan untrusted inbound file transfers for malware etc, which means buffering the inbound From 84d0ebfcee904f1a5f998d4d15e2564e15a8c653 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Sun, 31 Dec 2023 12:45:27 +0000 Subject: [PATCH 16/20] notes about thumbnailing and blurhashing --- proposals/4016-streaming-e2ee-file-transfer.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 8541d36dd51..5e0c6639f94 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -75,6 +75,11 @@ contents until you've uploaded the media. file transfer latency halving but wouldn't be able to show mid-transfer files) * When applied to unencrypted files, server-side content scanning (for trust & safety etc) would be unable to scan until it’s too late. +* For images & video, senders will still have to read (and decompress) enough of the file into RAM in order to thumbnail + it or calculate a blurhash, so the benefits of streaming in terms of RAM use on the sender are reduced. One could + restrict thumbnailing to the first 500MB of the transfer (or however much available RAM the client has) though, and + still stream the file itself, which would be hopefully be enough to thumbnail the first frame of a video, or most + images, while still being able to transfer arbitrary length files. * Cancelled file uploads will still leak a partial file transfer to receivers who start to stream, which could be awkward if the sender sent something sensitive, and then can’t tell who downloaded what before they hit the cancel button From 903d42afd109d2d4a354707bc92770571a8b0688 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Thu, 4 Jan 2024 11:47:33 +0000 Subject: [PATCH 17/20] add more alts & limitations --- .../4016-streaming-e2ee-file-transfer.md | 24 ++++++++++++------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 5e0c6639f94..186e6a11040 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -83,16 +83,19 @@ contents until you've uploaded the media. * Cancelled file uploads will still leak a partial file transfer to receivers who start to stream, which could be awkward if the sender sent something sensitive, and then can’t tell who downloaded what before they hit the cancel button -* Small bandwidth overhead for the additional AEADs and block headers - probably ~16 bytes per block. +* Small bandwidth overhead for the additional AEADs and block headers - ~32 bytes per block. * Out of the box it wouldn't be able to adapt streaming to network conditions (no HLS or DASH style support for multiple bitstreams) * Might not play nice with CDNs? (I haven't checked if they pass through Range headers properly) +* Recorded E2EE SFU streams (from a [MSC3898](https://github.com/matrix-org/matrix-spec-proposals/pull/3898) SFU or LiveKit SFU) + could be made available as live-streamed file transfers through this MSC. However, these streams would also have their + own S-Frame headers, whose keys would need to be added to the `EncryptedFile` block in addition to the AES-GCM layer. ## Detailed proposal The file is uploaded asynchronously using [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246). -The encrypted file block looks like: +The proposed v3 `EncryptedFile` block looks like: ```json5 "file": { @@ -114,7 +117,7 @@ The encrypted file block looks like: N.B. there is no longer a `hashes` key, as AES-GCM includes its own hashing to enforce the integrity of the file transfer. Therefore we can authenticate the transfer by the fact we can decrypt it using its key & IV (unless an -attacker who controls the same key & IV has substituted it for another file - but the benefit of doing so is +attacker who controls the same key & IV has substituted it for another file - but the benefit to them of doing so is questionable). We split the file stream into blocks of AES-256-GCM, with the following simple framing: @@ -125,9 +128,9 @@ We split the file stream into blocks of AES-256-GCM, with the following simple f * a 32-bit field: block sequence number (starting at zero, used to calculate the IV of the block, and to aid random access) * a 32-bit field: the length in bytes of the encrypted data in this block. - * a 32-bit field: a CRC32 checksum of the prior data. This is used when randomly seeking as a consistency check to - confirm that the registration code really did indicate the beginning of a valid frame of data. It is not used - for cryptographic integrity. + * a 32-bit field: a CRC32 checksum of the block, including headers. This is used when randomly seeking as a + consistency check to confirm that the registration code really did indicate the beginning of a valid frame of + data. It is not used for cryptographic integrity. * the actual AES-GCM bitstream for that block. * the plaintext block size can be variable; 32KB is a good default for most purposes. * Audio streams may want to use a smaller block size (e.g. 1KB blocks for a CBR 32kbps Opus stream will give @@ -179,14 +182,17 @@ can use `Content-Range` headers as per https://developers.google.com/youtube/v3/ TODO: the media API needs to advertise if it supports resumable uploads. -For resumable downloads, we then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek and -resume while downloading. +For resumable downloads, we then use normal +[HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek and resume while downloading. TODO: We need a way to mark a transfer as complete or cancelled (via a relation?). If cancelled, the sender should delete the partial upload (but the partial contents will have already leaked to the other side, of course). TODO: While we're at it, let's actually let users DELETE their file transfers, at last. +N.B. Clients which implement displaying blurhashes should progressively load the thumbnail over the top of the blurhash, +to make sure the detailed thumbnail streams in and is viewed as rapidly as possible. + ## Alternatives * We could use an existing streaming encrypted framing format of some kind rather (SRTP perhaps, which would give us @@ -230,6 +236,8 @@ TODO: While we're at it, let's actually let users DELETE their file transfers, a * Transfer files out of band using a protocol which already provides streaming transfers (e.g. IPFS?) * Could use tus.io as an almost-standard format for HTTP resumable uploads (PATCH + Upload-Offset headers) instead, although tus servers don't seem to stream. +* Could use ChaCha20-Poly1305 rather than AES-GCM, but no native webcrypto impl yet: https://github.com/w3c/webcrypto/issues/223 + * See also https://soatok.blog/2020/05/13/why-aes-gcm-sucks/ and https://andrea.corbellini.name/2023/03/09/authenticated-encryption/ ## Security considerations From c06cdcbe707424ca1847170adb05058de0c87db8 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Fri, 5 Jan 2024 14:02:01 +0000 Subject: [PATCH 18/20] update to use tus for resumable uploads --- .../4016-streaming-e2ee-file-transfer.md | 34 +++++++++++-------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 186e6a11040..95fb9f104a0 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -1,4 +1,4 @@ -# MSC4016: Streaming E2EE file transfer with random access and zero latency +# MSC4016: Streaming and resumable E2EE file transfer with random access ## Problem @@ -10,6 +10,7 @@ * You can’t skip within them without downloading the whole thing (if they’re streamable content, such as an .opus file) * For instance, you can’t do realtime broadcast of voice messages via Matrix, or skip within them (other than splitting them into a series of separate file transfers). +* You also can't resume uploads if they're interrupted. * Another example is sharing document snapshots for real-time collaboration. If a user uploads 100MB of glTF in Third Room to edit a scene, you want all participants to be able to receive the data and stream-decode it with minimal latency. @@ -44,11 +45,11 @@ contents until you've uploaded the media. equivalent. * XXX: is there still a vulnerability here? Other approaches use Merkle trees to hash the AEADs rather than simple sequence numbers, but why? +* We use streaming HTTP upload (https://developer.chrome.com/articles/fetch-streaming-requests/) and/or + [tus](https://tus.io/protocols/resumable-upload) resumable upload headers to incrementally send the file. This also + gives us resumable uploads. * We then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek while - downloading -* We could also use [Youtube-style] - (https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol) off-standard Content-Range headers - on POST when uploading for resumable/incremental uploads. + downloading. ## Advantages @@ -63,10 +64,12 @@ contents until you've uploaded the media. whole file in RAM in order to check hashes and then decrypt, whereas this would naturally lend itself to processing files incrementally in blocks. * Leverages AES-GCM’s existing primitives and hashing rather than inventing our own hashing strategy -* We already had Range/Content-Range resumable/seekable zero-latency HTTP transfer implemented and working excellently +* We've already implemented this once before (pre-Matrix) in our 'glow' codebase, and it worked excellently. pre-E2EE and pre-Matrix in our ‘glow’ codebase. * Random access could enable torrent-like semantics in future (i.e. servers doing parallel downloads of different chunks from different servers, with appropriate coordination) +* tus looks to be under consideration by the IETF HTTP working group, so we're hopefully picking the right protocol for + resumable uploads. ## Limitations @@ -87,9 +90,10 @@ contents until you've uploaded the media. * Out of the box it wouldn't be able to adapt streaming to network conditions (no HLS or DASH style support for multiple bitstreams) * Might not play nice with CDNs? (I haven't checked if they pass through Range headers properly) -* Recorded E2EE SFU streams (from a [MSC3898](https://github.com/matrix-org/matrix-spec-proposals/pull/3898) SFU or LiveKit SFU) - could be made available as live-streamed file transfers through this MSC. However, these streams would also have their - own S-Frame headers, whose keys would need to be added to the `EncryptedFile` block in addition to the AES-GCM layer. +* Recorded E2EE SFU streams (from a [MSC3898](https://github.com/matrix-org/matrix-spec-proposals/pull/3898) SFU or + LiveKit SFU) could be made available as live-streamed file transfers through this MSC. However, these streams would + also have their own S-Frame headers, whose keys would need to be added to the `EncryptedFile` block in addition to + the AES-GCM layer. ## Detailed proposal @@ -177,10 +181,8 @@ download can be streamed in the response body. The download should stream as ra server, letting the receiver view it incrementally as the upload happens, providing "zero-latency" - while also storing the stream to disk. -For resumable uploads (or to upload in blocks for HTTP clients which don't support streaming request bodies), the client -can use `Content-Range` headers as per https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol#Resume_Upload. - -TODO: the media API needs to advertise if it supports resumable uploads. +For resumable uploads (or to upload in blocks for HTTP clients which don't support streaming request bodies), we use +[tus](https://tus.io/protocols/resumable-upload) 1.0.0. For resumable downloads, we then use normal [HTTP Range](https://datatracker.ietf.org/doc/html/rfc2616#section-14.35.1) headers to seek and resume while downloading. @@ -234,10 +236,12 @@ to make sure the detailed thumbnail streams in and is viewed as rapidly as possi * Requires client to have a WebRTC stack * A suitable SFU still doesn’t exist yet * Transfer files out of band using a protocol which already provides streaming transfers (e.g. IPFS?) -* Could use tus.io as an almost-standard format for HTTP resumable uploads (PATCH + Upload-Offset headers) instead, - although tus servers don't seem to stream. * Could use ChaCha20-Poly1305 rather than AES-GCM, but no native webcrypto impl yet: https://github.com/w3c/webcrypto/issues/223 * See also https://soatok.blog/2020/05/13/why-aes-gcm-sucks/ and https://andrea.corbellini.name/2023/03/09/authenticated-encryption/ +* We could use YouTube's resumable upload API via `Content-Range` headers from + https://developers.google.com/youtube/v3/guides/using_resumable_upload_protocol, but having implemented both it and + tus, tus feels inordinately simpler and less fiddly. YouTube is likely to be well supported by proxies etc, but if + tus is ordained by the HTTP IETF WG, then it should be well supported too. ## Security considerations From 87590f2e3dd330e142382a9f9494d2c9811cd1c8 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Fri, 26 Jul 2024 14:08:51 +0100 Subject: [PATCH 19/20] reference key-committing attacks --- proposals/4016-streaming-e2ee-file-transfer.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 95fb9f104a0..07a59becdf1 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -245,6 +245,10 @@ to make sure the detailed thumbnail streams in and is viewed as rapidly as possi ## Security considerations +* AES-GCM is not key-committing, so removing hashes on the event means: + * the key committing attacks are all about an adversary which constructs a ciphertext C with multiple ((IV1, K1), (IV2, K2), ...) so that C decrypts to P1, P2, ... at the same time + * given that AES GCM is specifically not key committing, we introduce this attack. + * (thanks to @dkasak for pointing this out) * Variable size blocks could leak metadata for VBR audio. Mitigation is to use CBR if you care about leaking voice traffic patterns (constant size blocks isn’t necessarily enough, as you’d still leak the traffic patterns) * Is encrypting a sequence number in block header (with authenticated encryption) sufficient to mitigate reordering From 3a5e682b70a56528bb2879e36700920e55fcad88 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 22 Oct 2024 00:43:47 +0200 Subject: [PATCH 20/20] clarify security concerns --- proposals/4016-streaming-e2ee-file-transfer.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/4016-streaming-e2ee-file-transfer.md b/proposals/4016-streaming-e2ee-file-transfer.md index 07a59becdf1..da7ee632a69 100644 --- a/proposals/4016-streaming-e2ee-file-transfer.md +++ b/proposals/4016-streaming-e2ee-file-transfer.md @@ -121,8 +121,7 @@ The proposed v3 `EncryptedFile` block looks like: N.B. there is no longer a `hashes` key, as AES-GCM includes its own hashing to enforce the integrity of the file transfer. Therefore we can authenticate the transfer by the fact we can decrypt it using its key & IV (unless an -attacker who controls the same key & IV has substituted it for another file - but the benefit to them of doing so is -questionable). +attacker who controls the same key & IV has substituted it for another file - see Security Considerations below) We split the file stream into blocks of AES-256-GCM, with the following simple framing: @@ -261,8 +260,9 @@ to make sure the detailed thumbnail streams in and is viewed as rapidly as possi * Removing the `hashes` entry on the EncryptedFile description means that an attacker who controls the key & IV of the original file transfer could strategically substitute the file contents. This could be desirable for CDGs wishing to switch a file for a sanitised version without breaking the Matrix event hashes. For other scenarios it could be - undesirable. An alternative might be for the sender to keep sending new hashes in related matrix events as the - stream uploads, but it's unclear if this is worth it. + undesirable - for instance, a malicious server could serve different file contents to other users or servers to evade + moderation. An alternative might be for the sender to keep sending new hashes in related matrix events as the + stream uploads (but it's unclear if this is worth it, relative to MSC3888) ## Conclusion