Add a feature flag to disable extra debug info #595

capatoles · 2022-10-20T09:13:20Z

The issue is in bincode_derive crate, where we are adding extra debug info that are leaking enum names in the final binary (which imply less optimization with a bigger binary size, and internal structure leak). I think we should use a proper feature flag to disable this. I'm working on a pull request to address that issue.

It seems there is only one leak in the code in this part of the code. I agree this debug line can be important, but should be optional for release build if required. A feature flag could be used to address that situation, and maybe it is useful for other cases like in #594.

This allows - More optimization (smallest final binary size) - No internal code structure leak (no enum name leak).

capatoles · 2022-10-20T10:14:03Z

The PR is ready: #596

ZoeyR · 2022-10-20T11:04:01Z

Using a feature flag to control this is a non-starter since setting the feature would infect all other uses of bincode_derive in the build tree. Additionally I would need to see concrete data proving that this is creating unacceptable binary bloat. There are many sources for extra strings in rust binaries and I doubt this is a large one.

capatoles · 2022-10-20T13:44:43Z

There are many sources for extra strings in rust binaries and I doubt this is a large one.

As mentioned in #596, you can find a minimal reproducible code below. The code in question is in the Decode part of the code, so you need to call a decode_* operation in your minimal example. For example:

fn main() {
    let bytes = bincode::encode_to_vec(Foo::Bar, bincode::config::standard()).map_err(|_| ()).unwrap();
    println!("{bytes:?}");
    let (_, _): (Foo, _) = bincode::decode_from_slice(&bytes, bincode::config::standard()).unwrap();
}

#[derive(bincode::Encode, bincode::Decode)]
enum Foo {
    Bar
}

Which gives (Foo in the result of the strings command):

$ cargo build --release
$ strip errors-and-strings
$ strings errors-and-strings | grep "Foo"
Foolibrary/alloc/src/raw_vec.rscapacity overflowlibrary/alloc/src/ffi/c_str.rs

You can also check that with the cargo-expand crate:

#![feature(prelude_import)]
#[prelude_import]
use std::prelude::rust_2021::*;

#[macro_use]
extern crate std;
fn main() {
    let bytes = bincode::encode_to_vec(Foo::Bar, bincode::config::standard())
        .map_err(|_| ())
        .unwrap();
    {
        ::std::io::_print(
            ::core::fmt::Arguments::new_v1(
                &["", "\n"],
                &[::core::fmt::ArgumentV1::new_debug(&bytes)],
            ),
        );
    };
    let (_, _): (Foo, _) = bincode::decode_from_slice(
            &bytes,
            bincode::config::standard(),
        )
        .unwrap();
}

enum Foo {
    Bar,
}

impl ::bincode::Encode for Foo {
    fn encode<__E: ::bincode::enc::Encoder>(
        &self,
        encoder: &mut __E,
    ) -> core::result::Result<(), ::bincode::error::EncodeError> {
        match self {
            Self::Bar => {
                <u32 as ::bincode::Encode>::encode(&(0u32), encoder)?;
                Ok(())
            }
        }
    }
}

impl ::bincode::Decode for Foo {
    fn decode<__D: ::bincode::de::Decoder>(
        decoder: &mut __D,
    ) -> core::result::Result<Self, ::bincode::error::DecodeError> {
        let variant_index = <u32 as ::bincode::Decode>::decode(decoder)?;
        match variant_index {
            0u32 => Ok(Self::Bar {}),
            variant => {
                Err(::bincode::error::DecodeError::UnexpectedVariant {
                    found: variant,
                    type_name: "Foo",
                    allowed: &::bincode::error::AllowedEnumVariants::Range {
                        min: 0,
                        max: 0,
                    },
                })
            }
        }
    }
}

impl<'__de> ::bincode::BorrowDecode<'__de> for Foo {
    fn borrow_decode<__D: ::bincode::de::BorrowDecoder<'__de>>(
        decoder: &mut __D,
    ) -> core::result::Result<Self, ::bincode::error::DecodeError> {
        let variant_index = <u32 as ::bincode::Decode>::decode(decoder)?;
        match variant_index {
            0u32 => Ok(Self::Bar {}),
            variant => {
                Err(::bincode::error::DecodeError::UnexpectedVariant {
                    found: variant,
                    type_name: "Foo",
                    allowed: &::bincode::error::AllowedEnumVariants::Range {
                        min: 0,
                        max: 0,
                    },
                })
            }
        }
    }
}

In the same idea, your debug file in ./target contains the string:

impl :: bincode :: Decode for Foo
{
    fn decode < __D : :: bincode :: de :: Decoder > (decoder : & mut __D) ->
    core :: result :: Result < Self, :: bincode :: error :: DecodeError >
    {
        let variant_index = < u32 as :: bincode :: Decode > :: decode(decoder)
        ? ; match variant_index
        {
            0u32 => Ok(Self :: Bar {}), variant =>
            Err(:: bincode :: error :: DecodeError :: UnexpectedVariant
            {
                found : variant, type_name : "Foo", allowed : & :: bincode ::
                error :: AllowedEnumVariants :: Range { min : 0, max : 0 }
            })
        }
    }
} impl < '__de > :: bincode :: BorrowDecode < '__de > for Foo
{
    fn borrow_decode < __D : :: bincode :: de :: BorrowDecoder < '__de > >
    (decoder : & mut __D) -> core :: result :: Result < Self, :: bincode ::
    error :: DecodeError >
    {
        let variant_index = < u32 as :: bincode :: Decode > :: decode(decoder)
        ? ; match variant_index
        {
            0u32 => Ok(Self :: Bar {}), variant =>
            Err(:: bincode :: error :: DecodeError :: UnexpectedVariant
            {
                found : variant, type_name : "Foo", allowed : & :: bincode ::
                error :: AllowedEnumVariants :: Range { min : 0, max : 0 }
            })
        }
    }
}

Using a feature flag to control this is a non-starter since setting the feature would infect all other uses of bincode_derive in the build tree.

Our objective is with a trigger (e.g. like here, a feature) we want to disable any enum name leak, and if it propagates to any crate using bincode_derive it will improve even further the optimization. Also, we speak here of a non-default feature (that just allows maximum optimization of binary size).

Additionally I would need to see concrete data proving that this is creating unacceptable binary bloat.

For us the issue is regarding optimization (binary size, which will be increased for each enum in the code with the Decode trait) and regarding leaking internal enum names (which are not ok for some domain, like the one I am working in).

ZoeyR · 2022-10-20T14:12:49Z

Again, I will need to see concrete data on how much this affects overall binary size before I would consider this to be a bloat issue. Bincode is a general purpose library and not designed for situations where you need absolutely every string possible removed from the final binary. Even on microcontrollers, which I do work on, this is not an issue.

As for the leaking of enum names being a problem. Please elaborate on how this in particular is an issue. Rust crates work by building against the source of other crates. The source would also leak the enum name.

ZoeyR · 2022-10-20T14:18:28Z

Additionally try throwing away the error from your decode unwrap statement. The optimizer should get rid of the error since it won't be printed to stdout.

capatoles · 2022-10-21T08:17:24Z

Again, I will need to see concrete data on how much this affects overall binary size before I would consider this to be a bloat issue. Bincode is a general purpose library and not designed for situations where you need absolutely every string possible removed from the final binary. Even on microcontrollers, which I do work on, this is not an issue.

Indeed it may not be a problem for most of the use cases. What make me afraid is that the added code is proportional to the number of enum that derive the Decode trait (i.e. if we have 100 enum we have 100 strings added).

As for the leaking of enum names being a problem. Please elaborate on how this in particular is an issue. Rust crates work by building against the source of other crates. The source would also leak the enum name.

For my use case, I build a proprietary binary (top of the dependency tree), which is not meant to be built against any other crate .

Additionally try throwing away the error from your decode unwrap statement. The optimizer should get rid of the error since it won't be printed to stdout.

Even without used any unwrap, the strings are still even in release mode. I can get rid of them in some cases with extra compiler optimization, but it do not work every time.

VictorKoenders · 2022-10-21T08:37:30Z

if we have 100 enum we have 100 strings added

Realistically your enum variant names shouldn't be more than 20 characters. If you have an application that has over 100 enums, how much of an impact is 2kb of your final binary size going to matter? Especially because all of your log lines will probably be a significantly larger impact on your binary size.

And again, you can get around this by simply dropping the error from any bincode invocation. Making something like

enum YourError {
    Bincode,
    // ...
}
let _ = bincode::decode_from_slice(..).map_err(|_| YourError::Bincode)?;

In combination with LTO will get rid of all information you're worried about leaking.

If you're not using logging or custom error types, I can highly recommend adding that, especially from the importance of the application that it sounds like you're making.

which is not meant to be built against any other crate .

Is bincode the only crate you're building against? There are many, many good crates out there that will make your life a lot easier. I can highly recommending using more of them.

capatoles · 2022-10-21T09:48:20Z

We tested your solution, and in a minimal environment it works, but in our more complex and main project it doesn't seem to work. Maybe LTO is failing to optimize this in a more complex project? In our project we use other crate (than bincode), but we made sure there was no information leak and maximum optimization. When not possible, we just use a feature flag to allow more optimization, since all cases of information leaks we found were for debug purpose. So, we just made a feature flag to enable maximum optimization (losing debug information), and when developing we just have everything we need. For us it will be painful to have to maintain a parallel version for a small line of code, do you think it's possible to find a solution, or there is no other option for us? Maybe a more generic flag could work, by also solving other issues like #594 (like maximum-optimization)?

VictorKoenders · 2022-10-21T09:53:54Z

#594 is not a optimization issue, but an issue about cluttering the target directory. This is fixed in bincode-org/virtue#31 with moving the files to the /target/generated/bincode/ folder.

As Zoey said, feature flags are not a good solution for this, and I don't see us merging any PR that adds a feature that disables functionality.

capatoles · 2022-10-21T13:30:12Z

#594 is not a optimization issue, but an issue about cluttering the target directory.

I was more thinking about a "release" mode where you don't have to output generated code in files (#594), and optimize at a maximum binary size and internal structure leak (this issue). But I understand your point.

As Zoey said, feature flags are not a good solution for this, and I don't see us merging any PR that adds a feature that disables functionality.

So for you, is there any solution other than have to maintain our own version of bincode for this single line change?

ZoeyR · 2022-10-21T15:08:08Z

I would argue that if the security and obfuscation of your proprietary application is this important all of your dependencies should be vendored and possibly patched. This level of binary obfuscation is not a common requirement in most applications.

capatoles · 2022-10-24T06:45:27Z

Ok, I understand. Thanks for your help

capatoles added a commit to capatoles/bincode that referenced this issue Oct 20, 2022

Add a feature to disable extra debug info (bincode-org#595)

2cf20b3

This allows - More optimization (smallest final binary size) - No internal code structure leak (no enum name leak).

capatoles mentioned this issue Oct 20, 2022

Add a feature to disable extra debug info (#595) #596

Closed

capatoles closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a feature flag to disable extra debug info #595

Add a feature flag to disable extra debug info #595

capatoles commented Oct 20, 2022

capatoles commented Oct 20, 2022

ZoeyR commented Oct 20, 2022

capatoles commented Oct 20, 2022 •

edited

Loading

ZoeyR commented Oct 20, 2022

ZoeyR commented Oct 20, 2022

capatoles commented Oct 21, 2022

VictorKoenders commented Oct 21, 2022

capatoles commented Oct 21, 2022

VictorKoenders commented Oct 21, 2022 •

edited

Loading

capatoles commented Oct 21, 2022

ZoeyR commented Oct 21, 2022

capatoles commented Oct 24, 2022

Add a feature flag to disable extra debug info #595

Add a feature flag to disable extra debug info #595

Comments

capatoles commented Oct 20, 2022

capatoles commented Oct 20, 2022

ZoeyR commented Oct 20, 2022

capatoles commented Oct 20, 2022 • edited Loading

ZoeyR commented Oct 20, 2022

ZoeyR commented Oct 20, 2022

capatoles commented Oct 21, 2022

VictorKoenders commented Oct 21, 2022

capatoles commented Oct 21, 2022

VictorKoenders commented Oct 21, 2022 • edited Loading

capatoles commented Oct 21, 2022

ZoeyR commented Oct 21, 2022

capatoles commented Oct 24, 2022

capatoles commented Oct 20, 2022 •

edited

Loading

VictorKoenders commented Oct 21, 2022 •

edited

Loading