Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a feature flag to disable extra debug info #595

Closed
capatoles opened this issue Oct 20, 2022 · 12 comments
Closed

Add a feature flag to disable extra debug info #595

capatoles opened this issue Oct 20, 2022 · 12 comments

Comments

@capatoles
Copy link

The issue is in bincode_derive crate, where we are adding extra debug info that are leaking enum names in the final binary (which imply less optimization with a bigger binary size, and internal structure leak). I think we should use a proper feature flag to disable this. I'm working on a pull request to address that issue.

It seems there is only one leak in the code in this part of the code. I agree this debug line can be important, but should be optional for release build if required. A feature flag could be used to address that situation, and maybe it is useful for other cases like in #594.

capatoles added a commit to capatoles/bincode that referenced this issue Oct 20, 2022
This allows
- More optimization (smallest final binary size)
- No internal code structure leak (no enum name leak).
@capatoles
Copy link
Author

The PR is ready: #596

@ZoeyR
Copy link
Collaborator

ZoeyR commented Oct 20, 2022

Using a feature flag to control this is a non-starter since setting the feature would infect all other uses of bincode_derive in the build tree. Additionally I would need to see concrete data proving that this is creating unacceptable binary bloat. There are many sources for extra strings in rust binaries and I doubt this is a large one.

@capatoles
Copy link
Author

capatoles commented Oct 20, 2022

There are many sources for extra strings in rust binaries and I doubt this is a large one.

As mentioned in #596, you can find a minimal reproducible code below. The code in question is in the Decode part of the code, so you need to call a decode_* operation in your minimal example. For example:

fn main() {
    let bytes = bincode::encode_to_vec(Foo::Bar, bincode::config::standard()).map_err(|_| ()).unwrap();
    println!("{bytes:?}");
    let (_, _): (Foo, _) = bincode::decode_from_slice(&bytes, bincode::config::standard()).unwrap();
}

#[derive(bincode::Encode, bincode::Decode)]
enum Foo {
    Bar
}

Which gives (Foo in the result of the strings command):

$ cargo build --release
$ strip errors-and-strings
$ strings errors-and-strings | grep "Foo"
Foolibrary/alloc/src/raw_vec.rscapacity overflowlibrary/alloc/src/ffi/c_str.rs

You can also check that with the cargo-expand crate:

#![feature(prelude_import)]
#[prelude_import]
use std::prelude::rust_2021::*;

#[macro_use]
extern crate std;
fn main() {
    let bytes = bincode::encode_to_vec(Foo::Bar, bincode::config::standard())
        .map_err(|_| ())
        .unwrap();
    {
        ::std::io::_print(
            ::core::fmt::Arguments::new_v1(
                &["", "\n"],
                &[::core::fmt::ArgumentV1::new_debug(&bytes)],
            ),
        );
    };
    let (_, _): (Foo, _) = bincode::decode_from_slice(
            &bytes,
            bincode::config::standard(),
        )
        .unwrap();
}

enum Foo {
    Bar,
}

impl ::bincode::Encode for Foo {
    fn encode<__E: ::bincode::enc::Encoder>(
        &self,
        encoder: &mut __E,
    ) -> core::result::Result<(), ::bincode::error::EncodeError> {
        match self {
            Self::Bar => {
                <u32 as ::bincode::Encode>::encode(&(0u32), encoder)?;
                Ok(())
            }
        }
    }
}

impl ::bincode::Decode for Foo {
    fn decode<__D: ::bincode::de::Decoder>(
        decoder: &mut __D,
    ) -> core::result::Result<Self, ::bincode::error::DecodeError> {
        let variant_index = <u32 as ::bincode::Decode>::decode(decoder)?;
        match variant_index {
            0u32 => Ok(Self::Bar {}),
            variant => {
                Err(::bincode::error::DecodeError::UnexpectedVariant {
                    found: variant,
                    type_name: "Foo",
                    allowed: &::bincode::error::AllowedEnumVariants::Range {
                        min: 0,
                        max: 0,
                    },
                })
            }
        }
    }
}

impl<'__de> ::bincode::BorrowDecode<'__de> for Foo {
    fn borrow_decode<__D: ::bincode::de::BorrowDecoder<'__de>>(
        decoder: &mut __D,
    ) -> core::result::Result<Self, ::bincode::error::DecodeError> {
        let variant_index = <u32 as ::bincode::Decode>::decode(decoder)?;
        match variant_index {
            0u32 => Ok(Self::Bar {}),
            variant => {
                Err(::bincode::error::DecodeError::UnexpectedVariant {
                    found: variant,
                    type_name: "Foo",
                    allowed: &::bincode::error::AllowedEnumVariants::Range {
                        min: 0,
                        max: 0,
                    },
                })
            }
        }
    }
}

In the same idea, your debug file in ./target contains the string:

impl :: bincode :: Decode for Foo
{
    fn decode < __D : :: bincode :: de :: Decoder > (decoder : & mut __D) ->
    core :: result :: Result < Self, :: bincode :: error :: DecodeError >
    {
        let variant_index = < u32 as :: bincode :: Decode > :: decode(decoder)
        ? ; match variant_index
        {
            0u32 => Ok(Self :: Bar {}), variant =>
            Err(:: bincode :: error :: DecodeError :: UnexpectedVariant
            {
                found : variant, type_name : "Foo", allowed : & :: bincode ::
                error :: AllowedEnumVariants :: Range { min : 0, max : 0 }
            })
        }
    }
} impl < '__de > :: bincode :: BorrowDecode < '__de > for Foo
{
    fn borrow_decode < __D : :: bincode :: de :: BorrowDecoder < '__de > >
    (decoder : & mut __D) -> core :: result :: Result < Self, :: bincode ::
    error :: DecodeError >
    {
        let variant_index = < u32 as :: bincode :: Decode > :: decode(decoder)
        ? ; match variant_index
        {
            0u32 => Ok(Self :: Bar {}), variant =>
            Err(:: bincode :: error :: DecodeError :: UnexpectedVariant
            {
                found : variant, type_name : "Foo", allowed : & :: bincode ::
                error :: AllowedEnumVariants :: Range { min : 0, max : 0 }
            })
        }
    }
}

Using a feature flag to control this is a non-starter since setting the feature would infect all other uses of bincode_derive in the build tree.

Our objective is with a trigger (e.g. like here, a feature) we want to disable any enum name leak, and if it propagates to any crate using bincode_derive it will improve even further the optimization. Also, we speak here of a non-default feature (that just allows maximum optimization of binary size).

Additionally I would need to see concrete data proving that this is creating unacceptable binary bloat.

For us the issue is regarding optimization (binary size, which will be increased for each enum in the code with the Decode trait) and regarding leaking internal enum names (which are not ok for some domain, like the one I am working in).

@ZoeyR
Copy link
Collaborator

ZoeyR commented Oct 20, 2022

Again, I will need to see concrete data on how much this affects overall binary size before I would consider this to be a bloat issue. Bincode is a general purpose library and not designed for situations where you need absolutely every string possible removed from the final binary. Even on microcontrollers, which I do work on, this is not an issue.

As for the leaking of enum names being a problem. Please elaborate on how this in particular is an issue. Rust crates work by building against the source of other crates. The source would also leak the enum name.

@ZoeyR
Copy link
Collaborator

ZoeyR commented Oct 20, 2022

Additionally try throwing away the error from your decode unwrap statement. The optimizer should get rid of the error since it won't be printed to stdout.

@capatoles
Copy link
Author

Again, I will need to see concrete data on how much this affects overall binary size before I would consider this to be a bloat issue. Bincode is a general purpose library and not designed for situations where you need absolutely every string possible removed from the final binary. Even on microcontrollers, which I do work on, this is not an issue.

Indeed it may not be a problem for most of the use cases. What make me afraid is that the added code is proportional to the number of enum that derive the Decode trait (i.e. if we have 100 enum we have 100 strings added).

As for the leaking of enum names being a problem. Please elaborate on how this in particular is an issue. Rust crates work by building against the source of other crates. The source would also leak the enum name.

For my use case, I build a proprietary binary (top of the dependency tree), which is not meant to be built against any other crate .

Additionally try throwing away the error from your decode unwrap statement. The optimizer should get rid of the error since it won't be printed to stdout.

Even without used any unwrap, the strings are still even in release mode. I can get rid of them in some cases with extra compiler optimization, but it do not work every time.

@VictorKoenders
Copy link
Contributor

if we have 100 enum we have 100 strings added

Realistically your enum variant names shouldn't be more than 20 characters. If you have an application that has over 100 enums, how much of an impact is 2kb of your final binary size going to matter? Especially because all of your log lines will probably be a significantly larger impact on your binary size.

And again, you can get around this by simply dropping the error from any bincode invocation. Making something like

enum YourError {
    Bincode,
    // ...
}
let _ = bincode::decode_from_slice(..).map_err(|_| YourError::Bincode)?;

In combination with LTO will get rid of all information you're worried about leaking.

If you're not using logging or custom error types, I can highly recommend adding that, especially from the importance of the application that it sounds like you're making.

which is not meant to be built against any other crate .

Is bincode the only crate you're building against? There are many, many good crates out there that will make your life a lot easier. I can highly recommending using more of them.

@capatoles
Copy link
Author

We tested your solution, and in a minimal environment it works, but in our more complex and main project it doesn't seem to work. Maybe LTO is failing to optimize this in a more complex project? In our project we use other crate (than bincode), but we made sure there was no information leak and maximum optimization. When not possible, we just use a feature flag to allow more optimization, since all cases of information leaks we found were for debug purpose. So, we just made a feature flag to enable maximum optimization (losing debug information), and when developing we just have everything we need. For us it will be painful to have to maintain a parallel version for a small line of code, do you think it's possible to find a solution, or there is no other option for us? Maybe a more generic flag could work, by also solving other issues like #594 (like maximum-optimization)?

@VictorKoenders
Copy link
Contributor

VictorKoenders commented Oct 21, 2022

#594 is not a optimization issue, but an issue about cluttering the target directory. This is fixed in bincode-org/virtue#31 with moving the files to the /target/generated/bincode/ folder.

As Zoey said, feature flags are not a good solution for this, and I don't see us merging any PR that adds a feature that disables functionality.

@capatoles
Copy link
Author

#594 is not a optimization issue, but an issue about cluttering the target directory.

I was more thinking about a "release" mode where you don't have to output generated code in files (#594), and optimize at a maximum binary size and internal structure leak (this issue). But I understand your point.

As Zoey said, feature flags are not a good solution for this, and I don't see us merging any PR that adds a feature that disables functionality.

So for you, is there any solution other than have to maintain our own version of bincode for this single line change?

@ZoeyR
Copy link
Collaborator

ZoeyR commented Oct 21, 2022

I would argue that if the security and obfuscation of your proprietary application is this important all of your dependencies should be vendored and possibly patched. This level of binary obfuscation is not a common requirement in most applications.

@capatoles
Copy link
Author

Ok, I understand. Thanks for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants