Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryStream / TryGetBuffer and writing analyzers/fix providers #38093

Open
nschuessler opened this issue Jun 18, 2020 · 2 comments
Open

MemoryStream / TryGetBuffer and writing analyzers/fix providers #38093

nschuessler opened this issue Jun 18, 2020 · 2 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity
Milestone

Comments

@nschuessler
Copy link

nschuessler commented Jun 18, 2020

Background and Motivation

Mentioning @stephentoub as he seems to be on many of the MemoryStream topics.

I see that MemoryStream has a lot of discussion traffic (e.g. here), so add mine to the pile.

We often encounter egregious use of memory and track it down to use of MemoryStream.ToArray. It often piles things up unnecessarily on our LOH. In fact, double because of the copying.

We have written a diagnostic to flag all occurrences of it and try to review the usage to reduce the double allocations. Recently I learned about the TryGetBuffer api, but using it as a replacement is troublesome.

Our rule has two main criticisms/concerns:

  1. It flags instances that cannot be improved.
  2. If people try to improve it, they often don't understand the complexities of getting the underlying buffer in the MemoryStream, fail, and will be less likely to make performance improvements from our analyzers in the future.

My API proposal is we need (at the minimum):

ReadOnlySpan<byte> GetBufferSpan();

which always succeeds. If the MemoryStream.TryGetBuffer would return false, then it should do memoryStream.ToArray() and return it as an ArraySegment.

This is then a drop in replacement for memoryStream.ToArray()

The complexity of MemoryStream / API makes it hard to write analyzers and fix providers for it.

The logic to determine if MemoryStream.ToArray can use TryGetBuffer requires tracing the MemoryStream instance back through function calls to the source to find if it was created with one of the constructors that will work. This is quite complicated.

TryGetBuffer is not a drop in replacement for ToArray because it can fail, and when it does, it always will. Autogenerating error condition paths and changing variable scopes is complicated. See this "simple" example:

using System.IO;
using System.Text;
class Foo
{
    public string Bar()
    {
        using (MemoryStream memoryStream = new MemoryStream())
        {
            // Write to stream ...
            string jsonString = Encoding.UTF8.GetString(memoryStream.ToArray());
            memoryStream.Close();
            return jsonString;
        }
    }
}

This requires promoting the jsonString declaration above the memoryStream.ToArray and generating an else condition on error.

The other possibility is generating our own extension method:

ReadOnlySpan<byte>   GetBufferSpan(this MemoryStream)

That has this logic in it and force insert a new code file into every solution / project where it doesn't seem to be defined already.

Proposed API

public class MemoryStream
{
       public ArraySegment<byte> GetBufferSegment();
       public ReadOnlySpan<byte> GetBufferSpan();
}

Usage Examples

It would nice to be able to the following:

using (var memoryStream = new MemoryStream())
{
     // Write to memory Stream.

    string jsonString = System.Text.Encoding.UTF8.GetString(memoryStream.GetBufferSpan());
}

Side note: I see System.Text.Encoding takes a ReadOnlySpan as an argument to GetString and not ArraySegment. There is also a new class like Memory<T> as well.

So we need to do conversions now if we have an ArraySegment ?

Alternative Designs

It's unclear to me the need for the 'publicly' visible complication of the MemoryStream class.
i.e. If you supply a buffer and publiclyVisible is false, how do you get the used part of the buffer you passed in?

But to make this simple perhaps separate out the publiclyVisible: false functionality into a subclass or different stream implementation so analyzers can easily find MemoryStreams that can be "fixed" and ones that cannot.

public class FixedBufferMemoryStream : Stream
{
     public FixedBufferMemoryStream(byte[] buffer);
     public FixedBufferMemoryStream(ArraySegment<byte> buffer);

}

Then analyzers can quickly tell if the the stream.ToArray() can be fixed or not:


InvocationExpressionSyntax invocation = ...; // Points to memoryStream.ToArray()
INamedTypeSymbol streamType = semanticModel.Compilation.GetTypeByMetadataName("System.IO.MemoryStream");   
TypeInfo invokedType = semanticModel.GetTypeInfo(invocation.Expression);

if (invokedType.Inherits(streamType))
{
    // I can fix it
}
else
{
    // I cannot fix it
}

Risks

Splitting class functionality will of course break some portion of the API audience.

@nschuessler nschuessler added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 18, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.IO untriaged New issue has not been triaged by the area owner labels Jun 18, 2020
@carlossanlop carlossanlop added this to the Future milestone Jun 29, 2020
@GSPP
Copy link

GSPP commented Jul 3, 2020

Naming suggestion:

GetBufferSegment => GetBufferAsSegment
GetBufferSpan => GetBufferAsSpan

There is no such thing as a "BufferSegment". As indicated that these methods are differentiated by the format of the data.

@jozkee jozkee removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020
Copy link
Contributor

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

@dotnet-policy-service dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity
Projects
None yet
Development

No branches or pull requests

5 participants