Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler can silently generate a bad image when there's 0xFFFF parameters in the module #94892

Open
jbevain opened this issue Jun 24, 2023 · 29 comments
Labels
area-System.Reflection.Metadata bug design-discussion Ongoing discussion about design without consensus
Milestone

Comments

@jbevain
Copy link
Contributor

jbevain commented Jun 24, 2023

Version Used:
.NET 7.0.400-preview.23225.8

Steps to Reproduce:

Started as an investigation of jbevain/cecil#913

  1. Compile this absolute unit of a compilation unit as the single file of a library project: https://gist.github.com/jbevain/9599e655adad445a6ee447add305a8bc
  2. Reference the library in a console project:
using System.Reflection;

var a = Assembly.LoadFile(typeof(P3).Assembly.Location);
foreach (var t in a.GetTypes())
{
    Console.WriteLine(t.FullName);
    foreach (var m in t.GetMethods(BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance | BindingFlags.Static))
    {
        Console.WriteLine(m.Name);
        foreach (var p in m.GetParameters())
        {
            Console.WriteLine(p.Name);
        }
    }
}
  1. dotnet run the console project:

Expected Behavior:
Should print:

...
F_65534
a

Actual Behavior:

...
F_65534
Unhandled exception. System.BadImageFormatException: File is corrupt. (0x8013110E)
   at System.Reflection.MetadataImport._Enum(IntPtr scope, Int32 type, Int32 parent, MetadataEnumResult& result)
   at System.Reflection.RuntimeParameterInfo.GetParameters(IRuntimeMethodInfo methodHandle, MemberInfo member, Signature sig, ParameterInfo& returnParameter, Boolean fetchReturnParameter)
   at System.Reflection.RuntimeMethodInfo.GetParameters()
   at Program.<Main>$(String[] args) in C:\tmp\ReproFFFF\exe\Program.cs:line 10

Note that the last row of the Method table has a Param of 0x80000000 which is unexpected:

Screenshot 2023-06-23 at 5 50 21 PM
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jun 24, 2023
@jbevain
Copy link
Contributor Author

jbevain commented Jun 24, 2023

Michal pointed at the similarity with his old issue: dotnet/coreclr#20865

@FSDKO
Copy link

FSDKO commented Nov 16, 2023

I found a similar error but with fields: #94890

@jbevain
Copy link
Contributor Author

jbevain commented Nov 16, 2023

@FSDKO yes, everything that is represented as a “range” in metadata (Fields, Methods, Parameters) has this issue.

@AlekseyTs AlekseyTs transferred this issue from dotnet/roslyn Nov 17, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 17, 2023
@ghost
Copy link

ghost commented Nov 17, 2023

Tagging subscribers to this area: @dotnet/area-system-reflection
See info in area-owners.md if you want to be subscribed.

Issue Details

Version Used:
.NET 7.0.400-preview.23225.8

Steps to Reproduce:

Started as an investigation of jbevain/cecil#913

  1. Compile this absolute unit of a compilation unit as the single file of a library project: https://gist.github.com/jbevain/9599e655adad445a6ee447add305a8bc
  2. Reference the library in a console project:
using System.Reflection;

var a = Assembly.LoadFile(typeof(P3).Assembly.Location);
foreach (var t in a.GetTypes())
{
    Console.WriteLine(t.FullName);
    foreach (var m in t.GetMethods(BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance | BindingFlags.Static))
    {
        Console.WriteLine(m.Name);
        foreach (var p in m.GetParameters())
        {
            Console.WriteLine(p.Name);
        }
    }
}
  1. dotnet run the console project:

Expected Behavior:
Should print:

...
F_65534
a

Actual Behavior:

...
F_65534
Unhandled exception. System.BadImageFormatException: File is corrupt. (0x8013110E)
   at System.Reflection.MetadataImport._Enum(IntPtr scope, Int32 type, Int32 parent, MetadataEnumResult& result)
   at System.Reflection.RuntimeParameterInfo.GetParameters(IRuntimeMethodInfo methodHandle, MemberInfo member, Signature sig, ParameterInfo& returnParameter, Boolean fetchReturnParameter)
   at System.Reflection.RuntimeMethodInfo.GetParameters()
   at Program.<Main>$(String[] args) in C:\tmp\ReproFFFF\exe\Program.cs:line 10

Note that the last row of the Method table has a Param of 0x80000000 which is unexpected:

Screenshot 2023-06-23 at 5 50 21 PM
Author: jbevain
Assignees: -
Labels:

bug, area-System.Reflection

Milestone: -

@AlekseyTs AlekseyTs added area-System.Reflection.Metadata and removed untriaged New issue has not been triaged by the area owner labels Nov 17, 2023
@ghost
Copy link

ghost commented Nov 17, 2023

Tagging subscribers to this area: @dotnet/area-system-reflection-metadata
See info in area-owners.md if you want to be subscribed.

Issue Details

Version Used:
.NET 7.0.400-preview.23225.8

Steps to Reproduce:

Started as an investigation of jbevain/cecil#913

  1. Compile this absolute unit of a compilation unit as the single file of a library project: https://gist.github.com/jbevain/9599e655adad445a6ee447add305a8bc
  2. Reference the library in a console project:
using System.Reflection;

var a = Assembly.LoadFile(typeof(P3).Assembly.Location);
foreach (var t in a.GetTypes())
{
    Console.WriteLine(t.FullName);
    foreach (var m in t.GetMethods(BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance | BindingFlags.Static))
    {
        Console.WriteLine(m.Name);
        foreach (var p in m.GetParameters())
        {
            Console.WriteLine(p.Name);
        }
    }
}
  1. dotnet run the console project:

Expected Behavior:
Should print:

...
F_65534
a

Actual Behavior:

...
F_65534
Unhandled exception. System.BadImageFormatException: File is corrupt. (0x8013110E)
   at System.Reflection.MetadataImport._Enum(IntPtr scope, Int32 type, Int32 parent, MetadataEnumResult& result)
   at System.Reflection.RuntimeParameterInfo.GetParameters(IRuntimeMethodInfo methodHandle, MemberInfo member, Signature sig, ParameterInfo& returnParameter, Boolean fetchReturnParameter)
   at System.Reflection.RuntimeMethodInfo.GetParameters()
   at Program.<Main>$(String[] args) in C:\tmp\ReproFFFF\exe\Program.cs:line 10

Note that the last row of the Method table has a Param of 0x80000000 which is unexpected:

Screenshot 2023-06-23 at 5 50 21 PM
Author: jbevain
Assignees: -
Labels:

bug, area-System.Reflection, area-System.Reflection.Metadata

Milestone: -

@AlekseyTs
Copy link
Contributor

Probably the same as #94890, but with parameters.

@MichalStrehovsky
Copy link
Member

@AlekseyTs There was in the past interest to work around this in the Roslyn compiler itself by generating a dummy field/method. If we do a file format change to fix this file format bug, it would very likely not be backported to desktop CLR.

Ref: dotnet/coreclr#20865 (comment)

@AlekseyTs
Copy link
Contributor

@MichalStrehovsky

There was in the past interest to work around this in the Roslyn compiler itself by generating a dummy field/method.

I do not think compilers would want to do that. In my opinion, if System.Reflection.Metadata is required to use 2bytes size for scenarios that it cannot represent correctly, then it should be responsible for adding those dummy rows to overcome the 2byte requirement. It has all the information necessary to detect the edge cases.

@AlekseyTs
Copy link
Contributor

It would also be interesting to see how native metadata writer was dealing with the situation.

@MichalStrehovsky
Copy link
Member

It would also be interesting to see how native metadata writer was dealing with the situation.

It doesn't.

The linked PR (dotnet/coreclr#20865) that didn't merge has the details. The file format spec has allowance to represent this situation but neither the writers nor the readers implement it. If we fix this bug by implementing the spec, it would be netcore only. That's why the referenced comment from Tomas calls for a separate Roslyn issue to emit a workaround method/field instead.

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Nov 20, 2023

That's why the referenced comment from Tomas calls for a separate Roslyn issue to emit a workaround method/field instead.

Implementing the workaround in System.Reflection.Metadata will have the same effect, and, I think, it will be a much better place to do that.

@MichalStrehovsky
Copy link
Member

All of our metadata writers give callers full control over emitted tokens. Emitting extra tokens behind user's back would likely just replace a bug with a new bug. The ECMA-335 metadata formats are used for things that are not .NET programs (winmd, toc files, IBC profile data, coff object files to be used with link.exe and metadata merging etc.). The metadata writer doesn't know what it's s actually emitting and how legal a dummy would be. I don't think it can emit a dummy.

Emitting a dummy set of tokens from S.R.Metadata behind the back may even break Roslyn's EnC scenarios (but I'm not authoritave on EnC).

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Nov 20, 2023

The metadata writer doesn't know what it's s actually emitting and how legal a dummy would be. I don't think it can emit a dummy.

Isn't not emitting the dummy definitely wrong, for any kind of output? I am specifically talking about scenarios when there is a need to refer past the end of the table.

Do we know of any scenario for which not emitting a dummy is desired? BTW, EnC scenario, is compiler scenario as well.

@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Nov 20, 2023

The metadata writer doesn't know what it's s actually emitting and how legal a dummy would be. I don't think it can emit a dummy.

Isn't not emitting the dummy definitely wrong, for any kind of output? I am specifically talking about scenarios when there is a need to refer past the end of the table.

The proper fix is to implement the file format spec which allows representing this situation differently. It is what the linked PR was trying to do in the native writer. There would be a similar fix for S.R.Metadata reader and writer. It would be netcore only. Roslyn would need to emit a workaround token if it still cares about desktop because desktop is very unlikely to get a fix in its reader.

For desktop CLR, such netcore-only fix wouldn't make things any worse than they are now because without the fix, we would have a record that is not readable, and with the fix, we would have a record that is not readable either. (See the PR description on how that's achieved.)

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Nov 20, 2023

It looks like we are going in circle. The questions that I am asking are not getting answered. At the moment, I am not convinced that pushing the fix to compilers is the right thing to do. And the fix in S.R.Metadata looks like a viable alternative to me. I am happy to continue discussion off-line (Teams meeting/call, etc).

@MichalStrehovsky
Copy link
Member

I know what you mean - you want to replace a bug in S.R.Metadata (emitting assemblies that are not readable) with another bug (emitting assemblies that have tokens the user didn't ask for). You are asking for this because Roslyn doesn't care about the other bug (if we rule out EnC is not affected, which I'm not sure about). S.R.Metadata does care about the other bug too. Such fix would not be acceptable for the other non-Roslyn scenarios where S.R.Metadata is used.

We could consider a flag to metadata writer called "EmitWorkaroundTokensThereMaybeBugs" to allow emitting workaround tokens. If Roslyn opts into this and runs into bugs due to that (netmodules generated like that might not be mergable with link.exe due to duplicate workaround methods, for example), it will be Roslyn bugs. I don't think it fixes much.

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Nov 20, 2023

If you are proposing going with a format change instead and compilers emitting a dummy token only for the benefit a legacy desktop metadata reader, That is probably an option. But, then, I think the S.R.Metadata should be fixed first, and then we in Compilers can decide if we care about the legacy case, which is broken today anyway.

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Nov 20, 2023

you want to replace a bug in S.R.Metadata (emitting assemblies that are not readable) with another bug (emitting assemblies that have tokens the user didn't ask for).

I do not think I agree with this interpretation. Or, if we call the workaround "another bug". I do not see why "another bug" in Compilers is better than "another bug" in S.R.Metadata.

Such fix would not be acceptable for the other non-Roslyn scenarios where S.R.Metadata is used.

I asked for a single concrete example that would support this statement. So far, none were given. Until then, I remain very skeptical that this is actually the case and assume that the statement is false.

@MichalStrehovsky
Copy link
Member

I do not see why "another bug" in Compilers is better than "another bug" in S.R.Metadata.

I don't believe the contract of the C# compiler is to generate 1:1 type/method mapping with the input C# source code or there would be no RefSafetyRulesAttribute type injected into Roslyn outputs, for example. It is however the contract of System.Reflection.Metadata because the API is a low level API operating on tokens.

I asked for a single concrete example that would support this statement. So far, none were given.

I gave concrete examples where the ECMA-335 file format is used for purposes that are not ".NET program" above ("winmd, toc files, IBC profile data, coff object files"). The file format is a convenient way to represent various things that are not "executable .NET programs" and we cannot make an assumption that adding an extra type would be harmless. We already have uses within Microsoft where this is invalid or requires special care and that doesn't count customer uses that we don't know about.

WinMD is a file format based on ECMA-335. A dummy that is valid to emit in a WinMD looks different from a dummy for a .NET program (I don't actually know a dummy would look so that it doesn't upset tools that read WinMD). S.R.Metadata writer doesn't have much awareness about whether it's emitting WinMD. WinMD is just an example where someone took a .NET file format and used it for non-.NET things.

A TOC file looks like an assembly, but it's an input to a compiler, not a runnable thing. It provides information about how an assembly got compiled into a shared native library. It contains a subset of an existing assembly that got compiled to native code. If this subset happens to hit the 0xFFFF case and the metadata writer inserts a type/method that didn't exist in the original assembly (original that had more than 0xFFFF methods and didn't need a dummy), the TOC no longer matches the input assembly. Whether the reader will be able to handle it or crash is up to chance.

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Nov 21, 2023

None of these are concrete examples. A concrete example is a specific scenario. This is what we do today and it works great. This is how it would look like with the dummy and this is how it breaks. With details where it breaks and why that part cannot be adjusted to not break.

@AlekseyTs
Copy link
Contributor

Even if we decide that the behavior of S.R.Metadata cannot be changed unconditionally, there is always an option to make it conditional and be controllable by a client. One of the purposes of this reusable component is to shield clients from internal details of column sizes, etc. I think that it is not appropriate for compilers to worry about such things and be responsible for duplicating internal logic of that reusable component. In my opinion this complexity doesn't belong in compilers.

Note, at the moment, I am not even convinced that the suggested workaround (with the dummy) is appropriate for compiler scenarios. That is another reason why I am not comfortable with implementing it in compilers. For example, when a .NET module is produced, compiler is constrained in terms of what extra types it embeds. Presence of those can cause unexpected failures when multimodule assemblies are produced, or multiple modules are merged into a single assembly.

@ericstj ericstj added this to the 9.0.0 milestone Nov 28, 2023
@ericstj ericstj added the design-discussion Ongoing discussion about design without consensus label Nov 28, 2023
@MichalStrehovsky
Copy link
Member

Another example where S.R.Metadata injecting random things into the output would likely break is the https://github.com/microsoft/win32metadata project. It's another example of a project using ECMA-335 file format to store things that are not executable .NET programs and have its own rules (like visibility/accessibility being meaningless because this format pretty much encodes a C header file).

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Nov 29, 2023

Another example where S.R.Metadata injecting random things into the output would likely break is ...

I guess what I am still having a hard time to understand is how already broken scenarios (one cannot read the metadata back) can be broken by a fix. The proposed workaround might not be considered as a "fix" for them (i.e. they remain broken, but in a different way), that I can understand.

@MichalStrehovsky
Copy link
Member

The proposed workaround might not be considered as a "fix" for them (i.e. they remain broken, but in a different way), that I can understand.

They may not be broken. The ECMA-335 file format does have a provision for this, it's just S.R.Metadata and the native reader/writer in CoreCLR wasn't implemented to support what the spec says. Win32Metadata is used by multiple other projects. E.g. one of them is a Rust repo that has a metadata reader written in Rust. Maybe that one implements the spec and would handle it. If a reader implements the spec, it would work. Spec has a provision for it (put 0 for the method list instead of num_methods+1).

We'd need to update our readers to support this because they failed to implement the spec.

@AlekseyTs
Copy link
Contributor

Spec has a provision for it (put 0 for the method list instead of num_methods+1).

Could you provide a specific quote and where the provision can be found? For example, I was able to find the following in ECMA-335 "II.22.37 TypeDef : 0x02" section. And it looks like putting 0 would be a violation of the spec.

  • FieldList (an index into the Field table; it marks the first of a contiguous run of Fields owned by this Type). The run continues to the smaller of:
    • the last row of the Field table
    • the next run of Fields, found by inspecting the FieldList of the next row in this TypeDef table
  • MethodList (an index into the MethodDef table; it marks the first of a continguous run of Methods owned by this Type). The run continues to the smaller of:
    • the last row of the MethodDef table
    • the next run of Methods, found by inspecting the MethodList of the next row in this TypeDef table

@MichalStrehovsky
Copy link
Member

"Indexes to tables begin at 1, so index 1 means the first row in any given metadata table. (An index
value of zero denotes that it does not index a row at all; that is, it behaves like a null reference.)"

@MichalStrehovsky
Copy link
Member

Combined with "18. MethodList can be null or non-null"

@AlekseyTs
Copy link
Contributor

Thank you for the quotes. Well, I am not surprised that we ended up with the current implementation. Different sections of the spec are, essentially, contradict each other. Or, perhaps, some of the sections are incomplete, while they could easily be interpreted as specifying complete set of possible values.

@steveharter
Copy link
Member

Since the issue with 0xFFFF applies to parameters, fields and methods we should just use this issue.

Other issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Reflection.Metadata bug design-discussion Ongoing discussion about design without consensus
Projects
None yet
Development

No branches or pull requests

7 participants