Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make more Opcodes hold a FieldType #40

Closed
wants to merge 14 commits into from
35 changes: 27 additions & 8 deletions src/bytecode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ use crate::constant_pool::{
use crate::constant_pool::{
ConstantPoolEntry, ConstantPoolEntryTypes, InvokeDynamic, Loadable, MemberRef,
};
use crate::descriptor::{FieldType, Ty};
use crate::{read_u1, read_u2, read_u4, ParseError};

pub type JumpOffset = i32;
Expand Down Expand Up @@ -44,7 +45,7 @@ pub enum Opcode<'a> {
Aastore,
AconstNull,
Aload(u16), // both wide and narrow
Anewarray(Cow<'a, str>),
Anewarray(FieldType<'a>),
Areturn,
Arraylength,
Astore(u16), // both wide and narrow
Expand All @@ -55,7 +56,7 @@ pub enum Opcode<'a> {
Breakpoint,
Caload,
Castore,
Checkcast(Cow<'a, str>),
Checkcast(FieldType<'a>),
D2f,
D2i,
D2l,
Expand Down Expand Up @@ -142,7 +143,7 @@ pub enum Opcode<'a> {
Impdep2,
Imul,
Ineg,
Instanceof(Cow<'a, str>),
Instanceof(FieldType<'a>),
Invokedynamic(InvokeDynamic<'a>),
Invokeinterface(MemberRef<'a>, u8),
Invokespecial(MemberRef<'a>),
Expand Down Expand Up @@ -187,7 +188,7 @@ pub enum Opcode<'a> {
Lxor,
Monitorenter,
Monitorexit,
Multianewarray(Cow<'a, str>, u8),
Multianewarray(FieldType<'a>, u8),
New(Cow<'a, str>),
Newarray(PrimitiveArrayType),
Nop,
Expand Down Expand Up @@ -633,11 +634,29 @@ fn read_opcodes<'a>(
};
Opcode::Newarray(primitive_type)
}
0xbd => Opcode::Anewarray(read_cp_classinfo(code, &mut ix, pool)?),
0xbd => Opcode::Anewarray({
let ty = read_cp_classinfo(code, &mut ix, pool)?;
match FieldType::parse(&ty) {
Ok(ty) => ty,
Err(_) => FieldType::Ty(Ty::Object(ty)),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right to me. Conceptually the classinfo field contains the name of a class, which is different from a field type. For instance if the class happened to be called "B", parsing it as a field type would turn it into FieldType::Ty(Ty::Base(BaseType::Byte)) which is incorrect. On top of that this approach of handling Err and assuming it is an Ty::Object seems not very robust. Err can be returned for any number of reasons and is not a good way to handle control flow. You're basically hacking around the fact that ClassInfo structures contain a different syntax (e.g. java/lang/String) than what field type is intended to parse (Ljava/lang/String;) but this is not a good solution.

Copy link
Contributor Author

@C0D3-M4513R C0D3-M4513R Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to circle back to this, but having to manually parse the Cow of anewarray, instanceof and checkcast for array depths also seems a little strange though

Like imagine casting an Object to an int[].

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah like I said in the comment below, I'm not opposed to putting in a structured type here instead of the Cow. I'm just saying using FieldType is wrong, and it should be a different structured type that precisely represents the possible range of values here.

Copy link
Contributor Author

@C0D3-M4513R C0D3-M4513R Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it wrong? can you not checkcast from Object to [J or [B?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance if the class happened to be called "B", parsing it as a field type would turn it into FieldType::Ty(Ty::Base(BaseType::Byte)) which is incorrect.

Copy link
Contributor Author

@C0D3-M4513R C0D3-M4513R Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said I'm not at all opposed to this. I just want the implementation to use types with precision.

I proposed multiple solutions now, but you find none sufficient.
Tbh I do not know what you want from me at this point.

You say you see too many issues with my solution, so is there something I'm not seeing?

Copy link
Contributor Author

@C0D3-M4513R C0D3-M4513R Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also are you saying that the following sample opcode is wrong too? (also taken from #41)

21: checkcast #21 // class "[Ljava/lang/String;"

There's a number of problems with this code, starting with the fact that the class name in these instructions is not encoded with L...;.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code works when you use strip_prefix instead of split_prefix. I added the L...; bit as an extra bit of precaution, just to be sure.

fn main() {
    let ty = "[[[test";
    let mut ty = ty;
    let mut dims = 0;
    while let Some(new_ty) = ty.strip_prefix("[") {
        dims+=1;
        ty=new_ty;
    }
    
    let ty = ty.strip_prefix("L").unwrap_or(ty);
    let ty = ty.strip_suffix(";").unwrap_or(ty);
    
    println!("{ty}, {dims}");
}

Try this java code:

class L {
    public L tryCast(Object o) {
        L l = (L)o;
        return l;
    }
}

Look at what the checkcast instruction generates, the string value that you get, and what your parsing code does to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my new parsing code (ReferenceType) I'd expect it to correctly resolve to the Object Variant of ReferenceType.

I did miss a couple of cases of FieldType though, that should be replaced with ReferenceType.

}
}),
0xbe => Opcode::Arraylength,
0xbf => Opcode::Athrow,
0xc0 => Opcode::Checkcast(read_cp_classinfo(code, &mut ix, pool)?),
0xc1 => Opcode::Instanceof(read_cp_classinfo(code, &mut ix, pool)?),
0xc0 => Opcode::Checkcast({
let ty = read_cp_classinfo(code, &mut ix, pool)?;
match FieldType::parse(&ty) {
Ok(ty) => ty,
Err(_) => FieldType::Ty(Ty::Object(ty)),
}
}),
0xc1 => Opcode::Instanceof({
let ty = read_cp_classinfo(code, &mut ix, pool)?;
match FieldType::parse(&ty) {
Ok(ty) => ty,
Err(_) => FieldType::Ty(Ty::Object(ty)),
}
}),
0xc2 => Opcode::Monitorenter,
0xc3 => Opcode::Monitorexit,
0xc4 => {
Expand All @@ -663,7 +682,7 @@ fn read_opcodes<'a>(
}
}
0xc5 => Opcode::Multianewarray(
read_cp_classinfo(code, &mut ix, pool)?,
FieldType::parse(&read_cp_classinfo(code, &mut ix, pool)?)?,
read_u1(code, &mut ix)?,
),
0xc6 => Opcode::Ifnull((read_u2(code, &mut ix)? as i16).into()),
Expand Down
Binary file added tests/parse/clazz/Test.clazz
Binary file not shown.
Loading