Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add descriptor parsing. #1

Closed
wants to merge 1 commit into from
Closed

Conversation

Dirbaio
Copy link
Contributor

@Dirbaio Dirbaio commented Oct 28, 2023

This PR adds methods to DeviceInfo to get either un-parsed descriptors (as raw bytes), or parsed descriptors (as a tree of Configuration, Interface, InterfaceAlternateSetting, Endpoint) structs.

Parsing is based off what libusb does, API is similar to rusb's.

Tested working on Linux and Windows. On windows it's only possible to get descriptors for winusb devices.

@kevinmehall
Copy link
Owner

Nice! I've been meaning to open some "next steps" issues here, with descriptors near the top of that list, but you beat me to it.

For Windows, I believe this ioctl on the parent hub device allows getting descriptors from any device, not just those with the WinUSB driver bound. We already have code to find the parent hub device and port number for a similar ioctl used to get several fields for DeviceInfo.

I also have the usb crate name (from an early library wrapping libusb), and my plan for it has been to make a common crate for USB descriptors and constants that could be shared across libraries implementing devices and hosts. So I might end up moving some of the descriptor structs there. Ideally that would also be able to construct descriptors in const to be able to put them in a static in ROM on an embedded device. I know you're quite involved in the Rust embedded ecosystem -- does that seem useful to you?

src/descriptor.rs Outdated Show resolved Hide resolved
src/descriptor.rs Outdated Show resolved Hide resolved
Comment on lines +163 to +176
/// Returns the device's unparsed descriptors.
///
/// To access parsed data, use [`configurations()`](Self::configurations).
pub fn descriptors(&self) -> Result<Vec<u8>, Error> {
crate::platform::get_descriptors(self)
}

/// Returns a collection of the device's configurations.
pub fn configurations(&self) -> Result<impl Iterator<Item = descriptor::Configuration>, Error> {
let descriptors = self.descriptors()?;
descriptor::parse_configurations(&descriptors)
.map_err(|_| Error::new(ErrorKind::Unsupported, "malformed descriptor"))
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descriptors is probably too generic of a name. It seems like other libraries call this "get config descriptor" even though it contains other nested descriptors and not strictly just the configuration descriptor.

Eventually, I think we'll want a fn get_descriptor(descriptor_type: u8, descriptor_index: u8) -> Result<Vec<u8>> since that's ultimately what the request on the bus looks like. Then convenience methods like get_string_descriptor, get_config_descriptor that call into it, or use OS and type specific APIs to get cached data, and return more specific types. We do want to be clear what DeviceInfo methods potentially perform IO vs return cached data.

Also not sure whether it should return all configurations, or just specified or current configuration like libusb. Multi-configuration devices are rare in any case.

What if instead of separate methods for raw and parsed access, the Configuration wrapped Vec<u8> instead of doing all the parsing upfront? Then it could have an into_inner() and/or as_slice() for raw access, and methods to iterate and parse child descriptors on demand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descriptors is probably too generic of a name. It seems like other libraries call this "get config descriptor" even though it contains other nested descriptors and not strictly just the configuration descriptor.

I mirrored the Linux API, which gives device descriptor + all config descriptors concatenated. It's 99% likely what someone enumerating devices to decide which one to open needs.

Eventually, I think we'll want a fn get_descriptor(descriptor_type: u8, descriptor_index: u8) -> Result<Vec> since that's ultimately what the request on the bus looks like. Then convenience methods like get_string_descriptor, get_config_descriptor that call into it, or use OS and type specific APIs to get cached data, and return more specific types. We do want to be clear what DeviceInfo methods potentially perform IO vs return cached data.

I feel the general get_descriptor would have to do bus IO? It'd be strange to have it cache the "main" descriptors (device, configuration) but do IO for the others. IIUC sysfs /descriptors is cached by the kernel on device plug, it doesn't do IO.

IMO if get_descriptor does IO it is another "feature", and perhaps it needs to go into Device instead of DeviceInfo. Perhaps we can remove DeviceInfo::descriptors() for now, and think about adding something later?

What if instead of separate methods for raw and parsed access, the Configuration wrapped Vec instead of doing all the parsing upfront? Then it could have an into_inner() and/or as_slice() for raw access, and methods to iterate and parse child descriptors on demand.

if parsing is on-demand, iterators have to be impl Iterator<Item = Result<...>> which I find much less ergonomic. It's nicer to parse everything upfront, so you can handle errors in 1 place only, and have all the iteration be infallible. It also allows grouping interface altesettings by number.

Also parsing on-demand is a bit harder: to go from one configuration to the next you have to iterate all descriptors one-by-one, vs the current code relies on the fact that parsing a child advances the reader, leaving it at the next child.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mirrored the Linux API, which gives device descriptor + all config descriptors concatenated. It's 99% likely what someone enumerating devices to decide which one to open needs.

The common case is going to be having a single relevant configuration, and if it has multiple, you're kind of stuck anyway since WinUSB can't change the configuration.

The Linux API is certainly the nicer one here, but one of the failings of libusb that I'm trying to avoid is designing everything around Linux and then emulating that less-elegantly on the other OSs. On the other platforms and on the USB bus itself, the request returns a single configuration descriptor and all its associated interfaces, endpoints, etc.

I feel the general get_descriptor would have to do bus IO?

As far as I can tell, getting any raw descriptor on Windows (with either WinUSB or the hub ioctl) may perform IO, without documentation making promises that particular descriptors are going to be cached. macOS is a similar story.. Could test this and see for particular descriptors in particular circumstances, but I'm not sure how much to rely on undocumented behavior. So it's Linux as the outlier that guarantees that it will cache the descriptor read at enumeration time.

Yeah, I was also questioning whether it should be on DeviceInfo or Device for that reason. In terms of use cases:

  • lsusb -v-like enumerating many devices and dumping as much info as possible -> would slightly prefer it on DeviceInfo since it's not really opening the device to use it, but could do either and isn't necessarily the case to optimize for.
  • finding a device from the list that matches something from a descriptor to decide what to open -> would prefer DeviceInfo, and would strongly prefer not to do IO. nusb already has the serial number string descriptor in DeviceInfo, which is a very common need that it's crazy that libusb requires opening the device and performing IO for. Maybe other attributes for this use case (interface class and subclass IDs?) could similarly be obtained cross-platform in DeviceInfo using APIs that guarantee no IO, short of the full descriptor.
  • with an opened device, figuring out what interfaces or endpoints to use, or reading class-specific descriptors -> Would prefer Device or motivate a way to go back from Device to DeviceInfo. Already has the device open, doesn't care too much if it requires IO, and wants as much flexibility as possible in what it can retrieve.

So that makes me lean towards putting this on Device, and drawing the line at "Device does IO, and DeviceInfo doesn't". I guess there could be a Linux-specific DeviceInfo method that reads the sysfs attribute, and I'd be fine with that one returning all the configurations.

if parsing is on-demand, iterators have to be impl Iterator<Item = Result<...>> which I find much less ergonomic. It's nicer to parse everything upfront, so you can handle errors in 1 place only, and have all the iteration be infallible. It also allows grouping interface altesettings by number.

This is a good point. However on the other hand, I've come across devices with broken descriptors, and it would be nice if the non-broken parts were accessible instead of just returning an error for the whole thing. Not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how broken? currently this code requires only device, config, iface, endpoint descriptors to be well-formed (others are only required to have a good bLength so we can skip over them). If those are broken, wouldn't the OS reject the device? In that case I'd say it's OK for nusb to fail as well. Otherwise, if the OS does accept the broken descriptors we could adjust our parser to match.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the scenarios where it would help would basically be limited to bLength being wrong for the descriptor type but correctly pointing to the next descriptor, or if the chain is broken, allowing access to descriptors prior to the problem.

Looks like libusb also bails out and returns nothing when handling most errors. So that's probably fine.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I just rearranged the Linux transfer code a little bit to allow submitting control transfers without claiming an interface, which the generic get_descriptor could be built on for descriptors that aren't cached.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Oct 29, 2023

For Windows, I believe this ioctl on the parent hub device allows getting descriptors from any device, not just those with the WinUSB driver bound. We already have code to find the parent hub device and port number for a similar ioctl used to get several fields for DeviceInfo.

yeah I saw libusb does that... seems cursed (why the parent hub??) but it's definitely the way to go if it works for non-winusb devices. Will give it a try.

I also have the usb crate name (from an early library wrapping libusb), and kevinmehall/rust-usb#19 that could be shared across libraries implementing devices and hosts. So I might end up moving some of the descriptor structs there. Ideally that would also be able to construct descriptors in const to be able to put them in a static in ROM on an embedded device. I know you're quite involved in the Rust embedded ecosystem -- does that seem useful to you?

We discussed this a bit for embassy-usb. const is not powerful enough (it can't do arrays whose length is the result of doing maths), so building descriptors at compile time would require macros. But there's still use cases for building them at runtime (like, enable/disable classes based on some stored configuration), and for that you'd need two APIs (the macro, and the runtime one) so it's much less nice. We decided to have only the runtime one for now.

Also another challenge is keeping the descriptors and the "behavior" in sync. It has a builder that creates both the endpoints/interfaces, and the descriptors for them, so as long as you use that the "behavior" of the device will always match what the descriptors say. If you pre-bake the descriptors in ROM then you have to manually take care to instantiate the right things at runtime to match them. See example.

So I'm not sure if there's much code to be shared. What you want is very different if you're building or parsing descriptors, and if you're on std you can use upfront parsing with nice easy Vec/HashMap, on no-std you don't want alloc so you have to do things more "on-demand".

@kevinmehall
Copy link
Owner

yeah I saw libusb does that... seems cursed (why the parent hub??) but it's definitely the way to go if it works for non-winusb devices. Will give it a try.

How USB devices fit into the Windows device model is certainly very cursed, but this makes a certain amount of sense given that -- USB device nodes all have different drivers, and there's not a common "USB Device" object with a single common driver to put that functionality in. So the hub is the closest thing to that.

@kevinmehall
Copy link
Owner

Good feedback on the usb crate idea. I've mostly worked with USB firmware on smaller MCUs where copying a descriptor out of ROM makes sense, but dynamically building one is nice if you can afford the code size and memory for it.

I guess the most that library could do at this point is the bytemuck / zerocopy-style mapping between structs and bytes, and leave the parsing and building of concatenated descriptors to a higher level. But that still leaves a lot for people to disagree on, like which of those libraries to use or whether to hand-write it, such that it seems like it would be hard to cover all the use cases to be truly generic.

Anyway, seems like my main motivation for doing it would just be putting the crate name to good use, which doesn't seem like a good enough reason. (I considered using the usb name for nusb, but decided if there were eventually C or Python bindings it would want a more brandable name than just usb).

@Dirbaio Dirbaio force-pushed the descriptors branch 2 times, most recently from c20c589 to f6951d6 Compare October 31, 2023 01:59
@martinling
Copy link
Contributor

Regarding the usb crate idea, you might also be interested to have a look at our usb module in Packetry, which includes bytemuck-compatible types for descriptors and an iterator implementation, as well as types for most of the other structures associated with the protocol.

It's probably not exactly what's needed for other projects in its current form, but it would be interesting to see if we could collaborate to make a usb crate that covered everyone's use cases.

@kevinmehall
Copy link
Owner

kevinmehall commented Dec 10, 2023

I've started iterating on this in #13

Main differences:

  • Made it a configurations() method on Device rather than DeviceInfo
    • Could also make it a platform-specific method on DeviceInfo for platforms (Linux) where it can be read without opening the device and with a guarantee of not performing IO.
  • On Linux, read the descriptors from the usbfs FD, rather than sysfs -- I want to be able to offer a Device::from_fd for Android or the new flatpak USB portals where you're handed a file descriptor without access to sysfs.
  • Haven't done Windows or macOS yet, but on Windows I'm going to try the hub ioctl discussed above. On Windows I'm using hub ioctl discussed above. It works for devices even not bound to winusb.
  • Instead of unsafe or bytemuck, etc, this just uses slices and from_le_bytes. Wrote a little macro to make that fairly clean.
  • Unify the raw descriptor and parsed descriptor access API. To avoid returning Result everywhere despite not parsing upfront, I'm simply skipping descriptors that are too short, and giving up when bLength is <2 or out of bounds.
  • It's all slices borrowed from the Device. Not sure if this is good vs something owned? Seems like the device is going to want to store this data anyway, and on macOS we can even borrow it from iokit without copying it.
  • Also planning to add a blocking function to make a get_descriptor request to the device, and a wrapper for converting string descriptors to String.

Any feedback and testing appreciated. While this only directly uses a little bit of your code, this PR and the resulting discussion was quite useful in designing this.

@kevinmehall
Copy link
Owner

Merged #13, but still open to feedback if you have any when trying it out.

TroyNeubauer added a commit to MerchGuardian/nusb that referenced this pull request Sep 12, 2024
tuna-f1sh pushed a commit to tuna-f1sh/nusb that referenced this pull request Sep 26, 2024
udev: add function `Udev::has_devtmpfs`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants