Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate the C definitions for static FFI exports to make the bindings easier to maintain #506

Closed
wants to merge 53 commits into from

Conversation

rdw-software
Copy link
Member

@rdw-software rdw-software commented Feb 11, 2024

Some of the definitions unfortunately can't be checked by the compiler. But overall, reduces the amount of boilerplate code:

  • The bindings module now contains both the exports and the cdefs for each embedded library
  • Types that are used by both the C++ layer and LuaJIT share a standalone header instead of duplicating the cdefs
  • They're embedded as strings (via non-portable inline assembly) and so passed to LuaJIT, which is somewhat hacky
  • Opaque pointer cdefs or copy/pasted library types that can't be redefined are also stored in a separate header
  • Ideally, these would be checked by the compiler and possibly redefined (aliased) so there are no conflicts (later)

More cleanup will be needed to remove the bindings member of each library, but this is far too large already.

It would be much easier to maintain the various bindings if the definition weren't duplicated, as keeping them in sync can be error-prone and they're overly verbose in at least some cases.

As a first step in that direction, introduce a distinction between the stored export table and the C definitions (that are currently still stored in both the C++ and Lua sources, hence the placeholder).
Unfortunately the types used by LuaJIT's FFI won't always be quite identical as the C++ layer can use the full range of types, whereas LuaJIT often has to make do with opaque pointers for library data structures or callback types.

In order to deduplicate the shared type definitions anyway, they'll have to be stored separately (and without any of the clutter that LuaJIT might be unable to process). This way, the compiler can check everything before LuaJIT even sees them. And most importantly they don't have to be kept in sync manually, which is tedious and error-prone.

Also, library types still need to be copy/pasted - which isn't ideal...
If everything went smoothly, this should be exactly the same as before - except now there's a single source of truth for the typedefs, and the compiler gets to check them first. Phew!
The benefit here isn't as great since the compiler can't check the type definitions that are used by LuaJIT only. Still, moving them to separate headers reduces the boilerplate and file size in the Lua sources and enables syntax highlighting in the IDE.

It also makes the dump output more useful for those libraries since it's not includingall the cdefs, but not sure how useful that really is.
The exports tables are static, but when appending the aliases, then indeed the pointer could become invalid - however unlikely that may be here.
@rdw-software rdw-software force-pushed the deduplicate-cdef-strings branch from 4ec8e37 to f04bae6 Compare February 11, 2024 14:23
@rdw-software rdw-software force-pushed the deduplicate-cdef-strings branch 5 times, most recently from b515855 to e7bb888 Compare February 11, 2024 16:33
Symbol names have different conventions on macOS (Mach-O).
Not quite sure why ld can't find the symbol, but it's actually not needed since the BLOB is null-terminated and will be copied into the Lua environment anyway.
Clearly, I misunderstood how the macOS format works (and testing via GitHub actions isn't practical). One underscore is enough, hence the linker errors that should now be resolved.
@rdw-software rdw-software force-pushed the deduplicate-cdef-strings branch from 40cb1f5 to a46304f Compare February 12, 2024 03:07
@rdw-software
Copy link
Member Author

rdw-software commented Feb 12, 2024

Just a quick note for future me as to why embedding the cdefs turned out to be so overly complicated:

  • The easiest hypothetical way to embed binary data is not yet implemented: https://thephd.dev/finally-embed-in-c23
  • As a workaround, I tried multiple things - all of them having serious drawbacks that I'm very unhappy about
  • First, I tried inline assembly... but it's not portable and it doesn't seem to be supported on MSYS (.incbin didn't work)
  • Maybe more research and assembly hacking could work around this issue, but it makes the code too fragile anyway
  • Next, I tried a Lua script that generates the cdefs at build time and embeds them just like all the other bytecode objects
  • That does work and is reasonably fast, but it requires maintaining the script and so there's still no single source of truth
  • Finally, I tried the "classical" approach of using xxd to generate binary dump headers and include them in the code
  • This relies on yet another external tool that should however be present, but it's slow and complicates the build process

In short, all solutions are terrible and I'm not sure which one is the least of a headache. Give me std::embed already, will you?

@rdw-software
Copy link
Member Author

rdw-software commented Feb 12, 2024

I really am not a fan of this given the increase in complexity. The drawbacks simply don't seem worth it (without std::embed):

  • Introduces another build dependency, on xxd (and vim as a package)
  • Increases build times and complicates the configuration - need to explicitly define autogenerated header dependencies
  • The glue code for all bindings needs to process the autogenerated header and convert it to the cdef string as well

Since one goal is to make sure the FFI and C++ definitions are always in sync, maybe there's a simpler way to get that benefit:

  • Move the definitions used by both LuaJIT and C++ to a separate header, as proposed in this PR
  • Normally include them in the bindings but continue to copy/paste it for LuaJIT until proper embeds are available in gcc/clang
  • Add unit tests that read the separated header files and assert they match the cdefs, which is easy to do after the build
  • Can also add assertions to detect struct size mismatches and uninitialized export table fields (NULL pointer values)
  • This unfortunately doesn't ensure the opaque pointer aliases and other copy/pasted types from the library headers match
  • However, the previous approach also doesn't take care of that problem and I don't think there's an easy way to do it

This would still necessitate code duplication, but if the test suite catches any discrepancy then it might not be a huge problem.

@rdw-software
Copy link
Member Author

rdw-software commented Feb 12, 2024

Closing for now due to the above concerns. It's too large a PR to merge as-is anyway; will split off some parts when applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant