Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KAT-1090] Simplify how we track build dependencies #112

Open
arthurp opened this issue Mar 10, 2021 · 4 comments
Open

[KAT-1090] Simplify how we track build dependencies #112

arthurp opened this issue Mar 10, 2021 · 4 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@arthurp
Copy link
Contributor

arthurp commented Mar 10, 2021

Currently build dependencies are tracked in at least the following places:

  • conda_recipe/meta.yaml
  • conda_recipe/environment.yml
  • scripts/setup_ubuntu.sh
  • scripts/setup_osx.sh
  • config/conanfile.py
  • Various CMakeLists.txt
  • README.md

This is a mess and causes issues like having different versions specified in different dependency lists resulting in people getting different build results (this happened with different versions of the C++ date parser library).

It would be great to have a single source of truth that could be maintained and then the information could distributed to all the appropriate parts of the system, either by accessing the single source, or by regenerating the relevant file from the single source as needed.

This would also allow easier use of pipenv like tools for those who like them since there would be a single list of all the dependencies that could be fed into the tool.

@arthurp
Copy link
Contributor Author

arthurp commented Mar 10, 2021

Thoughts because I must always try to solve issues immediately. The dependency format should have the capabilities:

  • specify alternative names for a package in different systems (e.g., APT vs Conda).
  • specify versions in a general format that can be converted to any system (e.g., >1.0)
  • specify versions in system specific forms as an escape hatch
  • include comments

The format should be as simple as possible so allow it to be parsed from shell scripts if needed and easily parsed from any language without any other dependencies. This rules out general languages like JSON (complex and not supported by the C++ standard library) or YAML (complex and not supported by the Python standard library). This means that some custom line oriented format is probably simplest. Maybe like this: package [system], package [system], generic_package: version [system], generic_version

# pip is called "pip" in all the packaging systems we care about
pip: >20
arrow-cpp [conda], arrow [conan], apache-arrow [brew]: >=2.0<3.0

@aneeshdurg
Copy link
Contributor

Just for context, why do we support multiple dependency systems? Can we avoid this problem by restricting how we gather deps?

@arthurp
Copy link
Contributor Author

arthurp commented Mar 10, 2021

We are already building packages for Ubuntu (debs) and Conda. And most people use Conan for build dependencies, but I use Conda (because it's required for the Conda packaging, so I want to test it all the time). The result is that we need dependency lists in the correct formats and names for at least the following as I see it: Conda, Conan, Debian/Ubuntu, RPM/CentOS, pip (for python stuff in non-conda environments). And in fact most environments will MIX these kinds of dependencies. It's gonna be a bit confusing, but you get the idea and I don't think it's at all impossible to get right.

@arthurp arthurp added the good first issue Good for newcomers label Mar 10, 2021
@arthurp arthurp changed the title Simplify how we track build dependencies [KAT-1090] Simplify how we track build dependencies Aug 16, 2021
@arthurp arthurp self-assigned this Sep 9, 2021
@arthurp
Copy link
Contributor Author

arthurp commented Sep 9, 2021

I have decided to use JSON as the data format for the dependency information. This is supported by:

  • CMake (via string(JSON ...))
  • Python standard library
  • A C++ library we already use
  • Bash via jq (which we haven't used but could)
  • and it's probably not hard to find a library for any environment we are working in.

The main reason not to go with something "simpler" and custom is that it would force us to write a parser in each language, including one of the worst: CMake.

Writing JSON manually isn't great for humans, but if we format it in a specific way, it shouldn't be too bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants