Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip_parse: requirements_by_platform: allow custom defined platforms #2548

Open
hariom-qure opened this issue Jan 7, 2025 · 7 comments
Open
Labels
help wanted P4 type: pip pip/pypi integration

Comments

@hariom-qure
Copy link

hariom-qure commented Jan 7, 2025

Allow custom platforms to be added to pip_parse

We are trying to install PyTorch and PyTorch has a slightly different way for installing.

The pip command for installing pytorch+linux+cpu:

pip3 install torch --index-url https://download.pytorch.org/whl/cpu

While for a specific GPU version:

pip3 install torch --index-url https://download.pytorch.org/whl/cu121

So we have different requirements files for linux CPU and linux GPU combinations. In the code, I can see that rules_python is using the requirements file depending on the current host that its running, code is here

How do we pass custom platforms? Something like this

constraint_setting(name = "gpu")

constraint_value(
    name = "gpu_cu_12_1",
    constraint_setting = ":gpu",
)

constraint_value(
    name = "gpu_none",
    constraint_setting = ":gpu",
)

platform(
    name = "lol_x86_64",
    constraint_values = [
        "@platforms//os:linux",
        "@platforms//cpu:x86_64",
        ":gpu_none",
    ],
)

platform(
    name = "linux_x86_64_cu_12_1",
    constraint_values = [
        "@platforms//os:linux",
        "@platforms//cpu:x86_64",
        ":gpu_cu_12_1",
    ],
)

Versions

bazel: 6.2.0
rules_python: 0.40.0

Thanks!

@aignas
Copy link
Collaborator

aignas commented Jan 8, 2025

Right now there is no way to do that, similarly to how it is not possible to have switches in the regular Python tooling.

#2449 example shows how people are doing it right now.

@hariom-qure
Copy link
Author

hariom-qure commented Jan 8, 2025

So the example shown does not have multiple requirements files (I want to support multiple platforms, where torch has different ways for installing in each). Using requirements_by_platform is handling most of the cases for me. Only the extra CUDA platform is a problem for me

The main strategy is to rely on requirements_by_platform to handle most of the stuff for me, and handle specific packages by manually creating aliases which handle platforms

pip_parse all of my requirements files in WORKSPACE

# file: WORKSPACE
# the base python deps management, which handles the default cases
pip_parse(
    name = "python_deps",
    python_interpreter_target = interpreter,
    requirements_by_platform = {
      "//third_party/python/locks:requirements_lock_linux_x86_64.txt": "linux_x86_64",
    },
    requirements_lock = "//third_party/python:requirements_lock.txt",
)

# individual requirements are also parsed with their own names for `select` for specific packages
# which are not specifically handled by `requirements_by_platform`
pip_parse(
    name = "python_deps_linux_x86_64_cu_12_1",
    python_interpreter_target = interpreter,
    requirements_lock = "//third_party/python/locks:requirements_lock_linux_x86_64_cu_12_1.txt",
)

pip_parse(
    name = "python_deps_linux_x86_64",
    python_interpreter_target = interpreter,
    requirements_lock = "//third_party/python/locks:requirements_lock_linux_x86_64.txt",
)

Now I can create my own requirement macro and use it everywhere to take care of platform

# file: third_party/python/defs.bzl
def requirement(pkg_name):
   # draft and pseudocode: dont know the exact syntax right now
   # converting special characters to underscore and lowercase not handled too
   return select({
      "cuda_enabled": "@python_deps_linux_cu_12_1_{}//:pkg".format(dep),
      "default": "@python_deps_{}//:pkg".format(dep),
   })

or I could specifically create aliases for each package which is problematic

# file: third_party/python/torch
alias(
   name = "torch",
   actual = select({
      "cuda_enabled":  "@python_deps_linux_cu_12_1_{}//:pkg".format(dep),
      "default": "@python_deps_{}//:pkg".format(dep),
   }),
)

Can you give me some opinion on these options? I dont have any idea which option would be better in the long run, or if there would be some other better option

Thanks

@aignas
Copy link
Collaborator

aignas commented Jan 14, 2025

Making this work in WORKSPACE and bzlmod may be a lot of work, so it is likely that this would be implemented only in bzlmod unless there is help from outside.

What is more, we are moving towards using filename of the distribution for generating the config_setting values for matching the target platform. I am not sure how it would work together with this request. Right now Python standards do not define how to treat the CUDA dependencies, so we cannot provide something that just works out of the box for everyone.

I think the best approach for now is for the user to define the hub repo that is accesing one or another branch depending on what the active configuration is.

@aignas aignas added P4 help wanted type: pip pip/pypi integration labels Jan 14, 2025
@hariom-qure
Copy link
Author

hariom-qure commented Jan 14, 2025

Ohk, for now, I've taken the following approach and it's working well enough for my usecase. From what I understand, this is what you are talking about. correct me if I'm wrong. I hope it maps easily to bzlmod. Putting it out here in case someone else needs it

  • maintain separate lock files for each platform (including defining cuda separately)
  • in the main pip_parse target, where we provide platform_by_requirements, relying on rules_python to pick the correct lock file depending on platform. By default, we simply use a target like this: @python_deps_pydantic://pkg
  • we also pip_parse all platform lock files, creating separate targets for each of them, as an example, torch for linux cuda12.1 is @python_deps_linux_x86_64_cu_12_1_torch//:pkg
# WORKSPACE
load("@rules_python//python:pip.bzl", "pip_parse")

###################### Main pip parse ####################################################
# this pip-parse is the generic pip-parse whose targets are used as fallbacks
# it relies on the rules to select correct pip package depending on the host platform
##########################################################################################
pip_parse(
    name = "python_deps",
    python_interpreter_target = interpreter,
    requirements_by_platform = {
      "//third_party/python/reqs/locks:requirements_lock_linux_x86_64.txt": "linux_x86_64",
      "//third_party/python/reqs/locks:requirements_lock_macos_aarch64.txt": "osx_aarch64",
    },
    requirements_lock = "//third_party/python/reqs/locks:requirements_lock_linux_x86_64_cu_12_1.txt",
)

# Load the starlark macro which will define your dependencies.
load("@python_deps//:requirements.bzl", install_py_deps = "install_deps")

# Call it to define repos for your requirements.
install_py_deps()

######################### Linux Cuda pip #########################################################################
# Cuda dependencies require extra work from our side (which is not covered by stock pip)
# we have rules for specific pip packages which require extra work (example torch)
# these packages would use the platform specific pip targets defined here to install platform specific dependencies
###################################################################################################################
pip_parse(
    name = "python_deps_linux_x86_64_cu_12_1",
    python_interpreter_target = interpreter,
    requirements_lock = "//third_party/python/reqs/locks:requirements_lock_linux_x86_64_cu_12_1.txt",
)

# Load the starlark macro which will define your dependencies.
load("@python_deps_linux_x86_64_cu_12_1//:requirements.bzl", install_py_linux_86_64_cu_12_1_deps = "install_deps")

# Call it to define repos for your requirements.
install_py_linux_86_64_cu_12_1_deps()
  • Define the CUDA constraints
# platforms/gpu/BUILD
constraint_setting(
    name = "gpu",
    default_constraint_value="none",
    visibility=["//visibility:public"],
)

constraint_value(
    name = "cu_12_1",
    constraint_setting = ":gpu",
    visibility=["//visibility:public"],
)

constraint_value(
    name = "none",
    constraint_setting = ":gpu",
    visibility=["//visibility:public"],
)
  • Define our custom platforms
# platforms/BUILD

platform(
    name = "linux_x86_64",
    constraint_values = [
        "@platforms//os:linux",
        "@platforms//cpu:x86_64",
        "//platforms/gpu:none",
    ],
    visibility=["//visibility:public"],
)

platform(
    name = "linux_x86_64_cu_12_1",
    constraint_values = [
        "@platforms//os:linux",
        "@platforms//cpu:x86_64",
        "//platforms/gpu:cu_12_1",
    ],
    visibility=["//visibility:public"],
)

platform(
    name = "macos_arm_aarch64",
    constraint_values = [
        "@platforms//os:macos",
        "@platforms//cpu:aarch64",
        "@platforms//gpu:none",
    ],
    visibility=["//visibility:public"],
)
  • Create a separate bazel target for torch and every other annoying package
# third_party/python/lib/torch/BUILD

alias(
    name = "torch_linux",
    actual = select({
        "//platforms/gpu:cu_12_1": "@python_deps_linux_x86_64_cu_12_1_torch//:pkg",
        "//conditions:default": requirement("torch"),
    }),
)

alias(
    name = "torch",
    actual = select({
        "@platforms//os:linux": ":torch_linux",
        "//conditions:default": requirement("torch"),
    }),
    visibility=["//visibility:public"],
)

The team needs to be notified that torch should be imported as //third_party/python/libs/torch instead of directly requirementing it. I have also defined my own requirement macro which is simply returning the default pip parsed target (the default behavior), so that I can add better error messaging, etc. in case someone trips and uses requirement with the wrong libs. The team would use the macro provided in our repo

@rickeylev
Copy link
Collaborator

using filename to determine config_settings; unclear how that would work with this request

I think it would translate to: special handling of pytorch to somehow map constraints to the backing repos during the BUILD-file generation. IMHO, this would be a generally useful feature. We sort of have this already with pip.override for patching e.g. the library's py files, but we're still lacking a decent way to customize how the BUILD file stuff is handled. Something a bit unique to this case is that it's not just patching a BUILD file -- there is also the separate repo that, ideally, the pypi integration would create/manage.

Another take: how do we represent arbitrary constraints with the pypi integration? Hm, maybe stated another way: generalizing requirements_by_platform to support arbitrary constraints? (this idea sounds familiar)

Ultimately, what we want is:

  • Multiple repos for pytorch (cpu, cuda X.Y, ...)
  • The pip hub BUILD file to have a select() that routes to the appropriate pytorch repo. e.g. alias(actual=select({":cuda_12": "@pytorch_cuda_12", "default": "@pytorch_cpu"}), ...)

A requirements file supports basic stuff, e.g. os, cpu arch, with the environment markers. IIUC, it doesn't support anything custom, though (e.g. cuda version). I suppose one solution is for our requirements parser to support something custom, e.g. env markers like cuda_version=..., which it can then translate into the select() constraints. This is prone to not playing well with existing tooling, though.

Lacking this info being in requirements.txt, it has to be somewhere else. That translates to something on the pip module extension, e.g. arg to pip.parse, arg to pip.override, separate tag class, auxilary file with info. Whatever this thing is, it has to carry the 2-tuple of (list of constraints, target-to-use), e.g. (//whatever:cuda_12_1, @pytorch_linux_x86_cuda_12_1//:pkg)

@aignas
Copy link
Collaborator

aignas commented Jan 15, 2025

We do have a config_setting parameter in whl_config_setting function that we could repurpose for this. pip.override for the wheels accepting the config_setting and then creating a new config setting that is doing our_config_setting AND given_config_setting for the overridden wheels and our_config_setting AND NOT given_config_setting for the wheels that are not overridden could work from the alias managing point of view.

I guess in this particular case the main issue is that we need to pull 2 different local versions of pytorch and map that to the same thing.

Maybe using something like:

pip.parse(
    hub_name = "pip",
    config_setting = "my_config_setting_1",
    requirements_by_platform = ...
)

pip.parse(
    hub_name = "pip",
    requirements_by_platform,
)

This could almost work if we:

  1. parse the requirements files
  2. Look for differences between the entries and wire the config_setting to whl_config_setting when creating the hub repo.
  3. Add a suffix for the special repos from above.

If we decided to make this the main implementation how requirements_by_platform under the hood works this could be viable. What we would endup with for pytorch would be something like:

alias(
    name = "pytorch",
    actual = select({
        "my_config_setting_1": select({
            "is_python_3.13": "@pytorch_1",  # the same select would be needed even if we have the filename based config settings.
        }),
        "default": select({
            "is_python_3.13": "@pytorch_2",
        }),
    }),
)

I wrote a nested select to illustrate the point that we need to do additional selects afterwards.

If I am correct, this could even make it possible to further cleanup the config settings that I already cleaned up in #2556 - right now we have target_platforms that is for handling cases of different versions of packages for different target platforms.

The way to test my hypothesis would be to get rid of the target_platforms parameter to the whl_config_settings and make the hub repo handle the target platforms in a different way.

All of this is now much easier only because bazelbuild/bazel#19301 got resolved. When I was writing the inital requirements_by_platform we did not have such things.

@aignas
Copy link
Collaborator

aignas commented Jan 16, 2025

I thought about it a little more and realized that what we need for legacy WORKSPACE case is:

alias(
    name = "pkg",
    actual = "@pip_repo//:pkg",
)

For bzlmod where we have different repos per (python_version,os,arch) triplets, we need to have:

alias(
    name = "pkg",
    actual = select({
        "//_config:is_cp39_linux_aarch64": "@pip_cp39_linux_aarch64_repo//:pkg",
        # ... 
    })
)

For bzlmod where we also derive config settings from filenames, we should do a cascading alias:

alias(
    name = "_pkg_1",
    actual = select({
        "//_config:<filename_based_config_setting>": "@filename_based_repo//:pkg",
    }),
)

alias(
    name = "_pkg_2",
    actual = select({
        "//_config:<filename_based_config_setting>": "@filename_based_repo//:pkg",
    }),
)

config_setting_group(
    name = "_config_setting_group_1",
    match_all = ["@some_pkg:custom_user_setting", "//_config:is_cp39_linux_aarch64"]
)

alias(
    name = "pkg",
    actual = select({
        ":_config_setting_group_1": "_pkg_1",
        "//conditions:default": "_pkg_2", # or fail if there is no such _pkg_2 configured.
    })
)

That way we:

  • Correctly select the right wheel based on what is available. The specialization of the config settings for filename derived settings is working as intended. The users will get a reasonable error message if there is no wheel available for their target platform.
  • Correctly fail in case we do not have the right configuration based on what the user is supplying us (the some_pkg:custom_user_setting).

I think I first want to refactor our internal code and follow this solution and then the implementation of this feature would be definitely doable. No estimated time of when this would be ready, but thought I'd communicate out the direction I intend to take.

If anybody wants to pick it up, feel free. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted P4 type: pip pip/pypi integration
Projects
None yet
Development

No branches or pull requests

3 participants