Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds shared memory into extended resources #6193

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jan 25, 2025

Tracking issue

Towards #6142

Why are the changes needed?

This PR adds shared memory as an extend resource, that is made available through @task(shared_memory). For the simple case, you can have @task(shared_memory=True), which means: "memory backed volumes are sized to node allocatable memory". Otherwise, you can set shared_memory="2Gi" to specify the value.

What changes were proposed in this pull request?

This PR adds shared_memory to the IDL and the implementation to get it to work.

How was this patch tested?

Unit tests were added to this PR and tested with flytekit changes:

import os
from flytekit import task, ImageSpec

image = ImageSpec(
    name="flytekit",
    apt_packages=["git"],
    registry="localhost:30000",
    commands=[
        "uv pip install git+https://github.com/thomasjpfan/flyte.git@65dda339b0088d9e568877577fa78fc88b223582#subdirectory=flytekit"
        "uv pip install git+https://github.com/thomasjpfan/flyte.git@d2c76ff330077875f7826c278f660add7f2c50a9#subdirectory=flyteidl"
    ],
)


@task(container_image=image, shared_memory=True)
def check_shm2() -> bool:
    return os.path.exists("/dev/shm")

Then I used kubectl to make sure that the pod spec was correct.

Summary by Bito

This PR implements shared memory support in Flyte's extended resources by introducing a new SharedMemory message type in the IDL with configurable mount options. The implementation includes protobuf definitions and pod helper functionality for Kubernetes pods, utilizing getter methods for shared volume mount properties. The changes enable tasks to request and configure memory-backed volumes with either default node allocatable memory or custom size specifications across multiple language bindings including Go, JavaScript/TypeScript, and Python.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 5

@thomasjpfan thomasjpfan added the added Merged changes that add new functionality label Jan 25, 2025
@flyte-bot
Copy link
Collaborator

flyte-bot commented Jan 25, 2025

Code Review Agent Run #6a4c77

Actionable Suggestions - 6
  • flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper_test.go - 2
  • flyteidl/protos/flyteidl/core/tasks.proto - 1
    • Consider using resource.Quantity for size_limit · Line 73-73
  • flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper.go - 3
    • Consider validating SharedMemory fields · Line 494-495
    • Consider validating primaryContainerName before use · Line 494-497
    • Consider explicit error handling for ApplySharedMemory · Line 495-495
Additional Suggestions - 3
  • flyteidl/gen/pb_python/flyteidl/core/tasks_pb2.pyi - 1
    • Consider adding type hints for field · Line 70-71
  • flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper.go - 2
    • Consider using camelCase for variable names · Line 166-166
    • Consider renaming parameter to follow conventions · Line 140-140
Review Details
  • Files reviewed - 10 · Commit Range: d2c76ff..d2c76ff
    • flyteidl/gen/pb-es/flyteidl/core/tasks_pb.ts
    • flyteidl/gen/pb-go/flyteidl/core/tasks.pb.go
    • flyteidl/gen/pb-js/flyteidl.d.ts
    • flyteidl/gen/pb-js/flyteidl.js
    • flyteidl/gen/pb_python/flyteidl/core/tasks_pb2.py
    • flyteidl/gen/pb_python/flyteidl/core/tasks_pb2.pyi
    • flyteidl/gen/pb_rust/flyteidl.core.rs
    • flyteidl/protos/flyteidl/core/tasks.proto
    • flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper.go
    • flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper_test.go
  • Files skipped - 4
    • flyteidl/clients/go/assets/admin.swagger.json - Reason: Filter setting
    • flyteidl/gen/pb-go/gateway/flyteidl/service/admin.swagger.json - Reason: Filter setting
    • flyteidl/gen/pb-go/gateway/flyteidl/service/agent.swagger.json - Reason: Filter setting
    • flyteidl/gen/pb-go/gateway/flyteidl/service/external_plugin_service.swagger.json - Reason: Filter setting
  • Tools
    • Golangci-lint (Linter) - ✖︎ Failed
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

AI Code Review powered by Bito Logo

Copy link

codecov bot commented Jan 25, 2025

Codecov Report

Attention: Patch coverage is 32.07547% with 108 lines in your changes missing coverage. Please review.

Project coverage is 36.86%. Comparing base (5197f60) to head (e70f393).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
flyteidl/gen/pb-go/flyteidl/core/tasks.pb.go 2.83% 103 Missing ⚠️
...ns/go/tasks/pluginmachinery/flytek8s/pod_helper.go 90.56% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6193      +/-   ##
==========================================
+ Coverage   36.85%   36.86%   +0.01%     
==========================================
  Files        1318     1318              
  Lines      134543   134647     +104     
==========================================
+ Hits        49590    49644      +54     
- Misses      80632    80681      +49     
- Partials     4321     4322       +1     
Flag Coverage Δ
unittests-datacatalog 51.58% <ø> (ø)
unittests-flyteadmin 51.94% <ø> (+0.02%) ⬆️
unittests-flytecopilot 30.99% <ø> (ø)
unittests-flytectl 62.29% <ø> (ø)
unittests-flyteidl 7.22% <2.83%> (-0.01%) ⬇️
unittests-flyteplugins 54.02% <90.56%> (+0.10%) ⬆️
unittests-flytepropeller 42.78% <ø> (ø)
unittests-flytestdlib 55.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Thomas J. Fan <[email protected]>
@flyte-bot
Copy link
Collaborator

flyte-bot commented Jan 25, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
New Feature - Shared Memory Support in Extended Resources

tasks_pb.ts - Added SharedMemory class with mount path, name and size limit configuration

tasks.pb.go - Implemented SharedMemory protobuf message type and updated ExtendedResources to include shared memory support

New Feature - Shared Memory Support in Extended Resources

tasks.pb.go - Updated protobuf message types and added SharedMemory support in ExtendedResources

flyteidl.d.ts - Added TypeScript definitions for SharedMemory configuration and ExtendedResources integration

flyteidl.js - Implemented JavaScript bindings for SharedMemory and ExtendedResources functionality

tasks_pb2.py - Updated Python protobuf descriptors to include SharedMemory support

tasks_pb2.pyi - Added Python type hints for SharedMemory configuration

New Feature - Shared Memory Support in Extended Resources

tasks_pb2.pyi - Added shared memory field and initialization parameters

flyteidl.core.rs - Implemented SharedMemory struct and added to ExtendedResources

tasks.proto - Defined SharedMemory message type with mount path, name and size limit fields

pod_helper.go - Added shared memory volume configuration logic for Kubernetes pods

pod_helper_test.go - Added comprehensive test coverage for shared memory functionality

tasks_pb.ts - Added SharedMemory class with mount path, name and size limit configuration

tasks.pb.go - Updated protobuf message types and added SharedMemory support in ExtendedResources

flyteidl.d.ts - Added TypeScript definitions for SharedMemory configuration

flyteidl.js - Implemented JavaScript bindings for SharedMemory functionality

tasks_pb2.py - Updated Python protobuf descriptors to include SharedMemory support

Comment on lines +2552 to +2556
var quantity resource.Quantity
if test.sharedVolume.GetSizeLimit() != "" {
quantity, err = resource.ParseQuantity(test.sharedVolume.GetSizeLimit())
assert.NoError(t, err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider initializing quantity variable before use

Consider initializing quantity to a zero value before the conditional block. Currently if GetSizeLimit() returns an empty string, quantity remains uninitialized when used in the assertion.

Code suggestion
Check the AI-generated fix before applying
Suggested change
var quantity resource.Quantity
if test.sharedVolume.GetSizeLimit() != "" {
quantity, err = resource.ParseQuantity(test.sharedVolume.GetSizeLimit())
assert.NoError(t, err)
}
var quantity resource.Quantity
quantity = resource.Quantity{}
if test.sharedVolume.GetSizeLimit() != "" {
quantity, err = resource.ParseQuantity(test.sharedVolume.GetSizeLimit())
assert.NoError(t, err)
}

Code Review Run #6a4c77


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

// Size limit for shared memory. If not set, then the shared memory is equal
// to the allocated memory.
// +optional
string size_limit = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using resource.Quantity for size_limit

Consider using a more specific type like k8s.io/apimachinery/pkg/api/resource.Quantity for size_limit instead of string to ensure proper validation of memory size values.

Code suggestion
Check the AI-generated fix before applying
Suggested change
string size_limit = 3;
k8s.io.apimachinery.pkg.api.resource.Quantity size_limit = 3;

Code Review Run #6a4c77


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +494 to +495
if extendedResources.GetSharedMemory() != nil {
err = ApplySharedMemory(podSpec, primaryContainerName, extendedResources.GetSharedMemory())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating SharedMemory fields

Consider adding validation for SharedMemory fields before applying the override. The MountPath and MountName fields should be validated to ensure they contain valid values.

Code suggestion
Check the AI-generated fix before applying
Suggested change
if extendedResources.GetSharedMemory() != nil {
err = ApplySharedMemory(podSpec, primaryContainerName, extendedResources.GetSharedMemory())
if extendedResources.GetSharedMemory() != nil {
shm := extendedResources.GetSharedMemory()
if shm.MountPath == "" || shm.MountName == "" {
return nil, nil, fmt.Errorf("shared memory mount path and name must be specified")
}
if !strings.HasPrefix(shm.MountPath, "/") {
return nil, nil, fmt.Errorf("shared memory mount path must be absolute")
}
err = ApplySharedMemory(podSpec, primaryContainerName, extendedResources.GetSharedMemory())

Code Review Run #6a4c77


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +494 to +497
if extendedResources.GetSharedMemory() != nil {
err = ApplySharedMemory(podSpec, primaryContainerName, extendedResources.GetSharedMemory())
if err != nil {
return nil, nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating primaryContainerName before use

Consider checking if primaryContainerName is empty before using it in ApplySharedMemory(). An empty container name could cause issues with shared memory configuration.

Code suggestion
Check the AI-generated fix before applying
Suggested change
if extendedResources.GetSharedMemory() != nil {
err = ApplySharedMemory(podSpec, primaryContainerName, extendedResources.GetSharedMemory())
if err != nil {
return nil, nil, err
if extendedResources.GetSharedMemory() != nil {
if primaryContainerName == "" {
return nil, nil, fmt.Errorf("primary container name cannot be empty when configuring shared memory")
}
err = ApplySharedMemory(podSpec, primaryContainerName, extendedResources.GetSharedMemory())
if err != nil {
return nil, nil, err

Code Review Run #6a4c77


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -429,6 +490,14 @@
ApplyGPUNodeSelectors(podSpec, extendedResources.GetGpuAccelerator())
}

// Shared memory volume
if extendedResources.GetSharedMemory() != nil {
err = ApplySharedMemory(podSpec, primaryContainerName, extendedResources.GetSharedMemory())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider explicit error handling for ApplySharedMemory

Consider checking the return value from ApplySharedMemory() before proceeding. The error handling could be more explicit.

Code suggestion
Check the AI-generated fix before applying
Suggested change
err = ApplySharedMemory(podSpec, primaryContainerName, extendedResources.GetSharedMemory())
if err := ApplySharedMemory(podSpec, primaryContainerName, extendedResources.GetSharedMemory()); err != nil {
return nil, nil, fmt.Errorf("failed to apply shared memory: %w", err)
}

Code Review Run #6a4c77


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@flyte-bot
Copy link
Collaborator

flyte-bot commented Jan 25, 2025

Code Review Agent Run #5bb8be

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: d2c76ff..04206f8
    • flyteplugins/go/tasks/pluginmachinery/flytek8s/pod_helper_test.go
  • Files skipped - 0
  • Tools
    • Golangci-lint (Linter) - ✖︎ Failed
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

AI Code Review powered by Bito Logo

…d_memory_extended_sources_v2

Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
@flyte-bot
Copy link
Collaborator

flyte-bot commented Feb 6, 2025

Code Review Agent Run Status

  • Limitations and other issues: ❌ Failure - Bito Code Review Agent didn't review this pull request automatically because it exceeded the size limit. No action is needed if you didn't intend for the agent to review it. Otherwise, you can initiate the review by typing /review in a comment below.

Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not have to mount this to all the other containers in the pod as well? maybe i'm a bit confused as to how this interacts with default pod templates.

Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced offline

@thomasjpfan thomasjpfan merged commit 2092e88 into flyteorg:master Feb 7, 2025
52 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
added Merged changes that add new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants