Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uses_bulkdata argument to paasta spark run instance_config #4005

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

SuperMatt
Copy link
Contributor

@SuperMatt SuperMatt commented Jan 14, 2025

This makes the change to paasta spark run so that https://github.yelpcorp.com/sysgit/yelpsoa-configs/pull/52010 will work as expected.

This works by adding the uses_bulkdata key to the intsance config if the spark job has the key present and set to true.

I have added this arg to the tests so that they pass, and also created a test so that we can check all the different ways that uses_bulkdata can be set, either on paasta spark-run as an argument, or in the instance config.

See #3995 for more information about why we're doing this.

This makes the change to paasta spark run so that https://github.yelpcorp.com/sysgit/yelpsoa-configs/pull/52010 will work as expected.

This works by adding the uses_bulkdata key to the intsance config if the spark job has the key present and set to true.

I have added this arg to the tests so that they pass, and also created a test so that we can check all the different ways that uses_bulkdata can be set, either on paasta spark-run as an argument, or in the instance config.

See #3995 for more information about why we're doing this.
@SuperMatt SuperMatt force-pushed the u/mames/PERES-5194-uses-bulkdata-spark-run branch from 7c084d3 to 23fcbfa Compare January 20, 2025 15:37
Comment on lines +1442 to +1444
assert (
mock_get_instance_config.return_value.config_dict["uses_bulkdata"] == expected
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've mocked this on L1436 - so this assertion isn't doing much atm (I guess right now this test is really just checking that spark_run.paasta_spark_run(args) doesn't crash if we remove this always-passing assertion?)

imo, deleting this test is fine (the new logic is pretty trivial) or updating the test to assert that the correct mounts are in place in the final command would be where we should go from here :)

@SuperMatt
Copy link
Contributor Author

Discussing with @nemacysts, we think that we don't need a test that the volume is present because we're testing that uses_bulkdata ends up on the instance config, but then we have another test already for the volume being added when uses_bulkdata is set to true, which can be seen here. Adding a spark test for this volume being included would be redundent.

@SuperMatt SuperMatt merged commit dde1ed3 into master Feb 3, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants