Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐞 Spark build started taking longer than usual #28

Open
spagnoloe-amenitiz opened this issue Sep 19, 2024 · 2 comments
Open

🐞 Spark build started taking longer than usual #28

spagnoloe-amenitiz opened this issue Sep 19, 2024 · 2 comments
Assignees

Comments

@spagnoloe-amenitiz
Copy link

spagnoloe-amenitiz commented Sep 19, 2024

Describe the bug

Hi there,

Starting today, we have seen a significant increase in the time to setup Spark action.
Until yesterday this step would not take longer than 10 minutes, and today we see it is taking at least 30 minutes, with many instances taking more than 1 hour.

We are currently running the action on spark version 3.3.0 on ubuntu-20.04.

- uses: vemonet/setup-spark@v1
  timeout-minutes: 10
  with:
    spark-version: '3.3.0'
    hadoop-version: '3'
    spark-url: 'https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz'

image

Is anyone else experiencing the same issue?

Reproduction

No response

Which version of the action are you using?

v1

With which versions of Spark is it happening?

3.3.0

Operating System and environment

ubuntu-20.04

Additional context

No response

@vemonet
Copy link
Owner

vemonet commented Sep 19, 2024

Hi @spagnoloe-amenitiz

Downloading from https://archive.apache.org can be quite slow (but it has the best coverage in term of old versions), I would recommend to take a look there to find the officially recommended mirror: https://spark.apache.org/downloads.html

In the code we built up on this URL: https://github.com/vemonet/setup-spark/blob/main/src/setup-spark.ts#L44 not sure if it has changed since we

That makes me think maybe we should not use the archive.apache.org for the example in the readme though, might be misleading

@spagnoloe-amenitiz
Copy link
Author

spagnoloe-amenitiz commented Sep 20, 2024

Hi @spagnoloe-amenitiz

Downloading from https://archive.apache.org can be quite slow (but it has the best coverage in term of old versions), I would recommend to take a look there to find the officially recommended mirror: https://spark.apache.org/downloads.html

In the code we built up on this URL: https://github.com/vemonet/setup-spark/blob/main/src/setup-spark.ts#L44 not sure if it has changed since we

That makes me think maybe we should not use the archive.apache.org for the example in the readme though, might be misleading

Hi @vemonet,

Thanks for the quick reply. I tried removing the URL so that it would try to download from https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz, but apparently this binary is not available so it resorts to the archive again :(

image

I was trying to investigate what you mentioned on the mirrors, but couldn't figure out where to find this info. The archives are listed here, but could not find any details on the recommended mirror.

Do you know where is this info available? How can I specify the mirror then in the Github Action step?

Many thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants