Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeastExpensive Market Strategy not using number of threads of providers #1147

Open
RoyKas opened this issue Aug 26, 2023 · 7 comments
Open

Comments

@RoyKas
Copy link

RoyKas commented Aug 26, 2023

Below section of leastexpensive.py is not using the number of threads of the provider for the expected usage. This results in incorrect selection of providers.
Both the job duration and consumed cpu time are set to the same 'expected_usage', which is 60 seconds by default. This is only correct in case the provider is using a single thread. For higher number of threads, the consumed cpu time should be the same multiple of the job duration.
Either the job duration should be divider by the number of threads, or the consumed cpu time would need to be multiplied by the number of threads.
The first option will favor providers with higher number of threads, the second option will favor provider with lower number of threads.
In case both are undesirable, a third option is to adjust both using the square root of the number of threads, or some other in-between solution.

    expected_usage = []

    for resource in linear.usage_vector:
        if linear.price_for[resource] > self._max_price_for[resource]:
            self._logger.debug(
                "Rejected offer %s: price for '%s' higher than price cap %f.",
                offer.id,
                resource,
                self._max_price_for[resource],
            )
            return SCORE_REJECTED

        if linear.price_for[resource] < 0:
            self._logger.debug("Rejected offer %s: negative price for '%s'", offer.id, resource)
            return SCORE_REJECTED

        expected_usage.append(self._expected_time_secs)
@RoyKas RoyKas added the bug Something isn't working label Aug 26, 2023
@krunch3r76
Copy link
Contributor

krunch3r76 commented Sep 18, 2023

first and foremost, thanks for flagging this - such observations are crucial for the community to evolve and refine the project.

the essence of CPU time is that it aggregates the compute cycles used, regardless of the number of cores at play. even when multiple cores are working in parallel, their individual compute efforts, when tallied, present a consistent picture i.e. add up to the same total CPU time. put plainly, CPU time is divided across all the cores. these references can provide more granularity:

[1] CPU time formula
[2] CPU Second definition

i'm just another contributor like you, aiming to bring more clarity and efficiency to the project. your insights are invaluable, and i appreciate the time you took to articulate this.

@shadeofblue
Copy link
Contributor

@RoyKas well, the default strategy is by necessity a simplification and does take liberty to err on one side or another in those specific cases that you described...

The assumption here is that given the fact that it's hard to arrive at an uncomplicated solution that covers all bases - as you have observed yourself - we're providing something that tries to give a more-or-less sensible score without differentiating between single-threaded vs multi-threaded tasks ...

As an app developer, you're free to implement a strategy that will give a better score for a scenario that you know you will be running...

@shadeofblue shadeofblue added not a bug wontfix This will not be worked on and removed bug Something isn't working wontfix This will not be worked on labels Sep 19, 2023
@krunch3r76
Copy link
Contributor

krunch3r76 commented Sep 19, 2023

expected_time_secs is fundamental to the scoring function when comparing offers, as it essentially serves as the sole scaling factor for CPU time and duration. to understand expected_time_secs role, let's examine relevant code snippets:

in the code:

for resource in linear.usage_vector:
if linear.price_for[resource] > self._max_price_for[resource]:
self._logger.debug(
"Rejected offer %s: price for '%s' higher than price cap %f.",
offer.id,
resource,
self._max_price_for[resource],
)
return SCORE_REJECTED
if linear.price_for[resource] < 0:
self._logger.debug("Rejected offer %s: negative price for '%s'", offer.id, resource)
return SCORE_REJECTED
expected_usage.append(self._expected_time_secs)
# The higher the expected price value, the lower the score.
# The score is always lower than SCORE_TRUSTED and is always higher than 0.
score = SCORE_TRUSTED * 1.0 / (linear.calculate_cost(expected_usage) + 1.01)

expected_usage is populated with the value of expected_time_secs corresponding to the linear_coeffs, which are duration and cpu, where order is unimportant. this list is then passed to the calculate_cost method of the ComLinear class:

def calculate_cost(self, usage: List):
usage = usage + [1.0] # append the "usage" of the fixed component
return sum([c * usage[i] for (i, c) in enumerate(self.linear_coeffs)])

here, the first two coefficients from linear_coeffs are multiplied by its corresponding value in expected_usage, which, in this context, is a measure of CPU seconds scaled by default 60 to 1, duration is scaled for 60 to 1, and start 1 to 1. the results are summed.

after tracing the logic, i see your point that more threads should give a lower score when duration is less since expected time would be relatively less for duration vs cpu time. however, this strategy does not distinguish the two. as @shadeofblue said, a custom strategy, modeled after this one, would need to be developed.

and even scoring 1 core providers, clock speed may differ. then even multithreaded comparisons might need to factor clock speed. would 8 threads at 3GHz finish any faster than 4 threads at 6GHz, for example. . .

thanks for reading along, great discussion!

@RoyKas
Copy link
Author

RoyKas commented Sep 23, 2023

The point I'm trying to make is that the usage vector is populated with _expected_tim_sec, both for duration and for cpu.seconds, and therefore it implies a single core/threaded processor. For processors with more cores/threads the pricing for duration and for cpu-seconds will be weighed differently, e.g. cpu.seconds will be approximately an integer multiple of the duration, assuming the task is fully loading the processor.
I see that a solution will not be straightforward, but the code should be such that it is clear to requestors that this default market strategy is using this simplification.

@krunch3r76
Copy link
Contributor

krunch3r76 commented Sep 24, 2023

good point. the docstring could make it explicit that the parameter expected_time_sec is a scaling factor for both env/hr and cpu/hr. perhaps if someone, like you or myself, were to add that to the docstring and submit a pull request, the issue would be closed as satisfactorily resolved when merged - after comments and subsequent revisions perhaps?

@RoyKas
Copy link
Author

RoyKas commented Oct 15, 2023

I haven't done any such code changes or pull requests before, would you be willing to take this on you?

@krunch3r76
Copy link
Contributor

I haven't done any such code changes or pull requests before, would you be willing to take this on you?

good to know that would resolve the issue! i will see what i can come up with in the near future! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants