Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement probability-based aim SR calculation for osu!standard to better assess length and miss penalty #30934

Draft
wants to merge 21 commits into
base: pp-dev
Choose a base branch
from

Conversation

Natelytle
Copy link
Contributor

@Natelytle Natelytle commented Dec 1, 2024

(This is not ready to be merged until further changes come about to the performance points system. I am PRing these changes as a draft to work out the code concepts before using it as part of a bigger system.)

osu!standard Star Rating and Miss Penalty Proposal

Colloquially known as "aim prob"

Initially developed by joseph-ireland, and further polished by me, I am here to propose a new system for assessing star rating, length, and the miss penalty for the aim skill. This PR is rather large, so I have split everything up to the best of my ability to ease review, and the description will focus not on the implementation itself, but rather an overview of the underlying concept.

With that in mind, we can begin.

Live star rating

How it works

The way star rating works in the current performance points system is rather arbitrary. The map is split into 400ms sections, also known as strain sections. Each section has notes within it, and each section's difficulty value is equivalent to the highest note difficulty within the section. After the strain sections have their difficulties calculated, they are sorted and the highest difficulty strains are reduced, before being resorted. Afterwards, the strains are summed geometrically, such that every following strain is multiplied by 0.9^i, with i being the index of the strain. The result of the sum is your star rating.

Problems

One of the issues with this system is that it is invariant with regards to the length of a map. The limit of the geometric sum is the difficulty of the highest strain multiplied by 10, so a map of any length would be capped to the same value if the highest strain remains the same difficulty. The time it takes for additional strains to add a negligible amount is very small too, with the hardest 10 seconds of a map contributing to 96% of the total star rating.

Another issue with this system is that not every note contributes to difficulty, because each strain section takes the highest note difficulty within itself. Because of this, offset changes or gradual speed increases mean different notes correspond to different strain sections, and can cause unwanted variance in SR, including a star rating increase as you decrease speed.

Length bonus

In order to account for the first issue with star rating, a map's performance points value, calculated directly from star rating, is multiplied by a formula depending on the object count. However, since the performance calculator cannot see the difficulties of the notes in a map, it does so with no regard for whether the note actually contributes significant difficulty to achieving a full combo.

            // Longer maps are worth more
            double lengthBonus = 0.95 + 0.4 * Math.Min(1.0, totalHits / 2000.0) +
                                 (totalHits > 2000 ? Math.Log10(totalHits / 2000.0) * 0.5 : 0.0);

            aimValue *= lengthBonus;

Because of this, it has caused a subsection of farm maps to rely on filler sections - long sections of low difficulty that pad the number of notes in a map and cause the length bonus to increase, buffing the relatively short amount of high difficulty. Historic examples of this include Sunglow, Tsukinami, and the Foreground Eclipse Songs Compilation.

The live miss penalty

How it works

It is calculated by dividing each note's difficulty by the star rating of the map, and adding each note's weight in an arbitrary function to a total. This total is the supposed number of difficult notes in the map, and is used to scale how harsh the miss penalty is.

Problems

The miss penalty used in the current system is a vast improvement over the previous, however it does come with faults of its own.

Since the miss penalty is completely arbitrary the whole way through difficulty calculation, there are no guarantees that the same miss count on a map will not decrease the performance awarded for a score as you hit more notes. The miss penalty was tweaked and balanced with preventing this in mind, however it remains possible in niche scenarios.

Along with this, a bigger issue is that, for a completely consistent map, doubling the length of the map and doubling the number of misses should theoretically either remain the same value, or become a higher value than the shorter map. However, upon testing, this doubling has been found to decrease the total value of a score.

A subjective issue I hold with the live miss penalty is that it is, at its core, subject to balancing. There is no ground truth as to how much a miss should be worth in live, so scores with misses may be worth too much or too little, and it is another source of variance within the performance points system.

The solution to star rating

In order to solve the problems with live star rating, we need to solve 2 issues: each note must contribute to star rating, and star rating must continue to increase as you add more difficulty. First, we need to re-contextualize what star rating is. My idea of (aim) star rating is that it should represent the skill level required to achieve a full combo on a map. As length increases, the skill level required to achieve a full combo should increase in turn.

In practice

In order to find skill level based on the note difficulties, you need some sort of metric to measure. To do this, we find the expected aim error of the player on each note using the difficulty of the note, and a variable for the skill level of the player. This aim error is referred to as deviation, and it is simply just difficulty / skill.

With the assumption that this aim deviation is normally distributed, we can get the probability of hitting a note using the error function. erf(difficulty / skill) is our resulting formula for the probability of hitting any individual note. However, how do we go from variable probabilities to the skill level?

To do this, we need some sort of total probability of achieving a full combo on a map. If we multiply all of the note probabilities, we get the total probability of achieving a full combo on the map, given a skill level. We assign a value to the probability we want to find the skill level a player can achieve a full combo with, which in this case is 2%.

We then use a root finder to iterate through skill levels until we find one where the product of every note's probability is 2%, and that is our skill level, which then becomes our star rating.

How does this solve the problems with star rating?

With every note contributing to the total probability according to its difficulty, the issues with strain sections shifting around based on the offset or the speed of the map are now gone. As well, every added note will decrease the total probability of achieving a full combo, even if slight. This means that length is accounted for in star rating, and the length bonus in the performance calculator is able to be removed.

This replaces the top strain section reduction as well, since a few high difficulty notes can have a low probability of being hit due to a lower skill level and still be above the 2% threshold for a full combo. Because of this concept, if you increase the probability you are looking for, you will reward maps with short spikes in difficulty, and if you decrease it you devalue them relative to longer, more consistent maps. This is the one part of this change that necessitates balance.

The new miss penalty

This concept, very amazingly, can be extended to not just full combos, but also any miss count. To do this, we use a Poisson Binomial, which is an extension of the more popular Binomial Theorem, which supports non-equal success probabilities. Computing an actual Poisson Binomial is incredibly slow, however, so we use an approximation based on the normal distribution.

We can input each probability at a certain skill level, and it allows us to get the probability of achieving a certain number of successes (a.k.a. hits) or better. Because of this, we can go from skill to miss counts as well, since miss counts are just the number of difficulties minus the number of successful hits.

Since we cannot get the note difficulties in the performance calculator, which we need to compute the skill for a certain miss count, we have to compute them beforehand in the difficulty calculator. This adds a bit of time to the star rating calculation, however the calculation still takes a fairly short amount of time.

How does this solve the problems with the live miss penalty?

Because this miss penalty is based on probabilities, along with adding notes always decreasing the probability of getting a full combo at a certain skill level, adding more notes also increases the number of misses you have a 2% chance of achieving or better. Therefore, skill needs to be higher to get the same miss count. What this means is for the same miss count, adding more notes always increases performance points, no exceptions.

As well, when you double a map, the Poisson Binomial outputs double the misses for the same skill level. This means that doubling the length of a map and doubling the number of misses returns at least the same skill level.

Finally, since the only formula required in the performance calculator is the one that relates the number of misses to the corresponding skill level we computed for it, no balancing needs to be done for the miss penalty, which is a subjective win in my eyes.

Optimizations

Binning difficulties

Since we have to iterate through every note multiple times in order to get the probabilities for each skill level, we would like to minimize the number of notes we have to iterate through.To do this, on sufficiently long maps we create 32 bins with equally spaced difficulties. We split each note into the 2 closest bins based on their difficulties, adding a portion to each bin's total count which adds up to 1. We can then compute the probability of each binned difficulty and raise the probability to the power of the number of notes that fall into that bin, which is equivalent to computing that probability the number of times we raise it to the power of.

Reducing miss penalty attributes

To save on attributes for the miss penalties, we pick a few skill levels that give a good range of misses, with more importance on higher skill levels, and compute the miss counts for them. We then fit a polynomial to these miss counts, with a hard constraint that the polynomial must give a 0% penalty at 0 misses and a 100% penalty at the maximum possible number of misses. We can pass the coefficients of the polynomial through to the performance calculator, and simply solve for the point of the polynomial between 0% penalty and 100% penalty that lines up with our miss count.

Testing shows this gives fairly good accuracy as a cubic polynomial, but another coefficient can easily be added for further accuracy if necessary.

@cihe13375
Copy link

I only have very limited idea about SR/pp calculation, so forgive me if I made some obvious mistakes.

What comes to my mind after reading the op is something like this
aaaa
Maybe the chance for me to hit a horizontal jump is only 10%, but I can throw away all 1s and only hit the 2s; as a result the algorithm would consider erf(difficulty / skill) to be ~50% instead of 10%, giving higher estimates of skill than what I actually have.

I think the more general issue is that the difficulty of a note depends on if the previous 1~2 (or more) notes are hit, so theoretically it's required to recalculate the difficulty of notes based on the place of misses (and maybe estimate the place of misses base on the difficulty in an iterative manner). Though maybe it can be approximately fixed by, let's say, double the # of misses when calculating pp?

@Natelytle
Copy link
Contributor Author

The example you provided is a problem with any miss penalty system really, unless you assume that the player attempts to aim every note, it can theoretically be easier to skip notes.

Accounting for this is quite hard, impossible even, since it would require you to compute the skill level required to FC every single combination of hit and missed notes to see if you can get a lower average misscount, which for a map with only 100 notes would result in a stupidly high number of possibilities.

image

However, this isn't really a big deal since skipping notes intentionally is very rarely done in actual plays for an advantage PP wise. This exact same problem exists in the current PP system too, so it's not a new problem either.

@smoogipoo smoogipoo changed the base branch from master to pp-dev December 18, 2024 14:09
# Conflicts:
#	osu.Game.Rulesets.Osu/Difficulty/OsuDifficultyAttributes.cs
#	osu.Game.Rulesets.Osu/Difficulty/OsuPerformanceCalculator.cs
#	osu.Game.Rulesets.Osu/Difficulty/Skills/Aim.cs
#	osu.Game/Rulesets/Difficulty/Utils/DifficultyCalculationUtils_ErrorFunction.cs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants