Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decimal('0.0000005').quantize(Decimal('1.1111111')) -> Decimal('5E-7') # should be unchanged #128373

Closed
michael-db opened this issue Dec 31, 2024 · 16 comments
Labels
docs Documentation in the Doc dir

Comments

@michael-db
Copy link

michael-db commented Dec 31, 2024

Bug report

Bug description:

import decimal
Decimal = decimal.Decimal
Decimal('0.000005').quantize(Decimal('1.111111')) # Decimal('0.000005')  correct
Decimal('0.0000005').quantize(Decimal('1.1111111')) # Decimal('5E-7') wrong

According to the documentation, https://docs.python.org/3/library/decimal.html#decimal.Decimal.quantize
quantize() should "Return a value equal to the first operand after rounding and having the exponent of the second operand."
Clearly, it is not respecting the exponent of the second operand, even if that exponent is given explicitly:

Decimal('0.0000005').quantize(Decimal('1.1111111e0')) # Decimal('5E-7') wrong

CPython versions tested on:

3.13

Operating systems tested on:

Linux

@michael-db michael-db added the type-bug An unexpected behavior, bug, or error label Dec 31, 2024
@michael-db
Copy link
Author

More directly, the immediate problem is

Decimal('0.0000005')  # Decimal('5E-7')

But there is already a normalize() function for that if I wanted it

Decimal('0.0000005').normalize()

So, in this case of leading zeroes, normalize() has been rendered superfluous, while quantize() has been rendered useless.

If this limitation is not to be corrected then at a minimum it needs to be documented.

@skirpichev
Copy link
Member

skirpichev commented Dec 31, 2024

I suspect you misunderstand which "exponent" referenced in docs here, assuming it's just zero in both cases. It's not. You can use as_tuple() method to check exponent value. Then, results seems correct for me:

>>> Decimal('0.0000005').quantize(Decimal('1.1111111'))
Decimal('5E-7')
>>> _.as_tuple()[-1] == Decimal('1.1111111').as_tuple()[-1]
True
>>> Decimal('0.000005' ).quantize(Decimal('1.111111' ))
Decimal('0.000005')
>>> _.as_tuple()[-1] == Decimal('1.111111' ).as_tuple()[-1]
True

So, I don't think there is an issue.

Edit: For me it seems that from the documentation is clear which exponent referenced (and it's not one, shown in repr output or provided by you in string form of the Decimal constructor). So, I doubt there is a documentation issue as well.

But there is already a normalize() function for that

What do you mean by "that"? The normalize() function is to get the canonical representation for a Decimal.

@skirpichev skirpichev added the pending The issue will be closed if no feedback is provided label Dec 31, 2024
@michael-db
Copy link
Author

"No issue"? If the exponent referred to is the internal exponent, not that appearing in the representation, then the documentation at least needs to make that clear. The example doesn't, and I am not the first person to find this quite misleading https://stackoverflow.com/questions/61704496/ Documentation is supposed to be for those unfamiliar with a module, not those who already know.

Like the person asking that question, I wanted control over the representation. The normalize() and quantize() methods appear to provide that control within the decimal module, and would be more useful if they did.

@skirpichev
Copy link
Member

If the exponent referred to is the internal exponent

It's not "internal", this value is available via public interface. And you also can construct a Decimal object from tuple.

then the documentation at least needs to make that clear

Just don't mix the exponent value for a Decimal object with it's string representation. Current documentation seems clear for me in this aspect.

You can take look on _pydecimal.Decimal.__str__ method to see how the exponent value used to form a string representation. Fixed point or scientific notation chosen based on the exponent value. But if the number formatted in a fixed point notation - that doesn't mean it has zero exponent.

I wanted control over the representation.

It's not clear what do you mean. Probably, you want to control a string representation. In this case you should look on string formatting (str.format() method and f-strings).

@michael-db
Copy link
Author

Thank you for your reply, but I must say it again: Documentation is supposed to be for those unfamiliar with a module, not those who already know.

An example making clear that the exponent referred to was the internal one, together with a link to the as_tuple() method for what is meant by "exponent" would solve the problem as far as the documentation is concerned.

That would be more productive than arguing about words like "internal" (what you call it is entirely irrelevant, the fact is that it is a distinct exponent from the representative string) and disregarding the evidence that those unfamiliar with the module have found the documentation misleading on this point.

@skirpichev
Copy link
Member

the exponent referred to was the internal one

Again, it's not "internal". It's the exponent value, referred in many other places across the docs. You mistakenly interpreted the "exponent" here as "the exponent in scientific notation, when some Decimal represented this way in a string form". Could you please explain how did you come to this interpretation from the docs?

There is a link to the decimal arithmetic specification, where all terms are defined. If you are unfamiliar with the floating-point arithmetic (not with the module!) - it's a good source to start with. Or just read Wikipedia article: https://en.wikipedia.org/wiki/Floating-point_arithmetic

together with a link to the as_tuple() method for what is meant by "exponent" would solve the problem

Unfortunately, this "problem" span many other methods. I don't think we should clutter documentation with examples, that should be obvious to a careful reader.

IMO, sphinx docs for the decimal module already have examples, showing the difference between string representation and exponent notion. For instance, from constructor docs:

If value is a tuple, it should have three components, a sign (0 for positive or 1 for negative), a tuple of digits, and an integer exponent. For example, Decimal((0, (1, 4, 1, 4), -3)) returns Decimal('1.414').

But lets keep this for a while, maybe someone else will see how docs should be improved.

@michael-db
Copy link
Author

If you like Wikipedia, try reading this page about a certain cognitive bias in evidence here: https://en.wikipedia.org/wiki/Curse_of_knowledge It explains why, as a general principle, you should value feedback from reviewers instead of blaming them for being careless.

On a "careful reading", an error in the documentation for to_eng_string() would be obvious, yet you have not spotted it. Perhaps that example could lead you to have a little more empathy for readers. It is also one of several places where the word "exponent" is used to mean exactly what people expect it to mean, i.e., whatever comes after the 'E'.

Clutter: This is a poor excuse. With a little imagination, no clutter is required. If you really need to limit yourself to a single example, you could choose one that avoids misunderstanding. If you need to use the word "exponent" to refer to two different things, then you could add a one-word qualifier to avoid misinterpretation. If you have a peculiar objection to the word "internal" then make up another one and define it.

Arguing with someone whose default response is to put all their effort into denying that there is a problem and making excuses for not fixing it is not a good use of my time. I won't be commenting further.

@skirpichev
Copy link
Member

If you like Wikipedia

Not at all. But it's enough to provide you some basic information about floating-point arithmetic.

an error in the documentation for to_eng_string() would be obvious

If you found an error in docs - feel free to open a bug.

It is also one of several places where the word "exponent" is used to mean exactly what people expect it to mean, i.e., whatever comes after the 'E'.

I don't sure you are right about people expectations. People, who read something such basic as https://en.wikipedia.org/wiki/Floating-point_arithmetic - will assume very different meaning for exponent in context of floating-point arithmetic.

Are other places also related to string representation of Decimal's?

If you need to use the word "exponent" to refer to two different things, then you could add a one-word qualifier to avoid misinterpretation.

Can you propose some concrete suggestion?

@skirpichev skirpichev added docs Documentation in the Doc dir and removed type-bug An unexpected behavior, bug, or error labels Dec 31, 2024
@terryjreedy
Copy link
Member

@rhettinger

@rhettinger
Copy link
Contributor

ISTM that the docs appropriately mirror the wording in the specification for quantize.

Likewise, the intro section of the decimal docs do a reasonably good job of explaining the concept that Decimal('1.1111111') means 11111111 × 10⁻⁷ with -7 being the exponent — not as an internal concept but as the core concept behind the entire decimal arithmetic specification. The intro section also includes two typical examples of how to use quantize().

Further along, the detailed docs for quantize give a useful and clear example demonstrating that the meaning of Decimal('1.000') is 1000 × 10⁻³ with -3 as the exponent.

I recommend closing this as not a bug. The current docs seem reasonable to me as well. Wording tweaks would take just us farther from the specification but not actually help a new reader with strong preconceptions and who may be starting in the middle of the module docs.

@skirpichev
Copy link
Member

behind the entire decimal arithmetic specification

Not just decimal, but a floating-point.

I recommend closing this as not a bug.

Yep, I'll do this.

One point is that we could add to the glossary term like "floating-point" / "floating-point arithmetic" with some introduction for basic concepts (exponent, significand, etc) and then use it in some places across the docs. But I'm not sure if it worth.

@skirpichev skirpichev closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2025
@skirpichev skirpichev removed the pending The issue will be closed if no feedback is provided label Jan 3, 2025
@mdickinson
Copy link
Member

@skirpichev

Not just decimal, but a floating-point.

That would be nice, but the reality is unfortunately messier.

The IEEE 754 standard uses two different conventions, switching between them as convenient. The value of the exponent for any given floating-point number depends on whether the significand is being interpreted as an integer or whether it's interpreted as having one bit (or digit) before the binary (or decimal) point and the remaining bits (or digits) following the point. Somewhat frustratingly, the standard just uses the word "exponent" in both cases without any qualifying adjective, but it disambiguates via the notation used: the text uses 'e' to denote an exponent for the digit+fraction significand interpretation, and 'q' for the exponent when the significand is interpreted as an integer. So for example for the IEEE 754 binary64 format that CPython now expects, the maximum "e" exponent is 1023, while the maximum "q" exponent is 971.

By way of example, here's sys.float_info.max represented using the two different conventions - first with an "e" exponent of 1023, then with a "q" exponent of 971:

>>> float.fromhex("1.fffffffffffff") * 2**1023
1.7976931348623157e+308
>>> 0x1f_ffff_ffff_ffff * 2.**971
1.7976931348623157e+308
>>> sys.float_info.max
1.7976931348623157e+308

Here's the definition of "exponent" from section 2.1 of the IEEE 754 standard:

The component of a finite floating-point representation that signifies the integer power to which
the radix is raised in determining the value of that floating-point representation. The exponent e is used
when the significand is regarded as an integer digit and fraction field, and the exponent q is used when the
significand is regarded as an integer [...]

To add to the potential for confusion, the C standards (see e.g., §5.2.4.2.2 of the C17 standard) use a different convention again, interpreting all the bits (or digits) of the significand as lying to the right of the binary (or decimal) point. With that convention, the maximum exponent for IEEE 754 binary64 is 1024 rather than 1023. That's why sys.float_info.max_exp is 1024, rather than 1023 or 971, and why the exponent returned by math.frexp for any given finite nonzero float x is one larger than the exponent that would be returned by an implementation of IEEE 754's logB operation.

It would be nice if there were standard language for describing the two (three, including the C standard) different meanings of exponent, but I'm not aware of anything widespread or standardised.

@mdickinson
Copy link
Member

It would be nice if there were standard language for describing the two [...]

FWIW, for the specific case of the (rather niche) standard that Python's decimal module is based on, the wording "adjusted exponent" is used for what I've been calling the "e" exponent above (and the Decimal.adjusted() method gives direct access to that value), while "exponent" refers to the "q" exponent. But that use of "adjusted exponent" doesn't seem to be standard elsewhere.

@skirpichev
Copy link
Member

That would be nice, but the reality is unfortunately messier.
The IEEE 754 standard uses two different conventions, switching between them as convenient.

It's true, but something that probably might be mentioned breefly we a link to more complete source (like referenced wikipedia article, which does mention this).

To add to the potential for confusion, the C standards

And the MPFR.

Maybe confusion is less possible for floats and instead we should just inline the relevant part of the specification or just link to that in the decimal intro section. E.g., right at beginning:

Decimal “is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.” – excerpt from the decimal arithmetic specification.

BTW, this is not the first bugreport I've seen, where people are trying to do strange things with Decimal's when they actually want some formatted string output. Docs lacks any examples (here too.). Maybe it worth to add something in the "quick-start" section? (Fixed-point/scientific, precision, effect of the context's rounding mode.)

@ericvsmith
Copy link
Member

BTW, this is not the first bugreport I've seen, where people are trying to do strange things with Decimal's when they actually want some formatted string output. Docs lacks any examples (here too.). Maybe it worth to add something in the "quick-start" section? (Fixed-point/scientific, precision, effect of the context's rounding mode.)

Yes, I think this is a good idea.

@skirpichev
Copy link
Member

FYI: #128698 adds a reference to spec and some formatting examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
Status: Todo
Development

No branches or pull requests

6 participants