Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe number convertion ? #1317

Closed
sbernard31 opened this issue Sep 30, 2022 · 4 comments
Closed

Safe number convertion ? #1317

sbernard31 opened this issue Sep 30, 2022 · 4 comments
Labels
discussion Discussion about anything

Comments

@sbernard31
Copy link
Contributor

sbernard31 commented Sep 30, 2022

I will use this issue as a kind of bookmark to store all information about number handling issues/questions in Leshan.

In a general way, our encoder/decoder have to deal with different kind of number encoding.

By default, we should be sure those conversions are done without any information lost.
We begin to do that in NumberUtil class but not yet fully done.

Maximize JSON/CBOR interoperability ?
Should we limit range and precision of numbers to IEEE 754 binary64 (double precision) to maximize interoperability by default ?

This would affect old JSON / SenML-JSON / SenML-CBOR encoding :

It seems this is a real world question in JSON world : FasterXML/jackson-databind#911

Some links about how to handle this kind of conversion safely :

Non exhaustive list of Leshan issue relative to that :

@sbernard31
Copy link
Contributor Author

sbernard31 commented Feb 8, 2024

More safe number conversion question with number attribute : #1583

@sbernard31
Copy link
Contributor Author

sbernard31 commented Jan 22, 2025

This issue aims to centralize very different problem related to number conversion.

In this comment, I talk only about : Maximize JSON/SenML-JSON interoperability following RFC

Should we limit range and precision of numbers to IEEE 754 binary64 (double precision) to maximize interoperability by default ? (See #916 for more details)

(probably SenML-CBOR is not concerned by that)

The 2 points from RFC are :

In the interest of avoiding unnecessary verbosity and speeding up processing, the mantissa SHOULD be less than 19 characters long, and the exponent SHOULD be less than 5 characters long.

This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
generally available and widely used, good interoperability can be
achieved by implementations that expect no more precision or range
than these provide, in the sense that implementations will
approximate JSON numbers within the expected precision. A JSON
number such as 1E400 or 3.141592653589793238462643383279 may indicate
potential interoperability problems, since it suggests that the
software that created it expects receiving software to have greater
capabilities for numeric magnitude and precision than is widely
available.

Note that : A double uses 52 bits for the mantissa, which translates to about 15-17 decimal digits of precision.
And a double can safely store integer in the range [-(2^53)+1, (2^53)-1].

If we want to strictly follow RFC recommendation, I understand that we should avoid to send number with too big mantissa or exponent in way we ensure it fit in a double. And also probably not accept those number.

The obvious benefit of that would be to increase interoperability.
Why not doing that could cause interoperability issue ?

  • We can easily imagine that several JSON libraries will handle number as double.
  • We can imagine that some LWM2M implementation doesn't really consider that question and delegate that to underlayer JSON library (and so handle number as double too).
  • We can imagine that some JSON libraries just follow RFC recommendation.

The major drawbacks of following those recommendation :

  • We will strongly lost the benefit to have different kind of number (float, integer, unsigned_integer) for different purpose
  • We will lost information/precision and limit range for integer and unsigned_integer.
  • This will be done silently and so user will not even be aware of that.

Of course, this will only happen for very large number (> 2^53 or <-2^53), so maybe only for a few use cases.
But when this happen, if user is not aware then this lost of precision can lead to serious issue. 🤷

So I see possible 3 mode :

  1. we convert number in double automatically (easily way to limit range and precision), with silent precision loss.
  2. we convert number in double automatically but we raise exception is conversion can not be done without loss
  3. we just encode number without precision loss (this is more or less current implemented mode)

Which of this mode should we implement ?
Which one should be the default one ?

My personal option, this option seems ok to me :

  • we only implement 3.
  • we implement 2 and 3 with 2 mode by default.
  • we implement 1, 2 and 3 with 2 mode by default.

Any opinion ?

@sbernard31
Copy link
Contributor Author

(Not directly linked to comment above : #1692 aims to detect Number To double precision loss in conversion)

@sbernard31
Copy link
Contributor Author

After some discussion with @jvermillard

Not so obvious that 1 and 2 is needed.

  1. is already implemented + we have some number conversion which try to raise suspicious conversion.
    Let's see if users report conversion issue later and reconsider the question if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion about anything
Projects
None yet
Development

No branches or pull requests

1 participant