Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to select appropriate number of bits for precision? #9

Open
danieleades opened this issue Mar 17, 2022 · 4 comments
Open

How to select appropriate number of bits for precision? #9

danieleades opened this issue Mar 17, 2022 · 4 comments

Comments

@danieleades
Copy link
Contributor

How do you select an appropriate number of bits to use for precision?

Related stack overflow question - https://stackoverflow.com/questions/71509576/what-is-the-optimum-precision-to-use-in-an-arithmetic-encoder

@danieleades
Copy link
Contributor Author

from my own reading and experimentation, i'm finding that-

  • the precision bits must be at least 2 greater than the number of bits needed to represent that probability of each symbol
  • using more precision bits results in a better compression ratio
    • the gains are very minimal and saturate quickly
  • the total number of bits is constrained by the width of the type used (say u32 or u64)

haven't investigated performance implications.

@cgbur
Copy link
Owner

cgbur commented Mar 17, 2022

Check out the “Constraints on the Implementation” section from the paper referenced in the read me. It discusses choosing values and limitations of precision.

https://web.stanford.edu/class/ee398a/handouts/papers/WittenACM87ArithmCoding.pdf

@danieleades
Copy link
Contributor Author

Thank you for the reference! Unfortunately that article doesn't add much that's new-

  • the precision bits must be at least 2 greater than the number of bits needed to represent that probability of each symbol
  • more bits is better
  • there are some 'special' numbers which lend themselves to faster computation (ie 16)

still doesn't say anything about the rate of trade-off between precision and other factors. For example, why not use u64 instead of u32? or even u128? In what regimes do the trade-offs matter?

For my own implementation i've chose to set the precision automatically (to the maximum value), and panic if there's not enough bits to guarantee no overflow/underflow. Considering also making the integer representation a generic, so you could effectively choose u32 or u64.

I'm also interested in experimenting to see whether this can be checked at compile time using const generics... (hopefully without polluting the API too badly)

see https://github.com/danieleades/arithmetic-coding/pull/7/files#diff-9a5ef4cd6ac7a66abd842b474ee88447516ed3bd218748eb6f387a0895d14fcb

@cgbur
Copy link
Owner

cgbur commented Jan 28, 2023

Hey sorry I went so absent on this project have not been keeping up. I just want to drop a link to this https://github.com/Cyan4973/FiniteStateEntropy If you are still interested in lossless probability symbol coders. Very very fast and the ANS version is as good as AC but much faster. AFAIK lacks updating symbol table per encode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants