Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress multiple run-time uniforms into one register. #72

Open
nomaddo opened this issue Mar 28, 2018 · 5 comments
Open

Compress multiple run-time uniforms into one register. #72

nomaddo opened this issue Mar 28, 2018 · 5 comments
Assignees
Labels
enhancement optimization related to an optimization step

Comments

@nomaddo
Copy link
Collaborator

nomaddo commented Mar 28, 2018

The number of registers in VC4 is very limited.
One of the technique to avoid this, is compression of constants from uniform into one register.

For example, mov r0, uniform fill in all of element of vector-register r0.
If each element of r0 hold a different value, we can save registers.
So we use vector-rotation with broadcast.

Things we should do is,

  • (decide which constants should be packed into a register)
  • Load constants with specific setflags
  • Load a constant from a reigster (which hold different values in each elements) using vector-rotate (see P20 Horizontal Vector Rotation in VideoCore IV Architecture Reference Guide).
@doe300
Copy link
Owner

doe300 commented Mar 29, 2018

Although this is definitively possible to save registers, I don't think this is the direction we should go in.
The problem is that loading a constant takes a single instruction while rotating registers takes at least 2 to 3.

Also, are you talking about compressing constants (loaded via loadi) or parameters (read from uniform)?

@nomaddo
Copy link
Collaborator Author

nomaddo commented Mar 30, 2018

Sorry for confusing explanation. I am talking about constants from uniform.
Yes, getting values from compressed values are expensive than usual as you said.
For constants (loaded by ldi or mov), these should not be applied this optimization.

@doe300 doe300 changed the title Compress multiple constants into one register. Compress multiple run-time uniforms into one register. Mar 30, 2018
@doe300 doe300 added the optimization related to an optimization step label Apr 7, 2018
@doe300 doe300 self-assigned this Apr 13, 2018
@nomaddo
Copy link
Collaborator Author

nomaddo commented Apr 13, 2018

I think we can save instructions to store values from uniform.
It's enough to spend two instructions to store and compress the value from uniform.

https://github.com/nineties/py-videocore/blob/master/examples/sgemm_1thread.py#L37

This uses "load imm per-elmt signed" (page 33 in VC4C reference manual).

@doe300
Copy link
Owner

doe300 commented Apr 13, 2018

I never used the other versions of loadi, since I do not exactly know what they do.
Do I understand it correctly, that the lower half of the value loaded a bit-mask is as to which element to write?

Then:

loadelem.setf out, 1 << index

would be equivalent to

xor.set out, index, elem_num

@nomaddo
Copy link
Collaborator Author

nomaddo commented Apr 16, 2018

If all bits of Per-element MS (sign) bit are zero, your are right.
Come to think of it, If you want to set a flag of only 1-bit, we can use just isub like isub.set -, element_number, 2 and or.jzs out, unif, unif.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement optimization related to an optimization step
Projects
None yet
Development

No branches or pull requests

2 participants