Multi dense layer #1905

APJansen · 2024-01-09T11:51:45Z

Main idea: `MultiDense`

As discussed already in several places, the point of this PR is to merge the multiple replicas in the tightest way possible, which is at the level of the Dense layer, here implemented as a MultiDense layer.

The essence is this line and the lines around it. We extend the layer's weights from shape (in_units, out_units) to shape (replicas, in_units, out_units), or (r, f, g) for short.
The initial input at the first layer does not have a replica axis, its shape is (batch, n_gridpoints, features).
In this case the linked lines become einsum("bnf, rfg -> brng").
Every layer thereafter will have a replica axis. This simply adds an "r" to the first term in the einsum, to give einsum("brnf, rfg -> brng"). What this does is it uses the weights of replica i on the ith component of the input, that is on the previous layer's output corresponding to the same replica i. So it acts identically to the previous case, just more optimized.

Weight initialization: `MultiInitializer`

After all the refactorings before this, it is quite simple to initialize the weights in the same manner as is done now. A list of seeds is given, one per replica, along with an initializer which has a seed of its own, that the per replica seeds get added to. (So we can differentiate the different layers). A custom MultiInitializer class takes care of resetting the initializer to a given replica's seed, creating that replica's weights, and stacking everything into a single weight tensor.

Note that many initializers' statistics depend on the shape of the input, so just using a single initializer out of the box not only will give different results because it is seeded differently, it will actually be statistically different.

Dropout

Naively applying dropout to multi replica outputs will not consistently mask an equal fraction of each replica.

A simple and sufficient solution is to define dropout without the replica axis, and just broadcast to the replica dimension.
This is actually sort of supported already, you can subclass the Dropout layer and override the method _get_noise_shape, putting a None where you want it to broadcast.

Note that while this would turn off the same components in every replica, there is no meaning or relation to the order of the weights, so that should be completely fine.

Update: Actually, this is not necessary at all. What I thought was that dropout always sets a fixed fraction to zero, but actually it works individually per number, so it is completely fine to use the standard dropout.

Integration

I'm not sure what the best way of integrating this into the existing framework is, what I've done now is to create an additional layer_type, "multi_dense", that will have to be specified in the runcard to enable this. Previous behaviour with both layer_type="dense" and layer_type="dense_per_flavour" should be unaffected, the overhead to keep it like that is managable.

The upside of course is that if later changes become too complicated with this layer, you can always go back to the standard one.
The downside though is that it creates yet another code path, and everything will have to be tested separately.

Alternatively it could just replace the current "dense" layer type entirely, not sure if there is a nice middle ground.

Update After discussing briefly with Roy, we agreed it's not necessary to keep the old dense layer. Later I saw that actually it kind of is, as that is used under the hood in "dense_per_flavour" as well. So I have renamed that into "single_dense", and the new layer here as just "dense".

Tests

I have two unit tests, one shows that weight initalization is identical to standard dense layers. The second shows that the output on a test input is the same, up to what I think are round off errors.

Currently the CI is passing almost completely, with the only exception of a single test in python 3.11, a regression test, where one of the elements has a relative difference of 0.015, which is bigger than the tolerance of 0.002.
I assume this is just an accumulation of round off differences, I have no idea what else it could be.

Comparison with new baseline: https://vp.nnpdf.science/FGwCRitnQhmqBYBjeQWS7Q==/

Timings

I have done some timing tests on the main runcar (NNPDF40_nnlo_as_01180_1000), on Snellius, with 100 replicas on the GPU or 1 replica on the CPU. For completeness I'm also comparing to an earlier PR which made a big difference on performance, and the state of master just before that was merged. I still need to run the current master on the GPU.

branch	commit hash	1 replica	100 replicas	diff 1	diff 100
master	`5eebfba`	96	860	0	0
replica-axis-first	`8cbe0cf`	96	505	-1	355
current master	`f40ddd9`	116	??	-19	??
multi-dense-layer	`d8f28ff`	112	304	3	201

Status:

I need to do full fit comparisons, apart from that it's ready for review.

github-actions · 2024-01-11T16:26:03Z

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Fit Name: NNBOT-7a5b3663b-2024-01-11
Fit Report: https://vp.nnpdf.science/vpLyXtu_RIWKhtIXbCa4-A==
Fit Data: https://data.nnpdf.science/fits/NNBOT-7a5b3663b-2024-01-11.tar.gz

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

Co-authored-by: Juan M. Cruz-Martinez <[email protected]>

…e-layer.

github-actions · 2024-02-16T18:09:04Z

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Fit Name: NNBOT-1c97e2a73-2024-02-16
Fit Report: https://vp.nnpdf.science/NS-XcH4NS2m1sHCRiKOmGA==
Fit Data: https://data.nnpdf.science/fits/NNBOT-1c97e2a73-2024-02-16.tar.gz

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

RoyStegeman

Please update the reference fit for the fitbot report with NNBOT-1c97e2a73-2024-02-16

scarlehoff · 2024-02-17T14:07:06Z

tbh, having regression tests for "exactly the same fit" we might want to keep the fitbot fixed between tags (unless big changes happen) as to show the cumulative change...

We should discuss this point in AMS

RoyStegeman · 2024-02-17T15:32:32Z

Putting it that way there are two use-cases for the fitbot:

checking the cumulative change between tags
checking the impact of a single PR (which in some cases we want to be 0 or otherwise extremely small, not just statistically equivalent)

Given that to assess point one we will most likely end up looking at a global fit anyway while using the latest published version as a reference, I think the fitbot provides more value for point 2.

We could of course also ask the bot for two reports (assuming that doesn't break the github action time constraints)...

We can indeed discuss in AMS and leave it for now

scarlehoff · 2024-02-17T16:55:20Z

We could of course also ask the bot for two reports (assuming that doesn't break the github action time constraints).

tbh, this is key. There was a time when we were hitting the time constrain (which is 6 hours I think?) and the bot is the maximum we could get away with under that time. Now it takes about one hour so we can safely add more reports.

Radonirinaunimi

I don't have further comments in addition to those that have been raised above. It looks great to me!

APJansen · 2024-02-19T06:59:30Z

Great, thanks for the reviews everyone, so we leave the fitbot as is for now and I can merge this?

Btw, about the redo-regressions workflow, it's not working perfectly, in that it creates a new commit but doesn't trigger the other tests again, so if it's the final commit the PR cannot be merged. (Here I put the label on while the tests were just starting, and they do continue but you have to go to the previous commit to see them). Not sure what the best solution is, but the simplest which I did here is just to make another (trivial) commit.

scarlehoff · 2024-02-19T09:14:01Z

I think it is fine to force merge the PR provided the previous checks all passed other than the regression. By the time the regression label is used the PRs should be well tested.

RoyStegeman · 2024-02-19T10:07:55Z

The bot probably doesn't have the right privileges, similar to people who are not members of the NNPDF github organisation. A solution would probably be to create a token with those privileges and allow the github action to use that when pushing.

scarlehoff · 2024-02-19T10:08:53Z

No, I think github actions cannot trigger more actions by design.

RoyStegeman · 2024-02-19T10:13:39Z

Yes because it doesn't have the permissions. I think something like this fine-grained-personal-access-token might solve it. That may make things more risky if we're not careful

scarlehoff · 2024-02-19T10:48:31Z

I think we can simply add a on_workflow_call option and do it manually the same that the label is added manually...

Yes because it doesn't have the permissions.

Maybe it has changed or I'm misremembering, but I think at some point it was simply not possible because it could easily cause an infinite recursion.

RoyStegeman · 2024-02-19T11:23:39Z

I see, perhaps you're right, I didn't look that far into it.

APJansen force-pushed the parallel-prefactor branch from c650cf3 to 5453531 Compare January 9, 2024 14:42

APJansen force-pushed the multi-dense-layer branch from 59a345a to 1375daa Compare January 9, 2024 14:43

APJansen mentioned this pull request Jan 9, 2024

Multi Replica PDF #1880

Closed

APJansen self-assigned this Jan 9, 2024

APJansen added n3fit Issues and PRs related to n3fit escience labels Jan 9, 2024

APJansen requested review from scarlehoff, RoyStegeman and Radonirinaunimi January 9, 2024 15:25

APJansen force-pushed the parallel-prefactor branch from c11acbf to 997451b Compare January 10, 2024 15:15

APJansen force-pushed the multi-dense-layer branch from 1bfae4e to 5e99568 Compare January 10, 2024 15:15

APJansen mentioned this pull request Jan 10, 2024

Merge prefactors into single layer #1881

Merged

3 tasks

APJansen force-pushed the multi-dense-layer branch 2 times, most recently from e0fc2e1 to 472b674 Compare January 11, 2024 10:50

APJansen added the run-fit-bot Starts fit bot from a PR. label Jan 11, 2024

APJansen force-pushed the parallel-prefactor branch from ae40438 to e44a723 Compare January 11, 2024 12:07

APJansen force-pushed the multi-dense-layer branch from c99fa13 to b10ae0e Compare January 11, 2024 12:13

APJansen added run-fit-bot Starts fit bot from a PR. and removed run-fit-bot Starts fit bot from a PR. labels Jan 11, 2024

APJansen force-pushed the parallel-prefactor branch from e44a723 to 25de95c Compare January 17, 2024 09:55

APJansen force-pushed the multi-dense-layer branch from 0f587a2 to 341e39d Compare January 17, 2024 09:55

APJansen mentioned this pull request Jan 19, 2024

Pre-merge branch for batch of changes #1913

Merged

APJansen force-pushed the multi-dense-layer branch from 341e39d to 93a5636 Compare January 23, 2024 12:34

APJansen force-pushed the parallel-prefactor branch from 9bcb73f to 8810e11 Compare January 26, 2024 08:30

APJansen force-pushed the multi-dense-layer branch from c07a49b to 4df7e69 Compare January 26, 2024 09:06

Base automatically changed from parallel-prefactor to master January 26, 2024 09:59

APJansen force-pushed the multi-dense-layer branch 3 times, most recently from 16f8d53 to 4dca563 Compare January 26, 2024 11:35

APJansen and others added 9 commits February 16, 2024 17:25

Rewrite tensordot

72f7acc

Add comment on einsum vs tensordot

04c4361

Rewrite replica_input to is_first_layer

835f57d

Simplify layer generation code

6499e60

Add comment on self.matmul

1e335a3

Co-authored-by: Juan M. Cruz-Martinez <[email protected]>

Clarify MultiDense docstring

c797268

Clarify possible layer types

fdb5117

Clarify role of is_layer_type

8d9bd42

Change NNs to all_NNs in attribute layer_names

234efa0

APJansen force-pushed the multi-dense-layer branch from f6529ab to 234efa0 Compare February 16, 2024 16:25

APJansen added run-fit-bot Starts fit bot from a PR. redo-regressions Recompute the regression data labels Feb 16, 2024

Automatically regenerated regressions from PR 1905, branch multi-dens…

eb57035

…e-layer.

trivial commit to rerun tests

f50f27a

RoyStegeman approved these changes Feb 17, 2024

View reviewed changes

Radonirinaunimi approved these changes Feb 18, 2024

View reviewed changes

APJansen merged commit 9360153 into master Feb 19, 2024
8 checks passed

APJansen deleted the multi-dense-layer branch February 19, 2024 10:05

scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi dense layer #1905

Multi dense layer #1905

APJansen commented Jan 9, 2024 •

edited by scarlehoff

Loading

github-actions bot commented Jan 11, 2024

github-actions bot commented Feb 16, 2024

RoyStegeman left a comment

scarlehoff commented Feb 17, 2024

RoyStegeman commented Feb 17, 2024

scarlehoff commented Feb 17, 2024 •

edited

Loading

Radonirinaunimi left a comment

APJansen commented Feb 19, 2024

scarlehoff commented Feb 19, 2024

RoyStegeman commented Feb 19, 2024

scarlehoff commented Feb 19, 2024

RoyStegeman commented Feb 19, 2024 •

edited

Loading

scarlehoff commented Feb 19, 2024

RoyStegeman commented Feb 19, 2024

Multi dense layer #1905

Multi dense layer #1905

Conversation

APJansen commented Jan 9, 2024 • edited by scarlehoff Loading

Main idea: MultiDense

Weight initialization: MultiInitializer

Dropout

Integration

Tests

Timings

Status:

github-actions bot commented Jan 11, 2024

github-actions bot commented Feb 16, 2024

RoyStegeman left a comment

Choose a reason for hiding this comment

scarlehoff commented Feb 17, 2024

RoyStegeman commented Feb 17, 2024

scarlehoff commented Feb 17, 2024 • edited Loading

Radonirinaunimi left a comment

Choose a reason for hiding this comment

APJansen commented Feb 19, 2024

scarlehoff commented Feb 19, 2024

RoyStegeman commented Feb 19, 2024

scarlehoff commented Feb 19, 2024

RoyStegeman commented Feb 19, 2024 • edited Loading

scarlehoff commented Feb 19, 2024

RoyStegeman commented Feb 19, 2024

APJansen commented Jan 9, 2024 •

edited by scarlehoff

Loading

Main idea: `MultiDense`

Weight initialization: `MultiInitializer`

scarlehoff commented Feb 17, 2024 •

edited

Loading

RoyStegeman commented Feb 19, 2024 •

edited

Loading