Fk refactor #1936

APJansen · 2024-02-12T14:47:43Z

The idea

It's a relatively small change, only affecting the observable layers, changing a bit the order in which indices are contracted, and changing from a boolean mask to a float mask.

Performance

Timings for 1000 epochs of the main runcard (NNPDF40_nnlo_as_01180_1000), on Snellius, with 100 replicas on the GPU or 1 replica on the CPU. In brackets the GPU memory used.

branch	commit hash	1 replica	100 replicas	500 replicas	1000 replicas
multi-dense + trvl	59e5b58	145	320 (16.8 Gb)	x	x
fk-refactor	67cd5f0	122	176 (4.5Gb)	423 (16.8Gb)	x
fk-refactor (precompute)	22ef7b0	175	100
fk-refactor (enforce order einsum)	1a751c2	175	90
fk-refactor (fix 1 replica)	16551f6	118	90

Profile

The validation step will be addressed in #1855 and the gaps in #1802.

scarlehoff

I guess that if you run 1 replica in a GPU it would be faster, right?

(otherwise it makes no sense that running 1 replica would be slower than running 100).

n3fit/src/n3fit/layers/DIS.py

n3fit/src/n3fit/layers/DY.py

scarlehoff · 2024-03-04T23:04:05Z

n3fit/src/n3fit/layers/DY.py

-        # the masked convolution removes the batch dimension
-        ret = op.transpose(self.operation(results))
-        return op.batchit(ret)
+        self.compute_observable = compute_observable


same comment as before, having these function as module level functions would be great

I would also expect a speed up and memory reduction since they will be shared by different observables... You might want to put a @tf.function decorator on them.

... unless the speed up is coming from a memory-tradeoff by having these functions be observable specific? But I would hope not...
if that were the case you can still compile them when you attach them to the given layer by doing

self.compute_observable = @tf.function(compute_observable)

I think having a decorator or not shouldn't matter, as they're already used in a call which should get compiled, but I can do a test.
I don't think the speedup has to do with them being observable specific, at least not intentionally.

n3fit/src/n3fit/layers/observable.py

scarlehoff · 2024-03-04T23:12:23Z

n3fit/src/n3fit/layers/observable.py

@@ -42,36 +43,104 @@ def __init__(self, fktable_data, fktable_arr, operation_name, nfl=14, **kwargs):
        super(MetaLayer, self).__init__(**kwargs)

        self.nfl = nfl
+        self.num_replicas = None  # set in build
+        self.compute_observable = None  # set in build


Add a comment about compute_observable being a function with signature (pdf, masked_pdf) that need to be overwritten by any children of this class.

Actually, maybe it makes sense to make compute_observable be an abstract method and then in DIS.py and DY.py the choice of which (outside) function to use becomes

def compute_observable(self, pdf, fk): if self._one_replica: return _compute_dis_observable_one_replica(pdf, fk) return _compute_dis_observable_many_replica(pdf, fk)

I added the comment, wanted to avoid these if statements in the call. It probably doesn't matter but I thought it looked cleaner. But can make it an abstract method if you prefer.

APJansen · 2024-03-05T09:26:50Z

Thanks for the comments, I addressed them all, will do some timings after these changes, and with vs without a decorator on the compute_observables, as well as 1 replica on the GPU.

APJansen · 2024-03-05T09:48:53Z

I have no idea why, but the 1 replica case became significantly faster: 92s (vs 118 before, both on CPU).
100 replicas on GPU now takes 92 seconds as well, slightly more but most likely just random variations.

1 replica on the GPU takes 65 seconds, indeed faster than 100 of course, but the scaling is great.

The @tf.function decorators don't make a difference in performance.

scarlehoff · 2024-03-05T09:53:15Z

I have no idea why, but the 1 replica case became significantly faster: 92s (vs 118 before, both on CPU).

It might be able to compile the functions differently now? Maybe playing with the tf.function options https://www.tensorflow.org/api_docs/python/tf/function you might be able to get it even faster.

1 replica on the GPU takes 65 seconds, indeed faster than 100 of course, but the scaling is great.

Indeed, a factor of 100 in exchange for a factor of 1.5!!!!

scarlehoff

It looks good, thank you!

I have just one question... why is masking the PDF instead of the fktable faster in the case of one replica? Could've this been also a fluke of the cpu profiling?

It might be worthwhile to test again because part of the complexity here is coming from that. If that limitation is lifted this would be super clear and elegant!!

scarlehoff · 2024-03-05T11:37:19Z

n3fit/src/n3fit/layers/DIS.py

+    Same operations as above but a specialized implementation that is more efficient for 1 replica,
+    masking the PDF rather than the fk table.
+    """
+    # TODO: check if that is actually true


since you are already testing the timings, and we are already free from the yoke of 4.0.9 (which was supposed to be 7... so only two versions out, not bad)... why don't you remove the conditional and try to use the "multireplica" version in all of them?

Good point, just tested using the many replica version in all the observables. On the GPU it's actually faster, 53 seconds. On the CPU it's way slower, 330 seconds.

There are 2 factors here, one is just einsum vs tensordot, another is the order of contractions for DIS.

Masking the PDF has the downside that it cannot be precomputed, but it has the upside that it reduces the number of flavours. Masking the fk table only has to be done once, but it enlarges it (perhaps masking is not the best word in this case, it's expanding it into all flavours with zeroes).

I've tested also the approach of masking the fk table but using tensordot for 1 replica on the CPU

def compute_dy_observable_one_replica(pdf, masked_fk): pdf = pdf[0][0] # yg fk_pdf = op.tensor_product(masked_fk, pdf, axes=[(3, 4), (0, 1)]) # nxfyg, yg -> nxf observable = op.tensor_product(fk_pdf, pdf, axes=[(1, 2), (0, 1)]) # nxf, xf -> n return op.batchit(op.batchit(observable)) # brn

This takes 177 seconds on CPU, and again 53 on the GPU.

I don't fully understand these timings, but unfortunately we'll have to keep the branching, unless you're prepared to completely do away with CPU runs, but I don't think that's the case.

edit: for clarity: 1 replica timings

CPU\GPU einsum tensordot

mask pdf - 92 \ 65

mask fk 330 \ 53 \ 177 \ 53

Masking the fk table only has to be done once, but it enlarges it (perhaps masking is not the best word in this case, it's expanding it into all flavours with zeroes).

ah!! Okok, I was misled by the masking word. From this everything else makes sense.

Sadly we cannot avoid cpu runs because it will still be what most people use.
Could you add (basically this comment as a comment) at the top of the DIS.py module for instance with the version of tensorflow that you used.

We can then revisit it in the future. The important thing is that in the 1 replica case using einsum adds a factor of 2 (and the multireplica needs einsum). And since we already have the branching, we might as well mask the PDF for an extra factor of two.

Thank you for these checks.

Added something, is that what you meant?
Perhaps I should change masked_fk to something like padded_fk?

Also did a quick check in multireplica. Using observable code as it is now (so tensordot and masking pdf), but using einsum in multidense, it comes out as 115 seconds, so also slower.

n3fit/src/n3fit/layers/DIS.py

APJansen · 2024-03-05T13:48:12Z

I've rewritten it as padded_fk, it's clearer right?

scarlehoff · 2024-03-05T13:58:01Z

Even better. imho mask was ok once the clarification was added

scarlehoff · 2024-03-05T14:16:14Z

oh, I thought this was tracking master. is it fine for you if I rebase this one on master, and then the fix to 2.16 on this one?

APJansen · 2024-03-05T14:18:02Z

Yep that's fine, I rebased on master a while ago, but not recently.

Add timing comment additional comment Co-authored-by: Juan M. Cruz-Martinez <[email protected]>

github-actions · 2024-03-05T16:12:43Z

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Fit Name: NNBOT-c07b1a1ce-2024-03-05
Fit Report wrt master: https://vp.nnpdf.science/DbJiki1pTDma92nBkqnY2Q==
Fit Report wrt latest stable reference: https://vp.nnpdf.science/h0EPH2ahSA-Og-p6q3igqw==
Fit Data: https://data.nnpdf.science/fits/NNBOT-c07b1a1ce-2024-03-05.tar.gz

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

APJansen self-assigned this Feb 12, 2024

APJansen added n3fit Issues and PRs related to n3fit performance escience labels Feb 12, 2024

This was referenced Feb 13, 2024

Parallel hyperoptimization with MongoDB #1921

Merged

Avoid idle gpu #1939

Merged

APJansen force-pushed the fk-refactor branch 2 times, most recently from 5ff498f to d794992 Compare February 22, 2024 08:14

APJansen mentioned this pull request Feb 22, 2024

Parallel replicas with varying tr-vl masks #1788

Merged

APJansen force-pushed the fk-refactor branch 2 times, most recently from ad94132 to 5d742c9 Compare February 22, 2024 12:35

APJansen marked this pull request as ready for review February 23, 2024 11:29

APJansen force-pushed the fk-refactor branch 2 times, most recently from 9375dec to 16551f6 Compare February 26, 2024 11:34

APJansen mentioned this pull request Mar 4, 2024

Finalizing eScience contributions #1977

Closed

scarlehoff reviewed Mar 4, 2024

View reviewed changes

scarlehoff reviewed Mar 5, 2024

View reviewed changes

n3fit/src/n3fit/layers/DIS.py Show resolved Hide resolved

scarlehoff added the redo-regressions Recompute the regression data label Mar 5, 2024

APJansen added 4 commits March 5, 2024 15:20

Refactor FK contractions

f356241

Precompute masked fk tables

d667545

enforce einsum order in DY

66ae33b

Refactor 1 replica DY

54ef560

APJansen and others added 2 commits March 5, 2024 15:20

Implement Juan's suggestions.

6eccda2

Add timing comment additional comment Co-authored-by: Juan M. Cruz-Martinez <[email protected]>

Rewrite masked_fk as padded_fk

f5530d5

scarlehoff force-pushed the fk-refactor branch from b35a4a9 to f5530d5 Compare March 5, 2024 14:21

scarlehoff added redo-regressions Recompute the regression data run-fit-bot Starts fit bot from a PR. and removed redo-regressions Recompute the regression data labels Mar 5, 2024

Automatically regenerated regressions from PR 1936, branch fk-refactor.

3bb6389

fitbot updated due to refactor of convolution layers in PR #1936

dc57bf8

scarlehoff merged commit 23844cb into master Mar 5, 2024
8 checks passed

scarlehoff deleted the fk-refactor branch March 5, 2024 16:20

scarlehoff mentioned this pull request Mar 5, 2024

To Do for 4.0.10 #1854

Open

34 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fk refactor #1936

Fk refactor #1936

APJansen commented Feb 12, 2024 •

edited

Loading

scarlehoff left a comment

scarlehoff Mar 4, 2024

APJansen Mar 5, 2024

scarlehoff Mar 4, 2024

APJansen Mar 5, 2024

APJansen commented Mar 5, 2024

APJansen commented Mar 5, 2024

scarlehoff commented Mar 5, 2024

scarlehoff left a comment

scarlehoff Mar 5, 2024

APJansen Mar 5, 2024 •

edited

Loading

scarlehoff Mar 5, 2024

APJansen Mar 5, 2024

APJansen commented Mar 5, 2024

scarlehoff commented Mar 5, 2024

scarlehoff commented Mar 5, 2024

APJansen commented Mar 5, 2024

github-actions bot commented Mar 5, 2024

Fk refactor #1936

Fk refactor #1936

Conversation

APJansen commented Feb 12, 2024 • edited Loading

The idea

Performance

Profile

scarlehoff left a comment

Choose a reason for hiding this comment

scarlehoff Mar 4, 2024

Choose a reason for hiding this comment

APJansen Mar 5, 2024

Choose a reason for hiding this comment

scarlehoff Mar 4, 2024

Choose a reason for hiding this comment

APJansen Mar 5, 2024

Choose a reason for hiding this comment

APJansen commented Mar 5, 2024

APJansen commented Mar 5, 2024

scarlehoff commented Mar 5, 2024

scarlehoff left a comment

Choose a reason for hiding this comment

scarlehoff Mar 5, 2024

Choose a reason for hiding this comment

APJansen Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

scarlehoff Mar 5, 2024

Choose a reason for hiding this comment

APJansen Mar 5, 2024

Choose a reason for hiding this comment

APJansen commented Mar 5, 2024

scarlehoff commented Mar 5, 2024

scarlehoff commented Mar 5, 2024

APJansen commented Mar 5, 2024

github-actions bot commented Mar 5, 2024

APJansen commented Feb 12, 2024 •

edited

Loading

APJansen Mar 5, 2024 •

edited

Loading