[Feature] ISS-60: Implement Self Extend #431

jonpsy · 2024-10-18T02:24:06Z

google-cla · 2024-10-18T02:24:09Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

jonpsy · 2024-10-18T02:25:45Z

@jan-wassenberg would need your CR here, its in alpha stage right now. Let's go back and forth on this. Thanks

jan-wassenberg · 2024-10-18T17:38:11Z

Ooh nice :) Please note that we can only take pull requests on the dev branch. That code has just changed to replace template arguments with a runtime argument. Would you mind updating/rebasing your code to that?

jonpsy · 2024-10-19T06:15:43Z

My bad, let me do the needful! Thanks for the pointer though.

jonpsy · 2024-10-19T06:34:11Z

Haha I took so long to understand how main branch worked now I have to re-do it with this new base branch

jonpsy · 2024-10-19T08:11:25Z

Note to self: Was able to compile gemma dev branch by commenting out
tls.stats.Notify(stats); line on file: compress-inl.h for clang version: arm64-apple-darwin23.4.0

Had to do this because the compiler has strict check against not sending non-trivial args to variadic method. Maybe I could've turned off that feature in clang using -Wno-non-pod-varargs but it didn't work for me.

^^ This issue should be resolved now

jan-wassenberg

Sorry about the code change. We are moving toward an "all in one file" model. Thanks for rebasing!

gemma/gemma-inl.h

jan-wassenberg · 2024-10-22T12:05:13Z

gemma/configs.h

@@ -127,6 +127,10 @@ struct LayerConfig {
  size_t conv1d_width = 0;
  bool ff_biases = false;
  bool softmax_attn_output_biases = false;
+  bool self_extend = false;
+  size_t ngb_size = 0;


Is this n-gram block? Maybe expand to block_size for more clarity? We can also move these three new fields into a section (just newline before them) with a // Self-extension comment.

@jan-wassenberg Sorry, didn't understood it. I did it here because LayerConfig gets accessed during the Attention mechanism.

Sorry to be unclear, I was suggesting considering renaming this to ngram_block_size.
And it would be good to add a newline plus "// Self-extension" comment for visual separation from the other fields in this struct.

Oh, ngb is short for neighbour, i see the point of confusion now.

Ah :) Generally it's good to write out words.

understood, i'll make the change.

jan-wassenberg

Sorry to reply late :)

jan-wassenberg · 2024-11-01T12:44:29Z

gemma/configs.h

@@ -127,6 +127,10 @@ struct LayerConfig {
  size_t conv1d_width = 0;
  bool ff_biases = false;
  bool softmax_attn_output_biases = false;
+  bool self_extend = false;
+  size_t ngb_size = 0;


Sorry to be unclear, I was suggesting considering renaming this to ngram_block_size.
And it would be good to add a newline plus "// Self-extension" comment for visual separation from the other fields in this struct.

gemma/gemma-inl.h

jonpsy · 2024-11-01T14:10:28Z

Nit: I noticed when I ran clang-format on gemma-inl.h that some committed lines were re-formatted. Currently, I kept the committed line as is and only stuck to my code.

I suppose the team has some internal flow to run clang-format on the entire repo before merging which I'm not aware of, so I'll leave that decision onto you.

jan-wassenberg · 2024-11-01T14:56:04Z

No worries, both the result of clang-format, or leaving it unchanged, are fine.
Our IDEs indeed auto-format on saving.

jonpsy · 2024-11-01T18:40:31Z

Hm looks like there's static_cast is failing in gcc ubuntu

jan-wassenberg · 2024-11-04T09:31:12Z

hm, the build error I'm currently seeing seems to be due to an extra/unnecessary & in line 309, "const hwy::Divisor&". We want to just construct one instance, not a reference.

jonpsy · 2024-11-05T17:40:29Z

EDIT (IGNORE BELOW): Took longer than expected but here's a tech doc, will be easier to collaborate here. I pasted the same comment there, let me know if I got your email wrong?

@jan-wassenberg Let me highlight an issue here:

Summary: I want to mutate ModelConfig in runtime

Background:
Currently in run.cc we take the input model to match against the predefined models (kModelFlags) defined in common.cc. In future also see there's a comment of loading ModelConfig from the model itself. This will make it highly static.

Let's say I want to modify Gemma 2B to run self extend with some group size, I should be able to configure this in runtime, without hard coding anything in the config file, that's the entire point of this paper (to be able to increase context window at inference time).

My proposal:

Approach 1: Modification of ModelConfig on runtime

A basic approach would allow modification of ModelConfig via RuntimeConfig and consume only necessary parameters from it. Then we could define

class ModelConfig {
    
   int MutateModelConfig(const RuntimeConfig& runtime_config) {
         // do some validation
          
         // consume self_extend params, and modify model config with some validation
         this->layer_configs = runtime_config.layer_configs

    }

Pros:

No changes in Gemma side, and the definition of "ModelConfig" remains intact (i.e. defines model behaviour).
Allows leeway for future ModelConfig runtime behaviour change, which I don't think is far-fetched

Cons:

Work done vs Reward ratio seems weak

Approach 2: Create member variables for Gemma

Basically allow Gemma to store specific values from runtime_config and store it inside member variables i.e. self_extend_, ngb_size_, grp_size_

Pros:

Simple to implement
Keeps the sanctity of "runtime" behaviour

Cons:

Additional member variables (Should we create a class to hold generic behaviour altering variables like these?)

jan-wassenberg · 2024-11-06T14:37:20Z

hm, I understand. FYI ModelConfig has some larger changes coming up in order to allow serializing it to disk.
Let's therefore minimize the number of changes to ModelConfig itself.
How about in ModelWeightsStorage and Gemma we add a MutableModelConfig() accessor function that returns a non-const reference that we can modify?

jonpsy · 2024-11-10T15:56:21Z

Hi @jan-wassenberg, thanks for the prompt reply. I suppose you mean we do something similar to how config is being accessed currently?

// gemma.h
ModelConfig& GetMutableModelConfig() const { return model_.MutableConfig(); }

// weights.h
 ModelConfig& MutableConfig() const { return config_; }

In LayerConfig, define it as

// configs.h
class LayerConfig {

  /**
   * Self-extend
   * Jin, Hongye, et al. "Llm maybe longlm: Self-extend llm context window without tuning." arXiv preprint arXiv:2401.01325 (2024).
   */
  bool self_extend = false;
  // Self-extend neighbor size
  size_t se_neighbor_size = std::numeric_limits<size_t>::max();
  // Self-extend group window size
  size_t se_group_size = 1;

In our app we should allow storing of self extend params inside it, something like this

struct LoaderArgs {
  // Self-extend
  Tristate self_extend;
  size_t se_group_size;
  size_t se_neighbor_size;
}

Finally, in run.cc we can access the mutable config and fill parameters inside it via LoaderArgs

// run.cc

void ApplySelfExtendIfGiven(Gemma& model, const LoaderArgs& loader)  {
 ModelConfig& config = model.GetMutableConfig();
 if (loader.self_extend != TriState::kTrue) {
    return;
 }

 // Modify layer config in-place
 auto& layer_configs = std::move(config.layer_configs);
 std::transform(layer_configs.begin(), layer_configs.end(), layer_configs.begin(), [](
){
     layer_config.self_extend = loader.self_extend;
     layer_config.se_group_size = loader.se_group_size;
     layer_config.se_neighbor_size = loader.se_neighbor_size;
 });
}

void Run(LoaderArgs& loader, InferenceArgs& inference, AppArgs& app) {
   // post CreateGemma
   Gemma model = CreateGemma(loader, pools);
   ModelConfig& mutable_model_config = model.GetMutableConfig();
   ApplySelfExtendIfGiven(model, loader);

⚠️ There's a minor issue here, LayerWeightPtrs holds const ref to LayerConfig, it will hold on to previous version. I don't see it being used currently

jan-wassenberg · 2024-11-14T16:27:08Z

Nice, this looks good to me, except that the lambda's layer_config argument seems to have been omitted?

LayerWeightsPtrs.layer_config is used by Reshape in weights.h.
I believe this is fine: the LayerConfig are stored in a vector owned by ModelConfig, and we could modify them there. Any existing const-reference to them can be thought of as a pointer, so they will see any updates to the underlying storage, made via your new Mutable() accessor function. Does that make sense?

jonpsy · 2024-11-15T14:27:32Z

Woah, that's an interesting insight :) and yes I see the use of layer_config as well. Let me spin this up real quick!

…f extend config

jonpsy · 2024-11-19T18:09:49Z

Okay! Done with the changes, and its working with and without the runtime config 🍾

Tried it on a sample, and it completely fails 😄 . I highly doubt it has to do with the pos value I'm modifying.

Moving on: Unfortunately, I'm unable to run LongLM because I have mac and it has some issues with flash_attn module. If I could compare it that'd be great.

jonpsy · 2024-11-19T20:20:54Z

Example:
Input prompt

Here are a major wars have a global warming temperatures could lead to the environment, the environment, the global warming temperatures could lead to the threat, the likelihood of a lack of a lack of a lack of a threat from the lack of a lack of the lack of the of the of a lack of war, the of a threat, the lack of the lack of the global conflicts could lead to the lack of the presence of a lack of a lack of an increase in order and that a threat, the lack of a lack of the global instability and that a global instability and that a threat, the of past conflicts are the of the of war could lead to war could be high likelihood of instability and that a threat, the of war could lead to be high.

Output prompt:

Here are the exact details of a global governance, the exact details of the likelihood of a global governance, the likelihood of a comprehensive and the potential for example of the lack of a country could lead to the lack of a nation's, the lack of a global warming temperatures could lead to the threat.

**Here are a major wars have a global warming temperatures could lead to the environment, the environment, the global warming temperatures could lead to the threat, the likelihood of a lack of a lack of a lack of a threat from the lack of a lack of the lack of the of the of a lack of war, the of a threat, the lack of the lack of the global conflicts could lead to the lack of the presence of a lack of a lack of an increase in order and that a threat, the lack of a lack of the global instability and that a global instability and that a threat, the of past conflicts are the of the of war could lead to war could be high likelihood of instability and that a threat, the of war could lead to be high.

self_extend: true
se_group_size: 2
se_neighbor_size: 4

jan-wassenberg · 2024-11-20T18:22:48Z

Nice, your code looks good to me!
hm, how should we understand the example, what are input and output prompts?
It does look like we're losing coherency :/ This is also more likely with smaller models, is it the 2B?

jonpsy force-pushed the feature/ISS-60/implement-self-extend branch from 4bfe885 to 5a2a7ee Compare October 18, 2024 08:12

jonpsy changed the base branch from main to dev October 19, 2024 04:45

compile success: set default self extend values in noSSM and griffin

8cf3966

jonpsy force-pushed the feature/ISS-60/implement-self-extend branch from 7bb4e0b to 8cf3966 Compare October 19, 2024 06:12

remove compile time config

fbba197

Use runtime config to setup self extend

f77e61e

Merge branch 'dev' into feature/ISS-60/implement-self-extend

e8aaa4a

jan-wassenberg reviewed Oct 22, 2024

View reviewed changes

Use hwy divisor to speed up division

3b270d2

jan-wassenberg reviewed Nov 1, 2024

View reviewed changes

Move div_grp_size outside

719098f

Use explicit ctor for hwy divisor

2806374

Merge branch 'dev' into feature/ISS-60/implement-self-extend

397952f

Added support for mutable ModelConfig, run.cc can support runtime sel…

14d62b0

…f extend config

Merge branch 'dev' into feature/ISS-60/implement-self-extend

e8601b2

Merge branch 'dev' into feature/ISS-60/implement-self-extend

51a708e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] ISS-60: Implement Self Extend #431

[Feature] ISS-60: Implement Self Extend #431

jonpsy commented Oct 18, 2024 •

edited

Loading

google-cla bot commented Oct 18, 2024

jonpsy commented Oct 18, 2024

jan-wassenberg commented Oct 18, 2024

jonpsy commented Oct 19, 2024

jonpsy commented Oct 19, 2024 •

edited

Loading

jonpsy commented Oct 19, 2024 •

edited

Loading

jan-wassenberg left a comment

jan-wassenberg Oct 22, 2024

jonpsy Oct 29, 2024

jan-wassenberg Nov 1, 2024

jonpsy Nov 5, 2024

jan-wassenberg Nov 5, 2024

jonpsy Nov 5, 2024

jan-wassenberg left a comment

jan-wassenberg Nov 1, 2024

jonpsy commented Nov 1, 2024

jan-wassenberg commented Nov 1, 2024

jonpsy commented Nov 1, 2024

jan-wassenberg commented Nov 4, 2024

jonpsy commented Nov 5, 2024 •

edited

Loading

jan-wassenberg commented Nov 6, 2024

jonpsy commented Nov 10, 2024 •

edited

Loading

jan-wassenberg commented Nov 14, 2024

jonpsy commented Nov 15, 2024

jonpsy commented Nov 19, 2024

jonpsy commented Nov 19, 2024

jan-wassenberg commented Nov 20, 2024

[Feature] ISS-60: Implement Self Extend #431

Are you sure you want to change the base?

[Feature] ISS-60: Implement Self Extend #431

Conversation

jonpsy commented Oct 18, 2024 • edited Loading

google-cla bot commented Oct 18, 2024

jonpsy commented Oct 18, 2024

jan-wassenberg commented Oct 18, 2024

jonpsy commented Oct 19, 2024

jonpsy commented Oct 19, 2024 • edited Loading

jonpsy commented Oct 19, 2024 • edited Loading

jan-wassenberg left a comment

Choose a reason for hiding this comment

jan-wassenberg Oct 22, 2024

Choose a reason for hiding this comment

jonpsy Oct 29, 2024

Choose a reason for hiding this comment

jan-wassenberg Nov 1, 2024

Choose a reason for hiding this comment

jonpsy Nov 5, 2024

Choose a reason for hiding this comment

jan-wassenberg Nov 5, 2024

Choose a reason for hiding this comment

jonpsy Nov 5, 2024

Choose a reason for hiding this comment

jan-wassenberg left a comment

Choose a reason for hiding this comment

jan-wassenberg Nov 1, 2024

Choose a reason for hiding this comment

jonpsy commented Nov 1, 2024

jan-wassenberg commented Nov 1, 2024

jonpsy commented Nov 1, 2024

jan-wassenberg commented Nov 4, 2024

jonpsy commented Nov 5, 2024 • edited Loading

jan-wassenberg commented Nov 6, 2024

jonpsy commented Nov 10, 2024 • edited Loading

jan-wassenberg commented Nov 14, 2024

jonpsy commented Nov 15, 2024

jonpsy commented Nov 19, 2024

jonpsy commented Nov 19, 2024

jan-wassenberg commented Nov 20, 2024

jonpsy commented Oct 18, 2024 •

edited

Loading

jonpsy commented Oct 19, 2024 •

edited

Loading

jonpsy commented Oct 19, 2024 •

edited

Loading

jonpsy commented Nov 5, 2024 •

edited

Loading

jonpsy commented Nov 10, 2024 •

edited

Loading