Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying this out on llama3-8b , what range of layers do I use? #48

Closed
7hacker opened this issue Sep 6, 2024 · 3 comments
Closed

Trying this out on llama3-8b , what range of layers do I use? #48

7hacker opened this issue Sep 6, 2024 · 3 comments

Comments

@7hacker
Copy link

7hacker commented Sep 6, 2024

Hi, thanks for making this library - pretty cool!

I'm trying to modify the llama3-70b notebook to llama3-8b and hit the out of range when setting the layer ranges to use.

wrapped_model = model
model = ControlModel(wrapped_model, list(range(20, 60)))
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[<ipython-input-7-7cb59d37bf58>](https://localhost:8080/#) in <cell line: 2>()
      1 wrapped_model = model
----> 2 model = ControlModel(wrapped_model, list(range(20, 60)))

2 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/container.py](https://localhost:8080/#) in _get_abs_string_index(self, idx)
    285         idx = operator.index(idx)
    286         if not (-len(self) <= idx < len(self)):
--> 287             raise IndexError(f'index {idx} is out of range')
    288         if idx < 0:
    289             idx += len(self)

IndexError: index 32 is out of range

llama3-8b has 32 total layers

num_layers = len(model.model.layers)  # Example for a specific model architecture
print(f"Number of layers: {num_layers}")

So I had two questions:

  • how do i know which ranges to apply for llama3-8b ?
  • why are these ranges chosen specifically , are they all the hidden layers ?

(I'm new to programming with pytorch and transformers )

@ujisati
Copy link

ujisati commented Dec 8, 2024

I'm no expert but I think the idea is you can experiment with whatever layers you want.

@thiswillbeyourgithub
Copy link

Btw I made a PR that added more intuitive ways to specify the layer ids

#55
https://github.com/vgel/repeng/pull/55/files

Also be careful to the template you use.

@vgel
Copy link
Owner

vgel commented Dec 14, 2024

I generally use the middle ~2/3rds of layers--you just want to avoid the very early layers, which handle the inputs, and the very late layers, which handle the outputs. You roughly want to target the center "hump" in this graph, layers 5-25 (from notebooks/sae.ipynb):

image

If you stick to that general principle, I haven't noticed big differences in the exact layer range targeted. @thiswillbeyourgithub 's PR is also a good place to look until that gets merged.

@vgel vgel closed this as completed Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants