Trying this out on llama3-8b , what range of layers do I use? #48

7hacker · 2024-09-06T21:52:22Z

Hi, thanks for making this library - pretty cool!

I'm trying to modify the llama3-70b notebook to llama3-8b and hit the out of range when setting the layer ranges to use.

wrapped_model = model
model = ControlModel(wrapped_model, list(range(20, 60)))

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[<ipython-input-7-7cb59d37bf58>](https://localhost:8080/#) in <cell line: 2>()
      1 wrapped_model = model
----> 2 model = ControlModel(wrapped_model, list(range(20, 60)))

2 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/container.py](https://localhost:8080/#) in _get_abs_string_index(self, idx)
    285         idx = operator.index(idx)
    286         if not (-len(self) <= idx < len(self)):
--> 287             raise IndexError(f'index {idx} is out of range')
    288         if idx < 0:
    289             idx += len(self)

IndexError: index 32 is out of range

llama3-8b has 32 total layers

num_layers = len(model.model.layers)  # Example for a specific model architecture
print(f"Number of layers: {num_layers}")

So I had two questions:

how do i know which ranges to apply for llama3-8b ?
why are these ranges chosen specifically , are they all the hidden layers ?

(I'm new to programming with pytorch and transformers )

The text was updated successfully, but these errors were encountered:

ujisati · 2024-12-08T22:53:23Z

I'm no expert but I think the idea is you can experiment with whatever layers you want.

thiswillbeyourgithub · 2024-12-09T06:16:01Z

Btw I made a PR that added more intuitive ways to specify the layer ids

#55
https://github.com/vgel/repeng/pull/55/files

Also be careful to the template you use.

vgel · 2024-12-14T06:44:59Z

I generally use the middle ~2/3rds of layers--you just want to avoid the very early layers, which handle the inputs, and the very late layers, which handle the outputs. You roughly want to target the center "hump" in this graph, layers 5-25 (from notebooks/sae.ipynb):

If you stick to that general principle, I haven't noticed big differences in the exact layer range targeted. @thiswillbeyourgithub 's PR is also a good place to look until that gets merged.

vgel closed this as completed Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying this out on llama3-8b , what range of layers do I use? #48

Trying this out on llama3-8b , what range of layers do I use? #48

7hacker commented Sep 6, 2024

ujisati commented Dec 8, 2024 •

edited

Loading

thiswillbeyourgithub commented Dec 9, 2024

vgel commented Dec 14, 2024 •

edited

Loading

Trying this out on llama3-8b , what range of layers do I use? #48

Trying this out on llama3-8b , what range of layers do I use? #48

Comments

7hacker commented Sep 6, 2024

ujisati commented Dec 8, 2024 • edited Loading

thiswillbeyourgithub commented Dec 9, 2024

vgel commented Dec 14, 2024 • edited Loading

ujisati commented Dec 8, 2024 •

edited

Loading

vgel commented Dec 14, 2024 •

edited

Loading