Merge repeng to NousResearch/llama.cpp/master #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Many thanks to Nous Research, whose support and collaboration made this work possible!
This PR introduces a new activations hacking technique, control vectors (also known as steering vectors, concept vectors, representation engineering, etc.). Control vectors are an easy-to-train (~60s on a 4090 for a 7B parameter model) way to modify the behavior of an LLM without finetuning or inference-time prompting, using a synthetic dataset of prompt pairs and PCA to generate a set of per-layer vectors that are added to the model activations.
They've been described in a few recent papers, such as Representation Engineering: A Top-Down Approach to AI Transparency. I also have a blog post that covers them in a more grounded way, with a library for easily creating them and examples of their use: https://vgel.me/posts/representation-engineering/
An example from the blog post of a laziness/diligence vector being trained and applied to mistral-7b-instruct-0.1
This PR adds the ability to use control vectors, in GGUF format, with Llama-architecture models in llama.cpp. (Support for other architectures hasn't been implemented yet.) Currently, these control vectors can only be exported from repeng, but the format is simple, so my hope is that it can become a common export format for other libraries that generate representation engineering vectors with different techniques.
CLI / Usage
Along with changes to llama.cpp / llama.h to support loading control vectors, doing arithmetic on control vectors, and applying a control vector to or removing a control vector from a
llama_context *
, this PR also adds arguments to the common CLI:As an example usage, this command loads a Q4_K_M mistral-7b-instruct-0.1, and applies a pretrained happiness vector with a (default) strength of
1
, and a pretrained honesty vector with a strength of-2
(producing a strength-2 dishonesty vector) for a combined effect of a somewhat happy / very dishonest model. Note that the prompt doesn't mention a persona at all, the behavior comes purely from the control vectors.If you'd like to test this PR, but don't have a machine that can run
repeng
, I've uploaded those pretrained vectors to my website: happy.gguf honest.gguf. Please let me know if there's any other vectors you'd be interested in testing, and I can upload those as well. These vectors are trained on mistral-7b-instruct-0.1, but have also been tested on or mistral-7b-0.1 (base), and may also work on other Mistral finetunes or merges (testing appreciated).This is my first llama.cpp PR (and my first C++ PR to any project), so any feedback on code style or implementation strategy is appreciated!