Guided Generation

This repository contains some code + latex source from work on Guided Generation.

$Greedy next-token prediction balancing coherence and truth$

Building off the work in Discovering Latent Knowledge by Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt, we attempt to generate model completions guided by a linear probe in activation space

See this thread in the Eleuther discord for some discussion too.

Abstract

In this work, we show that linear classifiers for truth learned from consistent-contrast search in a language model's activation space transfer well to open-ended question answering. On TruthfulQA, a benchmark designed adversarially to uncover instances where models mimic human falsehoods, this method outperforms raw model probabilities at every model size tested. We then introduce Guided Generation, a novel method for generating text from a language model in line with latent knowledge uncovered from a classifier on model activations. We examine three approaches: simple rejection sampling, greedy search on merged objectives, and beam search on the same.

Code

guided-generation.ipynb contains a jupyter notebook for several guided generation experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
TruthfulQA		TruthfulQA
TruthfulQA2		TruthfulQA2
latex		latex
ccs.py		ccs.py
guided-generation.ipynb		guided-generation.ipynb
interesting.md		interesting.md
readme.md		readme.md
requirements.txt		requirements.txt
setup_env.sh		setup_env.sh
truthfulqa_guiding.ipynb		truthfulqa_guiding.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Guided Generation

Abstract

Code

About

Releases

Packages

Languages

ohxh/GuidedGeneration

Folders and files

Latest commit

History

Repository files navigation

Guided Generation

Abstract

Code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages