HyPost: Effective General and Domain-Adaptable ASR Post-Processing using LLMs

Submission to ICASSP 2025

Usage

pip install -r requirements.txt

Generating the HyPost Dataset

cd generate_data/whisper
pip install -e .

python generate_hypost_dataset.py --asr_wav  --asr_wav /path/to/wav --asr_txt /path/to/text --hp_json /path/to/hp.json --use_prompt

asr_wav: list of utterance ids and paths, e.g. "utt_id_1 /path/to/1.wav";
asr_txt: list of utterance ids and transcripts e.g. "utt_id_1 i have a dream";
hp_json: generated json file containing 5 hypotheses (input), unnormalized ground truth transcript (output), ground-truth transcript after normalization (normalized_output), and WER using normalized transcript (wer);
use_prompt: prompt to include hesitations in the hypotheses

LoRA Training

python finetune.py \
    --base_model 'meta-llama/Llama-2-7b-hf' \
    --data_path './data/hypost.json' \
    --output_dir './hypost' \
    --lora_target_modules='["down_proj","gate_proj","up_proj"]' \
    --learning_rate 1e-4 \
    --micro_batch_size=64 \
    --batch_size=256 \
    --lora_r=16 \
    --lora_alpha=16 \
    --prompt_template_name 'HyPost-LoRA' # change for other tasks in finetune/templates

Adapted from https://github.com/Hypotheses-Paradise/Hypo2Trans and https://github.com/tloen/alpaca-lora

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
auxiliary_metrics		auxiliary_metrics
finetune		finetune
generate_data		generate_data
other		other
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt
xlora_eval.py		xlora_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyPost: Effective General and Domain-Adaptable ASR Post-Processing using LLMs

Usage

Generating the HyPost Dataset

LoRA Training

About

Releases

Packages

Languages

maximus-21/HyPost

Folders and files

Latest commit

History

Repository files navigation

HyPost: Effective General and Domain-Adaptable ASR Post-Processing using LLMs

Usage

Generating the HyPost Dataset

LoRA Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages