Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"The CAPP (Contrastive Audio and Pose Pretraining) model should be available in a few weeks." #21

Open
johndpope opened this issue Oct 30, 2024 · 0 comments

Comments

@johndpope
Copy link
Owner

Screenshot 2024-10-30 at 11 11 40 PM

Joint Optimization:
Primary diffusion loss ensures high-quality generation
CAPP loss ensures better audio-pose alignment
Weighted combination allows control of importance

Training Insights:

CopyWithout CAPP:

  • Only optimizes for motion prediction
  • No explicit audio-pose alignment objective

With CAPP:

  • Direct feedback on alignment quality
  • Better learning of natural head movements
  • Improved synchronization with speech patterns

Validation Benefits:

CAPP score provides quantitative metric
Helps identify best checkpoints
Better model selection criteria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant