This guide covers all configuration options and command line arguments for the video analyzer tool, along with practical examples for different use cases.
video-analyzer path/to/video.mp4
video-analyzer path/to/video.mp4 --client openai_api --api-key your-key --api-url https://openrouter.ai/api/v1
Argument | Description | Default | Example |
---|---|---|---|
video_path |
Path to the input video file | (Required) | video.mp4 |
--config |
Path to configuration directory | config/ | --config /path/to/config/ |
--output |
Output directory for analysis results | output/ | --output ./results/ |
--client |
Client to use (ollama or openai_api) | ollama | --client openai_api |
--ollama-url |
URL for the Ollama service | http://localhost:11434 | --ollama-url http://localhost:11434 |
--api-key |
API key for OpenAI-compatible service | None | --api-key sk-xxx... |
--api-url |
API URL for OpenAI-compatible API | None | --api-url https://openrouter.ai/api/v1 |
--model |
Name of the vision model to use | llama3.2-vision | --model gpt-4-vision-preview |
--duration |
Duration in seconds to process | None (full video) | --duration 60 |
--keep-frames |
Keep extracted frames after analysis | False | --keep-frames |
--whisper-model |
Whisper model size or model path | medium | --whisper-model large |
--start-stage |
Stage to start processing from (1-3) | 1 | --start-stage 2 |
--max-frames |
Maximum number of frames to process. When specified, frames are sampled evenly across the video duration rather than just taking the first N frames. | sys.maxsize | --max-frames 100 |
--log-level |
Set logging level | INFO | --log-level DEBUG |
--prompt |
Question to ask about the video | "" | --prompt "What activities are shown?" |
--language |
Set language for transcription | None (auto-detect) | --language en |
--device |
Select device for Whisper model | cpu | --device cuda |
The --start-stage
argument allows you to begin processing from a specific stage:
- Frame and Audio Processing
- Frame Analysis
- Video Reconstruction
The tool uses a cascading configuration system with the following priority:
- Command line arguments (highest priority)
- User config (config/config.json)
- Default config (config/default_config.json)
{
"clients": {
"default": "ollama",
"ollama": {
"url": "http://localhost:11434",
"model": "llama3.2-vision"
},
"openai_api": {
"api_key": "",
"api_url": "https://openrouter.ai/api/v1",
"model": "meta-llama/llama-3.2-11b-vision-instruct:free"
}
},
"prompt_dir": "",
"output_dir": "output",
"frames": {
"per_minute": 10,
"analysis_threshold": 10.0,
"min_difference": 5.0,
"max_count": 30
},
"response_length": {
"frame": 256,
"reconstruction": 512,
"narrative": 1024
},
"audio": {
"sample_rate": 16000,
"channels": 1,
"quality_threshold": 0.5,
"chunk_length": 30,
"language_confidence_threshold": 0.5,
"language": null
},
"keep_frames": false,
"prompt": ""
}
clients.default
: Default LLM client (ollama/openai_api)clients.ollama.url
: Ollama service URLclients.ollama.model
: Vision model for Ollamaclients.openai_api.api_key
: API key for OpenAI-compatible servicesclients.openai_api.api_url
: API endpoint URLclients.openai_api.model
: Vision model for API service
frames.per_minute
: Target frames to extract per minuteframes.analysis_threshold
: Threshold for key frame detectionframes.min_difference
: Minimum difference between framesframes.max_count
: Maximum frames to extract
response_length.frame
: Max length for frame analysisresponse_length.reconstruction
: Max length for video reconstructionresponse_length.narrative
: Max length for enhanced narrative
audio.sample_rate
: Audio sample rate in Hzaudio.channels
: Number of audio channelsaudio.quality_threshold
: Minimum quality for transcriptionaudio.chunk_length
: Audio chunk processing lengthaudio.language_confidence_threshold
: Language detection confidenceaudio.language
: Force specific language (null for auto-detect)
prompt_dir
: Custom prompt directory pathoutput_dir
: Analysis output directorykeep_frames
: Retain extracted framesprompt
: Custom analysis prompt
video-analyzer video.mp4
video-analyzer video.mp4 \
--client openai_api \
--api-key your-key \
--api-url https://openrouter.ai/api/v1 \
--model meta-llama/llama-3.2-11b-vision-instruct:free \
--whisper-model large \
--prompt "What activities are happening in this video?"
video-analyzer video.mp4 \
--start-stage 2 \
--max-frames 50 \
--keep-frames
video-analyzer video.mp4 \
--max-frames 5 \
--keep-frames
This will extract frames evenly spaced across the video duration. For example, in a 5-minute video, it would sample approximately one frame per minute rather than taking the first 5 frames.
video-analyzer video.mp4 \
--language es \
--whisper-model large
video-analyzer video.mp4 \
--device cuda \
--whisper-model large
video-analyzer video.mp4 \
--config custom_config.json \
--output ./analysis_results \
--client openai_api \
--api-key your-key \
--api-url https://openrouter.ai/api/v1 \
--model meta-llama/llama-3.2-11b-vision-instruct:free \
--duration 120 \
--whisper-model large \
--keep-frames \
--log-level DEBUG \
--prompt "Focus on the interactions between people"
video-analyzer video.mp4 \
--client ollama \
--ollama-url http://localhost:11434 \
--model llama3.2-vision \
--max-frames 30 \
--whisper-model medium \
--device cuda \
--language en
video-analyzer video.mp4 \
--start-stage 2 \
--output ./custom_output \
--keep-frames \
--max-frames 50 \
--prompt "Describe the main events"
video-analyzer video.mp4 \
--whisper-model /path/to/whisper/model \
--device cuda \
--start-stage 1