You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
A dashboard will provide the necessary monitoring and performance metrics, enabling efficient management and optimization of our system.
Describe the solution you'd like
Below are the key features I envision for a dashboard:
Resource Monitoring
CPU: Real-time monitoring of CPU utilization across the distributed system nodes.
Memory: Tracking and visualization of memory usage for each node.
GPU: Monitoring GPU utilization, allowing us to identify bottlenecks or optimize resource allocation.
VRAM: Real-time monitoring of VRAM utilization for GPU-based inference.
Performance Monitoring:
Model-Specific Metrics: For each deployed model, capture and display relevant metrics such as generate task queue length, number of tokens generated per second, and any other model-specific performance indicators.
Throughput and Latency: Measure the overall throughput and latency of the system, enabling us to identify any performance issues and assess system efficiency.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe
A dashboard will provide the necessary monitoring and performance metrics, enabling efficient management and optimization of our system.
Describe the solution you'd like
Below are the key features I envision for a dashboard:
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: