forked from mlcommons/cm4mlperf-inference
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Results from self hosted Github actions - NVIDIARTX4090
- Loading branch information
1 parent
bc23f98
commit dd3bb80
Showing
35 changed files
with
12,253 additions
and
0 deletions.
There are no files selected for viewing
3 changes: 3 additions & 0 deletions
3
...rements/ce59bba944a6-nvidia_original-gpu-tensorrt-vdefault-scc24-base/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
| Model | Scenario | Accuracy | Throughput | Latency (in ms) | | ||
|---------------------|------------|------------|--------------|-------------------| | ||
| stable-diffusion-xl | offline | () | 1.318 | - | |
95 changes: 95 additions & 0 deletions
95
...original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.* | ||
|
||
## Host platform | ||
|
||
* OS version: Linux-6.8.0-49-generic-x86_64-with-glibc2.29 | ||
* CPU version: x86_64 | ||
* Python version: 3.8.10 (default, Nov 7 2024, 13:10:47) | ||
[GCC 9.4.0] | ||
* MLCommons CM version: 3.5.2 | ||
|
||
## CM Run Command | ||
|
||
See [CM installation guide](https://docs.mlcommons.org/inference/install/). | ||
|
||
```bash | ||
pip install -U cmind | ||
|
||
cm rm cache -f | ||
|
||
cm pull repo mlcommons@mlperf-automations --checkout=48ea6b46a7606d1c5d74909e94d5599dbe7ff9e1 | ||
|
||
cm run script \ | ||
--tags=app,mlperf,inference,generic,_nvidia,_sdxl,_tensorrt,_test,_r4.1-dev_default,_float16,_offline \ | ||
--quiet=true \ | ||
--env.CM_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST=yes \ | ||
--env.CM_QUIET=yes \ | ||
--env.CM_MLPERF_IMPLEMENTATION=nvidia \ | ||
--env.CM_MLPERF_MODEL=sdxl \ | ||
--env.CM_MLPERF_RUN_STYLE=test \ | ||
--env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False \ | ||
--env.CM_DOCKER_PRIVILEGED_MODE=True \ | ||
--env.CM_MLPERF_BACKEND=tensorrt \ | ||
--env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter \ | ||
--env.CM_MLPERF_CLEAN_ALL=True \ | ||
--env.CM_MLPERF_DEVICE= \ | ||
--env.CM_MLPERF_USE_DOCKER=True \ | ||
--env.CM_MLPERF_MODEL_PRECISION=float16 \ | ||
--env.OUTPUT_BASE_DIR=/cm-mount/home/arjun/scc_gh_action_results \ | ||
--env.CM_MLPERF_LOADGEN_SCENARIO=Offline \ | ||
--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/cm-mount/home/arjun/scc_gh_action_submissions \ | ||
--env.CM_MLPERF_INFERENCE_VERSION=5.0-dev \ | ||
--env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1-dev_default \ | ||
--env.CM_MLPERF_SUBMISSION_DIVISION=open \ | ||
--env.CM_RUN_MLPERF_SUBMISSION_PREPROCESSOR=False \ | ||
--env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=short \ | ||
--env.CM_MLPERF_SUT_NAME_RUN_CONFIG_SUFFIX4=scc24-base \ | ||
--env.CM_DOCKER_IMAGE_NAME=scc24-nvidia \ | ||
--env.CM_MLPERF_INFERENCE_MIN_QUERY_COUNT=50 \ | ||
--env.CM_MLPERF_LOADGEN_ALL_MODES=yes \ | ||
--env.CM_MLPERF_INFERENCE_SOURCE_VERSION=5.0.4 \ | ||
--env.CM_MLPERF_LAST_RELEASE=v5.0 \ | ||
--env.CM_TMP_PIP_VERSION_STRING= \ | ||
--env.CM_MODEL=sdxl \ | ||
--env.CM_MLPERF_LOADGEN_COMPLIANCE=no \ | ||
--env.CM_MLPERF_CLEAN_SUBMISSION_DIR=yes \ | ||
--env.CM_RERUN=yes \ | ||
--env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= \ | ||
--env.CM_MLPERF_LOADGEN_MODE=performance \ | ||
--env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline \ | ||
--env.CM_MLPERF_LOADGEN_MODES,=performance,accuracy \ | ||
--env.CM_OUTPUT_FOLDER_NAME=test_results \ | ||
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=no \ | ||
--env.CM_DOCKER_DETACHED_MODE=yes \ | ||
--add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \ | ||
--add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \ | ||
--add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \ | ||
--add_deps_recursive.submission-checker.tags=_short-run \ | ||
--add_deps_recursive.coco2014-preprocessed.tags=_size.50,_with-sample-ids \ | ||
--add_deps_recursive.coco2014-dataset.tags=_size.50,_with-sample-ids \ | ||
--add_deps_recursive.nvidia-preprocess-data.extra_cache_tags=scc24-base \ | ||
--v=False \ | ||
--print_env=False \ | ||
--print_deps=False \ | ||
--dump_version_info=True | ||
``` | ||
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts), | ||
you should simply reload mlcommons@mlperf-automations without checkout and clean CM cache as follows:* | ||
|
||
```bash | ||
cm rm repo mlcommons@mlperf-automations | ||
cm pull repo mlcommons@mlperf-automations | ||
cm rm cache -f | ||
|
||
``` | ||
|
||
## Results | ||
|
||
Platform: ce59bba944a6-nvidia_original-gpu-tensorrt-vdefault-scc24-base | ||
|
||
Model Precision: int8 | ||
|
||
### Accuracy Results | ||
|
||
### Performance Results | ||
`Samples per second`: `1.31816` |
73 changes: 73 additions & 0 deletions
73
...riginal-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/accuracy_console.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
[2024-12-31 12:27:42,667 main.py:229 INFO] Detected system ID: KnownSystem.ce59bba944a6 | ||
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). | ||
warnings.warn(_BETA_TRANSFORMS_WARNING) | ||
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). | ||
warnings.warn(_BETA_TRANSFORMS_WARNING) | ||
[2024-12-31 12:27:43,980 generate_conf_files.py:107 INFO] Generated measurements/ entries for ce59bba944a6_TRT/stable-diffusion-xl/Offline | ||
[2024-12-31 12:27:43,981 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/cm-mount/home/arjun/scc_gh_action_results/test_results/ce59bba944a6-nvidia_original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=2 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/7f314a33540f461d/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/c2856974d8384964a67e4134073fccab.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan,./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl | ||
[2024-12-31 12:27:43,981 __init__.py:53 INFO] Overriding Environment | ||
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). | ||
warnings.warn(_BETA_TRANSFORMS_WARNING) | ||
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning(). | ||
warnings.warn(_BETA_TRANSFORMS_WARNING) | ||
[2024-12-31 12:27:45,905 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan. | ||
[2024-12-31 12:27:46,032 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan. | ||
[2024-12-31 12:27:46,662 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan. | ||
[2024-12-31 12:27:48,004 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan. | ||
[2024-12-31 12:27:49,345 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan. | ||
[2024-12-31 12:27:49,468 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan. | ||
[2024-12-31 12:27:50,101 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan. | ||
[2024-12-31 12:27:51,439 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan. | ||
[2024-12-31 12:27:52,600 harness.py:207 INFO] Start Warm Up! | ||
[2024-12-31 12:28:04,349 harness.py:209 INFO] Warm Up Done! | ||
[2024-12-31 12:28:04,349 harness.py:211 INFO] Start Test! | ||
[2024-12-31 13:29:39,902 backend.py:801 INFO] [Server] Received 5000 total samples | ||
[2024-12-31 13:29:39,903 backend.py:809 INFO] [Device 0] Reported 2496 samples | ||
[2024-12-31 13:29:39,903 backend.py:809 INFO] [Device 1] Reported 2504 samples | ||
[2024-12-31 13:29:39,903 harness.py:214 INFO] Test Done! | ||
[2024-12-31 13:29:39,903 harness.py:216 INFO] Destroying SUT... | ||
[2024-12-31 13:29:39,903 harness.py:219 INFO] Destroying QSL... | ||
benchmark : Benchmark.SDXL | ||
buffer_manager_thread_count : 0 | ||
data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data | ||
gpu_batch_size : 2 | ||
gpu_copy_streams : 1 | ||
gpu_inference_streams : 1 | ||
input_dtype : int32 | ||
input_format : linear | ||
log_dir : /home/cmuser/CM/repos/local/cache/7c0c2e4c9cc3421e/repo/closed/NVIDIA/build/logs/2024.12.31-12.27.41 | ||
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/7f314a33540f461d/inference/mlperf.conf | ||
model_path : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/models/SDXL/ | ||
offline_expected_qps : 0.0 | ||
precision : int8 | ||
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/preprocessed_data | ||
scenario : Scenario.Offline | ||
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.334532, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197334532000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='ce59bba944a6') | ||
tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/ | ||
test_mode : AccuracyOnly | ||
use_graphs : False | ||
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/c2856974d8384964a67e4134073fccab.conf | ||
system_id : ce59bba944a6 | ||
config_name : ce59bba944a6_stable-diffusion-xl_Offline | ||
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) | ||
optimization_level : plugin-enabled | ||
num_profiles : 1 | ||
config_ver : custom_k_99_MaxP | ||
accuracy_level : 99% | ||
inference_server : custom | ||
skip_file_checks : False | ||
power_limit : None | ||
cpu_freq : None | ||
[I] Loading bytes from ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan | ||
[W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. | ||
[I] Loading bytes from ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan | ||
[I] Loading bytes from ./build/engines/ce59bba944a6/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan | ||
[2024-12-31 13:29:40,393 run_harness.py:166 INFO] Result: Accuracy run detected. | ||
|
||
======================== Result summaries: ======================== | ||
|
7 changes: 7 additions & 0 deletions
7
...e-diffusion-xl/offline/ce59bba944a6-nvidia_original-gpu-tensorrt-vdefault-scc24-base.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"starting_weights_filename": "https://github.com/mlcommons/cm4mlops/blob/main/script/get-ml-model-stable-diffusion/_cm.json#L174", | ||
"retraining": "no", | ||
"input_data_types": "int32", | ||
"weight_data_types": "int8", | ||
"weight_transformations": "quantization, affine fusion" | ||
} |
Oops, something went wrong.