You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am aware that the exact number of VM cores are not given, as discussed in issue #5, and VMs are put in one of six buckets based on their cores or memory. However, it seems that description of these buckets are "missing", even though they were meant to be released.
I say "missing" (in quotes) because even though description file is not included in AzurePublicDatasetLinksV2.txt they are available for downloading on Azure Blob Storage. More precisely, schema.csv mentions that description of the CPU buckets are available at vm_virtual_core_bucket_definition.csv, which has two fields: bucket and definition. I blindly constructed a path for this file by appending the file name vm_virtual_core_bucket_definition.csv to the parent path and I was able to download through the constructed path vm_virtual_core_bucket_definition.csv.
The vm_virtual_core_bucket_definition.csv file has description of six buckets. These descriptions match the bucket labels in "VM Cores Distribution" plot in jupyter notebook, which is referenced in the main readme. This matching confirms that the file available through Azure Blob Storage is the correct one.
The same applies to description of memory bucket: schema.csv mentions vm_memory_bucket_definition.csv, it is not included in AzurePublicDatasetLinksV2.txt but is available for download in Azure Blob Storage, here vm_memory_bucket_definition.csv.
So, it would be great to update AzurePublicDatasetLinksV2.txt file to include URL for both files (to avoid future guesswork by others):
Let me know if you accept pull requests. I'd be happy to include these two URLs in AzurePublicDatasetLinksV2.txt by myself and perhaps add a short description of buckets to the main readme.
Also, is it accurate to say that
core range in bucket 6 is >24 and <=30, and
memory range in bucket 6 is >64 and <=70?
I noticed these lines in jupyter notebook, that suggest these ranges to be correct:
#Transform vmcorecount '>24' bucket to 30 and '>64' to 70
max_value_vmcorecountbucket = 30
max_value_vmmemorybucket = 70
trace_dataframe = trace_dataframe.replace({'vmcorecountbucket':'>24'},max_value_vmcorecountbucket)
trace_dataframe = trace_dataframe.replace({'vmmemorybucket':'>64'},max_value_vmmemorybucket)
Or is this transformation just a cosmetic improvement to have the jupyter table datatype as int? Having more precise bucket bounds would be helpful.
Finally, is there an external document that describes AzurePublicDatasetV2, like SOSP 2017 paper that describes AzurePublicDatasetV1? It would be useful to reference it in the readme, if any.
Thanks in advance for clarifications!
The text was updated successfully, but these errors were encountered:
Hi,
Can you please include description of VM core and memory buckets to AzurePublicDatasetV2 dataset? It is just about including these two URLs in AzurePublicDatasetLinksV2.txt
I am aware that the exact number of VM cores are not given, as discussed in issue #5, and VMs are put in one of six buckets based on their cores or memory. However, it seems that description of these buckets are "missing", even though they were meant to be released.
I say "missing" (in quotes) because even though description file is not included in AzurePublicDatasetLinksV2.txt they are available for downloading on Azure Blob Storage. More precisely, schema.csv mentions that description of the CPU buckets are available at
vm_virtual_core_bucket_definition.csv
, which has two fields: bucket and definition. I blindly constructed a path for this file by appending the file namevm_virtual_core_bucket_definition.csv
to the parent path and I was able to download through the constructed path vm_virtual_core_bucket_definition.csv.The
vm_virtual_core_bucket_definition.csv
file has description of six buckets. These descriptions match the bucket labels in "VM Cores Distribution" plot in jupyter notebook, which is referenced in the main readme. This matching confirms that the file available through Azure Blob Storage is the correct one.The same applies to description of memory bucket:
schema.csv
mentionsvm_memory_bucket_definition.csv
, it is not included inAzurePublicDatasetLinksV2.txt
but is available for download in Azure Blob Storage, here vm_memory_bucket_definition.csv.So, it would be great to update
AzurePublicDatasetLinksV2.txt
file to include URL for both files (to avoid future guesswork by others):Let me know if you accept pull requests. I'd be happy to include these two URLs in
AzurePublicDatasetLinksV2.txt
by myself and perhaps add a short description of buckets to the main readme.Also, is it accurate to say that
>24
and<=30
, and>64
and<=70
?I noticed these lines in jupyter notebook, that suggest these ranges to be correct:
Or is this transformation just a cosmetic improvement to have the jupyter table datatype as
int
? Having more precise bucket bounds would be helpful.Finally, is there an external document that describes AzurePublicDatasetV2, like SOSP 2017 paper that describes AzurePublicDatasetV1? It would be useful to reference it in the readme, if any.
Thanks in advance for clarifications!
The text was updated successfully, but these errors were encountered: