-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generating benchmark summary JSON data for indexing #122
Comments
In principal, I don't think we want to have any identifiers in key names. So the "
|
From @atheurer: OK, but all of this hierarchy/format will now need to be pushed into each of the benchmarks, as there's no really good alternative to generate these summaries. What I mean generally is that it's not an easy change to make because the way we store this [benchmark result] data currently does not match this hierarchy. For example [for throughput], we have multiple Gb_sec tags with different things embedded (like the hostname) and a relatively "shallow" hierarchy in our hashes, because we made use of GenData to generate benchmark graphs and a per-series average value for summaries. That shallow hierarchy is $hash{htmlpage}{graph-name}{series}, which originated from our tool data. So, taking a shallow data hierarchy and expanding to not use identifiers in key names when possible just becomes tricky (well maybe messy is a better way to describe this), especially if you want a single script to generate this summary for all benchmarks (and not have a huge case statement) -so it's just better in the long run to adopt a very similar JSON hierarchy from the very beginning, the core data -each benchmark iteration sample. Below is an example of a new benchmark iteration sample format (1 of probably 3-5 sample runs), but (for now) just two of the metric classes included, latency and throughput:
I believe that format above respects the "no identifier in key" rule. This format is now quite similar to the "summary-format" we are discussing, but it is -not- the summary. Note that it has a "data-samples" hash, but that time-based samples from the benchmark itself. Now, if you have 5 examples of the data above (from 5 different runs/samples), generating that summary json variable is much easier, because the summary hierarchy is inferred from the sample hierarchy. |
@atheurer, I think I understand what you mean about using GenData to emit results data, or really anything else besides tool data, but it expects the data in a form as described, htmlpage.graph-name.series. The new format you show above looks good in terms of keys, except I'd break the hostname apart from the port, and list them separately, and try to get an FQDN for that name. And then when we index that file, we'll apply the URL directory hierarchy in which this lives as additional metadata. So as JSON the above would look like:
However, if we were to create a JSON document with the above and put it into ElasticSearch we'd have no good way to associate it with a particular pbench RUN. So at the least we'd be adding the pbench run ID. But to make this more useful, I think we need to add a bunch of metadata about this particular run, its sample #, etc. So somehow we need to get that data associated with the above at least by indexing time. Does that make sense? |
Yes, the run data will be added, just not there yet. As for hostname/port separated, I'd like to have a "URL", or some sort of locater value which is always unique, so that's why I used hostname:port. We can add hostname as another element, but I'd like to keep at least one element which fully describes the service. Below is an updated json for a benchmark iteration sample. It does not yet include run properties, but has resource and latency data, plus description elements.
|
On Mon, Dec 14, 2015 at 2:17 PM, Andrew Theurer [email protected]
Looks good to me. -peter
|
@portante, I was doing a little research for indexing and timestamps, and came across this: Should we follow that method for our data? For example:
|
On 03/23/2016 11:26 AM, Andrew Theurer wrote:
I don't think you should quote the date value when it is in milliseconds I'm pretty sure in my JS code that would blow up when trying to convert Karl Rister |
Good catch, thanks |
@atheurer, that would work. I have been using the ASCII version, readable by humans, |
Here's the most recent version of the JSON. This was generated by running
|
@atheurer I took a look at summary-result.json and it would be very helpful for me when doing analytics on pbench data if the json file included hardware and basic OS configuration information such as ssd used, filesystem type (xfs, ext4), lvm or no lvm, OS and version. @portante says I can get most of this info from the sosreport, but it would be much more convenient if it were in the json data. |
@dfeddema I think all of that will be there. However, the example above is only the benchmark part. @ndokos is working on joining that with all of the other data collected, including much of the information you want. The example above I consider just a leaf of a much bigger tree of data. Hopefully @ndokos can explain this better than I. |
@k-rister in moving to json data for benchmark results, we are losing the "old" hash format that gen_data() would use for graphing. This unfortunately is the hash format you have been using in GenData to move to your graphing method. I'd like to not lose the benchmark graphs feature, but I'd rather not generate two kinds of data formats for this. I suppose I could write a conversion function to take the json and produce the hash we use for graphing, but just wondering if there's a better solution. |
@ndokos to take another look |
From Andrew Theurer:
Guys, if you have a chance, take a look at the attached files[1]:
These are generated with a new benchmark-summary script, which all benchmarks will eventually use. Currently I have uperf using this in a git branch of mine.
What's new here is the summary-result.* files, including the json format. html format also now uses a html table.
I think the json format should work for elastic search. It was based on our conversation way back, with some minor tweaks.
[1]archive.zip
The text was updated successfully, but these errors were encountered: