-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First implementation of the behavioral_analytics track. #395
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I have left a few comments.
# Constants used in event randomization | ||
num_sessions = 1000000 | ||
num_paths = 50000 | ||
num_query_params = 1000 | ||
num_queries = 10000 | ||
num_docs = 50000 | ||
num_index = 3 | ||
num_queries = 20000 | ||
num_search_apps = 2 | ||
num_result = 10 | ||
search_ratio = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python constants are written in ALL_CAPS per PEP 8
random_paths = list(map(lambda _: random_identifier(), range(1, num_paths))) | ||
random_query_params = list(map(lambda _: random_identifier(), range(1, num_query_params))) | ||
random_titles = list(map(lambda _: random_identifier(), range(1, num_paths))) | ||
random_docs = list(map(lambda _: random_identifier(), range(1, num_docs))) | ||
random_indices = list(map(lambda i: "index-%sd" % (i), range(1, num_index))) | ||
random_search_applications = list(map(lambda i: "index-%sd" % (i), range(1, num_search_apps))) | ||
random_queries = list(map(lambda _: random_identifier(), range(1, num_queries))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use range(1, num_paths)
? This will give you 4999 paths, not 5000. And I also like functional programming, but this would be much easier to read with list comprehensions:
random_paths = [random_identifier() for _ in range(NUM_PATHS)]
|
||
# Function used to generate random events parts. | ||
def random_identifier(): | ||
return str(uuid.uuid4()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on making the randomness reproducible by using a random seed and submitting the randomness to the uuid4 function? (It's not because we can that we should, I'm just wondering.)
for _ in iterate(num_sessions): | ||
session_id = random_identifier() | ||
user_id = random_identifier() | ||
for i in range(1, random.randint(1, 50)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 50 could also be a constant.
import operator | ||
import re | ||
|
||
from esrally.track.params import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please avoid a star import here?
from esrally.track.params import * | ||
|
||
|
||
class EventBatchDataReader: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment mentioning that most of this was copy/pasted from Rally?
"documents": [ | ||
{ | ||
"source-file": "events.json.bz2", | ||
"document-count": 24514512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add uncompressed-bytes too?
No description provided.