Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First implementation of the behavioral_analytics track. #395

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

afoucret
Copy link
Contributor

No description provided.

@afoucret afoucret requested a review from pquentin April 17, 2023 12:39
@afoucret afoucret marked this pull request as ready for review April 18, 2023 12:03
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I have left a few comments.

Comment on lines +9 to +19
# Constants used in event randomization
num_sessions = 1000000
num_paths = 50000
num_query_params = 1000
num_queries = 10000
num_docs = 50000
num_index = 3
num_queries = 20000
num_search_apps = 2
num_result = 10
search_ratio = 30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python constants are written in ALL_CAPS per PEP 8

Comment on lines +27 to +33
random_paths = list(map(lambda _: random_identifier(), range(1, num_paths)))
random_query_params = list(map(lambda _: random_identifier(), range(1, num_query_params)))
random_titles = list(map(lambda _: random_identifier(), range(1, num_paths)))
random_docs = list(map(lambda _: random_identifier(), range(1, num_docs)))
random_indices = list(map(lambda i: "index-%sd" % (i), range(1, num_index)))
random_search_applications = list(map(lambda i: "index-%sd" % (i), range(1, num_search_apps)))
random_queries = list(map(lambda _: random_identifier(), range(1, num_queries)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use range(1, num_paths)? This will give you 4999 paths, not 5000. And I also like functional programming, but this would be much easier to read with list comprehensions:

random_paths = [random_identifier() for _ in range(NUM_PATHS)]


# Function used to generate random events parts.
def random_identifier():
return str(uuid.uuid4())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on making the randomness reproducible by using a random seed and submitting the randomness to the uuid4 function? (It's not because we can that we should, I'm just wondering.)

https://stackoverflow.com/questions/41186818/how-to-generate-a-random-uuid-which-is-reproducible-with-a-seed-in-python

for _ in iterate(num_sessions):
session_id = random_identifier()
user_id = random_identifier()
for i in range(1, random.randint(1, 50)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 50 could also be a constant.

import operator
import re

from esrally.track.params import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please avoid a star import here?

from esrally.track.params import *


class EventBatchDataReader:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment mentioning that most of this was copy/pasted from Rally?

"documents": [
{
"source-file": "events.json.bz2",
"document-count": 24514512
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add uncompressed-bytes too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants