-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL Repeat Record Report & Case Data Report #29029
Conversation
def task_operate_on_payloads( | ||
record_ids: List[str], | ||
domain: str, | ||
action, # type: Literal['resend', 'cancel', 'requeue'] # 3.8+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F821 undefined name 'Literal'
@@ -234,24 +241,34 @@ def task_generate_ids_and_operate_on_payloads( | |||
payload_id: Optional[str], | |||
repeater_id: Optional[str], | |||
domain: str, | |||
action: str = '', | |||
action, # type: Literal['resend', 'cancel', 'requeue'] # 3.8+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F821 undefined name 'Literal'
def operate_on_payloads( | ||
repeat_record_ids: List[str], | ||
domain: str, | ||
action, # type: Literal['resend', 'cancel', 'requeue'] # 3.8+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F821 undefined name 'Literal'
def _schedule_task_with_flag( | ||
request: HttpRequest, | ||
domain: str, | ||
action, # type: Literal['resend', 'cancel', 'requeue'] # 3.8+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F821 undefined name 'Literal'
def _schedule_task_without_flag( | ||
request: HttpRequest, | ||
domain: str, | ||
action, # type: Literal['resend', 'cancel', 'requeue'] # 3.8+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F821 undefined name 'Literal'
ping @dannyroberts |
eed78ed
to
e3dc880
Compare
e3dc880
to
f9a9302
Compare
queryset = queryset.filter(repeater_stub__repeater_id=repeater_id) | ||
if state: | ||
queryset = queryset.filter(state=state) | ||
return queryset.count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it important for this count to be exact? If not, would it be good enough to estimate it using the query planner (much faster)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queryset = queryset.filter(repeater_stub__repeater_id=repeater_id) | ||
if state: | ||
queryset = queryset.filter(state=state) | ||
return (queryset.order_by('-registered_at')[skip:skip + limit] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offset is inefficient for large offsets and can result in records being skipped in some cases. Would it work to use paginate_query
or possibly the technique used in iter_large
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share your concern.
Thanks for the links. I also looked at Ethan's recent PR along the same lines. But all of them seem to me to be appropriate for iterating rows or models, but not for paginating a report.
I think we have two options here:
- Use Elasticsearch instead of Postgres for the Repeat Record Report.
- Hope that users don't care about what happens more than a couple of pages into the report.
I don't really love either of those options. Are my concerns valid? Are there other options I'm not considering?
@dannyroberts @gherceg do you have opinions on whether the Repeat Records Report should be backed by Elasticsearch or Postgres? And if Postgres, how we should paginate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, doesn't couch (which currently backs the report) have the same pagination issue? And doesn't elasticsearch? Maybe I'm missing some special magic that Couch and/or Elasticsearch do, but my guess is that they also just internally do the query for skip + limit
items and return the last limit
of them.
At some point it also becomes more than is feasible for a person to actually click through. So having more granular filters and an upper limit (10k? 100k?) to the number of items we're willing to show in the results through pagination seems like the best longer term option to me
Co-authored-by: Daniel Miller <[email protected]>
caa375c
to
94a1b0a
Compare
corehq/motech/repeaters/models.py
Outdated
@@ -1185,7 +1185,8 @@ def failure_reason(self): | |||
def last_message(self): | |||
# Uses `list(queryset)[-1]` instead of `queryset.last()` to use | |||
# prefetched attempts, if available. | |||
return list(self.attempts)[-1].message | |||
attempts = list(self.attempts) | |||
return attempts[-1].message if attempts else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could this return an empty string rather than None
in the case of no attempts? That would make the return type always be the same (requiring less if x is None
checks elsewhere) without breaking process(message) if message else ignore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, and those are good points.
I leaned towards None
because attempts[-1].message
could be None
anyway:
message = models.TextField(null=True, blank=True)
I could return a string always:
def last_message(self):
attempts = list(self.attempts)
if attempts
return attempts[-1].message or ''
return ''
Or I could warn future devs to take None
into consideration -> 😉 :
def last_message(self) -> Optional[str]:
attempts = list(self.attempts)
return attempts[-1].message if attempts else None
I don't really love the idea of changing the model. Semantically None
means that message has not been assigned a value. I think that's useful to know, and maybe last_message()
should convey that. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think up to you. I'd personally go with a NOT NULL
field on the model and make it always return a string since that tends to make many things easier and less error prone in my experience. Is there ever a time when it's important to know that a message has not been assigned (value is None
) vs there is no message (value is empty string)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there ever a time when it's important to know that a message has not been assigned
An excellent question. e686887
corehq/motech/repeaters/models.py
Outdated
@@ -1185,7 +1185,8 @@ def failure_reason(self): | |||
def last_message(self): | |||
# Uses `list(queryset)[-1]` instead of `queryset.last()` to use | |||
# prefetched attempts, if available. | |||
return list(self.attempts)[-1].message | |||
attempts = list(self.attempts) | |||
return attempts[-1].message if attempts else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think up to you. I'd personally go with a NOT NULL
field on the model and make it always return a string since that tends to make many things easier and less error prone in my experience. Is there ever a time when it's important to know that a message has not been assigned (value is None
) vs there is no message (value is empty string)?
corehq/apps/data_interfaces/tasks.py
Outdated
else: | ||
return [r['id'] for r in iter_repeat_records_by_repeater( | ||
domain, repeater_id)] | ||
records = iter_repeat_records_by_repeater(domain, repeater_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this query use include_docs: true
(couch view option)? Would the performance improvement of doing the query with include_docs: false
be worth it? Same applies to above query with payload_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 9c6c06f
Summary
This PR is the next in a series to implement [CEP] Migrate RepeatRecord to SQL.
Previous PR: process_repeater_stub() task
no-obligation fyi: @orangejenny @millerdev @snopoke
This PR creates a repeat record report for SQL repeat records. HQ determines which report to use based on whether the domain has SQL repeat records. (See the "Test are_repeat_records_migrated" commit.)
Two important things to note from the CEP:
In an upcoming PR, SQLRepeatRecords are created by soft-migrating the RepeatRecord instances that are about to be forwarded.
It will take up to 7 days to soft-migrate all active Couch RepeatRecord instances. Old instances that have either succeeded or been cancelled will not be soft-migrated. We can choose whether to migrate them or not.
An upcoming PR includes a management command for migrating repeat records to SQL.
🐟 🐡 🐠
Safety Assurance
Automated test coverage
PR includes tests that cover changes. (Please comment if you spot changes that are not covered by automated tests and you think ought to be.)
QA Plan
The code that makes this PR reachable will be staged and QAed. We have two more PRs to go before we get to the one that makes these changes reachable.
Safety story
This code is not reachable yet.
Rollback instructions