SQL Repeat Record Report & Case Data Report #29029

kaapstorm · 2021-01-24T19:50:11Z

Summary

This PR is the next in a series to implement [CEP] Migrate RepeatRecord to SQL.

Previous PR: process_repeater_stub() task

no-obligation fyi: @orangejenny @millerdev @snopoke

This PR creates a repeat record report for SQL repeat records. HQ determines which report to use based on whether the domain has SQL repeat records. (See the "Test are_repeat_records_migrated" commit.)

Two important things to note from the CEP:

In an upcoming PR, SQLRepeatRecords are created by soft-migrating the RepeatRecord instances that are about to be forwarded.
- Only domains actively forwarding data will have SQLRepeatRecord instances, and will see the SQL repeat record report.
- Domains whose repeaters are paused, or who are not generating payloads for them, will only have Couch RepeatRecord instances, and will see the current repeat record report.
- Domains with a mix of paused and unpaused repeaters will see the SQL repeat record report. In this scenario, the repeat records of paused repeaters will not appear in the repeat record report unless/until they are migrated.
It will take up to 7 days to soft-migrate all active Couch RepeatRecord instances. Old instances that have either succeeded or been cancelled will not be soft-migrated. We can choose whether to migrate them or not.

An upcoming PR includes a management command for migrating repeat records to SQL.

🐟 🐡 🐠

Safety Assurance

Risk label is set correctly
The set of people pinged as reviewers is appropriate for the level of risk of the change
If QA is part of the safety story, the "Awaiting QA" label is used
I am certain that this PR will not introduce a regression for the reasons below

Automated test coverage

PR includes tests that cover changes. (Please comment if you spot changes that are not covered by automated tests and you think ought to be.)

QA Plan

The code that makes this PR reachable will be staged and QAed. We have two more PRs to go before we get to the one that makes these changes reachable.

Safety story

This code is not reachable yet.

Rollback instructions

This PR can be reverted after deploy with no further considerations

stickler-ci · 2021-01-24T19:50:23Z

corehq/apps/data_interfaces/tasks.py

+def task_operate_on_payloads(
+    record_ids: List[str],
+    domain: str,
+    action,  # type: Literal['resend', 'cancel', 'requeue']  # 3.8+


F821 undefined name 'Literal'

stickler-ci · 2021-01-24T19:50:23Z

corehq/apps/data_interfaces/tasks.py

@@ -234,24 +241,34 @@ def task_generate_ids_and_operate_on_payloads(
    payload_id: Optional[str],
    repeater_id: Optional[str],
    domain: str,
-    action: str = '',
+    action,  # type: Literal['resend', 'cancel', 'requeue']  # 3.8+


F821 undefined name 'Literal'

stickler-ci · 2021-01-24T19:50:23Z

corehq/apps/data_interfaces/utils.py

+def operate_on_payloads(
+    repeat_record_ids: List[str],
+    domain: str,
+    action,  # type: Literal['resend', 'cancel', 'requeue']  # 3.8+


F821 undefined name 'Literal'

corehq/apps/data_interfaces/utils.py

stickler-ci · 2021-01-24T19:50:24Z

corehq/motech/repeaters/views/repeat_records.py

+def _schedule_task_with_flag(
+    request: HttpRequest,
+    domain: str,
+    action,  # type: Literal['resend', 'cancel', 'requeue']  # 3.8+


F821 undefined name 'Literal'

stickler-ci · 2021-01-24T19:50:24Z

corehq/motech/repeaters/views/repeat_records.py

+def _schedule_task_without_flag(
+    request: HttpRequest,
+    domain: str,
+    action,  # type: Literal['resend', 'cancel', 'requeue']  # 3.8+


F821 undefined name 'Literal'

kaapstorm · 2021-01-27T07:35:15Z

ping @dannyroberts

corehq/motech/repeaters/tests/test_models.py

millerdev · 2021-01-28T20:52:28Z

corehq/motech/repeaters/dbaccessors.py

+        queryset = queryset.filter(repeater_stub__repeater_id=repeater_id)
+    if state:
+        queryset = queryset.filter(state=state)
+    return queryset.count()


Is it important for this count to be exact? If not, would it be good enough to estimate it using the query planner (much faster)?

millerdev · 2021-01-28T20:56:12Z

corehq/motech/repeaters/dbaccessors.py

+        queryset = queryset.filter(repeater_stub__repeater_id=repeater_id)
+    if state:
+        queryset = queryset.filter(state=state)
+    return (queryset.order_by('-registered_at')[skip:skip + limit]


Offset is inefficient for large offsets and can result in records being skipped in some cases. Would it work to use paginate_query or possibly the technique used in iter_large?

I share your concern.

Thanks for the links. I also looked at Ethan's recent PR along the same lines. But all of them seem to me to be appropriate for iterating rows or models, but not for paginating a report.

I think we have two options here:

Use Elasticsearch instead of Postgres for the Repeat Record Report.

Hope that users don't care about what happens more than a couple of pages into the report.

I don't really love either of those options. Are my concerns valid? Are there other options I'm not considering?

@dannyroberts @gherceg do you have opinions on whether the Repeat Records Report should be backed by Elasticsearch or Postgres? And if Postgres, how we should paginate it?

I wonder, doesn't couch (which currently backs the report) have the same pagination issue? And doesn't elasticsearch? Maybe I'm missing some special magic that Couch and/or Elasticsearch do, but my guess is that they also just internally do the query for skip + limit items and return the last limit of them.

At some point it also becomes more than is feasible for a person to actually click through. So having more granular filters and an upper limit (10k? 100k?) to the number of items we're willing to show in the results through pagination seems like the best longer term option to me

corehq/motech/repeaters/models.py

corehq/apps/data_interfaces/tasks.py

corehq/apps/data_interfaces/utils.py

Co-authored-by: Daniel Miller <[email protected]>

corehq/apps/data_interfaces/utils.py

millerdev · 2021-01-29T16:52:30Z

corehq/motech/repeaters/models.py

@@ -1185,7 +1185,8 @@ def failure_reason(self):
    def last_message(self):
        # Uses `list(queryset)[-1]` instead of `queryset.last()` to use
        # prefetched attempts, if available.
-        return list(self.attempts)[-1].message
+        attempts = list(self.attempts)
+        return attempts[-1].message if attempts else None


Nit: could this return an empty string rather than None in the case of no attempts? That would make the return type always be the same (requiring less if x is None checks elsewhere) without breaking process(message) if message else ignore.

I thought about this, and those are good points.

I leaned towards None because attempts[-1].message could be None anyway:

message = models.TextField(null=True, blank=True)

I could return a string always:

def last_message(self): attempts = list(self.attempts) if attempts return attempts[-1].message or '' return ''

Or I could warn future devs to take None into consideration -> 😉 :

def last_message(self) -> Optional[str]: attempts = list(self.attempts) return attempts[-1].message if attempts else None

I don't really love the idea of changing the model. Semantically None means that message has not been assigned a value. I think that's useful to know, and maybe last_message() should convey that. What do you think?

I think up to you. I'd personally go with a NOT NULL field on the model and make it always return a string since that tends to make many things easier and less error prone in my experience. Is there ever a time when it's important to know that a message has not been assigned (value is None) vs there is no message (value is empty string)?

Is there ever a time when it's important to know that a message has not been assigned

An excellent question. e686887

corehq/apps/data_interfaces/utils.py

millerdev · 2021-02-01T19:14:55Z

corehq/motech/repeaters/models.py

@@ -1185,7 +1185,8 @@ def failure_reason(self):
    def last_message(self):
        # Uses `list(queryset)[-1]` instead of `queryset.last()` to use
        # prefetched attempts, if available.
-        return list(self.attempts)[-1].message
+        attempts = list(self.attempts)
+        return attempts[-1].message if attempts else None


I think up to you. I'd personally go with a NOT NULL field on the model and make it always return a string since that tends to make many things easier and less error prone in my experience. Is there ever a time when it's important to know that a message has not been assigned (value is None) vs there is no message (value is empty string)?

millerdev · 2021-02-01T19:50:29Z

corehq/apps/data_interfaces/tasks.py

        else:
-            return [r['id'] for r in iter_repeat_records_by_repeater(
-                domain, repeater_id)]
+            records = iter_repeat_records_by_repeater(domain, repeater_id)


Does this query use include_docs: true (couch view option)? Would the performance improvement of doing the query with include_docs: false be worth it? Same applies to above query with payload_id.

👍 9c6c06f

kaapstorm added the product/invisible Change has no end-user visible impact label Jan 24, 2021

kaapstorm requested a review from proteusvacuum January 24, 2021 19:50

stickler-ci reviewed Jan 24, 2021

View reviewed changes

SQLRepeatRecordReport

b506f4f

kaapstorm force-pushed the nh/rep/report branch from eed78ed to e3dc880 Compare January 28, 2021 18:14

stickler-ci reviewed Jan 28, 2021

View reviewed changes

corehq/motech/repeaters/tests/test_models.py Outdated Show resolved Hide resolved

kaapstorm added 6 commits January 28, 2021 20:17

⚖ Test are_repeat_records_migrated

42cb272

♻ Use quickcache to avoid N+1 query problem

a4629c1

⚖ Test num queries for SQLRepeatRecordReport

be898b7

Repeat record report tasks support SQL

087e7e3

⚖ Update tests for Repeat Record Report tasks

56e3a17

Duck type for ReferCaseRepeater

f9a9302

kaapstorm force-pushed the nh/rep/report branch from e3dc880 to f9a9302 Compare January 28, 2021 19:55

Base automatically changed from nh/rep/process_repeater_stub to master January 28, 2021 19:57

kaapstorm marked this pull request as ready for review January 28, 2021 20:00

kaapstorm requested a review from orangejenny as a code owner January 28, 2021 20:00

Merge branch 'master' into nh/rep/report

2142e55

millerdev approved these changes Jan 28, 2021

View reviewed changes

Use enumerate(..., start=1)

cca491a

Co-authored-by: Daniel Miller <[email protected]>

stickler-ci reviewed Jan 29, 2021

View reviewed changes

corehq/apps/data_interfaces/utils.py Outdated Show resolved Hide resolved

stickler-ci reviewed Jan 29, 2021

View reviewed changes

corehq/apps/data_interfaces/utils.py Outdated Show resolved Hide resolved

Return None instead of raising IndexError

94a1b0a

kaapstorm force-pushed the nh/rep/report branch from caa375c to 94a1b0a Compare January 29, 2021 14:36

stickler-ci reviewed Jan 29, 2021

View reviewed changes

corehq/apps/data_interfaces/utils.py Outdated Show resolved Hide resolved

millerdev reviewed Jan 29, 2021

View reviewed changes

Add test. Fix IDs.

923453d

stickler-ci reviewed Jan 29, 2021

View reviewed changes

corehq/apps/data_interfaces/utils.py Outdated Show resolved Hide resolved

kaapstorm added 2 commits January 29, 2021 20:29

Drop bad type hint

89ba6db

Update tests

365c159

millerdev approved these changes Feb 1, 2021

View reviewed changes

kaapstorm added 6 commits April 3, 2021 19:16

Merge branch 'master' into nh/rep/report

e6b7c9a

Merge branch 'master' into nh/rep/report

57be9cf

Don't include docs if we don't need them

9c6c06f

We don't need None

e686887

Drop fragile test

550e93d

Estimate count

db0a110

dimagimon added the reindex/migration Reindex or migration will be required during or before deploy label Apr 10, 2021

millerdev approved these changes Apr 12, 2021

View reviewed changes

kaapstorm added 2 commits April 14, 2021 13:04

Update test with renamed functions

027dc4f

Fix report link in Data Forwarding page

171cbe7

kaapstorm merged commit 49953a1 into master Apr 14, 2021

kaapstorm deleted the nh/rep/report branch April 14, 2021 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL Repeat Record Report & Case Data Report #29029

SQL Repeat Record Report & Case Data Report #29029

kaapstorm commented Jan 24, 2021

stickler-ci Jan 24, 2021

stickler-ci Jan 24, 2021

stickler-ci Jan 24, 2021

stickler-ci Jan 24, 2021

stickler-ci Jan 24, 2021

kaapstorm commented Jan 27, 2021

millerdev Jan 28, 2021

kaapstorm Apr 10, 2021

millerdev Jan 28, 2021

kaapstorm Apr 10, 2021

dannyroberts Apr 12, 2021

millerdev Jan 29, 2021

kaapstorm Jan 29, 2021 •

edited

Loading

millerdev Feb 1, 2021

kaapstorm Apr 10, 2021

millerdev Feb 1, 2021

millerdev Feb 1, 2021

kaapstorm Apr 10, 2021

SQL Repeat Record Report & Case Data Report #29029

SQL Repeat Record Report & Case Data Report #29029

Conversation

kaapstorm commented Jan 24, 2021

Summary

Safety Assurance

Automated test coverage

QA Plan

Safety story

Rollback instructions

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaapstorm commented Jan 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaapstorm Jan 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaapstorm Jan 29, 2021 •

edited

Loading