Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first stab at a data backfill for StateFileArchivedIntake #5305

Draft
wants to merge 11 commits into
base: FYST-1466-prior-year-access-mvp
Choose a base branch
from

Conversation

jnf
Copy link
Contributor

@jnf jnf commented Dec 31, 2024

Link to pivotal/JIRA issue

FYST-1504

Is PM acceptance required? (delete one)

No - merge after code review approval

What was done?

  • Adds a rake task (and corresponding service object) to identify and create records for intakes/submissions from TY23 that are 'archiveable.'
  • Defining 'archiveable' as an intake + accepted efile submission for TY23.
    • Not concerned w/ efile submissions in any other state, as, for the purposes of the larger project, we're creating this archived records for the explicit purpose of providing PDFs to clients who successfully submitted a return.
  • The rake task is batched at 10. That's super safe mode and we can tune it if we want. I chose such a low number as 'archiving' requires attaching a PDF blob to the archive record. It's still unclear to me whether this duplicates the actual file in S3 or just adds an additional reference. Need to figure out which it is and ensure that the later task of removing old intakes from the production table doesn't accidentally break this association.

How to test?

  • The service object is testable in a rails console. Creating an instance for a state exposes the core methods of #query_archiveable, #current_batch, and #archive_batch.
  • The rake task can be run locally, but you'll need some TY23 returns with an 'accepted' efile submission. jey has a database dump you can use with NY and AZ returns in this state.

@jnf jnf added the wip denotes a work in progress that isn't ready for formal review label Dec 31, 2024
Copy link

Heroku app: https://gyr-review-app-5305-3f92ceae63ac.herokuapp.com/
View logs: heroku logs --app gyr-review-app-5305 (optionally add --tail)

@@ -0,0 +1,74 @@
module StateFile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at a high level, i think it would be helpful to write tests for this file.

next
end
Rails.logger.info("Archived #{archived_ids.count} #{data_source} intakes: [#{archived_ids.join(', ')}]")
@current_batch = nil # reset the batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the @current_batch set to nil for this method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i think it is to exit the while loop in the rake task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting it to nil to represent that the batch has been processed; those intakes are no longer eligible for archiving because they've been archived :)

next
end
Rails.logger.info("Archived #{archived_ids.count} #{data_source} intakes: [#{archived_ids.join(', ')}]")
@current_batch = nil # reset the batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i think it is to exit the while loop in the rake task.

archive = StateFileArchivedIntake.new(record.without('source_intake_id'))
archive.submission_pdf.attach(data_source.find(record['source_intake_id']).submission_pdf.blob)
archive.save!
archived_ids << record['source_intake_id']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 i am assuming that we don't want to archive and delete in the same rake task.

maybe it would be helpful to have a method that after the archive_batch is run, we verify that the archivable intakes now have an archivable record.

i don't if that makes sense or is easy.

Copy link
Contributor

@anisharamnani anisharamnani Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also how were you thinking of recording the intake IDs for deleting? or were you thinking of running the query again and then deleting them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not deleting in this task. collecting the ids for logging purposes only. we'll do the actual table truncation in a different effort.

@anisharamnani
Copy link
Contributor

also, the annotate error should be resolved if you merge in FYST-1466-prior-year-access-mvp 🎉

#{tax_year} AS tax_year, '#{state_code}' AS state_code,
email_address, hashed_ssn, id AS source_intake_id
FROM
state_file_az_intakes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 just realized this query is just for AZ. since this is a WIP PR, i'm sure you were gonna change this, but just pointing out for other reviewers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, fixed in the next commit.

@jnf jnf force-pushed the jnf/FYST-1466-2 branch from 6143bf3 to a941eb4 Compare January 6, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wip denotes a work in progress that isn't ready for formal review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants