Skip to content

Stage 3 LAERS and FAERS Transforms

wolfderby edited this page Sep 1, 2022 · 26 revisions

Important

  • The domains are "demo, drug, outc, indi, reac, rpsr, and ther" in this wiki "domain" implies one of those.
  • Domain data is downloaded and staged locally by stage_2 ./s3_data_download.sh according to timeframe values in your .config
  • Data is always loaded from the staged `${BASE_FILE_DIR}/"domain"/"domain".txt (ie /repos/parent/dir/demo/demo.txt for demographic data source)
  • Parallel processing on transforms means all domains load at once
  • Domain tables are automatically truncated (an impulse start then stop will drop lots of data!!)

All hops enabled: image

Troubleshooting

  • The domain.txt is built by stage_2's s3_data_download.sh (ie demo.txt, drug.txt, etc)
  • Previous domain.txt issues encountered
    • Headers (1st lines of files in the s3 bucket do not match)
    • Missing column(s) of data for first few quarters
    • sed replaced both the CR and the LF of CRLF none printing line break character to create alternating rows of data source should be fixed with tr -d '\015' <${domain}.txt >${domain}.txt

Suggested steps to fix a domain.txt file

  • Isolate the problem by viewing the file
    • head and tail domain.txt
      • cd to ${BASE_FILE_DIR}/"domain" (ie /repos/parent/dir/demo)
      • run head demo.txt tail demo.txt
    • Open in vscode
    • Attempt to open in LibreOffice Calc (often cannot due to size of file and will drop last rows)
  • To view s3 source files headers (1st lines) run troubleshooting_scripts/s3_data_download_header_finder.sh
  • To edit header (first line) of staged domain.txt
    • open .txt file in vscode
    • ...or sed -i '0,/\$yr\$qtr/{s/\$yr\$qtr/\$qtr\$yr/}' indi.txt swap first occurrence
  • To create file you can open in libreoffice calc sed '15001,$ d' domain.txt > domain_with_15001_lines.txt
  • Once fixed attempt a reload by disabling other domain hops
    • example only demo domain enabled: image

and see s3_data_download.sh variants for rough drafts of snippets to rebuild staged source file(s)

Stage 4 Wiki