forked from ltscomputingllc/faersdbstats
-
Notifications
You must be signed in to change notification settings - Fork 4
Stage 3 LAERS and FAERS Transforms
wolfderby edited this page Sep 1, 2022
·
26 revisions
- The domains are "demo, drug, outc, indi, reac, rpsr, and ther" in this wiki "domain" implies one of those.
- Domain data is downloaded and staged locally by stage_2
./s3_data_download.sh
according to timeframe values in your .config - Data is always loaded from the staged `${BASE_FILE_DIR}/"domain"/"domain".txt (ie /repos/parent/dir/demo/demo.txt for demographic data source)
- Parallel processing on transforms means all domains load at once
- Domain tables are automatically truncated (an impulse start then stop will drop lots of data!!)
All hops enabled:
- The domain.txt is built by stage_2's s3_data_download.sh (ie demo.txt, drug.txt, etc)
- Previous domain.txt issues encountered
- Headers (1st lines of files in the s3 bucket do not match)
- Missing column(s) of data for first few quarters
- sed replaced both the CR and the LF of CRLF none printing line break character to create alternating rows of data source should be fixed with
tr -d '\015' <${domain}.txt >${domain}.txt
- Isolate the problem by viewing the file
-
head
andtail
domain.txt- cd to
${BASE_FILE_DIR}/"domain"
(ie /repos/parent/dir/demo) - run
head demo.txt
tail demo.txt
- cd to
- Open in vscode
- Attempt to open in LibreOffice Calc (often cannot due to size of file and will drop last rows)
-
- To view s3 source files headers (1st lines) run troubleshooting_scripts/s3_data_download_header_finder.sh
- To edit header (first line) of staged domain.txt
- open .txt file in vscode
- ...or
sed -i '0,/\$yr\$qtr/{s/\$yr\$qtr/\$qtr\$yr/}' indi.txt
swap first occurrence
- To create file you can open in libreoffice calc
sed '15001,$ d' domain.txt > domain_with_15001_lines.txt
- Once fixed attempt a reload by disabling other domain hops
- example only demo domain enabled:
- example only demo domain enabled:
and see s3_data_download.sh variants for rough drafts of snippets to rebuild staged source file(s)