You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
need to figure out what weather stations we want to keep and which we do not
do bounding box (future create a bc buffered polygon that we can query)
For each weather station grab the lat longs and determine if they are in our area of interest.
if so then proceed to processing
Processing
Listed in the climate_obs spreadsheet to get the station list
pull down the station data for the current hour (note hours in the file names use UTC)
Extract from the individual xml files the following properties:
pcpn_amt_pst1hr
avg_air_temp_pst1hr
If a new day is detected then create a new file, otherwise pull the existing file from object store update it and repush (make sure we are not creating new versions)
create 2 different input files one for temperature and another for precip.
PC.csv
TA.csv
format of the files / columns:
date
climate stations (listed along the x axis like the PC.csv ASP data)
actual data (either precip. or temperature depending on which file is being created(
Script would run hourly when the data is available
Would pull the data down and update it, and then repost. (make sure we are not creating a new version in object storage when file is updated)
Need to setup a sync process that will ensure the data that exists in object store also exists on prem server.
on prem file path: Z:\MPOML\HOURLY (sewer)
object store path: RFC_DATA/ECC_HOURLY/
Secondary:
listen to the message queue for the specific data we want and trigger the github action
The text was updated successfully, but these errors were encountered:
Script is mostly complete. Hourly XML files for stations in station list are being downloaded, processed and saved to a dataframe, which is then saved as a parquet file in object store. Daily temperature and precipitation are generated and saved to object store. 'air_temp' variable used instead of 'avg_air_temp_pst1hr' as the latter was missing for many stations.
To do:
sync TA.csv/PC.csv files to prem server
Review and clean-up code, look for additional efficiencies to cut down run-time
Consider adding additional variables or stations to download that could come in useful in the future (e.g. implement the weather station bounding box strategy described above)
Create a script that will pull the following information on an hourly basis.
Source of data:
https://hpfx.collab.science.gc.ca/20231101/WXO-DD/observations/swob-ml/20231101/
Data Aquisition
Processing
Listed in the climate_obs spreadsheet to get the station list
pull down the station data for the current hour (note hours in the file names use UTC)
Extract from the individual xml files the following properties:
If a new day is detected then create a new file, otherwise pull the existing file from object store update it and repush (make sure we are not creating new versions)
create 2 different input files one for temperature and another for precip.
format of the files / columns:
Script would run hourly when the data is available
Would pull the data down and update it, and then repost. (make sure we are not creating a new version in object storage when file is updated)
Need to setup a sync process that will ensure the data that exists in object store also exists on prem server.
Secondary:
The text was updated successfully, but these errors were encountered: