update readmes and documentation

two-n · Sep 10, 2020 · ab371f5 · ab371f5
1 parent aae5fe1
commit ab371f5
Show file tree

Hide file tree

Showing 7 changed files with 145 additions and 93 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,8 +6,12 @@
 *_checkpoints
 
 # Turnstile data
-data/
-data/Turnstiles/Turnstile_Usage_Data__2020.csv
+/*
+!/data
+/data/*
+# include the data/README file
+!/data/README.md
+
 
 # QGIS Files
 *.qgz

diff --git a/README.md b/README.md
@@ -1,109 +1,48 @@
-# Data Sources
+# Visualizing MTA Ridership Changes During COVID-19
 
-### American Communities Census:
+## Repository Structure
 
-pulled [here](https://data.census.gov/cedsci/table?d=ACS%205-Year%20Estimates%20Data%20Profiles&table=DP03&tid=ACSDP5Y2018.DP03&g=0400000US36&hidePreview=true&moe=false) selecting all census tracts in the 5 buroughs.
+This repository holds all aspects of [MTA ridership project](http://projects.two-n.com/mta/).
 
-### Stations CSV (with Lat/Long)
+- `/data` - contains partial datasets that were used for research and analysis in this project. Note: due to the size of files, not all were commited to git, but the sources for all data sources (past and present) can be found in the [data folder README](./data/README.md).
 
-http://web.mta.info/developers/data/nyct/subway/Stations.csv
+- `/processingScripts` - contains python processing scripts used to pull and restructure the data for the visualization. There are some older jupyter notebook files in there that were used for data exploration, but the final scripts that contributed to the data in the application are all `.py` files.
 
-### 2010 Census Tract Shapefile
+  The primary scripts used were:
+  - [`mtaSwipes.py`](./processingScripts/mtaSwipes.py): scrapes MetroCard swipes data from [MTA developer's page](http://web.mta.info/developers/fare.html) and compiles all into a single dataset.
+  - [`acsByNTA.py`](./processingScripts/acsByNTA.py): filters [pre-downloaded census data](https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-acs-nta.page) for relevant metrics. Spatially joins [MTA station locations](http://web.mta.info/developers/data/nyct/subway/Stations.csv) into their surrounding Neighborhood Tabulation Areas. Saves `stations_with_ntas.csv` and `acs_nta.geojson` into `data/output` folder.
 
-https://data.cityofnewyork.us/City-Government/2010-Census-Tracts/fxpq-c8ku
+- `/app` - contains all the font-end application code.
 
-### MTA Turnstile Data 2020
+## App Structure
 
-https://data.ny.gov/Transportation/Turnstile-Usage-Data-2020/py8k-a8wg
+This app was built with TypeScript, leveraging Webpack for code bundling.
+In order to run, first navigate into the `/app` folder (`cd app`). From there you will have the following scripts available:
 
-### Station ID Linking
+#### `yarn start`
+starts a hot-reloading webpack development
 
-crowdsourced from here:
-https://groups.google.com/d/msg/mtadeveloperresources/VYReLOiV5Jg/QDbrYlG_AgAJ
+#### `yarn build`
+creates a production level code build, saved to `app/dist` folder
 
-partially hand-linked in this doc:
-https://docs.google.com/spreadsheets/d/1yLIF85YHxMLt-aUuPjY3Cn0TlWEUXQz7E4xm8du-LZE/edit#gid=0
+#### `yarn deploy`
+deploys `dist` folder to `project.two-n.com`
 
-### Lines GIS Data
+#### `yarn archie`
+starts up an [aml-gdoc-server](https://github.com/Quartz/aml-gdoc-server) to pull structured JSON from the [google doc](https://docs.google.com/document/d/1Dc9L6cVkBEpUPbp2vSby0Mpx40MzCeKqe_4cX5I11oE/edit?usp=sharing) that contains the application's narrative content.
 
-https://data.cityofnewyork.us/Transportation/Subway-Lines/3qz8-muuu
+To save an updated version, navigate to [`http://127.0.0.1:6006/1Dc9L6cVkBEpUPbp2vSby0Mpx40MzCeKqe_4cX5I11oE`](http://127.0.0.1:6006/1Dc9L6cVkBEpUPbp2vSby0Mpx40MzCeKqe_4cX5I11oE) and save the resulting `.json` to `app/public/content/narrativeCopy.json`.
 
-### Hospitals
 
-Pulled from [nyc PLUTO](https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page) dataset. Filtered out hospitals and clinics using building class code of `I*`, informed by the [data dictionary](https://www1.nyc.gov/assets/planning/download/pdf/data-maps/open-data/pluto_datadictionary.pdf?v=20v3).
+## Primary Datasources
 
-### MTA Schedule GTFS Data
+- Ridership data comes from [MTA MetroCard Swipe Data](http://web.mta.info/developers/fare.html).
+- American Community Survey (ACS) by Neighborhood Tabulation Area (NTA) data comes from [NYC Planning Aggregates](https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-acs-nta.page).
+- MTA station locations come from [this MTA dataset](http://web.mta.info/developers/data/nyct/subway/Stations.csv).
+- Narrative copy is maintained and edited in [this google doc](https://docs.google.com/document/d/1Dc9L6cVkBEpUPbp2vSby0Mpx40MzCeKqe_4cX5I11oE/edit?usp=sharing).
 
-http://web.mta.info/developers/developer-data-terms.html#data --> 'GTFS'-->'New York City Transit Subway' (Updated April 29, 2020)
+## Additional Tools/Resources
 
-```sh
-# filter for hospitals and clinics based on building code
-# remember to re-project it to the standard projection
-ogr2ogr -where "BldgClass LIKE 'I%'" -t_srs WGS84 ../output/hospitals.geojson  ./MapPLUTO.shp MapPLUTO
-```
-
-### NYC NTA(Neighborhood Tabulation Areas)
-https://data.cityofnewyork.us/City-Government/Neighborhood-Tabulation-Areas-NTA-/cpf4-rkhq
-
-`geo2topo output/nta.json > output/nta_topo.json`
-
-### MTA Fare Data - MetroSwipes Data
-http://web.mta.info/developers/fare.html
-
-### ACS by NTA
-American Communities Survey rolled up to the Neighborhood Tabulation Areas
-https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-acs-nta.page
-
-ran `processingScripts/acsByNTA.py` to create a cleaned up geojson and then `geo2topo data/output/acs_nta.geojson > data/output/acs_nta_topo.json` to create a topojson.
-
-Alternatively, to create a topojson with all of the required geographic data, you can run:
-`geo2topo data/output/acs_nta.geojson data/output/mapOutline/mapOutline.geojson data/output/subway-lines.geojson > mapData_topo.json ` and each file creates an `object` with the same name as the input file's original name.
-
-### App Content
-Using (ArchieML)[http://archieml.org/#resources] via Quartz's (aml-gdoc-server)[https://github.com/Quartz/aml-gdoc-server] to pull unstructured data from [this google doc](https://docs.google.com/document/d/1Dc9L6cVkBEpUPbp2vSby0Mpx40MzCeKqe_4cX5I11oE/edit) into a json format.
-To pull a new version of the data, run:
-
-`aml-gdoc-server` (it may prompt you for your google API credentials — see (documentation)[https://github.com/Quartz/aml-gdoc-server]).
-
-That will open a server at `http://127.0.0.1:6006/`. To get a JSON formatted dataset, just go to `http://127.0.0.1:6006/GOOGLE_DOC_ID` and save the resulting file.
-
-
-### NYC Outline
-Downloaded borough outlines from [NYC Open Data](https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm), then filtered out SI in the command line with the following command:
-
-```sh
-jq '{type: .type , features: [ .features[]| select( .properties.boro_code != "5") ] }' data/output/borough-boundaries.geojson > data/output/mapOutline.geojson
-```
-
-Downloaded [New Jersey County Boundaries](https://njogis-newjersey.opendata.arcgis.com/datasets/5f45e1ece6e14ef5866974a7b57d3b95_1?geometry=-74.891%2C40.521%2C-73.572%2C40.886) and [NY Civil Boundaries](http://gis.ny.gov/gisdata/inventories/details.cfm?DSID=927) (both shapefiles) and loaded them into folder called `mapOutlines`. Then from there:
-
-```sh
-# merge into single file
-ogrmerge.py -o output/mapOutline -overwrite_ds mapOutlines/County_Boundaries_of_NJ-shp/County_Boundaries_of_NJ.shp mapOutlines/NYS_Civil_Boundaries_SHP/Counties_Shoreline.shp -single
-
-# re-project
-ogr2ogr output/mapOutline/reproj.shp -t_srs "WGS84" output/mapOutline/merged.shp
-
-# clip to bounding box
-ogr2ogr output/mapOutline/clipped.shp  output/mapOutline/reproj.shp -spat -74.178 40.5320 -73.7309 40.946
-```
-
-## Process
-
-## Notes
-
-### Shapefile to GeoJSON using [GDAL](https://gdal.org/download.html)
-
-Just to be safe I first created a virtual environment for the gis dependencies
-
-#### `conda create gis`
-
-for [reference](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html): can list them all using `conda env list`
-
-#### `conda activate gis`
-
-#### `conda install gdal`
-
-use [Ogr2ogr](https://gdal.org/programs/ogr2ogr.html) to convert from `shapefile` to `geojson`
-
-### `ogr2ogr output/censusTracts.geojson [pathTo.shp]`
+- [ArchiML](http://archieml.org/) to turn unstructured google doc data into structured JSON. Leveraged [aml-gdoc-server](https://github.com/Quartz/aml-gdoc-server) to be able to connect directly to Google Drive.
+- [Scrollama](https://github.com/russellgoldenberg/scrollama) for scroll triggers.
+- [GDAL](https://gdal.org/) for geographic transformations.
diff --git a/data/README.md b/data/README.md
@@ -0,0 +1,109 @@
+# Data Sources
+
+### American Communities Census:
+
+pulled [here](https://data.census.gov/cedsci/table?d=ACS%205-Year%20Estimates%20Data%20Profiles&table=DP03&tid=ACSDP5Y2018.DP03&g=0400000US36&hidePreview=true&moe=false) selecting all census tracts in the 5 buroughs.
+
+### Stations CSV (with Lat/Long)
+
+http://web.mta.info/developers/data/nyct/subway/Stations.csv
+
+### 2010 Census Tract Shapefile
+
+https://data.cityofnewyork.us/City-Government/2010-Census-Tracts/fxpq-c8ku
+
+### MTA Turnstile Data 2020
+
+https://data.ny.gov/Transportation/Turnstile-Usage-Data-2020/py8k-a8wg
+
+### Station ID Linking
+
+crowdsourced from here:
+https://groups.google.com/d/msg/mtadeveloperresources/VYReLOiV5Jg/QDbrYlG_AgAJ
+
+partially hand-linked in this doc:
+https://docs.google.com/spreadsheets/d/1yLIF85YHxMLt-aUuPjY3Cn0TlWEUXQz7E4xm8du-LZE/edit#gid=0
+
+### Lines GIS Data
+
+https://data.cityofnewyork.us/Transportation/Subway-Lines/3qz8-muuu
+
+### Hospitals
+
+Pulled from [nyc PLUTO](https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page) dataset. Filtered out hospitals and clinics using building class code of `I*`, informed by the [data dictionary](https://www1.nyc.gov/assets/planning/download/pdf/data-maps/open-data/pluto_datadictionary.pdf?v=20v3).
+
+### MTA Schedule GTFS Data
+
+http://web.mta.info/developers/developer-data-terms.html#data --> 'GTFS'-->'New York City Transit Subway' (Updated April 29, 2020)
+
+```sh
+# filter for hospitals and clinics based on building code
+# remember to re-project it to the standard projection
+ogr2ogr -where "BldgClass LIKE 'I%'" -t_srs WGS84 ../output/hospitals.geojson  ./MapPLUTO.shp MapPLUTO
+```
+
+### NYC NTA(Neighborhood Tabulation Areas)
+https://data.cityofnewyork.us/City-Government/Neighborhood-Tabulation-Areas-NTA-/cpf4-rkhq
+
+`geo2topo output/nta.json > output/nta_topo.json`
+
+### MTA Fare Data - MetroSwipes Data
+http://web.mta.info/developers/fare.html
+
+### ACS by NTA
+American Communities Survey rolled up to the Neighborhood Tabulation Areas
+https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-acs-nta.page
+
+ran `processingScripts/acsByNTA.py` to create a cleaned up geojson and then `geo2topo data/output/acs_nta.geojson > data/output/acs_nta_topo.json` to create a topojson.
+
+Alternatively, to create a topojson with all of the required geographic data, you can run:
+`geo2topo data/output/acs_nta.geojson data/output/mapOutline/mapOutline.geojson data/output/subway-lines.geojson > mapData_topo.json ` and each file creates an `object` with the same name as the input file's original name.
+
+### App Content
+Using (ArchieML)[http://archieml.org/#resources] via Quartz's (aml-gdoc-server)[https://github.com/Quartz/aml-gdoc-server] to pull unstructured data from [this google doc](https://docs.google.com/document/d/1Dc9L6cVkBEpUPbp2vSby0Mpx40MzCeKqe_4cX5I11oE/edit) into a json format.
+To pull a new version of the data, run:
+
+`aml-gdoc-server` (it may prompt you for your google API credentials — see (documentation)[https://github.com/Quartz/aml-gdoc-server]).
+
+That will open a server at `http://127.0.0.1:6006/`. To get a JSON formatted dataset, just go to `http://127.0.0.1:6006/GOOGLE_DOC_ID` and save the resulting file.
+
+
+### NYC Outline
+Downloaded borough outlines from [NYC Open Data](https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm), then filtered out SI in the command line with the following command:
+
+```sh
+jq '{type: .type , features: [ .features[]| select( .properties.boro_code != "5") ] }' data/output/borough-boundaries.geojson > data/output/mapOutline.geojson
+```
+
+Downloaded [New Jersey County Boundaries](https://njogis-newjersey.opendata.arcgis.com/datasets/5f45e1ece6e14ef5866974a7b57d3b95_1?geometry=-74.891%2C40.521%2C-73.572%2C40.886) and [NY Civil Boundaries](http://gis.ny.gov/gisdata/inventories/details.cfm?DSID=927) (both shapefiles) and loaded them into folder called `mapOutlines`. Then from there:
+
+```sh
+# merge into single file
+ogrmerge.py -o output/mapOutline -overwrite_ds mapOutlines/County_Boundaries_of_NJ-shp/County_Boundaries_of_NJ.shp mapOutlines/NYS_Civil_Boundaries_SHP/Counties_Shoreline.shp -single
+
+# re-project
+ogr2ogr output/mapOutline/reproj.shp -t_srs "WGS84" output/mapOutline/merged.shp
+
+# clip to bounding box
+ogr2ogr output/mapOutline/clipped.shp  output/mapOutline/reproj.shp -spat -74.178 40.5320 -73.7309 40.946
+```
+
+## Process
+
+## Notes
+
+### Shapefile to GeoJSON using [GDAL](https://gdal.org/download.html)
+
+Just to be safe I first created a virtual environment for the gis dependencies
+
+#### `conda create gis`
+
+for [reference](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html): can list them all using `conda env list`
+
+#### `conda activate gis`
+
+#### `conda install gdal`
+
+use [Ogr2ogr](https://gdal.org/programs/ogr2ogr.html) to convert from `shapefile` to `geojson`
+
+### `ogr2ogr output/censusTracts.geojson [pathTo.shp]`
diff --git a/...ssingScripts/analyse_mta_timeseries.ipynb → ...gScripts/OLD_analyse_mta_timeseries.ipynb b/...ssingScripts/analyse_mta_timeseries.ipynb → ...gScripts/OLD_analyse_mta_timeseries.ipynb
diff --git a/...cripts/generateLineStationSequences.ipynb → ...ts/OLD_generateLineStationSequences.ipynb b/...cripts/generateLineStationSequences.ipynb → ...ts/OLD_generateLineStationSequences.ipynb
diff --git a/processingScripts/processHourlyMTAData.ipynb → ...ingScripts/OLD_processHourlyMTAData.ipynb b/processingScripts/processHourlyMTAData.ipynb → ...ingScripts/OLD_processHourlyMTAData.ipynb
diff --git a/...ssingScripts/stationsToCensusTracts.ipynb → ...gScripts/OLD_stationsToCensusTracts.ipynb b/...ssingScripts/stationsToCensusTracts.ipynb → ...gScripts/OLD_stationsToCensusTracts.ipynb