Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jurisdiction layer to helper #1651

Merged
merged 8 commits into from
Jan 15, 2025
Merged

Conversation

johnclary
Copy link
Member

@johnclary johnclary commented Jan 6, 2025

Associated issues

This PR brings the jurisdiction layer into our normal geo layer maintenance process. It replaces the old atd_jurisdictions layer that was created through an ad-hoc process that merged all features by jurisdiction label. Going forward, the table will resemble the authoritative CTM layer, where there may be multiple rows (aka features) for a single jurisdiction type.

Also, whereas the old DB table had rows for the various ETJs and surrounding jurisdictions, I elected to exclude these since we don't currently make any use of them.

Before

Screenshot 2025-01-06 at 4 34 51 PM

After

Screenshot 2025-01-06 at 4 34 30 PM

Testing

  1. Start your local setup
  2. Apply the DB changes: hasura migrate apply and hasura metadata apply
  3. Run the helper script to load the jurisdiction layer from AGOL:
 node --env-file=local.env src/load_agol_layer.js --layer jurisdictions
  1. Start the ole VZE. Move some crashes into and out of the AFP jurisdiction. Moving a crash out of AFP will cause the alert banner to appear above the crash map.

  2. The VZ team pointed out that 20271128 should not be assigned to AFP. You can test that by moving it 1 pixel and observing that in_austin_full_purpose changes to false

  3. I have prepared these sql commands to backfill the various AFP flags that we have for crashes, EMS, and AFD records. If you would please review and run each of those and confirm that they look good 🙏


Ship list

  • Check migrations for any conflicts with latest migrations in main branch
  • Confirm Hasura role permissions for necessary access
  • Code reviewed
  • Product manager approved

select 1
from geo.atd_jurisdictions
where JURISDICTION_LABEL = 'AUSTIN FULL PURPOSE' AND ST_Intersects(geometry, new.position)
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's the updated code. we're filtering on the jurisdiction label rather than the arbitrary ID sequence

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 goodbye magic number - thank you!

jurisdictions: {
service_name: "BOUNDARIES_jurisdictions_planning",
layer_id: 0,
query_params: { ...DEFAULT_ESRI_QUERY_PARAMS, where: "JURISDICTION_LABEL = 'AUSTIN FULL PURPOSE'" },
Copy link
Member Author

@johnclary johnclary Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are filtering out all features that are not AFP. there are thousands of features in this layer, and a single jurisdiction label can have multiple rows. E.g. there are currently two AUSTIN FULL PURPOSE features which, when combined, represent the entire AFP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing this out because I was curious about the 2 rows in the DB. Makes sense! 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before this change, our geospatial docs were a lie.

The atd_jurisdiction layer was not an "ArcGIS Online authoritative layer owned by CTM GIS". It was a custom layer. Now it is true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to point out that as the author of those docs, you have all the authority one could have to call out that they were in error, but for all other readers, I want to push back on calling these a lie. That feels a little too strong and doesn't capture the nuance here.

While this work brings them fully into compliance as written in the README, they most certainly were previously derived from the CTM data and used in good faith. This data is much better in its provenance, but they were far from fabricated or intentionally misleading previously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, Frank—"lie" was a bit harsh but i was a bit surprised to find that I hadn't correctly documented the provenance of that layer. glad it is square now 😊

Copy link
Member

@frankhereford frankhereford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm -- you do want this into main and not the collector branch? That sounds right to me, but I wanted to double check that you're not maybe wanting to rebase this.

The testing worked out great, and I think this is ship-shape for the big moving parts. I do have a pair of observations that I'd like to put forward enough to request changes, but I am happy to switch this to a ✅ after some consideration of them.

Of my observations, I'm most interested in not using the deprecated atd_ acronym, and also short-circuiting out that st_intersects() by the query engine where possible via the usage of &&.

When testing the SQL backfill commands, I get these number of updated rows, in order of the queries presented:

Updated Rows	29
Updated Rows	86
Updated Rows	9
Updated Rows	18
Updated Rows	29
Updated Rows	36

@@ -0,0 +1,143 @@
alter table geo.atd_jurisdictions add column jurisdictions_id integer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way at all we can drop the atd_ out of this table name? coa_ is appropriate here, or my true preference: just jurisdictions.

atd is no longer, and while tpw is marginally better, these jurisdictions belong to the larger city. This similar thing happened with the new VZ S3 bucket name, and it'd be great to avoid this usage here because they are harder to change the longer they remain in place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for nudging me on this! i was a bit worried about blowing up the scope of this, but you're totally right: might as well get this cleaned up once and for all now. i checked with Milad and Xavier and they are ok with this change. i'll push it up!

select 1
from geo.atd_jurisdictions
where JURISDICTION_LABEL = 'AUSTIN FULL PURPOSE' AND ST_Intersects(geometry, new.position)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 goodbye magic number - thank you!

new.in_austin_full_purpose = EXISTS (
select 1
from geo.atd_jurisdictions
where JURISDICTION_LABEL = 'AUSTIN FULL PURPOSE' AND ST_Intersects(geometry, new.position)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get some meaningful speed-up on this where clause by doing:

where 
JURISDICTION_LABEL = 'AUSTIN FULL PURPOSE' 
AND geometry && new.position
AND  ST_Intersects(geometry, new.position)

&& is the "has intersecting bounding boxes" operator, and it's a zillion times faster to see if a point and a box overlap than a point and an arbitrary polygon. By doing this blazing fast operation first, you can skip over checking the st_intersect() on all the pairs of geometries that don't have overlapping bounding boxes, knowing that they couldn't intersect no matter what.

Also PG's GIST indexing on geometry is aimed at exactly this -- it stores the bounding boxes of geometries as its index data, so && can get really lightning fast.

Copy link
Member Author

@johnclary johnclary Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @frankhereford for suggesting this. I had it in my mind that the GIST index would ensure that a bbox is used to speed up the spatial operations automatically, and that does seem to track from the docs.

Most of the commonly used functions in PostGIS (ST_Contains, ST_Intersects, ST_DWithin, etc) include an index filter automatically. But some functions (e.g., ST_Relate) do not include an index filter.

but i've also been reading that it's a best practice to be explicit about the use of bboxes—so there should be no harm in adding the && for good measure. that seems reasonable! weirdly, though, when i test my update commands with and without the bbox directly, the operation with && is consistently slower.

here's one example. curious if you're seeing this.

With bbox

-- Execution Time: 1474.878 ms
explain analyze UPDATE crashes
SET
    in_austin_full_purpose = FALSE,
    updated_by = 'dts_automation'
WHERE
    crashes.in_austin_full_purpose IS TRUE
    AND crashes.position IS NOT NULL
    AND NOT EXISTS (
        SELECT 1
        FROM
            geo.atd_jurisdictions
        WHERE
            atd_jurisdictions.geometry && crashes.position
            AND atd_jurisdictions.jurisdiction_label = 'AUSTIN FULL PURPOSE'
            AND ST_INTERSECTS(atd_jurisdictions.geometry, crashes.position)
    );

Without bbox

-- Execution Time: 1069.798 ms
explain analyze UPDATE crashes
SET
    in_austin_full_purpose = FALSE,
    updated_by = 'dts_automation'
WHERE
    crashes.in_austin_full_purpose IS TRUE
    AND crashes.position IS NOT NULL
    AND NOT EXISTS (
        SELECT 1
        FROM
            geo.atd_jurisdictions
        WHERE
            atd_jurisdictions.jurisdiction_label = 'AUSTIN FULL PURPOSE'
            AND ST_INTERSECTS(atd_jurisdictions.geometry, crashes.position)
    );

weirdly, the analytics for both of these statements doesn't mention the index scan at all. i also wonder if my local machine resources is influencing the query planner and if this may look completely different on staging. i'll test that here momentarily...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i ran those two statements on staging and the results are consistent: removing the && check results in better performance by nearly 50% 😵

Copy link
Member

@frankhereford frankhereford Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John - What a twist! Running those two queries on my machine come up with a less dramatic difference in timing 765 ms vs 515 ms, but the extra clause I'm suggesting really does seem to be the difference that is slowing it down. I'm fully confused about why it's doing it this way, but the numbers don't lie -- thank you for testing this out and providing such a clear cut set of data. 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really truly a twist! i'd love to spend some more time understanding what's going on here. i'd love to re-test this scenario against a table with more rows, like atd_txdot_locations....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to point out that as the author of those docs, you have all the authority one could have to call out that they were in error, but for all other readers, I want to push back on calling these a lie. That feels a little too strong and doesn't capture the nuance here.

While this work brings them fully into compliance as written in the README, they most certainly were previously derived from the CTM data and used in good faith. This data is much better in its provenance, but they were far from fabricated or intentionally misleading previously.

Copy link
Member

@chiaberry chiaberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the same updated rows counts as Frank did when running the backfill commands, ✅

I really appreciate the comments in the migration, it helps me understand what is going on.

Copy link
Contributor

@mateoclarke mateoclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HALP! I'm stuck on Step 3 in the testing steps. What directory should that script be run from? And why are we using a local.env file when all or other .env files follow the pattern of .env.name_of_environment instead of flipped around as your copy/paste implies.

Perhaps, we need to add to the testing instructions that a .env.local (or if you insist local.env) should be created from /toolbox/load_agol_layer/env_template instead of assuming its there. But this is all just a guess on my part. I've tried a few variations after creating an env file from the template and this is the type of error I see:

(venv) (base) ➜  load_agol_layer git:(john/19441-jurisdiction-layer) ✗ node --env-file=local.env src/load_agol_layer.js --layer jurisdictions
node: bad option: --env-file=local.env

I could also use some guidance on what AGOL credentials should be added to that .env file. For now, I'm just guessing AGOL VZ Publisher is the correct one to use.

UPDATE:
With @chiaberry's help, I was able to figure out how to get the load_agol_layer.js script running with the node script in the testing instructions. Here are the extra things I had to do that were hard for me to know without more explicit instructions.

  1. Make a local.env file from the env_template file. (Still confused why its named this way).
  2. Use the "AGOL Scripts Publisher" credentials in 1pw.
  3. Change to the directory two levels up from the script: cd toolbox/load_agol_layer
  4. Run nvm use in order to be able to use the --env-file option
  5. Run npm install
  6. Runnode --env-file=local.env src/load_agol_layer.js --layer jurisdictions

After the fact, I realized there is a README.md that could've helped me understand what was going wrong. myb

Copy link
Contributor

@mateoclarke mateoclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked as described in the testing steps (once I asked for help/learned how to read 🫠)

Copy link
Contributor

@mddilley mddilley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran through the test steps and see the in and out of AFP calculating as expected when using the editor. I also think nixing the atd_ in the table would be nice like Frank mentioned (probably because everything else is so nice and tidy in here ✨).

Cool to see yet another AGOL layer covered with this tooling! 😎 🙌

Copy link
Member

@frankhereford frankhereford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you John for picking off that atd_ prefix and also for digging into my misbelief that the bounding box operator would speed things up! Can't wait to see this go live!

Copy link
Contributor

@roseeichelmann roseeichelmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all looks great john! i reviewed these update statements and want to make sure you also update afd_incidents which are now AFP as well if thats necessary cuz i didnt see that command there. i agree on dropping atd_, i dont see that update pushed up yet but i think thats a great change to clean things up! 🧹 ✨

@johnclary
Copy link
Member Author

folks, i requested review because the table name change impacted quite a lot of code, including dropping a dormant function from the pre-LDM days 🙏

Copy link
Member

@frankhereford frankhereford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John - this looks good! I have run the DB up through the migrations and metadata presented in this PR as well as gone over the code. This is on top of my previous review, so I am feeling really great about this.

Thank you

Thanks for taking on the entirely out-of-scope request to rename that field and drop the depreciated department name. That was no small thing, and I am really grateful to you. 🙏

@@ -12,4 +12,3 @@
retries: 1
use_prepared_statements: true
tables: "!include default/tables/tables.yaml"
functions: "!include default/functions/functions.yaml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to see here that this is the removal of the last function, at all. I'll be curious how long we stay in that function-less state, and if it's a long time, that's pretty great.

Copy link
Contributor

@roseeichelmann roseeichelmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retested, it looks great!! 🚢

Copy link
Contributor

@mddilley mddilley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with the latest updates, and I remembered to run the SQL statements this time around too. Everything looks and works great! 🙌

@johnclary
Copy link
Member Author

headed to main— thank you!

@johnclary johnclary merged commit 97103e0 into main Jan 15, 2025
4 checks passed
@johnclary johnclary deleted the john/19441-jurisdiction-layer branch January 16, 2025 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants