Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWA: Investigate API requests & caching #207

Closed
ryelle opened this issue Aug 8, 2019 · 5 comments · Fixed by #319
Closed

PWA: Investigate API requests & caching #207

ryelle opened this issue Aug 8, 2019 · 5 comments · Fixed by #319
Labels
[Component] PWA Service workers, manifest

Comments

@ryelle
Copy link
Contributor

ryelle commented Aug 8, 2019

We're currently pulling ~615k of data every 60 seconds, which will use a ton of cellular data. That should be paired down to only the fields that actually get used.

This depends on the API response, but we are attempting to call 3 API endpoints every 60s. Right now this is mitigated by the service worker, so it's only passing through to the server every 15min.

Probably better to have day-of-event.php register a caching route specifically for this endpoint, so that the dependencies are clear and explicit. that may make some of service-worker-caching.php redundant for this request, but that's ok b/c better to make the code self-documenting. add an inline comment w/ that route registration though, noting that it complements things done in service-worker-caching.php

source

API requests themselves can be slow, too:

this is REALLY slow, takes almost 10 seconds on reasonably fast connection need to pair down to only fields being used, maybe other optimizations

source

@ryelle ryelle added [Component] Theme Templates Theme-agnostic templates like Day of Event, Offline [Component] PWA Service workers, manifest labels Aug 8, 2019
@westonruter
Copy link
Member

The service worker gating requests to 15 minute intervals can be eliminated in favor of implementing conditional requests. If the REST API response includes a Last-Modified header (or ETag) then requests from the page can include a If-Modified-Since (or If-None-Match) request header, allowing the REST API to respond with a 304 Not Modified response, which would have no body, thus saving bandwidth. It would also allow content updates to appear faster for the user.

Refs:

@ryelle
Copy link
Contributor Author

ryelle commented Oct 3, 2019

I like the idea of If-Modified-Since, but if I'm reading correctly it would still need to hit the server to get the modified time, so we'll probably want to have some extra server caching in place too.

A good ETag could be the max post_modified date of whatever is being queried.

@westonruter is there a difference between using If-Modified-Since vs an ETag? I see you've recommended both/either, but TBH I'm not sure if there's a reason to choose one over the other.

@ryelle ryelle changed the title Day of Event Template: Investigate API requests & caching PWA: Investigate API requests & caching Oct 3, 2019
@ryelle ryelle removed the [Component] Theme Templates Theme-agnostic templates like Day of Event, Offline label Oct 3, 2019
@westonruter
Copy link
Member

That's a good question. It seems that ETag conditions override Last-Modified conditions. Per docs on If-Modified-Since:

When used in combination with If-None-Match, it is ignored, unless the server doesn't support If-None-Match.

See also relevant discussions on https://core.trac.wordpress.org/ticket/47676 namely it is easier to ensure generate a reliable ETag generally versus Last-Modified, since with Last-Modified it is not feasible to track all of the possible last modified times for all the data that may be added to a response. An ETag is easy since it can just be hashed. But then generating ETag is more expensive from a server standpoint, since the full output needs to hashed and that needs to be used after generating the full response to then decide whether or not to just respond 304 Not Modified whereas with just Last-Modified it would be more feasible to check up-front before generating the response. If the REST API output is being cached anyway (e.g. via Varnish) then this is largely irrelevant, and using ETag seems like a safer option.

@ryelle
Copy link
Contributor Author

ryelle commented Oct 3, 2019

Thanks for explaining that, I think I have a better idea how this all works now :)

We only have 3 places that would support being cached by the service worker– tagregator, the live schedule block (session endpoints), and the manifest itself. Of those, tagregator & the live schedule should update as frequently as we set in their code, restricting those to only 15min breaks expectations somewhat. Those endpoints should be cached by the server, so that they're also cached for other clients who request them (browser caching only caches for repeat visits, not distinct users).

So… I still wonder if it makes sense to remove the API caching– in #243 we're only making the API request every 5 minutes, and can add server-level caching to make sure these requests don't overload the server. For the sites that don't have the PWA yet, we'll need to make sure those don't cause problems too.

We can improve the API to make conditional requests, but that would be a separate task from getting the PWA to a "launch ready" state.

@iandunn @coreymckrill what are your thoughts?

@iandunn
Copy link
Member

iandunn commented Oct 12, 2019

We're currently pulling ~615k of data every 60 seconds, which will use a ton of cellular data. That should be paired down to only the fields that actually get used.

Related: https://make.wordpress.org/core/2019/10/10/filtering-nested-rest-response-_fields-in-wp-5-3/

So… I still wonder if it makes sense to remove the API caching– in #243 we're only making the API request every 5 minutes, and can add server-level caching to make sure these requests don't overload the server.

WFM, just make sure to add the server-layer caching first.

ryelle added a commit that referenced this issue Jan 13, 2020
We only use the embed data to get track information, but we're already fetching that in another API request. We can save some time & bandwidth by omitting the `_embed` request and injecting the data from the tracks request. This potentially cuts the TTFB by 80%, and data downloaded by half.

See #207
ryelle added a commit that referenced this issue Jan 22, 2020
…TFB (#319)

Currently, the sessions endpoint used to fetch sessions for the Live Schedule needs to request the _embed version of the endpoint, which causes a much slower response (around 10s on 2019.us.wordcamp.org). We needed to do this to get the speaker names and category labels, but the rest of the data it outputs is unnecessary.

This PR pre-computes speakers and category names, and adds them to the API, so the request can be made for the default endpoint instead. It also restricts the data requested for the live schedule to only the fields needed for display, which drops the size of content downloaded significantly.

Fixes #207 by improving API size/response time, API responses are still currently being cached by the SW in the rest-api bucket.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Component] PWA Service workers, manifest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants