From 7ec4aa8b7bb217105a621db581ddbf67cff266f0 Mon Sep 17 00:00:00 2001 From: Jem <0x0xjem@gmail.com> Date: Thu, 3 Oct 2024 12:21:37 +0400 Subject: [PATCH] Update README with recent changes --- README.md | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index aa17158..a6dd125 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,10 @@ ## Purpose -This deploys a Google Cloud Function that periodically fetches records from subgraphs and stores them in Google Cloud Storage +This deploys a Google Cloud Function that periodically fetches records from subgraphs and exposes them in BigQuery. ## Potential Uses -- Expose records in BigQuery through the external tables functionality - Post-processing of records (e.g. to generate daily balances) ## Architecture @@ -14,7 +13,7 @@ This deploys a Google Cloud Function that periodically fetches records from subg This project has a few components: - Google Cloud Storage (GCS) bucket to store records (in JSONL files) - - Files are stored in the following location: `//dt=/records.jsonl` + - Files are stored in the following location: `////dt=/records.jsonl` - Files are stored in the `JSONL` (newline-delimited) format in order to make it easy to ingest into BigQuery - The parent directory of the `records.jsonl` file contains `dt=` followed by the date, so that [Hive partitioning](https://cloud.google.com/bigquery/docs/hive-partitioned-queries-gcs#supported_data_layouts) is supported without further transformation. - Google Cloud Function (GCF) to fetch records from the subgraph @@ -34,19 +33,11 @@ This project is designed to cache any Graph Protocol subgraph that is specified, GCP monitoring does not have a direct integration with Discord, and it seemed like overkill to write a GCP -> Discord webhook integration. -Instead, a [scenario](https://us1.make.com/126792/scenarios/446857/edit) is defined in Make (formerly Integromat) that does the following: - -- Watches an email address supplied by the custom mailhook -- Send a HTTP post request to the Discord webhook - -This results in a small message being sent into the alerts channel. - ## Caveats -- The [token-holder-balances](https://github.com/OlympusDAO/token-holder-balances) project has details of the resources in this project hard-coded into its configuration. Those values will need to be manually updated if the resource ids change (which is rare). -- If a new subgraph version is deployed that changes historical data (such as a new token being indexed, or a different calcultion being used), this function will not (yet) detect those changes. To force re-fetching of the transactions, delete the `token-holders-transactions` directory in the GCS bucket. Upon the next schedule, records will be fetched automatically. +- Data is by default stored as a string, to avoid any issues with data type. When performing queries in BigQuery, you will need to cast the data type appropriately. - If specifying a BigQuery type override for a date in ISO string format (e.g. `2022-10-11T10:05.001Z`), the type to specify is `TIMESTAMP` (not `DATE`). -- After changing any of the contents of the `subgraph/*.json` files, run `yarn codegen`. +- After changing any of the contents of the `subgraphs/*.json` files, run `yarn codegen`. - BigQuery does not support replacing the schema of an existing BigQuery table. If the type overrides are changed, you will need to delete the BigQuery table manually, run `pulumi refresh`, and then run `pulumi up`. ## Subgraph Configuration @@ -56,13 +47,11 @@ For each subgraph, there is a corresponding JSON file in the `subgraphs/` direct - generate a query to fetch all fields in the subgraph - generate a schema that is used in conjunction with Pulumi to create a BigQuery table -See the [TokenHolderTransaction](subgraphs/token-holder-transactions.json) file as an example. +See the [Cooler_Loans_Loan.json](subgraphs/Cooler_Loans_Loan.json) file as an example. To cache a new subgraph, perform the following: - Add a new definition file in `subgraphs/`, adhering to the `SubgraphConfig` type - - If a number is large or has a large number of decimal places, it should have a type override to be `BIGNUMERIC` - - See the [data types](https://cloud.google.com/bigquery/docs/schemas#standard_sql_data_types) - Run `yarn codegen` to generate the GraphQL & BigQuery schemas, and the Typescript typings - Run `pulumi up` on the appropriate stack to update the resources - Once this is done, the function can be manually triggered by running `yarn run execute `