subquery · jamesbayly · Apr 25, 2024
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -1,3 +1,4 @@
 {
-  "compile-hero.disable-compile-files-on-did-save-code": false
+  "compile-hero.disable-compile-files-on-did-save-code": false,
+  "editor.formatOnSave": true
 }
diff --git a/docs/build/manifest/polygon.md b/docs/build/manifest/polygon.md
diff --git a/docs/build/manifest/snippets/bypass-blocks.md b/docs/build/manifest/snippets/bypass-blocks.md
@@ -0,0 +1,13 @@
+## Bypass Blocks
+
+Bypass Blocks allows you to skip the stated blocks, this is useful when there are erroneous blocks in the chain or when a chain skips a block after an outage or a hard fork. It accepts both a `range` or single `integer` entry in the array.
+
+When declaring a `range` use an string in the format of `"start - end"`. Both start and end are inclusive, e.g. a range of `"100-102"` will skip blocks `100`, `101`, and `102`.
+
+```ts
+{
+  network: {
+    bypassBlocks: [1, 2, 3, "105-200", 290];
+  }
+}
+```
diff --git a/docs/build/manifest/snippets/intro.md b/docs/build/manifest/snippets/intro.md
@@ -0,0 +1,19 @@
+<!-- #region part1 -->
+
+The Manifest `project.ts` file can be seen as an entry point of your project and it defines most of the details on how SubQuery will index and transform the chain data. It clearly indicates where we are indexing data from, and to what on chain events we are subscribing to.
+
+The Manifest can be in either Typescript, Yaml, or JSON format.
+
+With the number of new features we are adding to SubQuery, and the slight differences between each chain that mostly occur in the manifest, the project manifest is now written by default in Typescript. This means that you get a fully typed project manifest with documentation and examples provided your code editor.
+
+Below is a standard example of a basic `project.ts`.
+
+<!-- #endregion part1 -->
+
+<!-- #endregion part2 -->
+
+Below is a standard example of the legacy YAML version (`project.yaml`).
+
+:::details Legacy YAML Manifest
+
+<!-- #endregion part2 -->
diff --git a/docs/build/manifest/snippets/overview.md b/docs/build/manifest/snippets/overview.md
@@ -0,0 +1,119 @@
+<!-- #region part1 -->
+
+## Overview
+
+### Top Level Spec
+
+| Field           | Type                                       | Description                                         |
+| --------------- | ------------------------------------------ | --------------------------------------------------- |
+| **specVersion** | String                                     | The spec version of the manifest file               |
+| **name**        | String                                     | Name of your project                                |
+| **version**     | String                                     | Version of your project                             |
+| **description** | String                                     | Description of your project                         |
+| **runner**      | [Runner Spec](#runner-spec)                | Runner specs info                                   |
+| **repository**  | String                                     | Git repository address of your project              |
+| **schema**      | [Schema Spec](#schema-spec)                | The location of your GraphQL schema file            |
+| **network**     | [Network Spec](#network-spec)              | Detail of the network to be indexed                 |
+| **dataSources** | [DataSource Spec](#datasource-spec)        | The datasource to your project                      |
+| **templates**   | [Templates Spec](../dynamicdatasources.md) | Allows creating new datasources from this templates |
+
+### Schema Spec
+
+| Field    | Type   | Description                              |
+| -------- | ------ | ---------------------------------------- |
+| **file** | String | The location of your GraphQL schema file |
+
+### Network Spec
+
+If you start your project by using the `subql init` command, you'll generally receive a starter project with the correct network settings. If you are changing the target chain of an existing project, you'll need to edit the [Network Spec](#network-spec) section of this manifest.
+
+<!-- #endregion part1 -->
+
+<!-- #region part2 -->
+
+Additionally you will need to update the `endpoint`. This defines the (HTTP or WSS) endpoint of the blockchain to be indexed - **this must be a full archive node**. This property can be a string or an array of strings (e.g. `endpoint: ['rpc1.endpoint.com', 'rpc2.endpoint.com']`). We suggest providing an array of endpoints as it has the following benefits:
+
+- Increased speed - When enabled with [worker threads](../../run_publish/references.md#w---workers), RPC calls are distributed and parallelised among RPC providers. Historically, RPC latency is often the limiting factor with SubQuery.
+- Increased reliability - If an endpoint goes offline, SubQuery will automatically switch to other RPC providers to continue indexing without interruption.
+- Reduced load on RPC providers - Indexing is a computationally expensive process on RPC providers, by distributing requests among RPC providers you are lowering the chance that your project will be rate limited.
+
+<!-- #endregion part2 -->
+
+<!-- #region part3 -->
+
+| Field            | Type   | Description                                                                                                                                                                              |
+| ---------------- | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **chainId**      | String | A network identifier for the blockchain                                                                                                                                                  |
+| **endpoint**     | String | Defines the endpoint of the blockchain to be indexed - **This must be a full archive node**.                                                                                             |
+| **port**         | Number | Optional port number on the `endpoint` to connect to                                                                                                                                     |
+| **dictionary**   | String | It is suggested to provide the HTTP endpoint of a full chain dictionary to speed up processing - read [how a SubQuery Dictionary works](../../academy/tutorials_examples/dictionary.md). |
+| **bypassBlocks** | Array  | Bypasses stated block numbers, the values can be a `range`(e.g. `"10- 50"`) or `integer`, see [Bypass Blocks](#bypass-blocks)                                                            |
+
+### Runner Spec
+
+| Field     | Type                                    | Description                                |
+| --------- | --------------------------------------- | ------------------------------------------ |
+| **node**  | [Runner node spec](#runner-node-spec)   | Describe the node service use for indexing |
+| **query** | [Runner query spec](#runner-query-spec) | Describe the query service                 |
+
+### Runner Node Spec
+
+| Field       | Type                                        | Description                                                                                                                                                                                                          |
+| ----------- | ------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **name**    | String                                      | `@subql/node-ethereum` _We use the Ethereum node package for Polygon since it is compatible with the Ethereum framework_                                                                                             |
+| **version** | String                                      | Version of the indexer Node service, it must follow the [SEMVER](https://semver.org/) rules or `latest`, you can also find available versions in subquery SDK [releases](https://github.com/subquery/subql/releases) |
+| **options** | [Runner Node Options](#runner-node-options) | Runner specific options for how to run your project. These will have an impact on the data your project produces. CLI flags can be used to override these.                                                           |
+
+### Runner Query Spec
+
+| Field       | Type   | Description                                                                                                                                                                                      |
+| ----------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| **name**    | String | `@subql/query`                                                                                                                                                                                   |
+| **version** | String | Version of the Query service, available versions can be found [here](https://github.com/subquery/subql/blob/main/packages/query/CHANGELOG.md), it also must follow the SEMVER rules or `latest`. |
+
+### Runner Node Options
+
+| Field                 | v1.0.0 (default) | Description                                                                                                                                                                                                                                                                               |
+| --------------------- | ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **historical**        | Boolean (true)   | Historical indexing allows you to query the state at a specific block height. e.g A users balance in the past.                                                                                                                                                                            |
+| **unfinalizedBlocks** | Boolean (false)  | If enabled unfinalized blocks will be indexed, when a fork is detected the project will be reindexed from the fork. Requires historical.                                                                                                                                                  |
+| **unsafe**            | Boolean (false)  | Removes all sandbox restrictions and allows access to all inbuilt node packages as well as being able to make network requests. WARNING: this can make your project non-deterministic.                                                                                                    |
+| **skipTransactions**  | Boolean (false)  | If your project contains only event handlers and you don't access any other block data except for the block header you can speed your project up. Handlers should be updated to use `LightEthereumLog` instead of `EthereumLog` to ensure you are not accessing data that is unavailable. |
+
+### Datasource Spec
+
+Defines the data that will be filtered and extracted and the location of the mapping function handler for the data transformation to be applied.
+
+| Field          | Type         | Description                                                                                                                                                                                                                                                                                                                                                                    |
+| -------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| **kind**       | string       | [ethereum/Runtime](#data-sources-and-mapping) _We use the Ethereum runtime for Polygon since it is compatible with the Ethereum framework_                                                                                                                                                                                                                                     |
+| **startBlock** | Integer      | This changes your indexing start block for this datasource, set this as high as possible to skip initial blocks with no relevant data                                                                                                                                                                                                                                          |
+| **endBlock**   | Integer      | This sets a end block for processing on the datasource. After this block is processed, this datasource will no longer index your data. <br><br>Useful when your contracts change at a certain block height, or when you want to insert data at genesis. For example, setting both the `startBlock` and `endBlock` to 320, will mean this datasource only operates on block 320 |
+| **mapping**    | Mapping Spec |                                                                                                                                                                                                                                                                                                                                                                                |
+
+<!-- #endregion part3 -->
+
+<!-- #region part4 -->
+
+### Mapping Handlers and Filters
+
+The following table explains filters supported by different handlers.
+
+**Your SubQuery project will be much more efficient when you only use `TransactionHandler` or `LogHandler` handlers with appropriate mapping filters (e.g. NOT a `BlockHandler`).**
+
+<!-- #endregion part4 -->
+
+<!-- #region part5 -->
+
+Default runtime mapping filters are an extremely useful feature to decide what block, event, or extrinsic will trigger a mapping handler.
+
+Only incoming data that satisfies the filter conditions will be processed by the mapping functions. Mapping filters are optional but are highly recommended as they significantly reduce the amount of data processed by your SubQuery project and will improve indexing performance.
+
+The `modulo` filter allows handling every N blocks, which is useful if you want to group or calculate data at a set interval. The following example shows how to use this filter.
+
+```yml
+filter:
+  modulo: 50 # Index every 50 blocks: 0, 50, 100, 150....
+```
+
+<!-- #endregion part4 -->
diff --git a/docs/build/manifest/snippets/real-time-indexing.md b/docs/build/manifest/snippets/real-time-indexing.md
@@ -0,0 +1,7 @@
+## Real-time indexing (Block Confirmations)
+
+As indexers are an additional layer in your data processing pipeline, they can introduce a massive delay between when an on-chain event occurs and when the data is processed and able to be queried from the indexer.
+
+SubQuery provides real time indexing of unconfirmed data directly from the RPC endpoint that solves this problem. SubQuery takes the most probabilistic data before it is confirmed to provide to the app. In the unlikely event that the data isn’t confirmed and a reorg occurs, SubQuery will automatically roll back and correct its mistakes quickly and efficiently - resulting in an insanely quick user experience for your customers.
+
+To control this feature, please adjust the [--block-confirmations](../../run_publish/references.md#block-confirmations) command to fine tune your project and also ensure that [historic indexing](../../run_publish/references.md#disable-historical) is enabled (enabled by default)
diff --git a/docs/run_publish/query/other_tools/metabase.md b/docs/run_publish/query/other_tools/metabase.md
@@ -88,3 +88,68 @@ In addition to the fundamental query visualisations, Metabase offers a variety o
 4. **Parameterised Queries:** to make your queries interactive and dynamic by incorporating parameters.
 
 5. **Data Alerts and Scheduled Reports:** to stay informed about critical data changes with Metabase's data alerts and scheduled reports.
+
+## Tips
+
+Presented below is a compilation of diverse tips aimed at enhancing the precision of your data analysis.
+
+### Dealing with Historical Data
+
+When the specified feature is activated, modifications to an entity with a specific ID result in the creation of a new row instead of updating the existing one. This new row includes the mentioned block range. Consequently, performing aggregations on a table with multiple rows for the same IDs can lead to inaccurate data. To address this, it is necessary to filter the data so that each ID corresponds to a single entity. The following examples provide a more detailed exploration:
+
+#### Capturing the Latest Entity State
+
+To illustrate this concept further, let's introduce a new entity to the previously mentioned example named `Account` linked to the `Swap` entity through a `sender` field. The schema is as follows:
+
+```graphql
+...
+
+type Swap @entity {
+  id: ID!
+  sender: Account!
+  ...
+
+type Account @entity {
+  id: ID!
+  swapsAmount: BigInt!
+}
+```
+
+Additionally, a `swap_amount` field is added to calculate the sum of swaps for a new entity. With historical data enabled, each time a new swap is recorded, a new row is appended to indicate the sum of swaps for the specified account within a given `_block_range`. Examine the database snapshot below, and observe multiple rows stored with different `_block_range` values for a single senderId:
+
+![](/assets/img/run_publish/metabase/blockRangesForSameId.png)
+
+Now, consider the scenario where one wants to join the `Swap` and `Account` tables for further analysis. A straightforward join would result in a many-to-many relationship due to the multitude of rows for each sender. Thus, the rows need to be reduced to a singular row for each unique ID (AccountId in our case) before performing the join.
+
+If the goal is to select a row where the account reflects the latest state (i.e., `_block_range` end block is null), employing an SQL Window function becomes necessary, as demonstrated in the following example:
+
+```sql
+SELECT *
+FROM
+  (SELECT *,
+          RANK() OVER (PARTITION BY "app"."accounts"."id"
+                       ORDER BY "app"."accounts"."_block_range" DESC) AS "ranked_from_latest"
+   FROM "app"."accounts") AS "ranked_list"
+WHERE "ranked_list"."ranked_from_latest" = 1
+ORDER BY "ranked_list"."swaps_amount" DESC
+```
+
+If the aforementioned query is executed, the resulting output will be as follows:
+
+![](/assets/img/run_publish/metabase/rankedSQLOutput.png)
+
+It is evident that all rows exhibit their latest state (the `_block_range` end block is null - indicating the absence of a second value).
+
+This query can be saved as a [Metabase model](https://www.metabase.com/docs/latest/data-modeling/models) to facilitate caching, allowing for direct and more efficient querying. To convert a question into a table, you can follow the instructions in the UI, as depicted below.
+
+![](/assets/img/run_publish/metabase/convertingToModel.png)
+
+To enable model caching, navigate to the admin settings: "Settings" -> "Caching" -> "Models" and activate the feature as shown in the following image.
+
+![](/assets/img/run_publish/metabase/cachingSetting.png)
+
+Once enabled, you can directly query this precalculated model. Returning to the previously mentioned use-case, you can now join the tables and ensure the uniqueness of addresses, thus ensuring accurate results in subsequent aggregations. The query can be formulated as follows:
+
+![](/assets/img/run_publish/metabase/joiningPrecalculatedModel.png)
+
+Beneath the surface, as we've been sorting the rows and selecting only the most recent one before, we have already obtained a result, and it has been cached in this manner, eliminating the need to re-sort. Currently, we are effortlessly incorporating each swap and merging the tables on the go. With a single row for each AccountID, we can confidently ensure the accuracy of subsequent aggregations.