From 6cd29c57e0ade6a1584450b363ae1fb95e3ed1e6 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Sat, 11 Jan 2025 09:40:50 +0000 Subject: [PATCH 01/28] Re-order docs --- apps/hashdotai/guide/05_workers/00_index.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/apps/hashdotai/guide/05_workers/00_index.mdx b/apps/hashdotai/guide/05_workers/00_index.mdx index 209935bd047..a0f597d2c0a 100644 --- a/apps/hashdotai/guide/05_workers/00_index.mdx +++ b/apps/hashdotai/guide/05_workers/00_index.mdx @@ -14,16 +14,16 @@ sidebarIcon: https://app.hash.ai/icons/docs/ai-overview.svg To view or manage your workers, click on the "Workers" tab in the left-hand sidebar of the HASH app. From here you'll be able to view their activity, provide them with new goals, or chat with them to get answers directly. -# Answering questions - -Workers can also help you understand topics with more expertise and in more detail than ordinary AI chatbots, by combining their own knowledge and access to the World Wide Web, with information from HASH (including data in your own personal [web](/guide/webs)). Answers are supplemented with references to the entity, webpage, or other context used in their production, so you can have more confidence in the underlying reasoning. - # Supporting goals Workers can help solve goals on your behalf. Currently, workers support research and analysis related tasks — for example, adding new entities to, or enriching existing entities within — your web, by autonomously researching topics and analyzing the collected information. When conducting research, workers are able to search for and interpret information already in your web, as well as public entities from other [HASH webs](/guide/webs), and the outside World Wide Web. As always, private information within both personal and shared webs is not accessible to anybody else (including other people's workers). +# Answering questions + +Workers can also help you understand topics with more expertise and in more detail than ordinary AI chatbots, by combining their own knowledge and access to the World Wide Web, with information from HASH (including data in your own personal [web](/guide/webs)). Answers are supplemented with references to the entity, webpage, or other context used in their production, so you can have more confidence in the underlying reasoning. + # Managing flows [Flows](/guide/flows) are pre-defined series of steps that workers can execute. If something goes wrong, depending on how the flow has been set up and defined, the worker executing it has the ability to decide how to do proceed. From 763cbd0faadd42d4864677cfb37f28454000ac7a Mon Sep 17 00:00:00 2001 From: vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Mon, 10 Feb 2025 17:43:07 +0100 Subject: [PATCH 02/28] Update copy --- apps/hashdotai/guide/01_introduction/00_index.mdx | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/apps/hashdotai/guide/01_introduction/00_index.mdx b/apps/hashdotai/guide/01_introduction/00_index.mdx index 7862b562f46..5411378668a 100644 --- a/apps/hashdotai/guide/01_introduction/00_index.mdx +++ b/apps/hashdotai/guide/01_introduction/00_index.mdx @@ -12,14 +12,17 @@ sidebarIcon: https://app.hash.ai/icons/docs/introduction-whatishash.svg HASH is a new kind of database. Operating in the background, HASH continuously extracts and integrates data from the outside world. This information is assembled into a private knowledge graph, called a "web", and can be used in a number of different ways. -Traditionally, knowledge graphs have been hard to bootstrap and maintain, while AI "research" tools (although fast) lack the ability to assure information's quality and provenance. HASH, in contrast, is suitable for real-world use in domains where data accuracy, integrity and trustworthiness are paramount (for example: in error-sensitive applications, when making potentially high-cost/high-risk decisions, and when operating within regulated industries). - # How can webs be used? +New AI tools, including large language models such as ChatGPT, work best with "semantic" data. This sort of information is typically stored in a "knowledge graph", which is what we call a "[web](/guide/webs)" in HASH. Prior to HASH, knowledge graphs were hard to bootstrap and maintain. HASH solves this problem, quickly and easily building webs containing information whose quality and provenance are assured. + +This makes HASH suitable for real-world use **extracting data** and **powering AI models** in domains where data accuracy, integrity and trustworthiness are paramount (for example: in error-sensitive applications, when making potentially high-cost/high-risk decisions, and when operating within regulated industries). + Once information is in your web, HASH lets you: - view and explore data visually as a graph, or through tables - construct dashboards and custom views of data, helping you analyze aggregate information +- query data with natural language, and chat with your web to get answers to questions - see how information changes over time, viewing the "history" of entities - [identify](/guide/webs/history) where information in a web came from, viewing the original source and/or claimants - [export](/guide/integrations) information in a variety of formats, and sync it with other apps or databases From 60ea7ccb70cfa5a143ad6fd4270fd0cc8a76e61d Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Wed, 12 Feb 2025 08:13:15 +0000 Subject: [PATCH 03/28] Update README.md --- infra/terraform/README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/infra/terraform/README.md b/infra/terraform/README.md index 9d8d6cd931a..c6b138344e2 100644 --- a/infra/terraform/README.md +++ b/infra/terraform/README.md @@ -4,10 +4,10 @@ This folder contains Terraform modules to deploy a HASH instance on AWS. The ent ## Getting started -1. Install the [terraform CLI](https://learn.hashicorp.com/tutorials/terraform/install-cli) +1. Install the [Terraform CLI](https://learn.hashicorp.com/tutorials/terraform/install-cli) 1. Install Docker 1. Install [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and configure it to use [your credentials](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html) -1. Initialize the terraform modules by executing the following command in [`./hash/`](./hash/): `terraform init` +1. Initialize the Terraform modules by executing the following command in [`./hash/`](./hash/): `terraform init` After initializing, you'll be put into the `default` workspace which isn't allowed for the plan. You can create new workspace names by creating/selecting new workspaces: @@ -19,7 +19,7 @@ $ terraform workspace select prod prod ``` -By default, the selected region is `us-east-1` and can be configured by editing the TF variables used for applying the TF plan, e.g. the one in [`./hash/prod-usea1.tfvars`](./hash/prod-usea1.tfvars). +By default, the selected region is `us-east-1` and can be configured by editing the Terraform variables used for applying the Terraform plan (e.g. the one in [`./hash/prod-usea1.tfvars`](./hash/prod-usea1.tfvars)). # Naming convention @@ -135,7 +135,7 @@ $ terraform apply --var-file prod-usea1.tfvars ## 2. Migrate databases -Once the terraform infrastructure is deployed, you should have an RDS Postgres database accessible from the bastion host with `graph` and `kratos` users/dbs. These need to be migrated locally in preparation for starting the services. +Once the Terraform infrastructure is deployed, you should have an RDS Postgres database accessible from the bastion host with `graph` and `kratos` users/dbs. These need to be migrated locally in preparation for starting the services. Before migrating, you must start an SSH tunnel through the bastion host to access the database. This can be done by executing the following command from the [`./hash/`](./hash/) folder: @@ -233,7 +233,7 @@ $ docker push 000000000000.dkr.ecr.us-east-1.amazonaws.com/h-hash-prod-usea1-kra **Building temporal services**: -All temporal services requires the `TEMPORAL_VERSION` build argument to be set to the version of temporal to use. The current version can be found in [`.env`](../../.env) in the repository root and should be set to the same value as the `HASH_TEMPORAL_VERSION`. The image should be tagged with the same version as the `TEMPORAL_VERSION` build argument. +All Temporal services requires the `TEMPORAL_VERSION` build argument to be set to the version of Temporal to use. The current version can be found in [`.env`](../../.env) in the repository root and should be set to the same value as the `HASH_TEMPORAL_VERSION`. The image should be tagged with the same version as the `TEMPORAL_VERSION` build argument. ```console $ DOCKER_BUILDKIT=1 docker build ./apps/hash-external-services/temporal/ -f ./apps/hash-external-services/temporal/migrate.Dockerfile --build-arg TEMPORAL_VERSION=$HASH_TEMPORAL_VERSION -t 000000000000.dkr.ecr.us-east-1.amazonaws.com/h-temporal-prod-usea1-migrate:$HASH_TEMPORAL_VERSION @@ -244,7 +244,7 @@ $ docker push 000000000000.dkr.ecr.us-east-1.amazonaws.com/h-temporal-prod-usea1 .. ``` -To build and push the temporal workers you may use these commands: +To build and push the Temporal workers you may use these commands: ```console $ # AI Typescript worker From d12f213c1a2e9c8cad06e9744e5cc1be6529447c Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Wed, 12 Feb 2025 08:37:24 +0000 Subject: [PATCH 04/28] Update README.md Removes mention of blocks --- apps/hash/README.md | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/apps/hash/README.md b/apps/hash/README.md index 169ca5c7a97..1e29f48d0ca 100644 --- a/apps/hash/README.md +++ b/apps/hash/README.md @@ -221,24 +221,6 @@ Transactional emails templates are located in the following locations: To use `AwsSesEmailTransporter` instead, set `export HASH_EMAIL_TRANSPORTER=aws_ses` in your terminal before running the app. Note that you will need valid AWS credentials for this email transporter to work. -## Integration with the Block Protocol - -HASH is built around the open [Block Protocol](https://blockprotocol.org) ([@blockprotocol/blockprotocol](https://github.com/blockprotocol/blockprotocol) on GitHub). - -### Using blocks - -Blocks published to the [Þ Hub](https://blockprotocol.org/hub) can be run within HASH via the 'insert block' (aka. 'slash') menu. - -While running the app in development mode, you can also test local blocks out in HASH by going to any page, clicking on the menu next to an empty block, and pasting in the URL to your block's distribution folder (i.e. the one containing `block-metadata.json`, `block-schema.json`, and the block's code). If you need a way of serving your folder, try [`serve`](https://github.com/vercel/serve). - -### HASH blocks - -The code pertaining to HASH-developed blocks can be found in the [`/blocks` directory](https://github.com/hashintel/hash/tree/main/blocks) in the root of this monorepo. - -### Creating new blocks - -See the [Developing Blocks](https://blockprotocol.org/docs/developing-blocks) page in the [Þ Docs](https://blockprotocol.org/docs) for instructions on developing and publishing your own blocks. - ## Development [//]: # "TODO: Pointers to where to update/modify code" From cf4f913b7b6117ba44d59501cfdfe8f020f3a563 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Wed, 12 Feb 2025 08:45:26 +0000 Subject: [PATCH 05/28] Update README.md --- blocks/README.md | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/blocks/README.md b/blocks/README.md index 93b4a43aadf..fcd73b68045 100644 --- a/blocks/README.md +++ b/blocks/README.md @@ -29,9 +29,13 @@ # Blocks -This directory contains the source code for all HASH-developed public [Block Protocol](https://blockprotocol.org/) blocks. +HASH is built around the open [Block Protocol](https://blockprotocol.org) ([@blockprotocol/blockprotocol](https://github.com/blockprotocol/blockprotocol) on GitHub). The current version of HASH is based upon an adapted version of the [Block Protocol Graph Module](https://blockprotocol.org/spec/graph) which will be formalized at a later date. -You can live preview most of these on the [`@hash`](https://blockprotocol.org/@hash/blocks) page in the [Þ Hub](https://blockprotocol.org/hub), and direct links are provided below. +Planned features such as [pages](https://hash.ai/guide/pages) and [apps](https://hash.ai/guide/apps) more directly utilize the [blocks](https://hash.ai/guide/pages/blocks) found in this directory, which contains the source code for all public HASH-developed [Block Protocol](https://blockprotocol.org/) blocks. + +## HASH Blocks + +You can preview most HASH blocks on the [`@hash`](https://blockprotocol.org/@hash/blocks) page in the [Þ Hub](https://blockprotocol.org/hub), and direct links are provided below. **Please note:** this table/directory contains HASH-published blocks only, and does not contain the full extent of available Þ blocks. @@ -60,17 +64,13 @@ You can live preview most of these on the [`@hash`](https://blockprotocol.org/@h | [`timer`] | 0.3 | **Maintained** | [@hash/blocks/timer](https://blockprotocol.org/@hash/blocks/timer) | | | [`video`] | 0.3 | **Maintained** | [@hash/blocks/video](https://blockprotocol.org/@hash/blocks/video) | | -## Creating a block +## Using blocks -Run the following command to create a new block: +**In the HASH app (production):** Blocks published to the [Þ Hub](https://blockprotocol.org/hub) can be run within HASH via the 'insert block' (aka. 'slash') menu. -```sh -yarn create-block block-name -``` - -## Running these blocks +**In the HASH app (development):** While running the HASH app in development mode, in addition to inserting blocks published to the Þ Hub, you can also test locally-developed blocks out by going to any page, clicking on the menu next to an empty block, and pasting in the URL to your block's distribution folder (i.e. the one containing `block-metadata.json`, `block-schema.json`, and the block's code). If you need a way of serving your folder, try [`serve`](https://github.com/vercel/serve). -If you want to work on, build or serve a single block, run: +**From the command line:** If you want to work on, build or serve a single block, run: ```sh yarn workspace @blocks/block-name dev @@ -80,13 +80,23 @@ yarn workspace @blocks/block-name build yarn workspace @blocks/block-name serve ``` +**From other applications:** Blocks published to the [Þ Hub](https://blockprotocol.org/hub) can be used within any embedding application that integrates with the Block Protocol. + +## Creating blocks + +See the [Developing Blocks](https://blockprotocol.org/docs/developing-blocks) page in the [Þ Docs](https://blockprotocol.org/docs) for instructions on developing and publishing your own blocks. + +Run the following command to create a new block: + +```sh +yarn create-block block-name +``` + ## Publishing blocks -Blocks are currently published via manually-triggered GitHub actions: +The HASH-developed blocks in this repository are currently published via manually-triggered GitHub actions: - Publish blocks to preview (choose a branch) - Publish blocks to production -## Using these blocks - -As a user, you can access the published versions of these blocks via any embedding application that integrates with the Þ Hub. +To publish your own block, in another [Þ Hub](https://blockprotocol.org/hub) namespace (and separate from this repository), see the "[Publishing Blocks](https://blockprotocol.org/docs/blocks/develop#publish)" guide in the Þ Docs. From f6a054252d3d64a09f2c4732ffc7136901a12c37 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Wed, 12 Feb 2025 08:50:38 +0000 Subject: [PATCH 06/28] Update README.md --- libs/README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/libs/README.md b/libs/README.md index ff24a7e3fd7..12923ed2a5b 100644 --- a/libs/README.md +++ b/libs/README.md @@ -26,10 +26,11 @@ Contains the source code for software development libraries which HASH has publi | Directory | Language(s) | Publication URL | Docs URL | Description | | ------------------------- | ----------- | ------------------------------------------------------------ | ---------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | | [antsi] | Rust | [Crates.io](https://crates.io/crates/antsi) | [Docs.rs](https://docs.rs/antsi/latest/antsi/) | Supports coloring Select Graphic Rendition (as defined in ISO 6429) with no external dependencies | +| [chonky] | Rust | [Crates.io](https://crates.io/crates/chonky) | [Docs.rs](https://docs.rs/chonky/latest/chonky/) | Assists in the segmentation, chunking and embedding of information contained within arbitrary files | | [deer] | Rust | [Crates.io](https://crates.io/crates/deer) | [Docs.rs](https://docs.rs/deer/latest/deer/) | **Experimental** backend-agnostic deserialization framework, featuring meaningful error messages and context and fail-slow behavior by default | -| [error-stack] | Rust | [Crates.io](https://crates.io/crates/error-stack) | [Docs.rs](https://docs.rs/error-stack/latest/error_stack/) | Context-aware error-handling library that supports arbitrary attached user data | -| [sarif] | Rust | [Crates.io](https://crates.io/crates/sarif) | [Docs.rs](https://docs.rs/sarif/latest/sarif/) | Representation of the SARIF specification in Rust | -| [@hashintel/type-editor] | TypeScript | [npm](https://www.npmjs.com/package/@hashintel/type-editor) | To be written | UI for editing entity types defined according to the [Block Protocol's Type System](https://blockprotocol.org/docs/working-with-types) | +| [error-stack] | Rust | [Crates.io](https://crates.io/crates/error-stack) | [Docs.rs](https://docs.rs/error-stack/latest/error_stack/) | Context-aware error-handling library that supports arbitrary attached user data | +| [sarif] | Rust | [Crates.io](https://crates.io/crates/sarif) | [Docs.rs](https://docs.rs/sarif/latest/sarif/) | Representation of the SARIF specification in Rust | +| [@hashintel/type-editor] | TypeScript | [npm](https://www.npmjs.com/package/@hashintel/type-editor) | To be written | UI for editing entity types defined according to the [Block Protocol's Type System](https://blockprotocol.org/docs/working-with-types) | | [@hashintel/query-editor] | TypeScript | [npm](https://www.npmjs.com/package/@hashintel/query-editor) | To be written | UI for editing queries (a specific entity type used heavily inside of [HASH]) | ## Internal Libraries From 6537739c7c00434bc1719f4795b5c91881459580 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Wed, 12 Feb 2025 08:50:56 +0000 Subject: [PATCH 07/28] Update README.md --- libs/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/libs/README.md b/libs/README.md index 12923ed2a5b..e72054b669f 100644 --- a/libs/README.md +++ b/libs/README.md @@ -5,6 +5,7 @@ [github_star]: https://github.com/hashintel/hash/tree/main/libs# [hash]: https://github.com/hashintel/hash/tree/main/apps/hash [antsi]: antsi +[chonky]: chonky [deer]: deer [error-stack]: error-stack [sarif]: sarif From ae312263a1871b5ae0280cce2437d547135204ec Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Wed, 12 Feb 2025 10:20:08 +0000 Subject: [PATCH 08/28] Update README.md --- blocks/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/blocks/README.md b/blocks/README.md index fcd73b68045..7aad4f08570 100644 --- a/blocks/README.md +++ b/blocks/README.md @@ -37,8 +37,6 @@ Planned features such as [pages](https://hash.ai/guide/pages) and [apps](https:/ You can preview most HASH blocks on the [`@hash`](https://blockprotocol.org/@hash/blocks) page in the [Þ Hub](https://blockprotocol.org/hub), and direct links are provided below. -**Please note:** this table/directory contains HASH-published blocks only, and does not contain the full extent of available Þ blocks. - | Directory | Spec Target | Status | Þ Hub URL | Description | | ---------------- | ----------- | -------------- | -------------------------------------------------------------------------------- | ----------- | | [`address`] | 0.3 | **Maintained** | [@hash/blocks/address](https://blockprotocol.org/@hash/blocks/address) | | @@ -64,6 +62,8 @@ You can preview most HASH blocks on the [`@hash`](https://blockprotocol.org/@has | [`timer`] | 0.3 | **Maintained** | [@hash/blocks/timer](https://blockprotocol.org/@hash/blocks/timer) | | | [`video`] | 0.3 | **Maintained** | [@hash/blocks/video](https://blockprotocol.org/@hash/blocks/video) | | +**Please note:** this table/directory contains HASH-developed blocks which are (or were) published to the [Þ Hub](https://blockprotocol.org/hub) under the official `@hash` namespace. This reflects neither the full extent of available Þ blocks, nor even those originally developed by HASH. A number of other publicly-accessible blocks can be found in the `@hashdeps` GitHub org, including the [Calculation Table](https://github.com/hashdeps/calculation-table-block), [Drawing](https://github.com/hashdeps/tldraw-block), and [Pull/Merge Request Overview](https://github.com/hashdeps/github-pr-overview) blocks. + ## Using blocks **In the HASH app (production):** Blocks published to the [Þ Hub](https://blockprotocol.org/hub) can be run within HASH via the 'insert block' (aka. 'slash') menu. From 5aac23788cf0db5d1e927f991ef06f1f4f30c53d Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 11:11:56 +0000 Subject: [PATCH 09/28] Update knowledge-graphs.mdx --- apps/hashdotai/glossary/knowledge-graphs.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/apps/hashdotai/glossary/knowledge-graphs.mdx b/apps/hashdotai/glossary/knowledge-graphs.mdx index db7c5b39002..4071c66f3f6 100644 --- a/apps/hashdotai/glossary/knowledge-graphs.mdx +++ b/apps/hashdotai/glossary/knowledge-graphs.mdx @@ -5,7 +5,7 @@ slug: knowledge-graphs tags: ["Data Science", "Graphs"] --- -A knowledge graph is a collection of linked concepts. It uses a graph structure to store semantically linked entities (e.g. objects, concepts, events). +A knowledge graph is a collection of linked concepts. It uses a [graph structure](/glossary/graphs) to store semantically linked entities (e.g. objects, concepts, events). Knowledge graphs help contextualize data - instead of treading a datum as a single, isolated fact, they store information on its relationship to other pieces of data. For instance, in a knowledge graph containing information on cars, each car could have a connection to its manufacturer; it's then easy for AI Applications to infer which cars are related. From 867f5ea1900c72f7c3918916fca4c765b9af0cd4 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 12:28:26 +0000 Subject: [PATCH 10/28] Delete `apps/hashdotai/glossary/url_map.json` --- apps/hashdotai/glossary/url_map.json | 367 --------------------------- 1 file changed, 367 deletions(-) delete mode 100644 apps/hashdotai/glossary/url_map.json diff --git a/apps/hashdotai/glossary/url_map.json b/apps/hashdotai/glossary/url_map.json deleted file mode 100644 index 6d550113fd3..00000000000 --- a/apps/hashdotai/glossary/url_map.json +++ /dev/null @@ -1,367 +0,0 @@ -[ - { - "title": "Actor Model", - "description": "There are two main approaches to building agent-based simulations: object-oriented programming and the actor-based model.", - "slug": "actor-model", - "tags": ["Simulation Modeling", "Software Engineering"] - }, - { - "title": "Agent-Based Modeling", - "description": "ABMs simulate entities in virtual environments, or digital twins, in order to help better understand both entities and their environments.", - "slug": "agent-based-modeling", - "tags": ["Simulation Modeling"] - }, - { - "title": "Applicant Tracking System", - "description": "Applicant tracking systems help employers manage recruitment and hiring.", - "slug": "ats", - "tags": ["Business Software"] - }, - { - "title": "Autocorrelation", - "description": "Autocorrelation is a measure of the degree of similarity between any time series and a lagged or offset version of itself over successive time intervals.", - "slug": "autocorrelation", - "tags": ["Data Science"] - }, - { - "title": "Block Protocol", - "description": "The open Block Protocol standardizes the means by which blocks and the applications that embed them communicate.", - "slug": "block-protocol", - "tags": ["Software Engineering", "Standards"] - }, - { - "title": "Business Intelligence", - "description": "Business Intelligence allows companies to make data-driven decisions.", - "slug": "business-intelligence", - "tags": ["Business Intelligence"] - }, - { - "title": "Business Process Modeling", - "description": "Business Process Modeling (BPM) helps organizations catalog, understand and improve their processes.", - "slug": "business-process-modeling", - "tags": ["Business Intelligence", "Simulation Modeling"] - }, - { - "title": "Content Management System", - "description": "Content management systems allow you to build and manage websites.", - "slug": "cms", - "tags": ["Business Software"] - }, - { - "title": "Customer Relationship Management System", - "description": "Customer relationship management systems track and coordinate interactions between a company and its customers.", - "slug": "crm", - "tags": ["Business Software"] - }, - { - "title": "Data Drift", - "description": "Data Drift is the phenomenon where changes to data degrade model performance.", - "slug": "data-drift", - "tags": ["Data Science"] - }, - { - "title": "Data Mesh", - "description": "Data meshes are decentralized database solutions.", - "slug": "data-mesh", - "tags": ["Data Science"] - }, - { - "title": "Data Mining", - "description": "Data Mining is a process applied to find unknown patterns, correlations, and anomalies in data. Through mining, meaningful insights can be extracted from data.", - "slug": "data-mining", - "tags": ["Data Science", "Machine Learning"] - }, - { - "title": "Data Pipelines", - "description": "Data pipelines are processes that result in the production of data products, including datasets and models.", - "slug": "data-pipelines", - "tags": ["Data Science"] - }, - { - "title": "Data Types", - "description": "Data types describe a space of possible values through the specification of constraints", - "slug": "data-types", - "tags": ["Graphs", "Standards"] - }, - { - "title": "Datasets", - "description": "Datasets are collections of numbers or words, generally centered around a single topic or subject.", - "slug": "datasets", - "tags": ["Business Intelligence", "Data Science"] - }, - { - "title": "Deep Reinforcement Learning", - "description": "DRL is a subset of Machine Learning in which agents are allowed to solve tasks on their own, and thus discover new solutions independent of human intuition.", - "slug": "deep-reinforcement-learning", - "tags": ["Machine Learning", "Simulation Modeling"] - }, - { - "title": "Diffing", - "description": "Diffs are used to track changes between different versions or forks of a project, providing an overview regarding files changed, and the nature of those changes.", - "slug": "diff", - "tags": ["Software Engineering"] - }, - { - "title": "Digital Twin", - "description": "Digital twins are a detailed simulated analogue to a real-world system", - "slug": "digital-twin", - "tags": ["Business Intelligence", "Simulation Modeling"] - }, - { - "title": "Directed Acyclic Graphs", - "description": "If you don’t know your DAGs from your dogs, you can finally get some clarity and sleep easily tonight. Learn what makes a Directed Acyclic Graph a DAG.", - "slug": "dag", - "tags": ["Data Science", "Graphs", "Software Engineering"] - }, - { - "title": "Discrete Event Simulation", - "description": "DES is a modeling approach that focuses on the occurrence of events in a simulation, separately and instantaneously, rather than on any chronological-scale.", - "slug": "discrete-event-modeling", - "tags": ["Simulation Modeling"] - }, - { - "title": "Document Management System", - "description": "Document management systems allow you to store, manage and track documents (both physical and digital).", - "slug": "dms", - "tags": ["Business Software"] - }, - { - "title": "Ego Networks", - "description": "Ego networks are a framework for local analysis of larger graphs.", - "slug": "ego-networks", - "tags": ["Data Science", "Graphs", "Simulation Modeling"] - }, - { - "title": "Enterprise Resource Planning", - "description": "Enterprise resource planning uses an integrated software system to manage a business' daily tasks.", - "slug": "erp", - "tags": ["Business Software"] - }, - { - "title": "Entities", - "description": "Entities are individual ‘things’ with a distinct, independent existence.", - "slug": "entities", - "tags": ["Graphs", "Simulation Modeling"] - }, - { - "title": "Entity Types", - "description": "Entity types represent commonly recurring classes of entities, and describe their properties.", - "slug": "entity-types", - "tags": ["Graphs", "Standards"] - }, - { - "title": "Fast Healthcare Interoperability Resources", - "description": "An electronic healthcare standard for data interoperability.", - "slug": "fhir", - "tags": ["Standards"] - }, - { - "title": "Forking", - "description": "Forking something means to create a copy of it, allowing individual developers or teams to work on their own versions of it, in safe isolation.", - "slug": "fork", - "tags": ["Software Engineering"] - }, - { - "title": "Graph Databases", - "description": "Graph Databases are a type of database that emphasizes the relationships between data.", - "slug": "graph-databases", - "tags": ["Graphs", "Software Engineering"] - }, - { - "title": "Graph Representation Learning", - "description": "Graph representation learning is a more tailored way of applying machine learning algorithms to graphs and networks.", - "slug": "graph-representation-learning", - "tags": ["Graphs", "Machine Learning"] - }, - { - "title": "Graphs", - "description": "A graph is a collection of entities which may be connected to other entities by links.", - "slug": "graphs", - "tags": ["Graphs", "Machine Learning", "Software Engineering"] - }, - { - "title": "Integrations", - "description": "Integrations allow information from different systems to be brought together, and actions coordinated across them.", - "slug": "integrations", - "tags": [ - "Business Intelligence", - "Business Software", - "Graphs", - "Standards" - ] - }, - { - "title": "Knowledge Graph Machine Learning", - "description": "Knowledge graphs are information-dense inputs to machine learning algorithms, and can capture more human-readable outputs of algorithms.", - "slug": "knowledge-graph-machine-learning", - "tags": ["Graphs", "Machine Learning"] - }, - { - "title": "Knowledge Graphs", - "description": "Knowledge Graphs contextualize data and power insight generation.", - "slug": "knowledge-graphs", - "tags": ["Data Science", "Graphs"] - }, - { - "title": "Links", - "description": "Links between different entities represent the relationships and connections between them.", - "slug": "links", - "tags": ["Graphs"] - }, - { - "title": "Properties", - "description": "Properties store individual pieces of information about entities. All property fields on an entity are inferred from its entity type(s).", - "slug": "properties", - "tags": ["Data Science", "Graphs"] - }, - { - "title": "Machine Learning", - "description": "Machine Learning is a subfield of Artificial Intelligence where parameters of an algorithm are updated from data inputs or by interacting with an environment.", - "slug": "machine-learning", - "tags": ["Machine Learning"] - }, - { - "title": "Merging", - "description": "Merging is the process of reconciling two projects together. In HASH merging projects is handled by submitting, reviewing and approving “merge requests”.", - "slug": "merge", - "tags": ["Software Engineering"] - }, - { - "title": "Metadata", - "description": "Metadata is data about data. It’s quite simple, really. Learn more about how it’s used within.", - "slug": "metadata", - "tags": ["Data Science", "Standards"] - }, - { - "title": "Model Drift", - "description": "Models tend to become less accurate over time.", - "slug": "model-drift", - "tags": ["Data Science", "Simulation Modeling"] - }, - { - "title": "Model Licensing", - "description": "There are lots of ways to license simulation models. Here we outline some key considerations and things to be aware of.", - "slug": "model-licensing", - "tags": ["Simulation Modeling"] - }, - { - "title": "Model Sharing", - "description": "There are lots of ways to share simulation models: blackbox, greybox, closed, open, transparent, and output-only. Here we explain what these terms all mean.", - "slug": "model-sharing", - "tags": ["Simulation Modeling"] - }, - { - "title": "Multi-Agent Systems", - "description": "Multi-Agent Systems represent real-world systems as collections of intelligent agents.", - "slug": "multi-agent-systems", - "tags": ["Simulation Modeling", "Software Engineering"] - }, - { - "title": "Artificial Neural Networks", - "description": "Artificial Neural Networks are computer models inspired by animal brains. They consist of collections of nodes, arranged in layers, which transfer signals.", - "slug": "neural-nets", - "tags": ["Machine Learning"] - }, - { - "title": "Optimization Methods", - "description": "The key to finding the best solution to any problem.", - "slug": "optimization-methods", - "tags": ["Data Science", "Simulation Modeling"] - }, - { - "title": "Parameters", - "description": "Parameters control specific parts of a system's behavior.", - "slug": "parameters", - "tags": ["Data Science", "Simulation Modeling"] - }, - { - "title": "Process Mining", - "description": "Process mining is an application of data mining with the purpose of mapping an organization’s processes. It is used to optimize operations, and identify weaknesses.", - "slug": "process-mining", - "tags": ["Data Science", "Machine Learning", "Simulation Modeling"] - }, - { - "title": "Project Management Software", - "description": "Project management software is used to manage teams completing complex projects.", - "slug": "project-management-software", - "tags": ["Business Software"] - }, - { - "title": "Robotic Process Automation", - "description": "Robotic process automation uses software to perform repeatable business tasks.", - "slug": "rpa", - "tags": ["Business Software"] - }, - { - "title": "Robustness", - "description": "Robustness is a measure of a model's accuracy when presented with novel data.", - "slug": "robustness", - "tags": ["Data Science", "Machine Learning", "Simulation Modeling"] - }, - { - "title": "Schemas", - "description": "Schemas are descriptions of things: agents in simulations, and the actions they take. They help make simulations interoperable, and data more easily understood.", - "slug": "schemas", - "tags": ["Data Science", "Simulation Modeling", "Software Engineering"] - }, - { - "title": "Scraping", - "description": "Web scraping is the process of automatically extracting data from websites efficiently and reliably.", - "slug": "scraping", - "tags": ["Data Science", "Graphs", "Software Engineering"] - }, - { - "title": "Simulation Modeling", - "description": "Simulation Models seek to demonstrate what happens to environments and agents within them, over time, under varying conditions.", - "slug": "simulation", - "tags": ["Simulation Modeling"] - }, - { - "title": "Single Synthetic Environment", - "description": "Single synthetic environments allow you to build, run, and analyze data-driven models and simulations.", - "slug": "single-synthetic-environment", - "tags": ["Business Intelligence", "Simulation Modeling"] - }, - { - "title": "Stochasticity", - "description": "Stochasticity is a measure of randomness. The state of a stochastic system can be modeled but not precisely predicted.", - "slug": "stochasticity", - "tags": ["Data Science", "Simulation Modeling"] - }, - { - "title": "Synthetic Data Generation", - "description": "Generating data that mimics real data for use in machine learning.", - "slug": "synthetic-data-generation", - "tags": ["Data Science", "Machine Learning", "Simulation Modeling"] - }, - { - "title": "System Dynamics", - "description": "System Dynamics models represent a system as a set of stocks and the rates of flows between them.", - "slug": "system-dynamics", - "tags": ["Simulation Modeling"] - }, - { - "title": "Time Series Data", - "description": "Time series data is data that has been indexed, listed, or graphed in time order. For example, the daily closing value of the NASDAQ, the price of a cryptocurrency per second, or a single step in a simulation run.", - "slug": "time-series", - "tags": ["Data Science"] - }, - { - "title": "Types", - "description": "Types describe the shape that information is expected to take, through rules and constraints associated with it.", - "slug": "types", - "tags": ["Graphs", "Software Engineering", "Standards"] - }, - { - "title": "Discrete vs Continuous Time", - "description": "In continuous time, variables may have specific values for only infinitesimally short amounts of time. In discrete time, values are measured once per time interval.", - "slug": "time", - "tags": ["Data Science", "Simulation Modeling"] - }, - { - "title": "Values", - "description": "Values are the information contained within a single property on a specific instance of an entity.", - "slug": "values", - "tags": ["Data Science", "Graphs"] - } -] From 745e7a02738c5f382202d3dd1e82f03376278420 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 15:12:30 +0000 Subject: [PATCH 11/28] Update knowledge-graphs.mdx --- apps/hashdotai/glossary/knowledge-graphs.mdx | 146 ++++++++++++++++++- 1 file changed, 142 insertions(+), 4 deletions(-) diff --git a/apps/hashdotai/glossary/knowledge-graphs.mdx b/apps/hashdotai/glossary/knowledge-graphs.mdx index 4071c66f3f6..91c42f82e1d 100644 --- a/apps/hashdotai/glossary/knowledge-graphs.mdx +++ b/apps/hashdotai/glossary/knowledge-graphs.mdx @@ -5,10 +5,148 @@ slug: knowledge-graphs tags: ["Data Science", "Graphs"] --- -A knowledge graph is a collection of linked concepts. It uses a [graph structure](/glossary/graphs) to store semantically linked entities (e.g. objects, concepts, events). +# Introduction -Knowledge graphs help contextualize data - instead of treading a datum as a single, isolated fact, they store information on its relationship to other pieces of data. For instance, in a knowledge graph containing information on cars, each car could have a connection to its manufacturer; it's then easy for AI Applications to infer which cars are related. +Knowledge graphs (KGs) have emerged as powerful tools for organizing and connecting data in a way that mirrors real-world relationships. Knowledge graph are networks of real-world things (i.e. “entities”) and the relationships between them. Entities might be objects, events, situations, or concepts. And knowledge graphs are typically stored and visualized as a collection of **nodes** (entities) and **edges** (relationships), often in the form of subject–predicate–object “triples” (e.g. ``), which together form a web of linked knowledge. Unlike traditional databases that silo information, knowledge graphs embed data in context, revealing how pieces of information relate to each other to provide a more meaningful, connected view. -Knowledge graphs have driven advances in applied machine learning. For instance Google uses a large, rich knowledge graph to inform and improve its search results - the snippets of information from searches are sourced from their knowledge graph. +Knowledge graphs help contextualize data - instead of treading a datum as a single, isolated fact, they store information on its relationship to other pieces of data. For instance, in a knowledge graph containing information on cars, each car could have a connection to its manufacturer; it's then easy for AI Applications to infer which cars are related. Knowledge graphs have driven advances in applied machine learning. -Knowledge graphs tend to use [graph databases](/glossary/graph-databases) to efficiently execute queries and searches. +Perhaps the most notable knowledge graph in use today is the Google Knowledge Graph (a large general-purpose knowledge base) which has powered Google search results since at least 2012. This large, rich knowledge graph helps inform and improve its search results - the snippets of information from searches are sourced from their knowledge graph. Today, KGs are used across many domains, and are used to organize data from multiple sources, add semantic context, and make it accessible to both humans and AI systems. + +Common applications of KGs include not only search engines and question-answering systems (going beyond keyword matching to understand a user’s actual intent), but also recommendation engines for products or content, chatbots and digital assistants, fraud detection in finance, and many more. In all cases, the knowledge graph serves as a **connected information layer** that helps “connect the dots” between disparate data, enabling deeper insights and smarter decision support. + +In this article, we will explain the **different types of knowledge graphs** that exist and how they are categorized. We will explore various industry-agnostic examples of their commercial applications – including **healthcare** and **supply chain management** – highlighting the benefits each type can offer. We will also discuss use cases, advantages, and common challenges or limitations, and how knowledge graphs contribute to business value, better decision-making, and operational efficiency. + +# Types of Knowledge Graphs + +Most knowledge graphs are stored in a [graph databases](/glossary/graph-databases) which allows them to be efficiently searched and queried. + +Knowledge graphs can be classified in several ways, depending on their **data model, scope, and design**. They differ by how they are implemented (the underlying graph technology), the breadth of their content (general-purpose vs domain-specific), their openness (public vs private), and even by temporal aspects (static vs dynamic). Below we outline some key types of knowledge graphs and their characteristics: + +## Semantic Knowledge Graphs (RDF-Based) + +One major category of KGs uses Semantic Web standards, primarily the Resource Description Framework (RDF). An RDF-based knowledge graph represents knowledge as triples and usually conforms to ontologies (formal schemas) that define the types of entities and relationships. RDF is a universal framework for describing metadata and knowledge, endorsed by W3C, allowing data from different sources to be encoded and linked in a common, machine-interpretable way. These *semantic knowledge graphs* often leverage technologies like OWL (Web Ontology Language) for rich semantics and SPARQL for querying. + +### Use cases and benefits + +Semantic KGs shine in scenarios requiring interoperability and standardization. Because they use shared vocabularies and URI identifiers for entities, they can integrate data across organizational or web boundaries. A classic example is **open knowledge graphs** on the web: projects like **DBpedia** and **Wikidata** aggregate structured knowledge from Wikipedia and other sources, providing a public semantic knowledge graph that many applications can reuse. Google’s own Knowledge Graph was initially built on such sources (like DBpedia, Freebase, Wikidata, etc.) to provide a foundation of general world knowledge. In the **healthcare domain**, semantic KGs are popular for integrating biomedical ontologies and datasets – for instance, linking gene databases, drug databases, and disease ontologies into a unified network of biomedical knowledge. The use of formal semantics enables reasoning: new facts can be inferred from the ontology and data (e.g., if *A* is a subtype of *B* and *B* is related to *C*, a reasoner can infer *A* is related to *C*). This can support advanced applications like clinical decision support or drug discovery, by uncovering indirect connections. The **contextual richness** of semantic KGs adds depth to AI applications; for example, in question-answering, a semantic KG helps a system understand the meaning of a query and retrieve exact answers using the graph’s relationships rather than just keywords. + +### Challenges + +Semantic graphs and ontologies can be complex to build and require expertise. Tools and standards (RDF/OWL) have a learning curve, and reasoning over a very large RDF graph can become computationally intensive. Ensuring different data sources align to a common ontology (resolving naming differences, for example) is itself a challenge. However, when successfully implemented, semantic KGs offer a highly expressive and interoperable knowledge structure that can be a long-term asset for knowledge management. [HASH](https://hash.ai/) is a multi-tenant platform that enables different organizations and individuals to capture and express information in the terms they care about, while maintaining interoperability with other knowledge graphs, eliminating the “traditional RDF”-based need to agree on common definitions of entities and types. + +## Property Graphs (Labeled Property Graphs) + +Another broad category is the *labeled property graph (LPG)* model, which is the basis of many graph databases (such as Neo4j, TigerGraph, Amazon Neptune (in LPG mode), etc.). In a property graph, nodes and edges can have **labels** and **properties** (key–value pairs) associated with them. This is a more schema-optional approach: rather than requiring a global ontology, property graphs allow each node/edge to carry its own descriptive attributes. For example, a node representing a **Person** might have properties like `name="Alice"`, and `age=”23”`, while an edge representing **purchased** might have the property `date="2025-02-28"`. Property graphs are very flexible and can represent complex networks with heterogeneous data. + +### Use cases and benefits + +Property graph knowledge graphs are common in **enterprise** settings where performance and agility are priorities. They are well-suited for operational applications like fraud detection, recommendation, or supply chain analysis where you need to traverse and query complex relationships quickly. Graph databases implementing LPG can perform graph traversals and pattern matching efficiently at scale, which may be hard to do with relational databases in such domains. Many organizations build internal knowledge graphs using property graph models to achieve a unified view of their business data (customers, products, transactions, etc.) without having to predefine a rigid global schema. For instance, a **supply chain knowledge graph** can use a property graph to model suppliers, factories, shipments, parts, and so on, with various attributes for each, enabling fast queries like “find alternative suppliers for component X in region Y” or “which products would be affected if supplier Z has a delay?”. (We will see a detailed supply chain example later.) Property graphs also lend themselves to integration with graph analytics and algorithms (community detection, shortest paths, centrality, etc.), supporting advanced **network analysis** on enterprise data. The benefit of this type is often *pragmatic flexibility*: teams can start populating a graph and iteratively add properties or new types of nodes as needed, aligning with agile development. + +### Challenges + +Unlike semantic KGs, property graphs don’t inherently enforce a common vocabulary across systems – which means integration still requires careful data alignment and governance. Without an ontology, there’s less automated reasoning (though one can manually encode business rules). Another consideration is interoperability: RDF is standardized, while property graph formats are less standardized (though efforts like LPG-to-RDF mappings exist). Still, property graphs avoid some complexity of semantic tech and often yield performance benefits, making them a popular choice for commercial knowledge graph platforms. [HASH](https://hash.ai/) provides a hybrid approach that combines the interoperability of semantic knowledge graphs with the expressiveness of labeled property graphs. + +## Other Classifications of Knowledge Graphs + +Beyond the implementation model, knowledge graphs can be distinguished by their scope and dynamics: + +### Enterprise vs. Public Knowledge Graphs + +An *enterprise knowledge graph* is built for internal use within an organization, capturing domain-specific knowledge and often integrating proprietary data (e.g., a bank’s KG of customers, accounts, and transactions, or a manufacturer’s KG of parts and suppliers). These are typically private and focus on a specific business domain or a *360-degree view* of the enterprise’s own data. In contrast, *public knowledge graphs* (like Google’s, Wikidata, etc.) are open or broad-scope, containing general knowledge about the world or a specific vertical, and are often accessible via public query endpoints or APIs. Public KGs can be leveraged by companies to enrich their own data – for example, an e-commerce site might link its product catalog to Wikidata entities to get additional attributes about products or related items. Enterprise KGs, on the other hand, directly contribute to internal decision-making and operations. Both types share similar technologies but differ in usage and accessibility. Many companies end up using a hybrid approach: internal KGs augmented with public data for context. + +### Domain-Specific vs. Cross-Domain + +Some knowledge graphs are **domain-specific**, containing deep knowledge about a particular field. For example: + +- A **healthcare knowledge graph** might include medical ontologies, patient data, research literature, and drug databases, all interlinked with domain-specific relationships (e.g., *Disease –\– Drug*). +- A **supply chain knowledge graph** focuses on logistics entities and relationships (suppliers, shipments, routes, inventories, etc.). Domain-specific KGs use terminology and schemas tailored to that field, which can make them extremely powerful for domain experts. +- Other KGs (like general search engine knowledge graphs or common-sense knowledge bases) are **cross-domain**, linking entities across many areas of knowledge. + +Cross-domain graphs provide breadth (useful for broad applications like general question-answering), whereas domain-specific graphs provide depth in a narrower area. In practice, many enterprise KGs start domain-specific (to solve a specific business problem) and later expand. + +### Static vs. Dynamic (Temporal) Knowledge Graphs + +A *static* knowledge graph is essentially a snapshot of knowledge at a given time – well-suited for relatively unchanging information – for example, an encyclopedia of knowledge. However, in many business scenarios knowledge is constantly evolving. + +*Dynamic* knowledge graphs update in real-time or near-real-time as new data comes in. For example, in cybersecurity or social media analytics, new events (like an alert or a post) might be ingested into the graph continuously. + +Dynamic *temporal* knowledge graphs explicitly track the time dimension for relationships, meaning the graph can represent how facts are true in certain time periods and not others (for instance, a relationship `works_at` might have timestamps for start and end of employment). Modeling time in a KG allows **temporal queries** and analysis of trends. + +Dynamic *event-based* knowledge graphs focus on capturing events and their participants as first-class nodes in the graph – useful in domains like intelligence analysis or IoT. + +Maintaining a dynamic KG can be challenging but important for applications like finance (tracking evolving ownership or transaction networks) or supply chain (tracking shipments over time), but in a business-context it is often critical, providing decision-makers with a view of the latest connected information, as most recently or currently understood. As such many enterprise knowledge graphs are now moving towards dynamic systems, such as [HASH](https://hash.ai/). + +### Multi-modal Knowledge Graphs + +While many KGs primarily store structured data (entities and relationships with attributes), some incorporate multiple data modalities. A *multi-modal knowledge graph* might link text documents, images, or other media to the entities they mention. For example, a news knowledge graph could have nodes for people and events and link to news articles (text) or photographs (images) that pertain to those nodes. In healthcare, a multi-modal KG might connect a medical image (like an X-ray) to the patient, the diagnosis, and the report text. The benefit is a richer knowledge repository that not only stores abstract facts but also ties in the source content. This is increasingly relevant as organizations seek to combine unstructured data (like documents) with structured knowledge. However, it raises complexity in storage and retrieval – often requiring specialized indexing (for text) or embeddings for images. Still, the graph serves as the glue connecting all data types. HASH natively supports data of any arbitrary type, and allows users to infinitely extend it with [custom data types](/guide/types/data-types#custom-data-types), including files of any kind. + +# Use Cases for Knowledge Graphs + +Knowledge graphs deliver significant benefits in a variety of fields and applications – from life sciences to finance to manufacturing. By providing a flexible, connected data layer, they help break down data silos, surface hidden relationships, and enable more informed decisions. In this section, we explore several industry-agnostic use cases and examples of how different types of knowledge graphs are being used commercially, the benefits they offer, and the business value they contribute: demonstrating how KGs can improve decision-making or operational efficiency in practice. + +## Healthcare and Life Sciences + +In **healthcare**, the volume and variety of data are enormous – electronic health records, lab results, medical imaging, genomics, drug databases, research publications, clinical guidelines, etc. These datasets often exist in isolation, making it difficult to get a comprehensive view of a patient or to discover insights across studies. Knowledge graphs can integrate and harmonize this fragmented data into a single connected structure. [Learn about knowledge graphs in healthcare and life sciences >](/tutorials/knowledge-graphs-in-healthcare-and-life-sciences) + +## Supply Chain Management + +Modern supply chains are highly complex networks involving suppliers, manufacturers, logistics providers, distributors, and retailers. Managing a supply chain means dealing with vast amounts of data: parts and products, bills of materials, shipment records, inventory levels, contracts, locations, and more. A small disruption at one supplier can cascade through a network and affect the final product delivery. Knowledge graphs have emerged as a game-changing solution for supply chain visibility and intelligence, because they naturally model the network structure of supply relationships and can flexibly accommodate new data sources. [Learn about knowledge graphs in supply chain management and logistics >](/tutorials/knowledge-graphs-in-supply-chain-management) + +## Finance and Banking + +The finance industry was one of the early adopters of graph techniques for tasks like fraud detection and risk management. Financial data naturally forms networks: think of banks with customers, accounts, transactions, devices, merchants, etc., all interlinked. Fraudulent activities often hide in those connections (for example, a ring of bank accounts funneling money between them, or a set of credit card transactions linked by common device or location that indicates identity theft). Knowledge graphs offer a way to model and analyze these complex relationships, going beyond what traditional transaction monitoring systems can do. [Learn about knowledge graphs in finance >](/tutorials/knowledge-graphs-in-finance) + +## Retail and E-Commerce + +Retailers and e-commerce platforms deal with diverse data about products, customers, and their interactions. Knowledge graphs are helping these companies better organize their knowledge about products and better serve their customers through recommendations and improved search. [Learn about knowledge graphs in retail and e-commerce >](/tutorials/knowledge-graphs-in-retail-and-ecommerce) + +## Enterprise Knowledge Management and Search + +Outside of specific verticals, one of the broadest applications of knowledge graphs is in **enterprise knowledge management** – helping organizations make sense of their own data across departments. In many companies, information is scattered in various databases, documents, spreadsheets, and applications (commonly referred to as data silos). Knowledge graphs can integrate these diverse sources into a single connected knowledge hub that reflects the business’s information landscape. [Learn about knowledge graphs in enterprise knowledge management >](/tutorials/knowledge-graphs-in-enterprise-knowledge-management) + +# Benefits of Knowledge Graphs in Summary + +From the various commercial examples linked above, we can distill some key benefits of knowledge graphs that apply across industries and use cases: + +- **Unified data integration:** Knowledge graphs excel at integrating data from disparate sources into a single, connected model. They provide a flexible schema that can evolve, making it easier to combine siloed data than traditional rigid databases. This means organizations can break down silos and have all their information contextually linked, ready to be queried. The immediate benefit is eliminating the time spent manually reconciling data from different systems – the KG does that for you in its structure. For example, linking customer records across sales, support, and billing databases yields a unified customer view. +- **Enhanced search and discovery:** By adding a semantic layer, KGs improve search results and enable discovery of relationships. Instead of isolated keyword hits, users get a network of related information. In platforms like [HASH](https://hash.ai/) which “vectorize” information, producing “embeddings” as data enters and is modified, this enables semantic search, allowing the system to understand users’ intent behind queries and fetching relevant answers more precisely. It also allows exploratory analysis – users can traverse the graph (“explore neighbors”) to uncover information they might not have known to query explicitly. This leads to insights that wouldn’t surface in a siloed environment. +- **Contextual awareness for AI and analytics:** Many AI algorithms treat data as isolated feature vectors, but knowledge graphs add **context** to data points. Entities in a KG carry their connections and attributes which can be used to enrich machine learning models (often improving accuracy with additional features). The graph structure also supports reasoning and inferencing, which can derive new implicit knowledge (aiding analytics with more complete data). Moreover, as discussed, KGs can supply trusted knowledge to AI systems (like LLMs) to improve their factual accuracy. All this results in smarter systems that require less data to train (since the knowledge fills in some gaps that data would otherwise have to try to learn). +- **Holistic 360° views:** Whether it’s of a customer, a product, a patient, or a supplier, knowledge graphs inherently provide a *360-degree view* by connecting all relevant information around that entity. This holistic perspective is crucial for decision-making. For example, a 360° customer view helps a company personalize service; a 360° supplier view helps a manufacturer ensure supply continuity. The benefit is that decisions or analyses based on the KG consider the full context, leading to better outcomes than partial information. +- **Speed and flexibility in querying relationships:** Many business questions boil down to graph problems (e.g., finding paths, neighborhoods, subgraphs). KGs allow such questions to be answered quickly and flexibly. Graph databases can retrieve complex relationship-based answers in milliseconds even on large datasets. This enables real-time analytics and interactive data exploration, which were previously impractical. The agility of adding new node or edge types without major rework means the knowledge graph keeps up with changing business needs. +- **Improved decision-making and insights:** Ultimately, the above technical benefits translate to **i**mproved decision-making. With knowledge graphs, organizations report faster access to insights and an ability to make connections in data that were previously missed. This can lead to innovations (finding new opportunities or optimizations) and evidence-based decisions backed by the comprehensive knowledge in the graph. For instance, seeing a hidden correlation between two processes in an enterprise KG might prompt a business process change that improves efficiency. In short, KGs turn data into an asset for strategic analysis. +- **Human-machine collaboration:** Knowledge graphs serve as a bridge between human knowledge and machine data. They can be visualized and navigated in a way that is intuitive to domain experts (a graph of concepts that experts recognize), making it easier for humans to trust and interact with the system. At the same time, they are structured enough for machines to parse. This duality means KGs can facilitate better collaboration: analysts can pose high-level questions, the system (machine) computes using the KG, and the results can be interpreted by humans with the help of the graph’s context. This loop accelerates tasks like investigations, root cause analyses, and brainstorming, by keeping the human informed and in control with the machine doing heavy lifting in the background. + +Of course, it’s important to acknowledge that these benefits are realized when the knowledge graph is well-designed and maintained. Without proper utilities to automate (or aide humans in) their maintenance, KGs can become messy webs, hard to use, and out-of-date. HASH’s versioned type system, two-way data synchronization, and intuitive user interface for domain experts helps overcome this historical burden associated with knowledge graphs. + +# Challenges and Limitations + +While knowledge graphs offer significant advantages, building and using them is not without challenges. Organizations considering a knowledge graph initiative should be aware of potential limitations: + +- **Data Integration and quality:** Integrating heterogeneous data into a graph is a major undertaking. Data may be incomplete, inconsistent, or error-prone. Ensuring high data quality in the KG is crucial – this involves entity resolution (figuring out when two records refer to the same entity), cleaning errors, and aligning schemas/ontologies. If not carefully managed, a knowledge graph can inherit the “garbage in, garbage out” problem. In fact, *quality assessment* and *cleansing* of knowledge graphs is an active area of research. Automated tools can help, but human oversight is often needed to validate that the graph’s knowledge is correct. Incomplete data is also a challenge: the graph might not have all facts, and one must be careful in analysis to not assume absence of a link means absence of a relationship in reality – it might just mean the data wasn’t captured. Techniques like knowledge graph completion (predicting missing links) exist, but again require caution and validation. +- **Scalability:** As knowledge graphs grow in size and detail, **scalability** becomes an issue. Storing and querying billions of triples or property graph edges can tax systems if not designed properly. Graph databases have made great strides in scaling horizontally and using efficient indexes, but very large graphs might still face performance issues for complex queries. Additionally, maintaining real-time updates in a huge graph (e.g., one that ingests a firehose of streaming data) is non-trivial. Organizations need to ensure they have the right infrastructure and possibly distributed graph processing capabilities for their scale. Without it, they could end up with slow query responses or inability to handle peak loads. That said, many modern graph solutions are handling enterprise-scale knowledge graphs, and cloud-based graph services can elastically scale resources. It just requires planning and sometimes significant investment. +- **Lack of awareness and expertise:** On the people side, one common challenge is that business stakeholders often don’t initially understand what a knowledge graph is or what value it brings. Unlike more established technologies, KGs may require some evangelizing within an organization. This lack of awareness can hinder getting executive buy-in or cross-departmental support. Moreover, building knowledge graphs requires a skill set that blends data engineering, semantic modeling, and sometimes graph theory – skills that are not widespread. Companies also often find it hard to hire or train engineers and data scientists with knowledge graph experience. There’s also the ambiguity of multiple technology stacks (RDF vs LPG, various databases and tools), which can confuse newcomers. Without clear standards, each team might approach building a KG differently, making it difficult to consolidate efforts. All this can slow down adoption. +- **Implementation effort and cost:** Constructing a robust knowledge graph is typically a significant project. It may require months (or more) to gather and prepare data, define ontologies or schema, and iterate on building the graph. It’s noted that developing rich KGs has historically demanded great time and manpower, with upfront costs deterring adoption, especially if returns on investment aren’t obvious early on. Stakeholders might ask: is it worth it? In some cases (like PrimeKG for drug repurposing), the payoff was finding new therapeutic insights that justified the effort. But in other cases, the ROI might be indirect or long-term. Therefore, organizations must plan for a long-term commitment – the value of a KG often accrues over time as it gets enriched and as more use cases start leveraging it. One strategy to mitigate this is to start with a smaller knowledge graph pilot for a specific high-impact use case (say, fraud detection or a recommendation system), show success, and then expand. Thankfully, the upfront costs of creating knowledge graphs have recently been massively reduced by HASH, utilizing AI to create a semantic web of knowledge exceptionally quickly, with minimal human effort or time required. +- **Maintenance and governance:** A knowledge graph is not a one-and-done dataset; it’s a living knowledge base. This raises the challenge of ongoing **maintenance**: as the world changes or as the business evolves, the KG must be updated. New entities appear, relationships change or become obsolete. Without continuous updates, the KG can quickly become stale and less useful. Governance is needed to manage contributions to the graph (especially if crowd-sourced internally) and to decide how to handle conflicting information. For example, if two data sources give different values for a property (say, two different birthdates for a customer due to entry error), the governance process defines how the conflict is resolved in the graph. Additionally, versioning of the graph knowledge might be required for auditing (knowing what the graph contained at a past date). All this adds overhead. Organizations should treat a knowledge graph as a strategic asset that requires stewardship – typically assigning data curators or a knowledge architect to oversee it. HASH’s in-built utilities both automate much of this graph maintenance, and make it easy for domain experts to dive in (inspecting the integrity of data, updating types, and modifying mappings as required) through an easy-to-use visual interface. +- **Privacy and security:** Because knowledge graphs by design connect *everything*, they can raise **privacy concerns** when the data involves personal or sensitive information. A KG might inadvertently reveal insights that violate privacy if not properly controlled (for example, linking data in a way that re-identifies an anonymized individual). Ensuring compliance with privacy laws (like GDPR) is critical – this could involve not including certain data in the KG, anonymizing nodes, or implementing strict access controls so only authorized queries can see certain parts of the graph. Unauthorized access to the knowledge graph could be more damaging than access to any single source because the KG reveals the connections and full picture. Therefore, security measures like encryption, authentication, and audit logging are important when deploying enterprise knowledge graphs. HASH addresses these privacy controls with granular in-built privacy controls and a secure-by-design architecture, providing cell-level security. +- **Initial skepticism and adoption hurdles:** Tying to the awareness point, there might be internal skepticism – “Is this just hype?”, “We already have a data warehouse, why do we need a knowledge graph?”. Convincing stakeholders often requires demonstrating concrete use cases where the graph clearly outperforms existing solutions. The relative newness of the technology can make conservative IT departments hesitant. Additionally, integrating a knowledge graph solution into existing infrastructure (to feed it data and to consume its output in applications) can face resistance simply because it’s a change. Overcoming these hurdles involves clear communication of benefits, training, and often running the KG in parallel with legacy systems until trust is earned. While this traditionally required significant upfront investment, HASH’s AI entity/type inference technology shrinks the time and cost of developing proof-of-concept knowledge graphs to \~hours, allowing solutions to be bootstrapped, tested and demonstrated quickly and affordably. + +Despite these challenges, the trajectory in many organizations is that once a knowledge graph starts proving its worth (even in a limited scope), adoption grows. Best practices, as highlighted by practitioners, include starting with a well-scoped project, focusing on data where relationships matter a lot, engaging both business and technical experts in designing the KG (so it fits the actual needs), and iterative development – building the graph incrementally and showing value at each stage. Also, using hybrid teams of domain experts helps ensure the knowledge graph’s structure truly represents the reality of the business domain. HASH’s visual user interfaces provide an easy means for users to input into as well as use data, making it ideally suited for constructing and managing organizations’ knowledge graphs. + +# Conclusion + +## Making data useful + +Knowledge graphs help organizations leverage data for commercial value. By organizing disparate information into a network of knowledge, they allow businesses to **unlock connections** in their data that were previously hidden or hard to exploit. Across industries – from healthcare discovering new treatment insights, to supply chain managers gaining real-time visibility, to financial institutions catching fraud, to retailers delighting customers with spot-on recommendations – knowledge graphs have demonstrated tangible benefits. They enable faster insights and better-informed decisions by ensuring all relevant information is connected and readily accessible. In operational terms, this means greater efficiency: teams spend less time gathering data and more time acting on it, processes become smarter and more automated, and systems can proactively support users with context-aware intelligence. + +## Useful across industries and functions + +The commercial applications we explored also show how **industry-agnostic** the impact of KGs can be. Any scenario where understanding relationships between data points is key can likely benefit from a knowledge graph. They contribute to business value by enhancing everything from strategic planning (through improved analytics and forecasting) to day-to-day operations (through quicker search and issue resolution). For example, a supply chain knowledge graph not only helps avoid disruptions (ensuring continuity and revenue), but also can optimize operations to be more cost-effective. A customer knowledge graph can increase sales and loyalty through personalization. In knowledge-driven fields, a KG becomes a backbone for innovation, enabling new services (like advanced question-answering systems in customer support or clinical decision support tools in medicine). + +## Accreting value over time + +It is important to approach knowledge graphs with the understanding that their value typically grows over time. Early on, they might solve a specific problem, but as they accumulate more knowledge and usage, they can become a kind of **knowledge platform** for the organization, supporting numerous applications (search, BI, AI, etc.) simultaneously. This compounding value can far exceed the initial expectations – effectively turning data into an **enterprise-wide asset for intelligence** rather than a byproduct of operations. “Tech” companies like Google, Amazon, Microsoft, and Facebook have shown this at web-scale, and now many others are following suit at enterprise scale, often noting that knowledge graphs quickly go from “nice-to-have” to **must-have** for competitive advantage in the era of AI. + +## Agility through common human- and machine-readability + +Knowledge graphs bring the **richness of human-like understanding** to our data systems – treating data not just as rows in tables but as interlinked knowledge. Businesses that successfully implement knowledge graphs can expect improved decision-making, more efficient operations, and the ability to discover insights that fuel innovation. There are challenges to overcome in building and maintaining a KG, but with careful planning and the right expertise, the payoff is a **smarter, more agile organization** that can leverage its collective knowledge to the fullest. In a world increasingly driven by data and AI, knowledge graphs provide the contextual glue that can turn information overload into actionable intelligence, making them a cornerstone of modern enterprise strategy. From 0de5f72c0248dd189086d67169a37baa65685619 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 15:16:36 +0000 Subject: [PATCH 12/28] Create knowledge-graphs-in-healthcare-and-life-sciences.mdx --- ...graphs-in-healthcare-and-life-sciences.mdx | 32 +++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 apps/hashdotai/tutorials/knowledge-graphs-in-healthcare-and-life-sciences.mdx diff --git a/apps/hashdotai/tutorials/knowledge-graphs-in-healthcare-and-life-sciences.mdx b/apps/hashdotai/tutorials/knowledge-graphs-in-healthcare-and-life-sciences.mdx new file mode 100644 index 00000000000..ec452259199 --- /dev/null +++ b/apps/hashdotai/tutorials/knowledge-graphs-in-healthcare-and-life-sciences.mdx @@ -0,0 +1,32 @@ +--- +title: Knowledge Graphs in Healthcare & Life Sciences +description: The value of connected knowledge across healthcare and life sciences +slug: knowledge-graphs-in-healthcare-and-life-sciences +tags: ["Use Case"] +--- + +In **healthcare**, the volume and variety of data are enormous – electronic health records, lab results, medical imaging, genomics, drug databases, research publications, clinical guidelines, etc. These datasets often exist in isolation, making it difficult to get a comprehensive view of a patient or to discover insights across studies. Knowledge graphs (KGs) can integrate and harmonize this fragmented data into a single connected structure. + +# Applications + +## Patient Care & Clinical Decision Support + +For example, imagine a hospital that collects data on patient records, diagnoses, treatments, medications, and lab results. Using a knowledge graph, the hospital can create a network that links all these pieces. It can connect a patient’s medical **history** with their current **symptoms**, recent **lab test** results, and prescribed **treatments**, all as interconnected nodes and relationships. With this unified view, the hospital’s systems (or caregivers) can infer valuable insights: identifying patterns in patient conditions and treatments to improve diagnostic accuracy, recommending personalized treatment plans tailored to an individual’s specific combination of conditions, and analyzing the effectiveness of different medications based on patient outcomes). In short, a healthcare knowledge graph enables a true **360-degree view of patient care**, where every relevant piece of information is connected. This leads to better clinical decision support – for instance, alerting doctors to potential drug interactions by seeing that two medications a patient is on are linked in the graph as having an adverse interaction. + +## Medical Research & Development + +Beyond direct patient care, knowledge graphs are proving invaluable in medical research and drug discovery. Researchers build specialized biomedical KGs that connect diseases to genes, proteins, drugs, phenotypes, and more. By mining these connections with AI algorithms, they can generate hypotheses for drug repurposing or identify new targets for therapy. A notable example is **PrimeKG**, a comprehensive disease-centric biomedical knowledge graph assembled from many primary data sources (including disease–gene associations, drug–target databases, and medical ontologies). PrimeKG enabled sophisticated analytics that would have been impossible otherwise – for instance, in one study it helped researchers identify 11 existing drugs that [could potentially be repurposed](https://www.mayoclinicplatform.org/2023/12/21/knowledge-graphs-can-move-healthcare-into-the-future/#:~:text=Since%20developing%20KGs%20of%20this,drugs%20that%20could%20be%20repurposed) to treat other conditions, out of a set of 40 recently FDA-approved drugs. This kind of outcome highlights how connecting disparate biomedical knowledge into a graph can directly fuel innovation and discovery. + +## Predictive Modeling & Explainability + +Another emerging application is using KGs to tap into the latent knowledge in **electronic health records (EHRs)**. EHRs contain a wealth of patient information, but much of it is unstructured text (doctors’ notes) and hard to analyze at scale. By building a knowledge graph from EHR data – extracting key entities like conditions, procedures, medications and linking them – hospitals can start to detect patterns that help in predictive modeling. For instance, a KG built from EHRs can help predict patient risks by recognizing a constellation of past events (medical history) that collectively increase the risk of a certain outcome. This can feed into machine learning models for risk scoring or clinical decision support, making those models more **explainable** because the graph provides a reasoning trail (“Patient X is at high risk because they have condition A, which is related to factor B and C in the knowledge graph, which in past data led to outcome Z”). + +# About Healthcare KGs + +## Technical implementation + +It’s worth noting that many healthcare KGs rely on **semantic graph** approaches (RDF/OWL) because of the need for standardized terminologies (like [SNOMED CT](https://digital.nhs.uk/services/terminology-and-classifications/snomed-ct) for clinical terms, or various disease ontologies). This allows integration of data from research and clinic using common identifiers. For example, linking a clinical diagnosis to a research knowledge base requires that both use the same ontology for diseases. Semantic interoperability is crucial in healthcare, and KGs provide the framework to achieve it. Mayo Clinic, for instance, is actively exploring knowledge graphs to unify genomics, proteomics, clinical data and more, aiming to enable personalized medicine; they emphasize that understanding connections between diseases, genes, drugs, and phenotypes could open doors for research on disease mechanisms, drug repurposing, and combination therapies. + +## Business value + +Both the commercial and clinical benefits of knowledge graphs in healthcare are significant. They can improve patient outcomes by ensuring decisions are informed by *all* relevant data (no more data hidden in separate silos). They can drive efficiencies by automating data integration tasks and reducing manual search – a doctor or a researcher can query the knowledge graph and quickly retrieve connected information that would have taken hours of cross-referencing otherwise. They also enable advanced analytics (like finding similar patient profiles for research or trials). Ultimately, this leads to better care, faster research breakthroughs, and potentially cost savings by catching issues (like adverse drug interactions or unnecessary tests) early through the connected knowledge. The main challenges in healthcare KGs are ensuring data privacy and security (patient data is sensitive), and the significant effort required to curate and maintain high-quality medical knowledge graphs. However, as the healthcare sector embraces AI, knowledge graphs are increasingly seen as a key to providing the *context* and *domain knowledge* that purely data-driven algorithms lack, thus moving healthcare into a more intelligent, knowledge-driven future. From 0f03118fcd68fa3b112d138bdf6b74c14f455367 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 15:18:27 +0000 Subject: [PATCH 13/28] Create knowledge-graphs-in-supply-chain-management.mdx --- ...edge-graphs-in-supply-chain-management.mdx | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 apps/hashdotai/tutorials/knowledge-graphs-in-supply-chain-management.mdx diff --git a/apps/hashdotai/tutorials/knowledge-graphs-in-supply-chain-management.mdx b/apps/hashdotai/tutorials/knowledge-graphs-in-supply-chain-management.mdx new file mode 100644 index 00000000000..99e6b10d2a1 --- /dev/null +++ b/apps/hashdotai/tutorials/knowledge-graphs-in-supply-chain-management.mdx @@ -0,0 +1,26 @@ +--- +title: Knowledge Graphs in Supply Chain Management +description: The value of connected knowledge across SCM and logistics +slug: knowledge-graphs-in-supply-chain-management +tags: ["Use Case"] +--- + +Modern supply chains are highly complex networks involving suppliers, manufacturers, logistics providers, distributors, and retailers. Managing a supply chain means dealing with vast amounts of data: parts and products, bills of materials, shipment records, inventory levels, contracts, locations, and more. A small disruption at one supplier can cascade through a network and affect the final product delivery. Knowledge graphs have emerged as a game-changing solution for supply chain visibility and intelligence, because they naturally model the network structure of supply relationships and can flexibly accommodate new data sources. + +# Applications + +A supply chain knowledge graph typically represents entities like **Suppliers**, **Manufacturers**, **Facilities**, **Products/Parts**, **Shipments**, **Orders**, etc., and the relationships between them (e.g., *Supplier A supplies Part X*, *Part X is used in Product Y*, *Order Z consists of Products Y and Q*, *Warehouse W stores Product Y* and so on). + +By consolidating data from many systems (procurement, ERP, logistics tracking, etc.) into a graph, organizations get a **unified view** of the supply network. This helps answer complex questions. For instance: “Which of our products will be affected if Supplier A in region B cannot deliver Part X next month?” – the knowledge graph would let you traverse from that supplier node through the parts, to all products and orders that depend on it, something very cumbersome to do with siloed spreadsheets or relational databases. + +By streaming data in real-time from different software packages, data platforms and ERP systems, HASH can help process large volumes of supplier data, transforming disconnected information into a knowledge graph which helps connect the dots and bridge gaps between users’ business vocabularies and their data. These KGs are interlinked sets of facts about suppliers, materials, and other entities in a format that is **both human- and machine-understandable**, meaning both non-technical supply chain managers and data scientists can navigate and query the same knowledge network to get insights. This allows users to visualize supplier interdependencies and run sophisticated queries (like multi-tier supplier risk analysis) quickly, speeding up supplier discovery processes and saving time. In an environment where delays or bottlenecks can cost millions, shrinking processes from weeks or months to days or hours provides a huge competitive advantage. + +Knowledge graphs around supply chains can support **real-time decision-making** as well. Because graph databases are able to handle complex queries with many hops efficiently, supply chain managers (and AI agents) can ask questions on the fly as situations evolve. For example: finding alternate sources for a component when a disruption occurs, identifying single points of failure in the network (nodes with no redundancy), or analyzing the impact of a geo-political event on suppliers in a certain region. Traditional relational databases struggle with these kinds of queries across many tables and relationships (in technical terms, because `JOIN`s across multiple tables become a performance bottleneck), whereas a graph structure allows relationships to be traversed directly. The flexibility of the graph model also means as the supply chain changes (new suppliers, new product lines, etc.), the data model can adapt without redesigning the entire schema – you simply add new nodes/edges or new entity/property/data types as needed. + +# Benefits + +For supply chain and logistics, the value of knowledge graphs lies in **visibility and resilience**. They provide a digital twin of the supply network, where everything is connected and traceable. This helps companies proactively manage risks (spotting potential issues up the chain), optimize inventory (by understanding the network flow better), and respond faster to changes (since querying the graph can reveal quick solutions, like alternate routes or suppliers). The improved efficiency (time saved in analysis) directly translates to cost savings and better service levels (meeting production deadlines, avoiding stockouts). Furthermore, a supply chain KG can enhance collaboration: different teams (procurement, manufacturing, sales) can all reference the same knowledge graph to ensure they have consistent information about the state of the supply chain. + +One challenge in this domain is data integration – supply chain data comes from many formats and systems, so building the KG requires careful ETL (extract/transform/load) and entity resolution (e.g., recognizing that “ACME Inc.” in one system is the same as “Acme Corporation” in another). [HASH](https://hash.ai/) handles this through its **multi-tenant type system**, high-quality catalog of **pre-built integrations** for many popular services, and a dedicated implementation team who seek to understand organizations’ business language (their “ontologies”) and data, as well as needs, helping them accurately map and sync information with their graph. + +Once up and running the knowledge graph ([HASH web](/guide/webs)) becomes a living asset that continuously ingests updates (shipment status changes, new suppliers, etc.), serving as a **source of truth** for how everything in the supply chain is connected. The agility gained helps businesses avoid or mitigate disruptions that would have been much harder to detect without a connected knowledge perspective: providing the difference between “delivery, delay, or worse” in times of crisis. From ce177a9391583ad3b068b80f3900ceb96c188c1d Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 15:22:02 +0000 Subject: [PATCH 14/28] Create knowledge-graphs-in-finance.mdx --- .../tutorials/knowledge-graphs-in-finance.mdx | 35 +++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 apps/hashdotai/tutorials/knowledge-graphs-in-finance.mdx diff --git a/apps/hashdotai/tutorials/knowledge-graphs-in-finance.mdx b/apps/hashdotai/tutorials/knowledge-graphs-in-finance.mdx new file mode 100644 index 00000000000..a324e6e6a3a --- /dev/null +++ b/apps/hashdotai/tutorials/knowledge-graphs-in-finance.mdx @@ -0,0 +1,35 @@ +--- +title: Knowledge Graphs in Finance +description: The value of connected knowledge across finance +slug: knowledge-graphs-in-finance +tags: ["Use Case"] +--- + +# Knowledge Graphs in Finance + +The finance industry was one of the early adopters of graph techniques for tasks like fraud detection and risk management. Financial data naturally forms networks: think of banks with customers, accounts, transactions, devices, merchants, etc., all interlinked. Fraudulent activities often hide in those connections (for example, a ring of bank accounts funneling money between them, or a set of credit card transactions linked by common device or location that indicates identity theft). Knowledge graphs offer a way to model and analyze these complex relationships, going beyond what traditional transaction monitoring systems can do. + +# Applications + +## Fraud detection and compliance + +Banks and financial institutions use knowledge graphs to monitor the flow of money across their customer base and to detect anomalies that might indicate fraud or money laundering. By linking entities like individuals, companies, accounts, transfers, and external data (e.g. watchlists of bad actors), a graph can reveal suspicious patterns – such as a set of accounts connected by shared phone numbers or addresses forming an unexpected cluster, or circular money movements that resemble laundering schemes. A relational database might flag individual suspicious transactions, but a knowledge graph can reveal the **network pattern** of transactions. This has made KGs a valuable tool in **AML (Anti-Money Laundering)** and **KYC (Know Your Customer)** initiatives. For example, a bank could use a KG to ensure it has a full picture of a customer’s relationships: if several customers share a business entity or if funds are moving between them in complex ways, the graph brings that to light, helping compliance officers investigate potential shell companies or fraud rings. + +One notable advantage is speed and efficiency in identifying complex fraud patterns that would otherwise remain hidden. Graph analysis techniques like link analysis and community detection can uncover structures (like rings, chains, hubs) in the transaction network that signal fraud. As *Communications of the ACM* [reports](https://cacm.acm.org/blogcacm/leveraging-graph-databases-for-fraud-detection-in-financial-systems/#:~:text=Graph%20databases%20reveal%20patterns%20and,fraud%20faster%20and%20more%20efficiently), graph databases “reveal patterns and relationships that would otherwise be hidden, allowing financial institutions to detect fraud faster and more efficiently”. In one scenario, an insurance company might use a knowledge graph to detect fraudulent claims by linking claimants, accidents, vehicles, repair shops, and adjusters – exposing if the same phone number or address is being used across many claims, or if a ring of people are coordinating fake accidents. Traditional fraud rules might not catch that if each claim alone looks ordinary, but the knowledge graph can connect the dots. + +## Risk analysis and knowledge management + +Beyond fraud, financial firms use knowledge graphs for broader knowledge management and decision support. An **enterprise knowledge graph** at a bank might connect internal data (like org structure, projects, documents, client data) to support an internal search or expertise location system. In investment banking or asset management, a KG can integrate market data, company profiles, news sentiment, and relationships (e.g., company A is a supplier for company B, or executive X sits on the board of Y and Z) – which can be invaluable for risk assessment or identifying indirect exposures. For example, if a geopolitical event happens, a knowledge graph might quickly show a bank which of its portfolio companies are related to that region or to affected commodities through a chain of relationships, thus informing strategy. + +## Customer service and personalization + +Banks also have started to use knowledge graphs to get a 360-view of customers (similar to customer 360 in other industries). By linking accounts, transaction history, customer interactions, and possibly social media or demographic data, banks can personalize services or detect if a customer might need a certain product. While this is akin to a recommendation use case, it’s worth noting that the graph can help maintain consistency in how a customer is treated across departments by having all their data linked. + +# Benefits + +For finance, the business case for KGs often comes down to **risk reduction and increased compliance efficiency** (catching bad actors sooner, avoiding regulatory fines) and **improved customer insights** (leading to better service or cross-sell opportunities). + +* **Fraud and AML** investigators can work more effectively using graph visualizations of suspicious networks, rather than poring over tables of transactions. J.P. Morgan, HSBC, and other major banks have publicly discussed their use of knowledge graphs/graph databases for these purposes, noting significant improvements in detecting complex fraud patterns. +* Another benefit is in **regulatory reporting** – banks need to aggregate data for regulations like Basel III or GDPR, and a knowledge graph can help trace the lineage of data and how different entities are related when assembling reports (e.g., finding all data related to a particular customer across systems can be done via the graph). + +The challenges here include the **scale** of the data (millions of nodes/edges representing accounts and transactions require robust graph processing capabilities) and data privacy (financial data is sensitive, so the KG must have proper access controls). Also, integrating a graph solution into legacy bank IT systems can be non-trivial. Nonetheless, the ability of KGs to **fuse structured and unstructured data** (like combining transaction logs with text reports or alerts) and to provide an explainable link analysis makes them increasingly indispensable in financial cybersecurity and intelligence operations. From 92bc2af1828bc1e07383dd6acd06a8775aab1180 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 15:26:35 +0000 Subject: [PATCH 15/28] Create knowledge-graphs-in-retail-and-ecommerce.mdx --- ...owledge-graphs-in-retail-and-ecommerce.mdx | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 apps/hashdotai/tutorials/knowledge-graphs-in-retail-and-ecommerce.mdx diff --git a/apps/hashdotai/tutorials/knowledge-graphs-in-retail-and-ecommerce.mdx b/apps/hashdotai/tutorials/knowledge-graphs-in-retail-and-ecommerce.mdx new file mode 100644 index 00000000000..6148d1ed525 --- /dev/null +++ b/apps/hashdotai/tutorials/knowledge-graphs-in-retail-and-ecommerce.mdx @@ -0,0 +1,36 @@ +--- +title: Knowledge Graphs in Retail & E-Commerce +description: The value of connected knowledge when selling +slug: knowledge-graphs-in-retail-and-ecommerce +tags: ["Use Case"] +--- + +# Knowledge Graphs in Retail & eCommerce + +Retailers and e-commerce platforms deal with diverse data about products, customers, and their interactions. Knowledge graphs are helping these companies better organize their knowledge about products and better serve their customers through recommendations and improved search. + +# Applications + +## Product recommendation and personalization + +Companies like Amazon, eBay, and Netflix use knowledge graph concepts to enhance their recommendation engines. Netflix, for instance, famously utilizes knowledge graphs to underpin its ‘spookily accurate’ content recommendation system, linking movies, shows, actors, genres, user profiles, and viewing history. By creating a web of connections – e.g., a user is linked to the shows they watched, those shows are linked to genres and actors, which link to other shows – Netflix can go beyond simple collaborative filtering. The knowledge graph allows Netflix to find, say, that a user likes *“crime drama series with a strong female lead”* even if that’s not an explicit category, because the graph connects those concepts through various nodes (perhaps the user liked **Mindhunter** and **Broadchurch**, which share certain thematic tags or crew, thus the graph can suggest another show with overlapping elements). These associations help anticipate what the customer might like to watch next. The result is a more accurate and *explainable* recommendation: the system could even surface *why* it recommended something (e.g., “because you watched X” and there’s a path in the graph from X to the recommended item). + +In e-commerce, a knowledge graph of products can capture relationships like *“bought together”*, *“is variant of”*, *“compatible with”*, or *“replacement for”*. For example, an online electronics retailer might have a graph linking a camera to its compatible lenses, accessories, and successor models. If a shopper looks at that camera, the system can easily find all related items and either recommend them or ensure search results include those connections. This improves cross-selling and upselling opportunities. eBay has been building a knowledge graph that maps relationships between products and the way users interact with them. This can help with search relevance (understanding that a search for “Apple phone 2019 model” should match an iPhone 11, for instance, which is knowledge encoded in the graph as an alias or attribute of that product). + +## Customer insights and marketing + +Retailers also use knowledge graphs to better understand customer behavior. By linking customer profiles with their purchase history, browsing history, customer service interactions, and even social media feedback, a retailer can query the graph to answer questions like “what other products are my VIP customers likely to buy?” or “which segments of customers are related to this new product line’s target market?”. One benefit is in marketing: a knowledge graph can reveal non-obvious connections that inform more targeted campaigns. For instance, if a fashion retailer’s KG shows that people who buy a certain sneaker brand also tend to buy a specific style of jacket, marketing can use that insight to bundle promotions or tailor advertisements. By understanding customer behavior better through a KG, marketers can create more powerful, targeted campaigns and even identify product design improvements from feedback relationships. + +## Search and discovery + +Many e-commerce sites implement **semantic search** backed by a knowledge graph. Instead of purely keyword-based search, the graph helps interpret the intent. If a user searches for “summer cocktail dress under $100”, a knowledge graph-enhanced search can understand “cocktail dress” is a category and “summer” implies certain styles or materials, and then filter by price, returning a much more precise set of results. The KG might encode that *Cocktail Dress* is a type of *Dress*, which has attributes like seasonality and price. This is similar to how Google’s semantic search works, where the KG helps understand entities and relationships behind the query. On a retailer’s site, semantic search can significantly enhance user experience by allowing more natural queries and by showing rich information (for example, typing a product name might show a quick info panel of the product specs – powered by the KG – before the user even clicks anything). + +## Inventory and supply optimization + +Retail and supply chain overlap; a retailer might also use a knowledge graph for managing store inventories, connecting products to warehouses to shipments to stores. This is more on the operational side, but it’s worth noting as a use case – some large retailers treat their operations as a graph problem to optimize routing and restocking. + +# Benefits + +For retail and e-commerce, knowledge graphs drive **revenue growth** and **customer satisfaction**. Better recommendations and search lead to more sales and happier customers who find what they need. Graph-based personalization often outperforms simpler methods because it can leverage a wider context. Also, a knowledge graph provides a **unified knowledge base** of products and customer relationships that can be reused across applications (recommendation, search, analytics), reducing duplication and ensuring consistency in information (e.g., if a product is discontinued, updating the KG ensures all downstream apps “know” about it). Retailers also benefit from the agility – as product lines change or new trends emerge (say a new category like “smart home devices”), adding that concept to the graph and linking products to it can be faster than reworking a rigid database schema. + +A challenge in retail KGs is data volume (for a large catalog and user base, the graph gets huge) and keeping it updated in real-time with new user interactions. But modern graph databases and streaming ingestion techniques have been scaling to meet these needs. Another challenge is making sure the graph is cleaned and normalized – product data especially can be messy (different naming conventions, etc.), so building a high-quality product knowledge graph might involve significant data cleansing and consolidation effort. Many retailers find the investment worth it, as the KG becomes a cornerstone for AI-driven features (recommendations, chatbots, etc.) that are essential in today’s competitive marketplace. From 03e16ea18c66f996ec608662f12a66f545d1c925 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 15:27:25 +0000 Subject: [PATCH 16/28] Update knowledge-graphs-in-finance.mdx --- apps/hashdotai/tutorials/knowledge-graphs-in-finance.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/apps/hashdotai/tutorials/knowledge-graphs-in-finance.mdx b/apps/hashdotai/tutorials/knowledge-graphs-in-finance.mdx index a324e6e6a3a..d15f613db83 100644 --- a/apps/hashdotai/tutorials/knowledge-graphs-in-finance.mdx +++ b/apps/hashdotai/tutorials/knowledge-graphs-in-finance.mdx @@ -5,8 +5,6 @@ slug: knowledge-graphs-in-finance tags: ["Use Case"] --- -# Knowledge Graphs in Finance - The finance industry was one of the early adopters of graph techniques for tasks like fraud detection and risk management. Financial data naturally forms networks: think of banks with customers, accounts, transactions, devices, merchants, etc., all interlinked. Fraudulent activities often hide in those connections (for example, a ring of bank accounts funneling money between them, or a set of credit card transactions linked by common device or location that indicates identity theft). Knowledge graphs offer a way to model and analyze these complex relationships, going beyond what traditional transaction monitoring systems can do. # Applications From 24a291a952178fdd4b5242be7849cd709dfa1d44 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 15:27:34 +0000 Subject: [PATCH 17/28] Update knowledge-graphs-in-retail-and-ecommerce.mdx --- .../tutorials/knowledge-graphs-in-retail-and-ecommerce.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/apps/hashdotai/tutorials/knowledge-graphs-in-retail-and-ecommerce.mdx b/apps/hashdotai/tutorials/knowledge-graphs-in-retail-and-ecommerce.mdx index 6148d1ed525..50da5b71677 100644 --- a/apps/hashdotai/tutorials/knowledge-graphs-in-retail-and-ecommerce.mdx +++ b/apps/hashdotai/tutorials/knowledge-graphs-in-retail-and-ecommerce.mdx @@ -5,8 +5,6 @@ slug: knowledge-graphs-in-retail-and-ecommerce tags: ["Use Case"] --- -# Knowledge Graphs in Retail & eCommerce - Retailers and e-commerce platforms deal with diverse data about products, customers, and their interactions. Knowledge graphs are helping these companies better organize their knowledge about products and better serve their customers through recommendations and improved search. # Applications From 5379b9fb78f70f770aec7c5afef5b0725a00c4ef Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 15:34:56 +0000 Subject: [PATCH 18/28] Create knowledge-graphs-in-enterprise-knowledge-management.mdx --- ...phs-in-enterprise-knowledge-management.mdx | 34 +++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 apps/hashdotai/tutorials/knowledge-graphs-in-enterprise-knowledge-management.mdx diff --git a/apps/hashdotai/tutorials/knowledge-graphs-in-enterprise-knowledge-management.mdx b/apps/hashdotai/tutorials/knowledge-graphs-in-enterprise-knowledge-management.mdx new file mode 100644 index 00000000000..de3961f4cc4 --- /dev/null +++ b/apps/hashdotai/tutorials/knowledge-graphs-in-enterprise-knowledge-management.mdx @@ -0,0 +1,34 @@ +--- +title: Knowledge Graphs in Enterprise Knowledge Management +description: The value of connected knowledge across enterprise knowledge management +slug: knowledge-graphs-in-enterprise-knowledge-management +tags: ["Use Case"] +--- + +Outside of specific verticals, one of the broadest applications of knowledge graphs is in **enterprise knowledge management** – helping organizations make sense of their own data across departments. In many companies, information is scattered in various databases, documents, spreadsheets, and applications (commonly referred to as data silos). Knowledge graphs can integrate these diverse sources into a single connected knowledge hub that reflects the business’s information landscape. + +# Applications + +## 360-degree visibility + +A common example is creating a **360-degree view of key business entities**, such as a _Customer 360_ knowledge graph. This kind of KG would link a customer to all the interactions and data points related to them: contracts, support tickets, sales calls, purchases, website visits, feedback, etc. By doing so, any department (sales, support, marketing) can query the graph and get a holistic picture of the customer, enabling better service and personalized offerings. In practice, this means executives or analysts can ask complex questions and get answers that span multiple business units’ data. Knowledge graphs such as HASH can also be used to provide context to large language models and AI agents, providing access to information from many disparate sources in an efficiently queryable and trustable manner. This supports complex question-asking and answering, for example “Show me how a delay in our R\&D project X could impact our top 5 customers’ deliverables” – a query that touches project data, product data, and customer data all at once, which a well-constructed enterprise KG could answer by following the links. + +## Knowledge discovery + +**Enterprise search** is another killer use case. Traditional enterprise search often just indexes documents by keywords. With a knowledge graph, search can be enhanced to understand context. Let’s say an employee searches for “ABC Project architecture” on the intranet. A KG-backed search could understand that *ABC Project* is an entity (perhaps a project code) which is related to certain products and teams, and *architecture* might refer to design documents. Instead of just keyword matching, the search system can use the KG to find the specific architectural documents linked to that project entity, and also return related items (like the lead architect’s name, or similar projects). In effect, the KG adds a semantic layer that makes search results more accurate and navigable. Companies like Microsoft and Google have incorporated knowledge graph concepts to deliver “intelligent” search and discovery within their products. + +## Data integration + +One big benefit of enterprise KGs is their ability to **flexibly integrate data, and act as a souce of truth** (where desired). As noted earlier, knowledge graphs provide a framework that is more adaptable than a classical relational warehouse for combining data. They allow merging data even when schemas differ, by focusing on the relationships. Data integration projects that might take months to carefully design a unified schema can sometimes be accelerated by instead funneling data into a graph and then refining the relationships and node definitions on the fly. This isn’t to say schema and curation aren’t needed (they are, to maintain quality), but the graph approach can tolerate partial knowledge and evolve. Organizations have used KGs to merge everything from technical metadata to business glossaries to service logs – enabling cross-domain queries that were never possible before. The Alan Turing Institute explains that KGs are commonly used as **bridges between humans and systems**, because they can generate human-readable explanations or provide a layer of abstraction over raw data that makes it easier for people to understand. For example, an enterprise KG might map cryptic database field names to real-world concepts, so that a non-technical user could query “total sales in Europe Q1” and the system, via the KG, knows which internal data sources and fields to pull together to answer that. + +## AI and analytics enablement + +Another aspect is that enterprise KGs set the stage for **advanced analytics and AI**. By having a unified, connected data layer, machine learning models can more easily leverage all relevant features. Some organizations embed their knowledge graph into ML by using graph embeddings or by doing feature extraction from the KG (for instance, adding a feature like “is this customer connected to an escalation?” when predicting churn). Moreover, KGs can improve **explainability** of AI. If an AI system makes a recommendation or decision, having a knowledge graph that traces how data points are connected can help explain the rationale (a critical factor in fields like finance or healthcare for compliance). We’re also seeing the rise of **retrieval augmented generation (RAG)** where large language models (LLMs) use a knowledge graph as a source of factual grounding: the LLM generates a query to the KG, retrieves factual triples, and then formulates a natural language answer for the user. This approach helps avoid “hallucinations” by AI because the answer is anchored in the verified knowledge from the graph. Businesses implementing chatbots or question-answering systems internally can use this method to ensure the chatbot provides accurate, up-to-date answers from the enterprise’s knowledge graph rather than an unverifiable model memory. + +# Benefits + +The business value of enterprise knowledge graphs is broadly in **improving decision-making and operational efficiency** across the board. By making data more accessible and understandable, they reduce the time employees spend searching for information and piecing together context. One source notes that a knowledge graph essentially acts as a company’s collective memory, enabling strategic decisions by “connecting all data into an artificial brain” and thus helping improve decision-making skills of business owners. Concretely, this might mean faster time to insights (analysts can run complex queries in minutes that used to take days of joining spreadsheets), better collaboration (everyone references the same knowledge base), and the ability to **discover new insights** (seeing relationships that were not obvious before, like how two different product lines might be serving the same customer and should be coordinated). + +Operational efficiency can also improve: consider IT operations – some companies build KGs for infrastructure, linking servers, applications, alerts, and incidents. This helps in root cause analysis of outages because all relationships in the IT stack are mapped (this is sometimes called an **IT knowledge graph** or dependency graph). When something fails, the graph can quickly show what other systems depend on it, which team is responsible, etc., shortening downtime. + +The challenges here revolve around **data governance** (ensuring the KG stays accurate and updated as the business changes) and scaling to handle enterprise-wide data. There is also the organizational challenge of getting different departments to contribute to and use a shared knowledge graph. Success often requires executive support and a clear value proposition for all stakeholders. HASH solves these problems by providing automated two-way data synchronization, AI-inference of knowledge graph entities, and simple visual interfaces for domain experts and business users to use. This puts graph maintenance on autopilot and allows direct access to [organizations’ webs](/guide/webs) by non-technical, non-data scientist end-users. From 15663ffdc325e7695913661054fa2fd97c637240 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Thu, 20 Feb 2025 23:07:01 +0000 Subject: [PATCH 19/28] Create WIP_event-driven-knowledge-graphs.mdx --- .../glossary/WIP_event-driven-knowledge-graphs.mdx | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 apps/hashdotai/glossary/WIP_event-driven-knowledge-graphs.mdx diff --git a/apps/hashdotai/glossary/WIP_event-driven-knowledge-graphs.mdx b/apps/hashdotai/glossary/WIP_event-driven-knowledge-graphs.mdx new file mode 100644 index 00000000000..17c00d6a9bb --- /dev/null +++ b/apps/hashdotai/glossary/WIP_event-driven-knowledge-graphs.mdx @@ -0,0 +1,10 @@ +--- +title: Event-Driven Knowledge Graphs +description: "Event-driven knowledge graphs are commonly used to power decision support and simulation tools." +slug: event-driven-knowledge-graphs +tags: ["Graphs"] +--- + +**Event-driven knowledge graphs** sit at the intersection of [discrete event models](/glossary/discrete-event-modeling) which help simulate processes and the state of real-world systems, and [knowledge graphs](/glossary/knowledge-graphs) which help represent objects, events, situations, or concepts (often using [graph databases](/glossary/graph-databases)). + +By introducing an event-driven approach (aka. “event sourcing via streaming platforms”), platforms like HASH can extract and link data from multiple data silos in near real time. In practice, an event-sourcing pipeline streams key data changes (events) from disparate systems, deduplicates and unifies them, and updates a knowledge graph accordingly – resulting in an up-to-date, event-driven knowledge graph. This differs from traditional knowledge graphs that are typically entity-centric and updated in batches or via periodic processes. Traditional KGs capture mostly static facts and relationships, whereas an event-driven KG continually incorporates dynamic, temporal information (events) as first-class data. As such, event-driven graphs are always evolving to reflect the latest state of the business. From b38b18a7f3b12ec6e2768a2e837d645cb38e6be8 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Fri, 21 Feb 2025 13:40:25 +0000 Subject: [PATCH 20/28] Create WIP_labeled-property-graphs.mdx --- .../glossary/WIP_labeled-property-graphs.mdx | 32 +++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx diff --git a/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx b/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx new file mode 100644 index 00000000000..6d6588f70fc --- /dev/null +++ b/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx @@ -0,0 +1,32 @@ +--- +title: Labeled Property Graphs +description: "Labeled Property Graphs consist of nodes (also called vertices) and relationships (edges), each of which can hold associated data." +slug: labeled-property-graphs +tags: ["Graphs"] +--- +# **Data Modeling Principles** + +## **Structure of Labeled Property Graphs** + +A **labeled property graph (LPG)** such as a [HASH web](/guide/webs) consists of *nodes* (also called vertices) and *relationships* (edges), each of which can hold associated data. Nodes typically represent entities (e.g. a Person or Product) and can be assigned one or more **labels** to categorize their type. Relationships connect nodes and have a **type** (label) that describes the nature of the connection (e.g. `FRIENDS_WITH` or `PURCHASED`). Both nodes and relationships can have any number of **properties**, which are key–value pairs storing additional attributes (for example, a Person node might have properties like `name:"Alice"` and `age:30`). Relationships are usually **directed**, meaning they have a start and end node, though in some use cases direction can be ignored or traversed in both ways as needed. + +This model is best understood with a simple example. Consider two Person nodes and a friendship between them: + +``` +CREATE (p:Person { name: "Alice", age: 30 }); +CREATE (q:Person { name: "Bob", age: 32 }); +CREATE (p)-[:FRIENDS_WITH]->(q); +``` + +Here we created two nodes labeled **Person** with some properties, and a directed relationship of type **FRIENDS\_WITH** from Alice to Bob. In an LPG-based database (such as HASH), this data is stored as a graph structure – you can later query it by traversing from `Alice` to find all `FRIENDS_WITH` connections, for example. The key elements to note are: nodes have labels and properties, relationships have types and can also carry properties (e.g. one could add a property `since:2020` on the `FRIENDS_WITH` relationship to indicate when the friendship started). This enriched graph structure makes the LPG extremely expressive for modeling complex domains. + +## **LPG vs. Other Graph Models (e.g. RDF)** + +Labeled property graphs are one of the two major graph data modeling paradigms in wide use today, the other being the **RDF** (Resource Description Framework) triple model. While both represent data as networks of connected entities, there are fundamental differences in how data is structured and annotated: + +- **Node properties _vs._ Triples**: In an LPG, a node can have attributes stored directly as properties (as in the Alice example above). In RDF, by contrast, there is *no concept of an attribute on a node* – every piece of information is expressed as a separate triple (subject–predicate–object). For example, to represent a person’s birthdate in RDF, one would create a triple like `(BarackObama) -[birthDate]-> ("1961")`, essentially treating the date "1961" as an object node or literal connected via a predicate. In an LPG, that same fact could simply be a property `birthDate: 1961` on the Barack Obama node, with no extra edge needed. This means RDF tends to produce many more small connecting elements, whereas LPG can store richer information per node/edge object (more analogous to an object in OOP with fields). +- **Global _vs._ Local identification**: RDF uses globally unique identifiers (URIs) for each entity and relationship type, aiming for web-scale data integration. Every predicate (relationship type) and often nodes are defined by URIs that can link across datasets. LPG systems typically use application-local identifiers (like string names for relationship types and labels) and do not inherently link across databases. This makes property graphs simpler to work with in a closed-world context, whereas RDF is built for interoperability at the cost of some verbosity. [HASH](https://hash.ai/) is a next-generation platform that combines the interoperability and mutual intelligibility of RDF with the expressiveness and customizability of LPG. +- **Atomic unit of data**: The atomic unit in RDF is the triple. Even a single entity with multiple attributes is essentially a collection of triples sharing the same subject. LPGs do not have a single fixed atomic structure; a node with properties is a self-contained data structure, and an edge with its properties is another. This means an LPG can be thought of as a collection of nodes and edges (each a small record with key-values), rather than a collection of triples. +- **Schema and semantics**: RDF is tightly connected to the Semantic Web and has a rich standard stack for defining ontologies and schemas (RDF Schema, OWL) that let you formally specify classes, relationships, and even logical inference rules. An RDF graph can be “self-describing” to a degree, as the meaning of relationships and nodes can be defined through shared vocabularies/ontologies. Property graphs, on the other hand, do not enforce any specific global schema or ontology layer; the **interpretation of the labels and properties is left to the consumer** or defined at the application level. This gives LPGs more flexibility (you can add any property to any node without prior schema setup), but it also means that understanding the data’s meaning relies on external documentation or conventions rather than inherent semantics. As a hybrid of RDF and LPG-based approaches, HASH relies upon a [type system](/guide/types) to describe labeled property graphs and ensure interoperability. + +In summary, LPGs emphasize a pragmatic, object-like approach to graph data modeling: nodes and relationships as entities with properties, suitable for straightforward querying and mutation in graph databases. RDF emphasizes a web-standard, triple-based approach with powerful integration and reasoning capabilities. While RDF is common in open knowledge graphs and linked data scenarios, LPGs are frequently found in graph databases for operational or analytic applications. And now, with HASH, the two models can complement one another. From 1d392e81af689cd23e192ef71a9a3c9a67fc795d Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Wed, 26 Feb 2025 15:54:29 +0000 Subject: [PATCH 21/28] temp --- apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx b/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx index 6d6588f70fc..ad746ea5087 100644 --- a/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx +++ b/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx @@ -4,6 +4,7 @@ description: "Labeled Property Graphs consist of nodes (also called vertices) an slug: labeled-property-graphs tags: ["Graphs"] --- + # **Data Modeling Principles** ## **Structure of Labeled Property Graphs** From c72624db6f218e29e2a73932af131c2b1dec530b Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Tue, 4 Mar 2025 10:58:27 +0000 Subject: [PATCH 22/28] Create `scalars.mdx` --- apps/hashdotai/glossary/scalars.mdx | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 apps/hashdotai/glossary/scalars.mdx diff --git a/apps/hashdotai/glossary/scalars.mdx b/apps/hashdotai/glossary/scalars.mdx new file mode 100644 index 00000000000..f7339b7128f --- /dev/null +++ b/apps/hashdotai/glossary/scalars.mdx @@ -0,0 +1,14 @@ +--- +title: Scalars +description: "Scalar values are numerical values which indicate the magnitude of something - for example the mass of an object or distance between two points." +slug: scalars +tags: ["Data Science"] +--- + +**Scalars** are [variables](/glossary/variables) which hold an individual [value](/glossary/values). + +**Numerical scalar values** are numerical values which indicate the magnitude of something -- for example, the mass of an object (e.g. `6 kilograms`), or the distance between two points (`5 miles`). Numbers which are scalars are usually integers, fixed points, or floats. + +Scalars stand in contrast to [vectors](/glossary/vectors), which are things such as velocity (the _speed_ of something AND the direction it is headed in - e.g. `5 meters per second northeast`), which combine multiple values in an [array](/glossary/arrays). + +_Ordinary users of HASH don't need to know what scalars are, and this information is provided as a reference for advanced [type modelers](/guide/types) only._ From 2a60430da141e49dae89bd4d278de74436ddd3ad Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Tue, 4 Mar 2025 12:16:09 +0000 Subject: [PATCH 23/28] Create vectors.mdx --- apps/hashdotai/glossary/vectors.mdx | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 apps/hashdotai/glossary/vectors.mdx diff --git a/apps/hashdotai/glossary/vectors.mdx b/apps/hashdotai/glossary/vectors.mdx new file mode 100644 index 00000000000..6f5225a9596 --- /dev/null +++ b/apps/hashdotai/glossary/vectors.mdx @@ -0,0 +1,16 @@ +--- +title: Vectors +description: "Vectors values can't be expressed as a single number, but instead combine multiple pieces of information - e.g. velocity (which refers to the speed something is moving at, as well as the direction it is moving in)." +slug: vectors +tags: ["Data Science"] +--- + +**Vectors** are values which cannot be expressed as a single number (a [scalar](/glossary/scalars)). + +For example, _velocity_ is a vector which measures both the magnitude (speed) an object is moving at, as well as its direction, and is written in a form similar to `5 meters per second northeast`. _Displacement_ and _force_ are other common vectors in physics which have both magnitudes and directions. + +These stand in contrast to measures such as _speed_ by itself, a scalar, which can simply be written as `5 meters per second` (as it contains only the magnitude of a thing, and no directional information). + +However, vectors don't have to contain magnitude and direction, but rather any two or more kinds of information, combined. Common sorts of vectors include tuples (such as a color expressed as an RGB value) and arrays. + +_Ordinary users of HASH don't need to know what vectors are, and this information is provided as a reference for advanced [type modelers](/guide/types) only._ From 85964017ff779e883b16a18bd7f9d30d7dee1ef6 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Tue, 4 Mar 2025 12:18:37 +0000 Subject: [PATCH 24/28] Update scalars.mdx --- apps/hashdotai/glossary/scalars.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/apps/hashdotai/glossary/scalars.mdx b/apps/hashdotai/glossary/scalars.mdx index f7339b7128f..73737079dc5 100644 --- a/apps/hashdotai/glossary/scalars.mdx +++ b/apps/hashdotai/glossary/scalars.mdx @@ -5,7 +5,7 @@ slug: scalars tags: ["Data Science"] --- -**Scalars** are [variables](/glossary/variables) which hold an individual [value](/glossary/values). +**Scalars** are [variables](/glossary/variables) which hold individual [values](/glossary/values). **Numerical scalar values** are numerical values which indicate the magnitude of something -- for example, the mass of an object (e.g. `6 kilograms`), or the distance between two points (`5 miles`). Numbers which are scalars are usually integers, fixed points, or floats. From fff10ac492daad5bc5b6610d95c5989f9893d671 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Tue, 4 Mar 2025 12:26:17 +0000 Subject: [PATCH 25/28] Create variables.mdx --- apps/hashdotai/glossary/variables.mdx | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 apps/hashdotai/glossary/variables.mdx diff --git a/apps/hashdotai/glossary/variables.mdx b/apps/hashdotai/glossary/variables.mdx new file mode 100644 index 00000000000..eaa4c40cd21 --- /dev/null +++ b/apps/hashdotai/glossary/variables.mdx @@ -0,0 +1,10 @@ +--- +title: Variables +description: "Variables are abstract things which can contain information, and which have a symbolic name by which they can be referred." +slug: variables +tags: ["Data Science"] +--- + +In both programming and mathematics, **variables** are abstract things which can contain information, and which have a symbolic name by which they can be referred. + +In HASH, any [property](/glossary/properties) or [link](/glossary/links) may be used as a variable in an equation, or by an application querying a [web](/guide/webs) via the HASH API. From 8d6726125b2b9444e24dc76faae0e8e1ac25c383 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Tue, 4 Mar 2025 13:14:37 +0000 Subject: [PATCH 26/28] Create arrays.mdx --- apps/hashdotai/glossary/arrays.mdx | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 apps/hashdotai/glossary/arrays.mdx diff --git a/apps/hashdotai/glossary/arrays.mdx b/apps/hashdotai/glossary/arrays.mdx new file mode 100644 index 00000000000..92c57ee1497 --- /dev/null +++ b/apps/hashdotai/glossary/arrays.mdx @@ -0,0 +1,12 @@ +--- +title: Arrays +description: "Arrays are collections of variables or values, each of which can be identified by at least one key (e.g. their order in an array)." +slug: arrays +tags: ["Data Science"] +--- + +In programming, **arrays** are collections of [variables](/glossary/variables) or [values](/glossary/values), each of which can be identified within the array by at least one "key" (typically its position in the array). Arrays themselves can in turn also be variables or values. + +Some data types are arrays. For example, `Color` when expressed as an `RGB` value, contains three numbers which refer to the relative amount of red, green, and blue light that make up a color (each on a scale of 0 to 255). For example, `0,0,0` is white, `255,0,0` is red, `0,0,255` is blue, and `255,255,255` is black. + +In HASH, arrays are found in the context of [property types](/guide/types/property-types). Property types describe the acceptable value(s) that a property can have. They are either expressed as [data types](/guide/types/data-types), property objects (other property types, nested within the parent), or arrays (which can contain data-typed values, property objects, or further nested arrays). From be0d284bdd8f80eeb2f2aa5e96318a2a03451de0 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Tue, 4 Mar 2025 14:09:09 +0000 Subject: [PATCH 27/28] Update WIP_labeled-property-graphs.mdx --- .../glossary/WIP_labeled-property-graphs.mdx | 46 +++++++++++++++---- 1 file changed, 36 insertions(+), 10 deletions(-) diff --git a/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx b/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx index ad746ea5087..29ecdd091d8 100644 --- a/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx +++ b/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx @@ -9,25 +9,51 @@ tags: ["Graphs"] ## **Structure of Labeled Property Graphs** -A **labeled property graph (LPG)** such as a [HASH web](/guide/webs) consists of *nodes* (also called vertices) and *relationships* (edges), each of which can hold associated data. Nodes typically represent entities (e.g. a Person or Product) and can be assigned one or more **labels** to categorize their type. Relationships connect nodes and have a **type** (label) that describes the nature of the connection (e.g. `FRIENDS_WITH` or `PURCHASED`). Both nodes and relationships can have any number of **properties**, which are key–value pairs storing additional attributes (for example, a Person node might have properties like `name:"Alice"` and `age:30`). Relationships are usually **directed**, meaning they have a start and end node, though in some use cases direction can be ignored or traversed in both ways as needed. +A **labeled property graph (LPG)** such as a [HASH web](/guide/webs) consists of *entities* (also called nodes or vertices) and *links* (relationships or edges), each of which can have a series of attributes associated with it which contain relevant information. Entities typically represent things like a `Person` or `Product`, and can be assigned one or more [entity types](/guide/types/entity-types) to help categorize them. Links connect entities and are also characteirized by [link types](/guide/types/link-types) which describe the nature of the connection (e.g. `Friends With` or `Purchased`). Both entities and links can have any number of **attributes**, which are key–value pairs storing additional information (for example, a Person node might have properties like `name:"Alice"` and `age:30`). Links are usually **directed**, meaning they have start and end at specific entities, though in some use cases direction can be ignored or traversed in both ways as needed. -This model is best understood with a simple example. Consider two Person nodes and a friendship between them: +This model is best understood with a simple example. Consider two `Person` entities and a friendship between them: ``` CREATE (p:Person { name: "Alice", age: 30 }); CREATE (q:Person { name: "Bob", age: 32 }); -CREATE (p)-[:FRIENDS_WITH]->(q); +CREATE (p)-[:Friends With]->(q); ``` -Here we created two nodes labeled **Person** with some properties, and a directed relationship of type **FRIENDS\_WITH** from Alice to Bob. In an LPG-based database (such as HASH), this data is stored as a graph structure – you can later query it by traversing from `Alice` to find all `FRIENDS_WITH` connections, for example. The key elements to note are: nodes have labels and properties, relationships have types and can also carry properties (e.g. one could add a property `since:2020` on the `FRIENDS_WITH` relationship to indicate when the friendship started). This enriched graph structure makes the LPG extremely expressive for modeling complex domains. +Here we created two entities with the **Person** entity type, which allows certain properties to be associated with them. We've also created a link of type **Friends With** from Alice to Bob. In an LPG-style database (such as HASH), this data is stored as a graph structure – you can later query it by traversing from `Alice` to find all `Friends With` connections, for example. Both the entities and links between them have types, and information which describes them (attributes: properties and links associated with them) - e.g. one could add a property `Since` and a corresponding value of `2020` on the `Friends With` link to indicate when the friendship started). This enriched graph structure makes the LPG extremely expressive for modeling complex domains. ## **LPG vs. Other Graph Models (e.g. RDF)** Labeled property graphs are one of the two major graph data modeling paradigms in wide use today, the other being the **RDF** (Resource Description Framework) triple model. While both represent data as networks of connected entities, there are fundamental differences in how data is structured and annotated: -- **Node properties _vs._ Triples**: In an LPG, a node can have attributes stored directly as properties (as in the Alice example above). In RDF, by contrast, there is *no concept of an attribute on a node* – every piece of information is expressed as a separate triple (subject–predicate–object). For example, to represent a person’s birthdate in RDF, one would create a triple like `(BarackObama) -[birthDate]-> ("1961")`, essentially treating the date "1961" as an object node or literal connected via a predicate. In an LPG, that same fact could simply be a property `birthDate: 1961` on the Barack Obama node, with no extra edge needed. This means RDF tends to produce many more small connecting elements, whereas LPG can store richer information per node/edge object (more analogous to an object in OOP with fields). -- **Global _vs._ Local identification**: RDF uses globally unique identifiers (URIs) for each entity and relationship type, aiming for web-scale data integration. Every predicate (relationship type) and often nodes are defined by URIs that can link across datasets. LPG systems typically use application-local identifiers (like string names for relationship types and labels) and do not inherently link across databases. This makes property graphs simpler to work with in a closed-world context, whereas RDF is built for interoperability at the cost of some verbosity. [HASH](https://hash.ai/) is a next-generation platform that combines the interoperability and mutual intelligibility of RDF with the expressiveness and customizability of LPG. -- **Atomic unit of data**: The atomic unit in RDF is the triple. Even a single entity with multiple attributes is essentially a collection of triples sharing the same subject. LPGs do not have a single fixed atomic structure; a node with properties is a self-contained data structure, and an edge with its properties is another. This means an LPG can be thought of as a collection of nodes and edges (each a small record with key-values), rather than a collection of triples. -- **Schema and semantics**: RDF is tightly connected to the Semantic Web and has a rich standard stack for defining ontologies and schemas (RDF Schema, OWL) that let you formally specify classes, relationships, and even logical inference rules. An RDF graph can be “self-describing” to a degree, as the meaning of relationships and nodes can be defined through shared vocabularies/ontologies. Property graphs, on the other hand, do not enforce any specific global schema or ontology layer; the **interpretation of the labels and properties is left to the consumer** or defined at the application level. This gives LPGs more flexibility (you can add any property to any node without prior schema setup), but it also means that understanding the data’s meaning relies on external documentation or conventions rather than inherent semantics. As a hybrid of RDF and LPG-based approaches, HASH relies upon a [type system](/guide/types) to describe labeled property graphs and ensure interoperability. - -In summary, LPGs emphasize a pragmatic, object-like approach to graph data modeling: nodes and relationships as entities with properties, suitable for straightforward querying and mutation in graph databases. RDF emphasizes a web-standard, triple-based approach with powerful integration and reasoning capabilities. While RDF is common in open knowledge graphs and linked data scenarios, LPGs are frequently found in graph databases for operational or analytic applications. And now, with HASH, the two models can complement one another. +- **Node properties _vs._ Triples**: In an LPG, an entity can have attributes stored directly as properties (as in the Alice example above). In RDF, by contrast, there is *no concept of an attribute on an entity* – every piece of information is expressed as a separate triple (subject–predicate–object). For example, to represent a person’s birthdate in RDF, one would create a triple like `(BarackObama) -[birthDate]-> ("1961")`, essentially treating the date "1961" as an object node or literal connected via a predicate. In an LPG, that same fact could simply be a property `birthDate: 1961` on the Barack Obama entity, with no extra link needed. This means RDF tends to produce many more small connecting elements, whereas LPG can store richer information per entity/link (more analogous to an object in object-oriented programming with fields). +- **Global _vs._ Local identification**: RDF uses globally unique identifiers (URIs) for each entity and link type, aiming for web-scale data integration. Every predicate (link type) and often entities are defined by URIs that can link across datasets. LPG systems meanwhile typically use application-local identifiers (like string names for link types and entity types, which they call "labels") and do not inherently link across databases. This makes property graphs simpler to work with in a closed-world context, whereas RDF is built for interoperability at the cost of some verbosity. [HASH](https://hash.ai/) is a next-generation platform that combines the interoperability and mutual intelligibility of RDF with the expressiveness and customizability of LPG, with all entities, links, entity types, link types and associated resources having fixed URIs which can be relied upon internally (or even published to the world wide web) as desired. +- **Atomic unit of data**: The atomic unit in RDF is the triple. Even a single entity with multiple attributes is essentially a collection of triples sharing the same subject. LPGs do not have a single fixed atomic structure; an entity with properties is a self-contained data structure, and a link with its properties is another. This means an LPG can be thought of as a collection of entities and links (each a small record with key-values), rather than a collection of triples. +- **Schema and semantics**: RDF is tightly connected to the Semantic Web and has a rich standard stack for defining ontologies and schemas (RDF Schema, OWL) that let you formally specify classes, relationships, and even logical inference rules. An RDF graph can be “self-describing” to a degree, as the meaning of relationships and nodes can be defined through shared vocabularies/ontologies. Property graphs, on the other hand, do not enforce any specific global schema or ontology layer; the **interpretation of the labels and properties is left to the consumer** or defined at the application level. This gives LPGs more flexibility (you can add any property to any node without prior schema setup), but it also means that understanding the data’s meaning relies on external documentation or conventions rather than inherent semantics. As a hybrid of RDF and LPG-based approaches, HASH relies upon a [type system](/guide/types) to describe labeled property graphs and ensure interoperability. These types can be kept private or publicly shared, and users can fork on, extend, re-use and crosswalk between standardized definitions of entities created by anyone else. This makes HASH well-suited to collaboration within and across companies, while HASH's UI abstracts away complexity and ensures type creation and editing remains simple and easy. + +In summary: + +- **LPGs** emphasize a pragmatic, object-like approach to graph data modeling: entities and links have attributes which make them suitable for straightforward querying and mutation in graph databases. +- **RDF** emphasizes a web-standard, triple-based approach with powerful integration and reasoning capabilities. +- While RDF is common in open knowledge graphs and linked data scenarios, LPGs are frequently found in graph databases for operational or analytic applications. +- **HASH** extends the "LPG" model, replacing simple text "labels" with formally-defined types, while supporting the common RDF paradigm of stable, referenceable URIs. As such, HASH combines the benefits of both LPG and RDF approaches into a single new approach. + +## Best Practices for Graph Data Modeling + +Designing a graph data model requires careful thought to fully leverage the power of the LPG model while keeping the graph efficient and comprehensible. Here are some core data modeling principles and best practices: + +1. **Identify nodes and relationships from entities**: Start by identifying the main kinds of entities you store information about, to map to [entity types](/guide/types/entity-types) (your node labels), and the important relationships between them, to map to [link types](/guide/types/link-types). + * If you have an Entity Relationship (ER) diagram or an object model, it can often be translated: entities/node labels map to *entity types*, and relationships/edges/associations map to *link types*. For example, in a retail scenario you might have nodes labeled `Customer`, `Product`, and `Order`, with relationships like `(:Customer)-[:PLACED]->(:Order)` and `(:Order)-[:CONTAINS]->(:Product)`. + * In HASH, connecting to existing data sources via [integrations](/integrations) automatically populates your web with properly typed entities, eliminating any need for time-consuming transformation or manual mapping of data to entities. +2. **Use properties for simple attributes**: For attributes that don’t naturally need to be separate nodes, use properties. In a relational database, you might normalize certain data into separate tables, but in a graph it’s often unnecessary unless you plan to traverse or query that attribute as a relationship. For instance, an `email` or `age` of a person can be a property on the Person node. On the other hand, something like an `Address` might be a node of its own if you want to connect people living at the same address or perform graph queries on the network of locations. A key difference from RDF here is that in LPG such as HASH you don’t need to create intermediate nodes for every value. Properties on entities and links in HASH help keep the graph compact and performant. +3. **Use links when appropriate**: Oftentimes data can be modeled either as a property or as a link (relationship between two entities). A good general rule of thumb is if the data item is primarily an attribute of *one* entity (and not something you'd traverse or connect to from other entities), a property is appropriate. If the data item represents a connection or entity in its own right that *other things* may relate to, create a separate entity and link to it. For example, if modeling a person’s employer: if you only care to store the employer’s name, you could use an `Employer Name` property type. But if you want to connect the `Person` to another entity, `Company` (which in turn might have its own properties or connect to other companies), create a `Works For` link type and model `(:Person)-[:WORKS_FOR]->(:Company)` instead. As an LPG, HASH supports both approaches, and you should circumstantially pick the one that makes querying for data most natural and avoids duplication of data across your [web](/guide/webs). +4. **Avoid superfluous nodes or relationships**: Every entity (node) and link (relationship) in your web should represent something meaningful. If you find entities of a given type that have only one link and no properties, ask if they’re actually necessary, or if they could just be properties on another entity or link. Unnecessary indirection can slow down queries, and make information harder to understand at a glance. Similarly, avoid introducing link types that duplicate what could be captured via properties or existing links. In general, you want to ensure information is only represented once in your graph (eliminating a need to sync distinct values), and your ontology is as simple as possible (to make understanding it and checking for consistency easier), while still representing all distinctions you care about. +5. **Leverage entity types properly**: In many LPGs, entity types (labels on nodes) and link types (relationship types) can be indexed or used to efficiently select subgraphs. In HASH, all entities are typed, and this is handled automatically. + * Whether you’re using HASH or another LPG, make sure that entities are assigned the correct entity types – e.g. that an individual entity is assigned both the entity type (label) `Employee` and `Customer` if they happen to fall into both categories. + * When creating entity types, avoid becoming too fine-grained. For example, having an entity type per country of citizenship (both `US Person` and `UK Person`) may be overkill, if a `Country` property on a `Person` entity would suffice. However, if you want to associate unique attributes (properties or links) with people in your graph, depending on their place of residence (e.g. `SSN` in the US, and `National Insurance Number` in the UK) such granularity could be appropriate. +6. **Minimize redundant data**: One sign of a suboptimal graph model is a large number of duplicate entities scattered across a web. Instead of duplicating entities to indicate multiple roles, add multiple entity types (labels) to a single entity, storing its values in one place, and linking to it from elsewhere if required, rather than duplicating information in different places (necessitating they then be kept in sync, lest they drift causing confusion). Graphs by nature can represent many-to-many connections without duplication. If you notice identical subgraphs repeated, you may need to refactor your model. In graph design, it's generally often better to increase the number of relationships rather than duplicate nodes. This means linking data points with new relationships so they can be shared or traversed, rather than copying data into separate parts of the graph. +7. **Watch for modeling anti-patterns**: Three common issues can signal a need to adjust your data model: + 1. *Sparse or tree-like graph structure*: If your web has very few links (like a shallow tree), you aren’t leveraging graph traversal much. Webs, graph databases, and linked property graphs show their strength when data is highly connected; a purely hierarchical or isolated data model might perform just as well in a relational system. + 2. *Data duplication*: as mentioned, repeated entities/links usually indicate the data model could be more normalized within the graph. + 3. *Overly complex queries*: If you find yourself writing very convoluted graph queries to get simple answers, the model might be forcing workarounds. The ideal is that queries align with how you naturally think of the problem. Complex, multi-step queries might mean some important link is missing in the data model, or data is embedded in properties when it should be connected via edges. Revisit the model to see if a different arrangement of entities/links would answer that query more directly (for example, adding a shortcut link for a frequently needed connection). + +Following these principles helps maintain a graph model that is both expressive and performant. A well-designed LPG will make it easier to formulate queries, ensure the database can traverse efficiently, and reduce the chances of anomalies (like contradictory data) by storing each fact in an appropriate place. From ec274ff2bbe62f6d83193863d29a2913c583a2f7 Mon Sep 17 00:00:00 2001 From: Dei Vilkinsons <6226576+vilkinsons@users.noreply.github.com> Date: Tue, 4 Mar 2025 14:22:10 +0000 Subject: [PATCH 28/28] Update WIP_labeled-property-graphs.mdx --- .../glossary/WIP_labeled-property-graphs.mdx | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx b/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx index 29ecdd091d8..5aa42895d47 100644 --- a/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx +++ b/apps/hashdotai/glossary/WIP_labeled-property-graphs.mdx @@ -9,9 +9,15 @@ tags: ["Graphs"] ## **Structure of Labeled Property Graphs** -A **labeled property graph (LPG)** such as a [HASH web](/guide/webs) consists of *entities* (also called nodes or vertices) and *links* (relationships or edges), each of which can have a series of attributes associated with it which contain relevant information. Entities typically represent things like a `Person` or `Product`, and can be assigned one or more [entity types](/guide/types/entity-types) to help categorize them. Links connect entities and are also characteirized by [link types](/guide/types/link-types) which describe the nature of the connection (e.g. `Friends With` or `Purchased`). Both entities and links can have any number of **attributes**, which are key–value pairs storing additional information (for example, a Person node might have properties like `name:"Alice"` and `age:30`). Links are usually **directed**, meaning they have start and end at specific entities, though in some use cases direction can be ignored or traversed in both ways as needed. +A **labeled property graph (LPG)** consists of *entities* (also called nodes or vertices) and *links* (relationships or edges), each of which can have a series of attributes associated with it which contain relevant information. In an LPG, entities have simple textual labels like `Person` or `Product`, and so do links - for example, `Friends With` or `Purchased`. These help categorize entities and links. -This model is best understood with a simple example. Consider two `Person` entities and a friendship between them: +In more advanced "LPG inspired" systems like HASH, instead of using simple textual labels, entities and links are assigned one or more [entity types](/guide/types/entity-types) or [link types](/guide/types/link-types) (as appropriate). + +In LPGs, both entities and links can have any number of _properties_, which are key–value pairs storing additional information (for example, a Person node might have properties like `name:"Alice"` and `age:30`). In HASH, both entities and links can contain any number of properties, or other _links_ (allowing links to point to other links, as required). + +In LPGs and HASH, links are usually **directed**, meaning they have start and end at specific entities, though in some use cases direction can be ignored or traversed in both ways as needed. + +The LPG model is best understood with a simple example. Consider two `Person` entities and a friendship between them: ``` CREATE (p:Person { name: "Alice", age: 30 }); @@ -19,13 +25,13 @@ CREATE (q:Person { name: "Bob", age: 32 }); CREATE (p)-[:Friends With]->(q); ``` -Here we created two entities with the **Person** entity type, which allows certain properties to be associated with them. We've also created a link of type **Friends With** from Alice to Bob. In an LPG-style database (such as HASH), this data is stored as a graph structure – you can later query it by traversing from `Alice` to find all `Friends With` connections, for example. Both the entities and links between them have types, and information which describes them (attributes: properties and links associated with them) - e.g. one could add a property `Since` and a corresponding value of `2020` on the `Friends With` link to indicate when the friendship started). This enriched graph structure makes the LPG extremely expressive for modeling complex domains. +Here we created two entities with the **Person** label (or entity type, if using HASH), which allows certain properties to be associated with them. We've also created a link with the label (or link type) **Friends With** that points from from Alice to Bob. In LPGs and HASH, this data is stored as a graph structure – you can later query it by traversing from `Alice` to find all `Friends With` connections, for example. In both LPGs and HASH this works very similarly: except in LPGs we are querying by "label" and in HASH instead by "type". Both the entities and links between them have labels/types. In HASH, these types indicate what information can be associated with an entity (the expected attributes: properties and links). For example, one could add a property `Since` and corresponding value of `2020` on the `Friends With` link to indicate when a friendship started. This enriched graph structure makes LPGs and typed alternatives like HASH extremely expressive for modeling complex domains. ## **LPG vs. Other Graph Models (e.g. RDF)** Labeled property graphs are one of the two major graph data modeling paradigms in wide use today, the other being the **RDF** (Resource Description Framework) triple model. While both represent data as networks of connected entities, there are fundamental differences in how data is structured and annotated: -- **Node properties _vs._ Triples**: In an LPG, an entity can have attributes stored directly as properties (as in the Alice example above). In RDF, by contrast, there is *no concept of an attribute on an entity* – every piece of information is expressed as a separate triple (subject–predicate–object). For example, to represent a person’s birthdate in RDF, one would create a triple like `(BarackObama) -[birthDate]-> ("1961")`, essentially treating the date "1961" as an object node or literal connected via a predicate. In an LPG, that same fact could simply be a property `birthDate: 1961` on the Barack Obama entity, with no extra link needed. This means RDF tends to produce many more small connecting elements, whereas LPG can store richer information per entity/link (more analogous to an object in object-oriented programming with fields). +- **Node properties _vs._ Triples**: In LPGs/HASH, an entity can have attributes stored directly as properties (as in the Alice example above). In RDF, by contrast, there is *no concept of an attribute on an entity* – every piece of information is expressed as a separate triple (subject–predicate–object). For example, to represent a person’s birthdate in RDF, one would create a triple like `(BarackObama) -[birthDate]-> ("1961")`, essentially treating the date "1961" as an object node or literal connected via a predicate. In an LPG, that same fact could simply be a property `birthDate: 1961` on the Barack Obama entity, with no extra link needed. This means RDF tends to produce many more small connecting elements, whereas LPGs can store richer information per entity/link (more analogous to an object in object-oriented programming with fields). - **Global _vs._ Local identification**: RDF uses globally unique identifiers (URIs) for each entity and link type, aiming for web-scale data integration. Every predicate (link type) and often entities are defined by URIs that can link across datasets. LPG systems meanwhile typically use application-local identifiers (like string names for link types and entity types, which they call "labels") and do not inherently link across databases. This makes property graphs simpler to work with in a closed-world context, whereas RDF is built for interoperability at the cost of some verbosity. [HASH](https://hash.ai/) is a next-generation platform that combines the interoperability and mutual intelligibility of RDF with the expressiveness and customizability of LPG, with all entities, links, entity types, link types and associated resources having fixed URIs which can be relied upon internally (or even published to the world wide web) as desired. - **Atomic unit of data**: The atomic unit in RDF is the triple. Even a single entity with multiple attributes is essentially a collection of triples sharing the same subject. LPGs do not have a single fixed atomic structure; an entity with properties is a self-contained data structure, and a link with its properties is another. This means an LPG can be thought of as a collection of entities and links (each a small record with key-values), rather than a collection of triples. - **Schema and semantics**: RDF is tightly connected to the Semantic Web and has a rich standard stack for defining ontologies and schemas (RDF Schema, OWL) that let you formally specify classes, relationships, and even logical inference rules. An RDF graph can be “self-describing” to a degree, as the meaning of relationships and nodes can be defined through shared vocabularies/ontologies. Property graphs, on the other hand, do not enforce any specific global schema or ontology layer; the **interpretation of the labels and properties is left to the consumer** or defined at the application level. This gives LPGs more flexibility (you can add any property to any node without prior schema setup), but it also means that understanding the data’s meaning relies on external documentation or conventions rather than inherent semantics. As a hybrid of RDF and LPG-based approaches, HASH relies upon a [type system](/guide/types) to describe labeled property graphs and ensure interoperability. These types can be kept private or publicly shared, and users can fork on, extend, re-use and crosswalk between standardized definitions of entities created by anyone else. This makes HASH well-suited to collaboration within and across companies, while HASH's UI abstracts away complexity and ensures type creation and editing remains simple and easy.