From 99c3e1857ec482ad21b595ba2e18b6ed5700ccdf Mon Sep 17 00:00:00 2001 From: Andy Jackson Date: Thu, 9 Nov 2023 13:38:07 +0000 Subject: [PATCH] Note on scopes etc. --- ingest/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ingest/README.md b/ingest/README.md index 3e8461c..079a409 100644 --- a/ingest/README.md +++ b/ingest/README.md @@ -95,7 +95,7 @@ Similarly to Kafka, use the supplied scripts (or varient of them) to launch the A few differnet things need to be set up when running a crawler: -- Check scope surts and exclusions. These are on shared files with the host, and may need updating based on data from W3ACT/curators. +- Check scope surts and exclusions. These are on shared files with the host, and may need updating based on data from W3ACT/curators. FC manages scope and seeds via Kafka, but exclusions are manual. DC needs explicit scope and exclusion configuration. - Update the Geo-IP DB for DC: https://github.com/ukwa/ukwa-services/issues/123 Note that setting up seeds, scope and exclusions for the domain crawl is particularly involved, and is documented at _TBA IS ON GITLAB_