Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Barbara/publishing pipeline #703

Merged
merged 5 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,6 @@ samconfig.toml
!/.github
startup.sh

__pycache__
__pycache__
.venv/
env/
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,23 @@ If you would like to mount your own codebase to the content_harvester container
export MOUNT_CODEBASE=<path to rikolti, for example: /Users/awieliczka/Projects/rikolti>
```

In order to run the indexer code, make sure the following variables are set:

```
export RIKOLTI_ES_ENDPOINT= # ask for endpoint url
export RIKOLTI_HOME=/usr/local/airflow/dags/rikolti
export INDEX_RETENTION=1
```

Also make sure to set your temporary AWS credentials and the region so that the mwaa-local-runner container can authenticate when talking to the OpenSearch API:

```
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_SESSION_TOKEN=
export AWS_REGION=us-west-2
```

Finally, from inside the aws-mwaa-local-runner repo, run `./mwaa-local-env build-image` to build the docker image, and `./mwaa-local-env start` to start the mwaa local environment.

For more information on `mwaa-local-env`, look for instructions in the [ucldc/aws-mwaa-local-runner:README](https://github.com/ucldc/aws-mwaa-local-runner/#readme) to build the docker image, run the container, and do local development.
Expand Down
2 changes: 1 addition & 1 deletion dags/index_to_prod_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
schedule=None,
start_date=datetime(2023, 1, 1),
catchup=False,
params={'collection_id': Param(None, description="Collection ID to index")},
params={'collection_id': Param(None, description="Collection ID to move to prod")},
tags=["rikolti"],
)
def index_to_prod_dag():
Expand Down
1 change: 1 addition & 0 deletions dags/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.10.txt"
boto3
opensearch-py
requests
sickle
python-dotenv
Expand Down
10 changes: 7 additions & 3 deletions env.example
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,11 @@ export CONTENT_ROOT=file:///usr/local/airflow/rikolti_content

# indexer
export RIKOLTI_ES_ENDPOINT= # ask for endpoint url
export RIKOLTI_ES_PASS= # ask for password

export RIKOLTI_HOME=/usr/local/airflow/dags/rikolti
export INDEX_RETENTION=1 # number of unaliased indices to retain during cleanup
export INDEX_RETENTION=1 # number of unaliased indices to retain during cleanup

# indexer when run locally via aws-mwaa-local-runner
# export AWS_ACCESS_KEY_ID=
# export AWS_SECRET_ACCESS_KEY=
# export AWS_SESSION_TOKEN=
# export AWS_REGION=us-west-2
2 changes: 2 additions & 0 deletions record_indexer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ Add the current stage index for a collection to the `rikolti-prd` alias:
python -m record_indexer.move_index_to_prod <collection_id>
```

Note that the index creation code enforces the existence of one stage index at a time. This means we can simply supply the collection ID as input and the process will move the current stage index to production.
barbarahui marked this conversation as resolved.
Show resolved Hide resolved




Expand Down
2 changes: 1 addition & 1 deletion record_indexer/add_page_to_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def build_bulk_request_body(records: list, index: str):
# https://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/
body = ""
for record in records:
doc_id = record.get("calisphere-id")
doc_id = record.get("id")

action = {"create": {"_index": index, "_id": doc_id}}

Expand Down
1 change: 1 addition & 0 deletions record_indexer/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
boto3
opensearch-py
python-dotenv
requests
requests-aws4auth
Expand Down
8 changes: 7 additions & 1 deletion record_indexer/settings.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
import os

from boto3 import Session
from dotenv import load_dotenv
from opensearchpy import AWSV4SignerAuth

load_dotenv()

def get_auth():
credentials = Session().get_credentials()
return AWSV4SignerAuth(credentials, os.environ.get("AWS_REGION"))

ENDPOINT = os.environ.get("RIKOLTI_ES_ENDPOINT")
AUTH = ("rikolti", os.environ.get("RIKOLTI_ES_PASS"))
AUTH = get_auth()

RIKOLTI_HOME = os.environ.get("RIKOLTI_HOME", "/usr/local/airflow/dags/rikolti")
RECORD_INDEX_CONFIG = os.sep.join(
Expand Down
1 change: 1 addition & 0 deletions requirements_dev.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
-r ./metadata_mapper/requirements.txt
-r ./metadata_fetcher/requirements.txt
-r ./record_indexer/requirements.txt
ipython
ruff
isort
Loading