Docsite search: Set update mechanism #173

hanna-paasivirta · 2025-02-17T13:49:32Z

Once a basic docsite RAG service is done, design a mechanism to update the embeddings at appropriate intervals. Ideally, these would update only when the docs are updated, and only the parts that are updated would be regenerated, but this may be impractical.

josephjclark · 2025-02-17T16:20:30Z

Yes - I think the solution here is that we do something like POST services/embeddings/docs, with credential/database data in the payload, which triggers the server to run the docs loader and update the database.

We can then trigger this from a github action whenever docs are merged to main.

I still want to think about what happens while the update is running. Do we drop the prod vector DB to update it? How long does the update take? How many RAG requests will fail because the database is updating?

hanna-paasivirta · 2025-03-06T16:29:37Z

More ideas from Joe here: #176 (comment)

I'm not sure how to increment parts of an online table though because it's chunked roughly by character length, and different types of changes might get complicated. It's a small dataset so regenerating should be ok.

hanna-paasivirta self-assigned this Feb 17, 2025

hanna-paasivirta mentioned this issue Mar 6, 2025

Docsite rag #176

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docsite search: Set update mechanism #173

Docsite search: Set update mechanism #173

hanna-paasivirta commented Feb 17, 2025

josephjclark commented Feb 17, 2025

hanna-paasivirta commented Mar 6, 2025

Docsite search: Set update mechanism #173

Docsite search: Set update mechanism #173

Comments

hanna-paasivirta commented Feb 17, 2025

josephjclark commented Feb 17, 2025

hanna-paasivirta commented Mar 6, 2025