v2.2.0 #162

clusterzx · 2025-01-16T09:46:01Z

clusterzx
Jan 16, 2025
Maintainer

What's Changed

Fix chat in firefox. submit form is deprecated by @Nikorag in Fix chat in firefox. submit form is deprecated #151
Addressed: https://github.com/OLLAMA Memory Usage Spike During Manual AI Analysis #157#issuecomment-2593746955
Addressed: https://github.com/Ollama model support #142
Regarding the Ollama Service and Context size, I implemented a dynamic way of using num_ctx parameter. Now there is a "maximum" context size of 100.000 Tokens. Thats huge and most systems won't handle that. The solution comes in handy and checks how much tokens are needed for the prompt (based on Q4_0 Quantization Size), it adds a buffer for the response of 1024 tokens (thats more enough in my eyes for the JSON response) and transfers the new calculated num_ctx to the Ollama API call.
You now have the ability to view the full prompt + response log in /app/logs/prompt.txt

New Contributors

@Nikorag made their first contribution in Fix chat in firefox. submit form is deprecated #151

Full Changelog: v2.1.9...v2.2.0

This discussion was created from the release v2.2.0.

mamema · 2025-01-16T18:40:38Z

mamema
Jan 16, 2025

i have this without using the "use existing correspondent and tag" setting to yes

time=2025-01-16T18:34:47.868Z level=WARN source=runner.go:129 msg="truncating input prompt" limit=14514 prompt=19579 keep=5 new=14514

my (modified) prompt doesn't much differ from the example one. Is the tag prompt limit new?

4 replies

clusterzx Jan 16, 2025
Maintainer Author

Then the document is too big I gues??

mamema Jan 16, 2025

hmm, look at the numbers, isn't this quite small?

clusterzx Jan 16, 2025
Maintainer Author

I don't know what your Ollama instance is doing there. I can only guess 😅

mamema Jan 18, 2025

openai is telling me this (asking her for help with ollama problem in question)

Break the Input into Chunks
If you need to process large documents, split them into smaller chunks and process each chunk separately.

Example Python code for splitting a large document:
`def split_into_chunks(text, max_tokens):
words = text.split()
chunks = []
while words:
chunk = " ".join(words[:max_tokens])
chunks.append(chunk)
words = words[max_tokens:]
return chunks

Example usage

document = "..." # Your long document here
chunks = split_into_chunks(document, max_tokens=500) # Adjust max_tokens as needed

mamema · 2025-01-16T20:58:51Z

mamema
Jan 16, 2025

what should she doing? :-) are there special parameters i should consider?

Nothing special here:
ollama: image: ollama/ollama:latest volumes: - ollama:/root/.ollama ports: - 11434:11434 restart: unless-stopped volumes: ollama: {}

4 replies

clusterzx Jan 16, 2025
Maintainer Author

I really don't know 😅 that's why I said I could only guess.

That's a part that I can not influence. Is this happening with only that document? And how long are your documents?

mamema Jan 18, 2025

thats difficult to say, as this error comes from ollama where i do not see which document currently is in the works

mamema Jan 18, 2025

i've investigated it further all documrnts which are at the end not tagged are larger documents, eg 25 pages, 23 pagres and 198 pages.
I can understand that such large documents are may give problems.
But there is an easy fix
Pseudo Code:
"If document is larger than x pages feed only y pages into AI"

x is the value which needs to be defined as failsafe... duno 10 pages?
y 2-3 pages

..because it is not needed to analyze 25, 23, or even 198 pages. In every document i came across the interesting information is on the first 2-3 pages.
This would solve the "too big to analyze" problem AND nevertheless gives good analyzing results.....

mamema Jan 19, 2025

As i guess, the documents are not fed page wise to the AI instead converted to clear text, i've asked openai how many chars fit on one page with 11 point font size. He says 4000-4500 chars. So this could be a good value to start from.

reneil1337 · 2025-02-04T11:25:27Z

reneil1337
Feb 4, 2025

The issue with this is that it reloads the ollama model into VRAM for every document as the ctx parameter is changing with each api request. I'd love to be able to set a fixed num_ctx in the setup to prevent this from happening. It makes paperless-ai unusable for me as all the other applications that are using my ollama instance are losing the connection to the model.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.2.0 #162

{{title}}

Replies: 3 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

v2.2.0 #162

clusterzx Jan 16, 2025 Maintainer

What's Changed

New Contributors

Replies: 3 comments · 8 replies

clusterzx Jan 16, 2025 Maintainer Author

clusterzx Jan 16, 2025 Maintainer Author

Example usage

clusterzx Jan 16, 2025 Maintainer Author

clusterzx
Jan 16, 2025
Maintainer

Replies: 3 comments 8 replies

clusterzx Jan 16, 2025
Maintainer Author

clusterzx Jan 16, 2025
Maintainer Author

clusterzx Jan 16, 2025
Maintainer Author