-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add langchain document support #56
Comments
Great!
Tiktoken is our current method for calculating tokens since (unfortunately) semantic processing is OpenAI centric at the moment. I wouldn't worry about lines - they're used internally to assemble nodes. Once the node is created they're no longer needed. Feel free to ask anything else! |
@Filimoa enjoy this simple class that is compatible.
Usage:
Feel free to add to the code base. |
How do I extract tabels and images from a pdf?? |
Description
Love the project,
we need to add a langchain Document interface, which I am more than happy to do it but just a few questions:
What is the embedding field for? Will that be filled eventually with an openai embedding vector?
What are tokens and how they are calculated base on what model? are you using tiktoken?
Within each node you have something called Lines, is that basically the text but split into detected lines?
Cheers.
The text was updated successfully, but these errors were encountered: