Logseq Markdown Tokenizer

This project provides a user-friendly graphical interface to tokenize Markdown files within a selected directory, calculate character counts, and estimate the cost of using specific OpenAI models based on the token count. It is designed specifically for Logseq's pages and journals folders, but can tokenize any folder of markdown files.

Languages Used

Python

Technologies Used

PySide6 for GUI
tiktoken for tokenization

Prerequisites

Python
PySide6
tiktoken library

Installation

Clone the repository:

git clone https://github.com/yourusername/logseq-tokenizer.git

Navigate to the cloned directory:
```
cd logseq-tokenizer
```
Install the required Python packages:
```
pip install -r requirements.txt
```

Usage

Run the application:
```
python main.py
```
Click on 'Select Folder to Tokenize' to choose the directory containing Markdown files.
Enter the desired name for the output CSV file.
Click 'Start' to begin the tokenization process.
Check the generated CSV file for results.

Features

GUI for easy interaction
Tokenization of Markdown files
Calculation of character count
Estimation of cost for using OpenAI models
Output results to a CSV file

Roadmap

Support for additional file formats
Integration with more OpenAI models/use cases beyond text embeddings
Enhanced data visualization in the GUI
Pre-processing content for stopwords before encoding

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logseq Markdown Tokenizer

Table of Contents

Languages Used

Technologies Used

Prerequisites

Installation

Usage

Features

Roadmap

License

About

Releases

Packages

Languages

License

wonyoung-jang/logseq-tokenizer

Folders and files

Latest commit

History

Repository files navigation

Logseq Markdown Tokenizer

Table of Contents

Languages Used

Technologies Used

Prerequisites

Installation

Usage

Features

Roadmap

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages