- Static library linking on mobile (fixes iOS signing) (PR: #289)
- Fix support for extras (flash attention, iQ quants) (PR: #292)
- iOS deployment (PR: #267)
- Improve building process (PR: #282)
- Add structured output / function calling sample (PR: #281)
- Update LlamaLib to v1.2.0 (llama.cpp b4218) (PR: #283)
- Clear temp build directory before building (PR: #278)
- Remove support for extras (flash attention, iQ quants) (PR: #284)
- remove support for LLM base prompt (PR: #285)
- Implement Retrieval Augmented Generation (RAG) in LLMUnity (PR: #246)
- Fixed build conflict, endless import of resources. (PR: #266)
- Add Phi-3.5 and Llama 3.2 models (PR: #255)
- Speedup LLMCharacter warmup (PR: #257)
- Fix handling of incomplete requests (PR: #251)
- Fix Unity locking of DLLs during cross-platform build (PR: #252)
- Allow spaces in lora paths (PR: #254)
- Set default context size to 8192 and allow to adjust with a UI slider (PR: #258)
- LlamaLib v1.1.12: SSL certificate & API key for server, Support more AMD GPUs (PR: #241)
- Server security with API key and SSL (PR: #238)
- Show server command for easier deployment (PR #239)
- Fix multiple LLM crash on Windows (PR: #242)
- Exclude system prompt from saving of chat history (PR: #240)
- Allow to set the LLMCharacter slot (PR: #231)
- fix adding grammar from StreamingAssets (PR: #229)
- fix library setup restart when interrupted (PR: #232)
- Remove unnecessary Android linking in IL2CPP builds (PR: #233)
- Fix naming showing full path when loading model (PR: #224)
- Fix parallel prompts (PR: #226)
- Implement embedding and lora adapter functionality (PR: #210)
- Read context length and warn if it is very large (PR: #211)
- Setup allowing to use extra features: flash attention and IQ quants (PR: #216)
- Allow HTTP request retries for remote server (PR: #217)
- Allow to set lora weights at startup, add unit test (PR: #219)
- allow relative StreamingAssets paths for models (PR: #221)
- Fix set template for remote setup (PR: #208)
- fix crash when stopping scene before LLM creation (PR: #214)
- Documentation/point to gguf format for lora (PR: #215)
- Resolve build directory creation
- Android deployment (PR: #194)
- Allow to download models on startup with resumable download functionality (PR: #196)
- LLM model manager (PR: #196)
- Add Llama 3 7B and Qwen2 0.5B models (PR: #198)
- Start LLM always asynchronously (PR: #199)
- Add contributing guidelines (PR: #201)
- Add LLM selector in Inspector mode (PR: #182)
- Allow to save chat history at custom path (PR: #179)
- Use asynchronous startup by default (PR: #186)
- Assign LLM if not set according to the scene and hierarchy (PR: #187)
- Allow to set log level (PR: #189)
- Allow to add callback functions for error messages (PR: #190)
- Allow to set a LLM base prompt for all LLMCharacter objects (PR: #192)
- set higher priority for mac build with Accelerate than without (PR: #180)
- Fix duplicate bos warning
- Fix bugs in chat completion (PR: #176)
- Call DontDestroyOnLoad on root to remove warning (PR: #174)
- Implement backend with DLLs (PR: #163)
- Separate LLM from LLMClient functionality (PR: #163)
- Add sample with RAG and LLM integration (PR: #170)
- disable GPU compilation when running on CPU (PR: #159)
- Switch to llamafile v0.8.6 (PR: #155)
- Add phi-3 support (PR: #156)
- Add Llama 3 and Vicuna chat templates (PR: #145)
- Use the context size of the model by default for longer history (PR: #147)
- Add documentation (PR: #135)
- Add server security for interceptions from external llamafile servers (PR: #132)
- Adapt server security for macOS (PR: #137)
- Add sample to demonstrates the async functionality (PR: #136)
- Add to chat history only if the response is not null (PR: #123)
- Allow SetTemplate function in Runtime (PR: #129)
- Use llamafile v0.6.2 (PR: #111)
- Pure text completion functionality (PR: #115)
- Allow change of roles after starting the interaction (PR: #120)
- use Debug.LogError instead of Exception for more verbosity (PR: #113)
- Trim chat responses (PR: #118)
- Fallback to CPU for macOS with unsupported GPU (PR: #119)
- Removed duplicate EditorGUI.EndChangeCheck() (PR: #110)
- Provide access to LLMUnity version (PR: #117)
- Rename to "LLM for Unity" (PR: #121)
- Fix async server 2 (PR: #108)
- use namespaces in all classes (PR: #104)
- await separately in StartServer (PR: #107)
- Kill server after Unity crash (PR: #101)
- Persist chat template on remote servers (PR: #103)
- LLM server unit tests (PR: #90)
- Implement chat templates (PR: #92)
- Stop chat functionality (PR: #95)
- Keep only the llamafile binary (PR: #97)
- Fix remote server functionality (PR: #96)
- Fix Max issue needing to run llamafile manually the first time (PR: #98)
- Async startup support (PR: #89)
- Refactoring and small enhancements (PR: #80)
- Fix Mac command spaces (PR: #71)
- Expose new llama.cpp arguments (PR: #60)
- Allow to change prompt (PR: #64)
- Feature/variable sliders (PR: #65)
- Feature/show expert options (PR: #66)
- Improve package loading (PR: #67)
- Fail if port is already in use (PR: #62)
- Run server without mmap on mmap crash (PR: #63)
- Fix download function (PR: #51)
- Added how settings impact generation to the readme (PR: #49)
- fix slash in windows paths (PR: #42)
- Fix chmod when deploying from windows (PR: #43)
- Code auto-formatting (PR: #26)
- Setup auto-formatting precommit (PR: #31)
- Start server on Awake instead of OnEnable (PR: #28)
- AMD support, switch to llamafile 0.6 (PR: #33)
- Release workflows (PR: #35)
- Support Unity 2021 LTS (PR: #32)
- Fix macOS command (PR: #34)
- Release fixes and readme (PR: #36)
- Fix running commands for projects with space in path
- closes #8
- closes #9
- Fix sample scenes for different screen resolutions
- closes #10
- Allow parallel prompts