-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add doc for dedicated local storage in ilab cli #104
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Oleg S <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally happy with the direction here.
I'd like to hear a little more around the proposed unique refs -- how would you generate them? where would they be stored? How would they be mapped to a file (or files) on disk?
Are you intentionally leaving it vague and just "TBD based on whatever seems easy to implement" later? or do you have anything more concrete in mind?
There's also some linter complaints, but I'm sure you saw those already.
|
||
For tools to be useful to the user, they generally need to work on and know about data pertinent to the user. For example, podman and docker track image blobs, kubectl tracks cluster credentials, etc. | ||
|
||
The InstructLab CLI manages a number of settings today such as the teacher model, taxonomy location, inference model, etc. using a local file known as `config.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLI itself doesn't manage any of this. It's user config. It generates an initial file for you, but never updates it from there. I only bring this up because user config isn't the same as data that is never touched directly by a user. (config vs data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, I would say it makes a stronger point for the proposed structure of the ~/.local/share
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be so @RobotSail but as @russellb mentioned the CLI doesn't manage this.
docs/cli/ilab-local-config.md
Outdated
|
||
We propose the following: | ||
|
||
1. `ilab` will have a dedicated config directory seated at `~/.config/ilab` on Linux/MacOS systems, and the `%APPDATA%` equivalent if we decide to support Windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably this works on windows with WSL? I know people have been using it on WSL, already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's WSL then we should still assume it's a linux system. The only WSL-specific bits are more on the CUDA drivers side. And even then, that works fine currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more of "raw windows", but you're right - WSL is a great alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RobotSail Are qwe going to support native Windows? if not, we should be explicit that it is WSL.
docs/cli/ilab-local-config.md
Outdated
|
||
The steps outlined to implement this are: | ||
|
||
1. Implement logic for `~/.config/ilab` and `~/.local/share/ilab` directories and expect the config to live and be read from there unless another config is manually specified. For simplicity, if the is read in from `~/.config/ilab` then we assume the relative paths should be realtive to the data directory `~/.local/share/ilab`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A way this can work is have a data_dir that can be changed via a CLI flag or config file option, but always default to what you have specified here -- very similar to what you describe as a default location for the config dir unless otherwise specified via a flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree with that.
I was thinking that we could just use a simple UUID or SHA2 digest. For the initial version we could have the refs just be the relative paths in the |
Signed-off-by: Oleg S <[email protected]>
|
||
We propose the following: | ||
|
||
1. `ilab` will have a dedicated config directory seated at `~/.config/instructlab` on Linux/MacOS systems, and the `%APPDATA%` equivalent if Windows support is added in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your intent to support XDG?:
https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html#introduction
Not arguing that you should necessarily but if you intended to I would change the text to be seated at $XDG_CONFIG_HOME/instructlab (.local/share/instructlab) (And XDG_DATA_HOME later on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I was thinking of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly recommend to use https://pypi.org/project/platformdirs/ . It has the correct defaults for all important platforms.
I also like to point out that the use of XDG directories does not work well with systemd services. @mairin just mentioned that she wants systemd service support for InstructLab. XDG directories are problematic, because they are designed for human end-users with a home directory. systemd --system
need an entirely different directory structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiran Thanks for pointing this out! I had initially suggested XDG thinking of the laptop-targeted experience. But if we want a systemd service we'll need to use system directories. Ideally we'd use XDG for the laptop-target user experience and system-wide directories for the big mama GPU hw experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ to this end, @tiran suggests I would store the paths in config.yaml and have a option to either use XDG or a common base directory for all paths.
docs/cli/ilab-local-config.md
Outdated
The output would look something like | ||
|
||
|
||
## Evolving configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a distinct proposal, was there a reason you included it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is definitely distinct, but will be something we'll need as a feature in the future.
We propose the following: | ||
|
||
1. `ilab` will have a dedicated config directory seated at `~/.config/instructlab` on Linux/MacOS systems, and the `%APPDATA%` equivalent if Windows support is added in the future. | ||
2. `ilab` will have a dedicated program files directory seated at `~/.local/share/instructlab` which will store model checkpoints, downloaded models, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To carry through on the Windows spec I would say %LOCALAPPDATA% here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification!
|
||
Based on these steps, #1 would be the trickiest because we assume that the paths referenced in the config file are relative to the calling process. | ||
|
||
One approach here could be removing these paths from the default and forcing the user to specify them relatively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having trouble understanding what you mean on this sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n1hility My wording here is bad but the idea is simple:
- When the config read is the default one
~/.config/ilab/config.yaml
then parse paths relative to~/.local/share/ilab
. - When that's NOT the config, we would parse relative paths with respect to the calling process. E.g. if
ilab
is invoked in/foo
and theconfig.model_path
references a file inbar/granite_7b.gguf
then it would resolve to/foo/bar/granite_7b.gguf
. - All absolute paths will be parsed as-is
|
||
One approach here could be removing these paths from the default and forcing the user to specify them relatively. | ||
|
||
Another approach would be implementing the list commands at the same time as #1 and setting it up so that the relative paths passed to `ilab` are relative ***to the `~/.local/share/instructlab` directory*** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any compatibility with existing configs? For example it looks in the current dir and if there is a config.yml expects abs paths, or is this an intentional major change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the experience be like without #3 (implementing refs) in your list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would list tell you were a file was in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a breaking change then but we'd need to make it. We can maintain compatibility by allowing users to provide a --config
flag which should read in the config as we do today. Not sure how much harder this would be versus what we're doing today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I really like this proposal, but could use some clarifications on a few points made.
Signed-off-by: Oleg S <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall this is a good proposal and direction. Thanks for pushing this @RobotSail.
Some additional points:
- As raised by @n1hility, I think supporting the XDG Base Directory Specification is a good idea. Do you mind adding this to the proposal? You could then mention support for the environment variables as per the XDG Base Directory Specification:
$XDG_CONFIG_HOME
and$XDG_DATA_HOME
. - It was mentioned about capability to change the data and config paths. This could be implemented by providing environment variables like
$INSTRUCTLAB_PATH_CONFIG
and$INSTRUCTLAB_PATH_CONFIG
|
||
The steps outlined to implement this are: | ||
|
||
1. Implement logic for the `~/.config/instructlab` and `~/.local/share/instructlab` directories, and expect the config to live and be read from there unless another config is manually specified. For simplicity, if the config we read in is located at `~/.config/instructlab` - the default location - then we'll assume relative paths should be resolved with respect to the data directory `~/.local/share/instructlab`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be that relative paths are resolved with respect to the config directory?
@RobotSail Would you be willing to update this doc based on feedback and current state? Would be good to get it merged given the state of the code has already moved past the state of this doc. |
@danmcp Yes we can update this one to reflect what currently exists in ilab |
old review, code moved on independent of this doc
Thank you! |
@RobotSail are you still planning on updating this? |
This document outlines an approach for config and program data management that simplifies
the user flow and creates a streamlined experience.
Signed-off-by: Oleg S [email protected]