Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider incorporating a tutorial which walks the user through using the example data #11

Open
wincowgerDEV opened this issue Mar 1, 2022 · 12 comments

Comments

@wincowgerDEV
Copy link

I have gotten to step 2 in the usage guidelines. At this point the guidelines become all about calibrating configuration files but the user is not familiar with the basics of how any of these applications work or what the workflow experience will be like and as a non-expert in machine learning I find it challenging still to see how everything is going to fold together. I would recommend at this point recommending that the user walk through the example usage on the main page https://github.com/U-Alberta/ADaPT-ML. However, when I started to walk through it starting with Step 2: create a gold dataset using Label Studio, where the first script is, the code does not make my project in label studio reflect what is presented in the example usage. This may require some reorganization of the example files and configuration files so that the user can rapidly get the example working.

@nulberry
Copy link
Collaborator

nulberry commented Mar 3, 2022

I've added two things to the documentation in a6ca9df:

  • a note about following the example use case after reviewing system requirements
  • specific (but brief) instructions on how the example project was set up using the Label Studio UI. Following the UI is the easiest, most user-friendly way to init the project IMO.

@wincowgerDEV
Copy link
Author

wincowgerDEV commented Mar 3, 2022 via email

@wincowgerDEV
Copy link
Author

I like that you have the note about following the example use case mentioned now. When I go to the example use case and start with Step 1, I can't find a clear path for following along with the tutorial. For example, where is the data stored that I am supposed to import to CrateDB? How do I get it into CrateDB? I tried to skip ahead to step 2 but ran into similar questions. I ran this command "docker exec label-studio-dev python ./ls/sample_tasks.py example_data txt 30 example --filename example_tasks.json" and read that the data should be in $LS_TASKS_PATH but I do not see that populated on my computers directories, perhaps it is in the docker container? If so, how would one access it to load it into the label studio?

@wincowgerDEV
Copy link
Author

BTW thanks for adding in all this newbie stuff for newbies like me, I am learning a lot and could likely see myself using this tool in the future.

@wincowgerDEV
Copy link
Author

I think I may have found the example data from step 2, is this right? If so, you might want to change $LS_TASKS_PATH to example_data\ls\tasks or maybe specify where the $LS_TASKS_PATH command is supposed to be sent if that is what it is.
image

@nulberry
Copy link
Collaborator

nulberry commented Mar 8, 2022

In c07ec84, I've added a note in the Example Use Case section for users who, in addition to reading over the steps, would like to follow along on their machine like you are doing. It clarifies that the file paths specified by environment variables like $LS_TASKS_PATH are in the .env file, so that they can reference where things are happening.

The example data stored in CrateDB is in ./crate relative to the repo root. I found writing Step 1 for the Example Use Case a bit tricky because pretty much everything in this step has already happened behind the scenes. ADaPT-ML provides CrateDB, but there are no scripts or programs that take a user's data and create a table in CrateDB for them. I did not make this a part of ADaPT-ML because I was unsure of how to handle the many possibilities for data types, file formats, etc., with one program. All of this to say, do you think that it would be good for me to mention somehow that there isn't a way to "follow along" with Step 1 besides accessing the CrateDB UI and looking at the example data table?

There's also the option for me to describe the way I loaded the data into CrateDB as a suggestion within the usage guidelines, just so that users have something to work off of when they are deciding how they want to featurize and load their data into CrateDB. Right now, this is what I have under Step 6 of the usage guidelines:

Then it's ready! Import your data into a table in CrateDB and refer to the Example Usage for an example of how to manipulate the data so that it's ready for ADaPT-ML. How you load the data, featurize it, and sample from it to create your unlabeled training data is up to you -- ADaPT-ML does not perform these tasks. However, there may be an opportunity for certain sampling methods to become a part of the system; see Contributing.

And yes that's right, $LS_TASKS_PATH corresponds to ./example_data/ls/tasks relative to the root of the repo. All files created within the Docker containers are on shared volumes with the host machine so that nothing gets lost if the containers are stopped. It took me a while to wrap my head around Docker 😆, but it certainly saves a lot of time and headaches down the road for large systems like ADaPT-ML!

@wincowgerDEV
Copy link
Author

wincowgerDEV commented Mar 9, 2022 via email

@nulberry
Copy link
Collaborator

Thank you for your suggestions. To give a general idea of how users can import their data, I have added the Python script I used to import the example data, with some inline comments to give a bit of detail on LF and ML featurization. It is example_data/example_data_import.py. It is not intended to be run as that would require setting up a virtual environment with extra dependencies and it is outside the current scope of ADaPT-ML, but I linked to it in the parts of the docs you've quoted so it can be read over as users are following along. I didn't edit the gitignore properly on the first go, so you will find these changes in 34cc974 and 02d87a2. Thanks again for all of your help in improving my understanding of users' mental models.

@wincowgerDEV
Copy link
Author

wincowgerDEV commented Mar 11, 2022 via email

@nulberry
Copy link
Collaborator

I can't think of anything specific that needs attention. I am still trying to get the proper Docker daemon running on the GitHub hosted Windows runner, but the runner seems to me to be very unlike any setup that the average user would have, so I think that this work can carry on separately as it's more of an issue with trying to customize the default Docker installation on the runner than an issue with the system functionality.

@wincowgerDEV
Copy link
Author

wincowgerDEV commented Mar 14, 2022 via email

@nulberry
Copy link
Collaborator

Thank you so much, that means a lot! I am grateful for all of your contributions to this project 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants