Merge pull request IBM#945 from touma-I/documentation-1.0

Cleanup documentation for 1.0.0
agoyal26 · Jan 21, 2025 · 87844a0 · 87844a0
2 parents c8096b1 + 2f5c691
commit 87844a0
Show file tree

Hide file tree

Showing 7 changed files with 556 additions and 264 deletions.
diff --git a/README.md b/README.md
@@ -55,7 +55,7 @@ Data modalities supported _today_: Code and Natural Language.
 
 ### Fastest way to experience Data Prep Kit
 
-With no setup necessary, let's use a Google Colab friendly notebook to try Data Prep Kit. This is a simple transform to extract content from PDF files: [examples/notebooks/Run_your_first_transform_colab.ipynb](examples/notebooks/Run_your_first_transform_colab.ipynb)  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IBM/data-prep-kit/blob/dev/examples/notebooks/Run_your_first_transform_colab.ipynb). ([Here](doc/google-colab.md) are some tips for running Data Prep Kit transforms on Google Colab. For this simple example, these tips are either already taken care of, or are not needed.)  The same notebook can be downloaded and run on the local machine, without cloning the repo or any other setup. For additional guidance on setting up Jupyter lab, see the Appendix section below. 
+With no setup necessary, let's use a Google Colab friendly notebook to try Data Prep Kit. This is a simple transform to extract content from PDF files: [examples/notebooks/Run_your_first_transform_colab.ipynb](examples/notebooks/Run_your_first_transform_colab.ipynb)  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IBM/data-prep-kit/blob/dev/examples/notebooks/Run_your_first_transform_colab.ipynb). ([Here](doc/google-colab.md) are some tips for running Data Prep Kit transforms on Google Colab. For this simple example, these tips are either already taken care of, or are not needed.)  The same notebook can be downloaded and run on the local machine, without cloning the repo or any other setup. For additional guidance on setting up Jupyter lab, click [here](doc/quick-start/quick-start.md#jupyter). 
 
 ### Install data prep kit from PyPi
 
@@ -71,7 +71,7 @@ When installing select transforms, users can specify the name of the transform i
 ```bash
 pip install 'data-prep-toolkit-transforms[pdf2parquet]'
 ```
-For guidance on creating the virtual environment for installing the data prep kit, refer to the Appendix section below.
+For additional guidance on creating the virtual environment for installing the data prep kit, click [here](doc/quick-start/quick-start.md#conda).
 
 ### Run your first data prep pipeline
 
@@ -173,33 +173,6 @@ When you finish working with the cluster, and want to clean up or destroy it. Se
 
 You can run transforms via docker image or using virtual environments. This [document](doc/quick-start/run-transform-venv.md) shows how to run a transform using virtual environment. You can follow this [document](doc/quick-start/run-transform-image.md) to run using docker image. 
 
-## Appendix
-### Create a Virtual Environment
-
-To run on a local machine, follow these steps to quickly set up and deploy the Data Prep Kit in your virtual Python environment.
-
-```bash
-conda create -n data-prep-kit -y python=3.11
-conda activate data-prep-kit
-python --version
-```
-
-Check if the python version is 3.11. 
-
-If you are using a linux system, install gcc using the below commands, as it will be required to compile and install [fasttext](https://fasttext.cc/) currently used by some of the transforms.
-
-```bash
-conda install gcc_linux-64
-conda install gxx_linux-64
-```
-
-## Setting up Jupyter lab for local experimentation with transform notebooks 
-
-```bash
-pip install jupyterlab ipykernel ipywidgets
-python -m ipykernel install --user --name=data-prep-kit --display-name "dataprepkit"
-```
-
 ## Citations <a name = "citations"></a>
 
 If you use Data Prep Kit in your research, please cite our paper:

diff --git a/doc/mac.md b/doc/mac.md
@@ -21,7 +21,23 @@ machine with an Intel CPU.
 
 ### Memory Considerations
 
-To verify that running transforms through KFP does not leak memory and also get an idea on the required Podman VM memory size configuration, a few tests were devised and run, as summarized [here](memory.md).
+To verify that running transforms through KFP does not leak memory and also get an idea on the required Podman VM memory size configuration, a few tests were devised and run, as summarized below:
+
+#### Memory and Endurance Considerations
+
+A test was devised with a set of 1483 files on a Mac with 32GB memory and 4CPU cores. Traceback library was used to check for memory leak. 
+10 iterations were run and the memory usage was observed, which peaked around 4 GB. There were no obvious signs of a memory leak. 
+
+Another set of tests was done with the 1483 files on a podman VM with different memory configurations. The results are shown below.
+It seems that it needed around 4GB of available memory to run successfully for all 1483 files.
+
+|CPU Cores                       | Total Memory       | Memory Used by Ray             | Transform             | Files Processed Successfully             |
+|------------------------------  |-------------------|------------------|--------------------|------------------------|
+|4             |8GB |4.2GB|                  NOOP  |1483      |
+|4             |6GB |3GB|                NOOP    |910 |
+|4             |4GB |2GB|  NOOP                  |504  | 
+
+
 
 > **Note**: the *current* release does not support building cross-platform images, therefore, please do not build images 
 on the Apple silicon. 
diff --git a/doc/memory.md b/doc/memory.md