-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 00883b6
Showing
20 changed files
with
3,775 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Ignore Python bytecode | ||
__pycache__/ | ||
*.pyc | ||
*.pyo | ||
|
||
# Ignore Jupyter Notebook checkpoints | ||
.ipynb_checkpoints/ | ||
.ipynb_checkpoints/Qwen_sentiment_analysis.ipynb | ||
|
||
# Ignore virtual environment directories | ||
venv/ | ||
env/ | ||
*.egg-info/ | ||
|
||
# Ignore environment configuration files | ||
*.env | ||
*.ini | ||
*.yaml | ||
|
||
# Ignore local configuration files | ||
*.local | ||
|
||
# Ignore log files | ||
*.log | ||
|
||
# Ignore system files | ||
.DS_Store | ||
Thumbs.db | ||
|
||
# Ignore Jupyter Notebook output files | ||
*.nbconvert.ipynb | ||
|
||
# Ignore other IDE or editor-specific files | ||
.idea/ | ||
.vscode/ | ||
*.sublime-project | ||
*.sublime-workspace |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
# Task: Sentiment Analysis with IMDB Dataset | ||
|
||
## Description: | ||
|
||
### Task Objectives | ||
|
||
1. Design and refine prompts to achieve maximum accuracy in sentiment analysis | ||
2. Implement local inference for SLMs using Python | ||
3. Compare the performance and efficiency of two SLMs of different sizes | ||
4. Analyze results and draw meaningful insights | ||
5. Document the process and findings effectively | ||
|
||
|
||
### Data | ||
|
||
The dataset of IMDB movie reviews is accessed through [Huggingface](https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews), which contains around 50k samples; 40k for 'train' and 10k for 'test' dataset. | ||
|
||
### Models | ||
|
||
1. [bartowski/Qwen2.5-1.5B-Instruct-GGUF](https://huggingface.co/bartowski/Qwen2.5-1.5B-Instruct-GGUF/blob/main/Qwen2.5-1.5B-Instruct-Q5_K_M.gguf): A lightweight 1.5B-parameter instruction- | ||
tuned model with enhanced capabilities in coding, mathematics, and handling structured data. | ||
|
||
|
||
2. [bartowski/Qwen2.5-0.5B-Instruct-GGUF](https://huggingface.co/bartowski/Qwen2.5-0.5B-Instruct-GGUF/blob/main/Qwen2.5-0.5B-Instruct-Q5_K_M.gguf): A model with 500M parameters, making it | ||
three times smaller than its 1.5B counterpart, yet still highly capable for many tasks. | ||
|
||
|
||
### Project Structure | ||
|
||
The project is organized as follows: | ||
|
||
sentiment_analysis/ | ||
├── config.py | ||
├── data_loader.py | ||
├── data/ | ||
│ ├── sampled_dataset.csv | ||
├── predictions/ | ||
│ ├── this folder contains text file of all predictions from model inference | ||
├── plots/ | ||
│ ├─ this folder contains plots for each prompting step | ||
├── Qwen_sentiment_analysis.ipynb | ||
├── requirements.txt | ||
├── README.md | ||
|
||
1. config.py: Contains configurations such as directory paths and dataset locations. | ||
|
||
2. data_loader.py : Contains code for dataset sampling. | ||
|
||
3. data/: Directory containing the dataset sampled and used in the project. | ||
|
||
4. predictions/: contains 8 text files consisting of model predictions for each step. | ||
|
||
5. plots/: contains 4 plots for average model accuracies across 3 runs per temperature [0,1,2] for each model based on the 4 prompting steps. | ||
|
||
6. Qwen_sentiment_analysis.ipynb: The Jupyter notebook containing the analysis and model-building code. | ||
|
||
|
||
### Installation | ||
|
||
To run this project, follow the steps below: | ||
|
||
1. Clone the Project: Download the project as a zip file or clone it. | ||
|
||
2. Set Up the Environment: Install the required Python packages using pip: | ||
|
||
```python | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Usage | ||
|
||
### Setting Up a Virtual Environment and Installing Dependencies | ||
|
||
To run this project in a local Jupyter Notebook, follow these steps: | ||
|
||
1. **Clone the repository**: | ||
```bash | ||
git clone https://github.com/zarakokolagar/sentiment_analysis_LLM | ||
cd sentiment_analysis_LLM | ||
|
||
## Setting Up a Virtual Environment and Running the Project | ||
|
||
Follow these steps to set up a virtual environment and run the Jupyter Notebook locally: | ||
|
||
#### 1. Install `virtualenv` (if not already installed) | ||
|
||
If you don’t have `virtualenv` installed, you can install it using: | ||
|
||
```bash | ||
pip install virtualenv | ||
``` | ||
|
||
#### 2. Create a virtual environment | ||
Create a virtual environment by running the following command: | ||
```bash | ||
virtualenv venv | ||
``` | ||
#### 3. Activate the virtual environment | ||
On macOS/Linux: | ||
|
||
```bash | ||
source venv/bin/activate | ||
``` | ||
#### 4. Install the required dependencies | ||
With the virtual environment activated, install the dependencies listed in the `requirements.txt` file | ||
|
||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
#### 5. Run Jupyter Notebook | ||
If you don’t have Jupyter Notebook installed, you can install it with: | ||
```bash | ||
pip install notebook | ||
``` | ||
Then, start Jupyter Notebook by running: | ||
|
||
```bash | ||
jupyter notebook | ||
``` | ||
|
||
#### 6. Deactivate the virtual environment | ||
Once you're done working, deactivate the virtual environment by running: | ||
```bash | ||
deactivate | ||
``` | ||
**Note:** | ||
Alternatively, you could run the notebook on [Colab Notebook](https://colab.research.google.com/drive/1UWiUKyRz0HgGtB_frLV6Kl0MMb7J25Kg#scrollTo=ihomXxB4Svsk&uniqifier=1) | ||
If you are running the code from Google colab, all required libraries and data handling is included in the colab code. | ||
## License | ||
Not required |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
import os | ||
|
||
# Define the base directory for the project | ||
BASE_DIR = os.path.dirname(os.path.abspath(__file__)) | ||
|
||
# Define the data directory | ||
DATA_DIR = os.path.join(BASE_DIR, 'data') | ||
|
||
#Define predictions directory | ||
PRED_DIR = os.path.join(BASE_DIR, 'predictions') | ||
|
||
#define plot directory for predictions | ||
PLOT_DIR = os.path.join(BASE_DIR, 'plots') | ||
os.makedirs(PLOT_DIR, exist_ok=True) | ||
|
||
SAMPLED_DATA_PATH = os.path.join(DATA_DIR,'sampled_dataset.tsv') |
Oops, something went wrong.