Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
zahrakolagar committed Oct 10, 2024
0 parents commit 00883b6
Show file tree
Hide file tree
Showing 20 changed files with 3,775 additions and 0 deletions.
37 changes: 37 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Ignore Python bytecode
__pycache__/
*.pyc
*.pyo

# Ignore Jupyter Notebook checkpoints
.ipynb_checkpoints/
.ipynb_checkpoints/Qwen_sentiment_analysis.ipynb

# Ignore virtual environment directories
venv/
env/
*.egg-info/

# Ignore environment configuration files
*.env
*.ini
*.yaml

# Ignore local configuration files
*.local

# Ignore log files
*.log

# Ignore system files
.DS_Store
Thumbs.db

# Ignore Jupyter Notebook output files
*.nbconvert.ipynb

# Ignore other IDE or editor-specific files
.idea/
.vscode/
*.sublime-project
*.sublime-workspace
1,502 changes: 1,502 additions & 0 deletions Qwen_sentiment_analysis.ipynb

Large diffs are not rendered by default.

138 changes: 138 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Task: Sentiment Analysis with IMDB Dataset

## Description:

### Task Objectives

1. Design and refine prompts to achieve maximum accuracy in sentiment analysis
2. Implement local inference for SLMs using Python
3. Compare the performance and efficiency of two SLMs of different sizes
4. Analyze results and draw meaningful insights
5. Document the process and findings effectively


### Data

The dataset of IMDB movie reviews is accessed through [Huggingface](https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews), which contains around 50k samples; 40k for 'train' and 10k for 'test' dataset.

### Models

1. [bartowski/Qwen2.5-1.5B-Instruct-GGUF](https://huggingface.co/bartowski/Qwen2.5-1.5B-Instruct-GGUF/blob/main/Qwen2.5-1.5B-Instruct-Q5_K_M.gguf): A lightweight 1.5B-parameter instruction-
tuned model with enhanced capabilities in coding, mathematics, and handling structured data.


2. [bartowski/Qwen2.5-0.5B-Instruct-GGUF](https://huggingface.co/bartowski/Qwen2.5-0.5B-Instruct-GGUF/blob/main/Qwen2.5-0.5B-Instruct-Q5_K_M.gguf): A model with 500M parameters, making it
three times smaller than its 1.5B counterpart, yet still highly capable for many tasks.


### Project Structure

The project is organized as follows:

sentiment_analysis/
├── config.py
├── data_loader.py
├── data/
│ ├── sampled_dataset.csv
├── predictions/
│ ├── this folder contains text file of all predictions from model inference
├── plots/
│ ├─ this folder contains plots for each prompting step
├── Qwen_sentiment_analysis.ipynb
├── requirements.txt
├── README.md

1. config.py: Contains configurations such as directory paths and dataset locations.

2. data_loader.py : Contains code for dataset sampling.

3. data/: Directory containing the dataset sampled and used in the project.

4. predictions/: contains 8 text files consisting of model predictions for each step.

5. plots/: contains 4 plots for average model accuracies across 3 runs per temperature [0,1,2] for each model based on the 4 prompting steps.

6. Qwen_sentiment_analysis.ipynb: The Jupyter notebook containing the analysis and model-building code.


### Installation

To run this project, follow the steps below:

1. Clone the Project: Download the project as a zip file or clone it.

2. Set Up the Environment: Install the required Python packages using pip:

```python
pip install -r requirements.txt
```

### Usage

### Setting Up a Virtual Environment and Installing Dependencies

To run this project in a local Jupyter Notebook, follow these steps:

1. **Clone the repository**:
```bash
git clone https://github.com/zarakokolagar/sentiment_analysis_LLM
cd sentiment_analysis_LLM

## Setting Up a Virtual Environment and Running the Project

Follow these steps to set up a virtual environment and run the Jupyter Notebook locally:

#### 1. Install `virtualenv` (if not already installed)

If you don’t have `virtualenv` installed, you can install it using:

```bash
pip install virtualenv
```

#### 2. Create a virtual environment
Create a virtual environment by running the following command:
```bash
virtualenv venv
```
#### 3. Activate the virtual environment
On macOS/Linux:

```bash
source venv/bin/activate
```
#### 4. Install the required dependencies
With the virtual environment activated, install the dependencies listed in the `requirements.txt` file

```bash
pip install -r requirements.txt
```

#### 5. Run Jupyter Notebook
If you don’t have Jupyter Notebook installed, you can install it with:
```bash
pip install notebook
```
Then, start Jupyter Notebook by running:

```bash
jupyter notebook
```

#### 6. Deactivate the virtual environment
Once you're done working, deactivate the virtual environment by running:
```bash
deactivate
```
**Note:**
Alternatively, you could run the notebook on [Colab Notebook](https://colab.research.google.com/drive/1UWiUKyRz0HgGtB_frLV6Kl0MMb7J25Kg#scrollTo=ihomXxB4Svsk&uniqifier=1)
If you are running the code from Google colab, all required libraries and data handling is included in the colab code.
## License
Not required
16 changes: 16 additions & 0 deletions config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import os

# Define the base directory for the project
BASE_DIR = os.path.dirname(os.path.abspath(__file__))

# Define the data directory
DATA_DIR = os.path.join(BASE_DIR, 'data')

#Define predictions directory
PRED_DIR = os.path.join(BASE_DIR, 'predictions')

#define plot directory for predictions
PLOT_DIR = os.path.join(BASE_DIR, 'plots')
os.makedirs(PLOT_DIR, exist_ok=True)

SAMPLED_DATA_PATH = os.path.join(DATA_DIR,'sampled_dataset.tsv')
Loading

0 comments on commit 00883b6

Please sign in to comment.