- Design and refine prompts to achieve maximum accuracy in sentiment analysis
- Implement local inference for SLMs using Python
- Compare the performance and efficiency of two SLMs of two sizes
- Analyze results and draw meaningful insights
The dataset of IMDB movie reviews is accessed through Huggingface, which contains around 50k samples; 40k for 'train' and 10k for 'test' dataset.
-
bartowski/Qwen2.5-1.5B-Instruct-GGUF: A lightweight 1.5B-parameter instruction- tuned model with enhanced capabilities in coding, mathematics, and handling structured data.
-
bartowski/Qwen2.5-0.5B-Instruct-GGUF: A model with 500M parameters, making it three times smaller than its 1.5B counterpart, yet still highly capable for many tasks.
The project is organized as follows:
sentiment_analysis/
├── config.py
├── data_loader.py
├── data/
│ ├── sampled_dataset.csv
├── predictions/
│ ├── this folder contains text file of all predictions from model inference
├── plots/
│ ├─ this folder contains plots for each prompting step
├── Qwen_sentiment_analysis.ipynb
├── requirements.txt
├── README.md
-
config.py: Contains configurations such as directory paths and dataset locations.
-
data_loader.py : Contains code for dataset sampling.
-
data/: Directory containing the dataset sampled and used in the project.
-
predictions/: contains 8 text files consisting of model predictions for each step.
-
plots/: contains 4 plots for average model accuracies across 3 runs per temperature [0,1,2] for each model based on the 4 prompting steps.
-
Qwen_sentiment_analysis.ipynb: The Jupyter notebook containing the analysis and model-building code.
To run this project in a local Jupyter Notebook, follow these steps:
- Clone the repository:
git clone https://github.com/zarakokolagar/sentiment_analysis_LLM cd sentiment_analysis_LLM
Follow these steps to set up a virtual environment and run the Jupyter Notebook locally:
If you don’t have virtualenv
installed, you can install it using:
pip install virtualenv
Create a virtual environment by running the following command:
virtualenv venv
On macOS/Linux:
source venv/bin/activate
With the virtual environment activated, install the dependencies listed in the requirements.txt
file
pip install -r requirements.txt
If you don’t have Jupyter Notebook installed, you can install it with:
pip install notebook
Then, start Jupyter Notebook by running:
jupyter notebook
Once you're done working, deactivate the virtual environment by running:
deactivate
Note:
Alternatively, you could run the notebook on Colab Notebook If you are running the code from Google colab, all required libraries and data handling is included in the colab code.
-
Sample Size: Due to computational constraints, I was only able to test the models on 1000 samples (500 per class). While this provides an initial indication of model performance, predictions may fluctuate when applied to larger datasets. I addressed this to some extent by using a more diverse data sampling strategy to ensure broader coverage of examples.
-
Execution Environment: The model inference was conducted on Google Colab, which is linked in my GitHub repository via the README.md file. The code is available in both the README and the Jupyter notebook in the repository. Colab’s limitations, including personal account restrictions and potential connection issues, led me to save the prediction results after each run to ensure stability. Plotting is also based on these saved predictions, as connection issues in Colab could cause cell outputs to be lost. Please refer to the plots directory for the visualized results.
-
Parameter Tuning: I focused exclusively on experimenting with different temperature settings and did not explore the influence of parameters like top-p and top-k. This decision was made for several reasons:
a. Simplicity and Clarity: By controlling only the temperature, I ensured that the results are easier to interpret. Introducing multiple varying parameters could make it harder to attribute performance changes to any one factor.
b. Task Relevance: The temperature parameter directly impacts the randomness and creativity of model outputs, which is closely related to sentiment prediction tasks. In contrast, top-p and top-k are more suited for tasks requiring diverse content generation.
c. Resource Efficiency: Given the focus on smaller language models, reducing the parameter space helped minimize the computational load and experimentation time.
d.Comparability: Keeping the scope to temperature adjustments allowed for a clearer comparison between models of different sizes without adding complexity from other variables.
-
Prompting Approach: Since the step 4 prompting approach yielded stable and satisfactory results for both models, I opted to stop further prompt experimentation. However, I acknowledge that additional prompting techniques could be explored to ensure robustness in real-world applications. Further tests such as cross-validation across different data domains, error analysis of edge cases, and A/B testing with alternative prompt designs would be necessary to refine the model's reliability for production-level sentiment analysis.