Stochastic Gradient Descent (SGD)-based Issue Report Classifier

This repository contains our code and documentation for participation in The NLBSE'23 Tool Competition.

Data Set

We used issue reports data from real open source projects made available by (Kallis et al., [2023] (https://doi.org/10.1007/978-3-031-21388-5_34) for The NLBSE'23 Tool Competition.

Training Data: 1275881

Testing Data: 142320

Steps to run the code

Step-1: Get data

Training Data: https://tickettagger.blob.core.windows.net/datasets/nlbse23-issue-classification-train.csv.tar.gz

Testing Data: https://tickettagger.blob.core.windows.net/datasets/nlbse23-issue-classification-test.csv.tar.gz

Step-2: Install

sklearn and gensim. On Windows, install using the following command: pip install sklearn and pip install gensim .

Step-3: Download

git clone https://github.com/laiqujan/sgd-based-issue-classification.git

cd sgd-based-issue-classification

Step-4: Run

Run sgd-based-issue-classification.ipynb . Then execute all cells in the jupyter notebook and check the results.

Classifier

We implemented an SGDClassifier with the following parameters: SGDClassifier(loss='hinge', penalty='l2',alpha=0.000001, random_state=42,max_iter=20, tol=0.001) Additional hypermeters can be tried; visit for the full list.

Pre-processing

We followed standard preprocessing steps such as data cleaning and vectorization. We performed data cleaning mainly using Gensim, check the def preprocess (text) function. Then we applied TfidfVectorizer.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
SGD-based Issue Classification.ipynb		SGD-based Issue Classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stochastic Gradient Descent (SGD)-based Issue Report Classifier

Data Set

Steps to run the code

Classifier

Pre-processing

About

Releases

Packages

Languages

License

laiqujan/sgd-based-issue-classification

Folders and files

Latest commit

History

Repository files navigation

Stochastic Gradient Descent (SGD)-based Issue Report Classifier

Data Set

Steps to run the code

Classifier

Pre-processing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages