This project aims to perform data research using the Kaggle dataset titled "Heart Attack Analysis & Prediction Dataset." The primary objective is to start with an initial exploration of the dataset and then proceed with classification tasks. The goal is to determine, based on the data listed in the dataframe, whether an individual is likely to have a heart attack or not.
To run this project, you'll need the following Python libraries:
- Pandas
- Numpy
- Matplotlib
- Seaborn
- scikit-learn
The dataset contains the following columns:
- Age: Age of the patient
- Sex: Sex of the patient
- cp: Chest pain type (Values: 1 - typical angina, 2 - atypical angina, 3 - non-anginal pain, 4 - asymptomatic)
- trtbps: Resting blood pressure (in mm Hg)
- chol: Cholesterol in mg/dL fetched via BMI sensor
- fbs: Fasting blood sugar > 120 mg/dL (1 for true, 0 for false)
- restecg: Resting electrocardiographic results (Values: 0 - normal, 1 - having ST-T wave abnormality, 2 - showing probable or definite left ventricular hypertrophy)
- thalach: Maximum heart rate achieved
- exng: Exercise-induced angina (1 for yes, 0 for no)
- old peak: ST depression induced by exercise relative to rest
- slp: The slope of the peak exercise ST segment (0 - unsloping, 1 - flat, 2 - downsloping)
- caa: Number of major vessels (0-3)
- thall: Thalassemia
- target: Heart attack prediction (0 for less chance of heart disease, 1 for more chance of heart attack)
- Objectives
- Setup
- Data Description
- Tools and Functions
- Exploratory Data Analysis and Feature Engineering
- Data Preprocessing
- Model Evaluation
- Hyper Parameter Tuning
- Final Model
- Conclusions