Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of Jupyter notebooks
- Email: [email protected]
- LinkedIn: linkedin.com/in/luis-okech
Financial Inclusion: A model to predict which individuals are most likely to have or use a bank account. The models and solutions developed can provide an indication of the state of financial inclusion in Kenya, Rwanda, Tanzania and Uganda, while providing insights into some of the key factors driving individuals’ financial security.
Which Debts are Worth the Bank's Effort: Use statistical analysis to to develop a model or strategy that helps the bank prioritize debt recovery efforts by predicting the likelihood of successful recovery and the potential recovery value. The model should assist in identifying which debts are worth pursuing based on factors like the expected vs. actual recovery amount, the applied recovery strategy, and demographic variables (age, sex).
Car Price Prediction: Goal is to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with the independent variables. They can accordingly manipulate the design of the cars, the business strategy etc. to meet certain price levels
Urban Air Pollution Challenge: creating a model to predict air quality in cities around the world using satellite data. The goal is to use this information to predict PM2.5 particulate matter concentration (a common measure of air quality that normally requires ground-based sensors to measure) every day for each city. The data covers the last three months, spanning hundreds of cities across the globe.
Tools: scikit-learn, Pandas, Seaborn, Matplotlib
Airline Tweets Sentiment Analysis: Creating a model to classify tweets into categories such as positive, negative, and neutral to understand how customers feel about their experiences with different airlines.
Covid-19 tweet classification: Creating a model for 2-way polarity (positive, negative) classification system for Covid-19 tweets
chatgpt-ml-challenge-swahili-news-classification: Create a model to develop a multi-class classification model to classify news content according to six specific categories.The model can be used by Swahili online platforms to automatically group news according to their categories and help readers find the specific news they want to read. In addition, the model will contribute to a body of work ensuring that Swahili is represented in apps and other online products in future.
Tools: NLTK, scikit
Sea Turtle Face Detection: Create a machine-learning model that can take in an image of a sea turtle and output the position of a bounding box around that all-important scale pattern. This will be used to develop a tool that can crop a given image to show only the important facial region, reducing the chances of an accidental match down the line.
Spot the mask: Create an image classification machine learning model to accurately predict the likelihood that an image contains a person wearing a face mask, or not. The machine learning solution will help policymakers, law enforcement, hospitals, and even commercial businesses ensure that masks are being worn appropriately in public. These solutions can help in the battle to reduce community transmission of COVID-19.
Road Segment Identification: Create an image classification machine learning model to identify whether an image contains a road segment or not. Dry river beds, railway tracks and power lines could look like roads. This will allow government officials to focus on areas they might need to send an official to confirm the road placement, and add it to the government’s maps and road networks.
Crop Damage Classification: Create a machine-learning algorithm to classify crops into categories: Good growth (G), Drought (DR), Nutrient Deficient (ND), Weed (WD), and Other (including pest, disease or wind damage). By knowing what type of damage a crop experiences, images can be fed into a model to indicate whether a crop was damaged, and needs to be evaluated for insurance payouts.