Portfolio

Welcome to my data science portfolio page where showcases a collection of my projects on Python and R programming, machine learning algorithms, data analysis, and visualization. In this portfolio, I have worked on various datasets ranging from predicting forest fires and heart disease, to building a spam filter and analyzing New York City school data. The goal is to share my passion for data science and how it can be used to solve real-world problems. Each project demonstrates a different aspect of data science and I hope they inspire you to explore the endless possibilities of this field.

Digits Classifier

Handwritten Digits Classifier

Deep neural networks have been used to reach state-of-the-art performance on image classification tasks in the last decade. We will explore why it is a hard task and observe the limitations of traditional machine learning models for image classification.


Predict Forest Fires

Predict Forest Fires

Applying a standard linear regression model to predict the extent of fire damage to a forest. Our data comes from the Forest Fires dataset from the UCI Machine Learning Repository.


Predicting Employee Productivity

Predicting Employee Productivity

Satisfying the huge global demand for garment products is mostly dependent on the production and delivery performance of the employees in the garment manufacturing companies.


Classifying Heart Disease

Classifying Heart Disease

Applying logistic regression model at a real-life dataset: the Heart Disease Data Set from the UCI Machine Learning Repository to predict heart disease, showing how machine learning can help solve problems that have a real impact on people’s lives.


Crowdedness in the Gym

Crowdedness in the Gym

Creating a model that can predict how many people will be at the gym at a particular day and time. That way, I will be able to enjoy my excersise routine without waiting times.


Predicting Insurance

Predicting Insurance

This dataset contains information on individual medical insurance bills, which is associated with some demographic and personal characteristics of the person who received it.


Credit Card Customer Segmentation

Credit Card Customer Segmentation

Given a dataset containing information about the company’s clients and asked to help segment them into different groups in order to apply different business strategies for each type of customer.


Predicting Heart Disease

Predicting Heart Disease

Build a K-Nearest Neighbors classifier to accurately predict the likelihood of a patient having a heart disease in the future. It is imperative to identify these risk factors early on to prevent the onset of CVDs and reduce premature deaths.


Winning Jeopardy

Winning Jeopardy

Jeopardy is a popular TV show in the US where participants answer questions to win money. I am going to work with a dataset of Jeopardy questions to figure out some patterns in the questions that could help to win.


Spam Filter

Spam Filter

Build a spam filter for SMS messages using the multinomial Naive Bayes algorithm. Our goal is to write a program that classifies new messages with an accuracy greater than 80%.


Best Markets to Advertise

Best Markets to Advertise

By leveraging relevant data sources and analytical tools, we aim to provide actionable insights to inform the company’s advertising decision-making process.


Exchange Rates

Exchange Rates

In this project, we will focus on explanatory data visualization and practice how to use information design principles (familiarity and maximizing the data-ink ratio) to create better graphs for an audience.


Python

Machine Learning

Building A Handwritten Digits Classifier Deep Learning Zhiwen Shi, 2023-04-24

Optimizing Model Prediction: Predict Forest Fires Zhiwen Shi, 2023-04-21

Tree Models: Predicting Employee Productivity Zhiwen Shi, 2023-04-15

Logistic Regression Modeling: Classifying Heart Disease Zhiwen Shi, 2023-04-10

Stochastic Gradient Descent: Crowdedness in the Gym Zhiwen Shi, 2023-04-02

Linear Regression Modeling: Predicting Insurance Costs Zhiwen Shi, 2023-03-28

K-Mean Algorithm: Credit Card Customer Segmentation Zhiwen Shi, 2023-03-22

K-Nearest Neighbors Classifier: Predicting Heart Disease Zhiwen Shi, 2023-03-14

Hypothesis Testing: Jeopardy Questions Zhiwen Shi, 2023-03-07

Building a Spam Filter with Naive Bayes Zhiwen Shi, 2023-03-02

Public Data Sources

Finding the Best Two Markets to Advertise In Zhiwen Shi, 2023-02-27

Storytelling Data Visualization on Exchange Rates Zhiwen Shi, 2023-02-22

Popular Data Science Questions Zhiwen Shi, 2023-02-13

Is Fandango Still Inflating Ratings? Zhiwen Shi, 2023-02-05

Analyzing New York City School Data Zhiwen Shi, 2023-01-28

Star Wars Survey Zhiwen Shi, 2023-01-23

Answering Business Questions Using SQL Zhiwen Shi, 2023-01-15

Analyzing CIA Factbook Data Using SQL Zhiwen Shi, 2023-01-03

Clean and Analyze Employee Exit Surveys Zhiwen Shi, 2022-12-12

Indicators of Heavy Traffic on I-94 Zhiwen Shi, 2022-12-5

Mobile App for Lottery Addiction Zhiwen Shi, 2022-11-30

eBay Car Sales Data Exploration Zhiwen Shi, 2022-11-25

Hacker News Zhiwen Shi, 2022-11-20

Prison Break Zhiwen Shi, 2022-11-15

Profitable App Profiles Zhiwen Shi, 2022-11-11

R

NYC Property Sales Zhiwen Shi, 2022-11-04

Predicting Car Prices Zhiwen Shi, 2022-10-27

Jeopardy Zhiwen Shi, 2022-10-23

BestMarket Zhiwen Shi, 2022-10-21

Spam Zhiwen Shi, 2022-10-22

Lottery Zhiwen Shi, 2022-10-22

Fandango Movie Ratings Zhiwen Shi, 2022-10-20

Movie Ratings Zhiwen Shi, 2022-10-19

NY Solar Resource Zhiwen Shi, 2022-10-18

Chinook Zhiwen Shi, 2022-10-15

NYC schools Zhiwen Shi, 2022-10-13

Forestfires Zhiwen Shi, 2022-10-12

Book reviews Zhiwen Shi, 2022-10-11

Book sales data Zhiwen Shi, 2022-10-11

COVID19 Zhiwen Shi, 2022-10-10


Back to Home