Hi! I’m Chris, a data analyst and former math teacher.I specialize in data visualization, spatial analysis, and predictive modeling to drive decisions that make a positive impact on society.With a passion for solving complex problems, I turn data into actionable insights that empower organizations and communities.
Python | R | Tableau | Power BI | SQL | Excel | InDesign | Git
Data Analysis 2+ years
Data Visualization 2+ years
Project Management 5+ years
Data Systems & QA 1+ years
Reporting 2+ years
Spatial Analysis 2+ years
Predictive Modeling 2+ years
PROGRAM / DATA ANALYST - Michaels Energy
DATA/POLICY ANALYST & PROJECT MANAGER - Cook County Office of Research, Operations & Innovation
LEAD STEM INSTRUCTOR & DATA REPORTING SPECIALIST - TEACH, LLC.
ARTIFICIAL INTELLIGENCE/MACHINE LEARNING INSTRUCTOR - Kode with Klossy
HIGH SCHOOL MATH TEACHER - School District of Philadelphia

Data Science Certificate, Flatiron School Bootcamp

M.S. Education, University of Pennsylvania

B.S. Mathematics, University of Pittsburgh

OVERVIEW:This project analyzes Chicago’s crash data to identify factors influencing severe crashes, aiming to support Vision Zero initiatives and provide actionable insights for policymakers.DATA: Over 900k records on crash incidents, 1.98 million on individuals, and 1.84 million on vehicles.GOALS: Develop an inferential model to predict and identify factors leading to serious crashes, addressing challenges like class imbalance.MODELING STEPS:
1. Data Cleaning: Handled missing values, duplicates, outliers, and ensured proper formatting.
2. Data Splitting: Prepared training, validation, and test sets to prevent data leakage.
3. Baseline Models: Built logistic regression and decision tree models for initial comparisons.
4. Addressing Class Imbalance: Applied SMOTE, class weighting, and their combinations to address imbalances.
5. Hyperparameter Tuning: Used GridSearchCV to find optimal parameters, optimizing for PR AUC.
6. Model Selection: Chose the decision tree for its interpretability and strong performance.
7. Model Evaluation: Focused on evaluating feature importances from the decision tree to gain insights into key predictors.
- INSIGHT #1: Overrepresented factors in severe crashes include airbag deployment, light condition, and males.

- INSIGHT #2: The top contributing factors identified in severe crashes are airbag deployment, males, and seasonality. The feature importance output from the decision tree model highlights these variables as having the greatest influence on crash severity.

RECOMMENDATIONS:
1. Male Injury Trends:
- Conduct further research to analyze trends in male driver involvement in serious crashes.
- Develop targeted safety campaigns focused on behaviors like speeding and aggressive driving.2. Vehicle Size and Speed Regulations:
- Implement city-specific tiered fees for heavier vehicles, using revenue to fund pedestrian and cyclist safety measures.3. Annual Airbag Inspections:
- Promote policies requiring annual airbag inspections to ensure functionality and compliance with safety standards.
The full project can be viewed on my github page: here
Completed as part of the UChicago Applied Data Fellowship (Cook County Office of Research, Operations & Innovation)
OVERVIEWThis project involved developing a district-level dataset and reporting framework for Cook County by integrating multiple geographic and demographic data sources. The goal was to transform fragmented, ZIP-level and census-based data into consistent, accurate district-level metrics that could support countywide reporting and program evaluation.CONTEXT & USE CASE
Cook County departments often rely on geographic and demographic data to inform planning, reporting, and resource allocation. However, available data is frequently structured at different geographic levels (e.g., ZIP codes, census tracts), making it difficult to generate consistent district-level insights.This project addressed that gap by creating a standardized approach to aggregating and validating data across sources, enabling more reliable reporting and supporting data-informed decision-making.
DATA:
SOURCE: Cook County Offices Under the President, Cook County Health Department, and Cook County Forest Preserves
EXTERNAL SOURCES: U.S. Census Bureau (American Community Survey via API)
SCOPE: 20+ departments, 100+ performance and demographic metrics, covering all 17 Cook County districts
STRUCTURE: Data originated across multiple geographic levels (e.g., ZIP code, census-based metrics) and required transformation into consistent district-level datasets
METHODOLOGY:
Collected data from multiple sources using Python, including API-based retrieval from the Census
Cleaned and standardized datasets to ensure consistent variable definitions and geographic identifiers
Designed a transformation approach to convert ZIP-level data into district-level metrics using population-weighted aggregations
Performed data validation checks to identify inconsistencies across sources and ensure alignment between geographic units
Structured final datasets to support downstream reporting and visualization
INSIGHTS:
Demonstrated that geographic misalignment (ZIP vs. district boundaries) can significantly impact reported metrics if not properly accounted for
Highlighted the importance of population-weighted aggregation when transforming data across geographic levels
Produced a consistent and reusable dataset that enabled clearer, more reliable district-level comparisons
KEY SKILLS:
Python (pandas, API data collection)
Data cleaning and validation
Data integration (multi-source datasets)
Geographic data processing
Aggregation and transformation methods
Click here to view all 17 District Overview Reports.
OVERVIEW:
Conducted a geospatial analysis using ArcGIS to identify the five most dangerous school zones in Pasadena, CA for pedestrian and bicycle accidents. The analysis supports data-driven policy interventions, such as implementing school slow zones to improve student safety. Findings were compiled in an ArcGIS app. This project was developed based on an Esri tutorial and adapted to emphasize policy-relevant insights.
Note: Due to licensing restrictions with my organization's ArcGIS account, the interactive map and app are not publicly shareable.DATA:
Source: Traffic collision data from the City of Pasadena
Scope: Over 26k reported traffic collisions across a 16-year span
Filtered Dataset: 2,441 incidents involving pedestrians and cyclists
GOALS:
Identify school zones with the highest rates of pedestrian and cyclist injuries
Support targeted safety improvements using spatial data
Provide city planners and school officials with a tool for prioritization
STEPS:
DATA FILTERING: Focused on incidents involving pedestrians and cyclists only. Cleaned and visualized 2,441 incidents.
GEOSPATIAL ANALYSIS: Applied point clustering, heat maps, and hot spot analysis to visualize accident concentrations.
CONTEXTUAL LAYERS: Added Pasadena Unified School District (PUSD) locations and generated half-mile walkable zones around each school.
PRIORITIZATION: Used spatial joins and summary statistics to count accidents per school zone. Filtered to highlight the top 5 most dangerous zones based on total number of incidents.
KEY INSIGHTS:
Disproportionate Risk: Just five school zones account for 757 pedestrian and cyclist collisions (over 30% of all such incidents citywide) despite covering a small geographic area. These findings underscore the need for targeted, place-based interventions.
High-Risk Schools Identified: McKinley School (205 incidents), Madison Elementary (196), Rose City High (151), Washington Elementary School (112), Longfellow Elementary (93)
Geographic Clustering: All five zones are located in central/northwest Pasadena, enabling localized, concentrated policy efforts.
POLICY RECOMMENDATIONS:
Implement School Slow Zones in the five highest-risk areas, including traffic calming measures, speed limit enforcement, and enhanced signage.
Prioritize Infrastructure Upgrades such as protected bike lanes, speed cushions, and intersection daylighting.
Deploy crossing guards and create highly visible crosswalks near the entrances of the five identified schools.
Use this Model as a Framework: The spatial analysis approach can be replicated for other risk-prone facilities like parks or transit hubs.
OVERVIEW:
Explored the relationship between youth gun deaths and high school graduation rates across Chicago ZIP codes using publicly available data (2018-2021). Conducted hypothesis test, and regression analysis, as well as created data visualizations, revealing a statistically significant inverse relationship between graduation rates and juvenile gun deaths.DATA:
Mortality Data: Juvenile (under 18) gun-related deaths in Chicago (2018–2021).
Education Data: 4-year high school graduation rates by ZIP code (2018–2021).
GOALS:
Explore the relationship between youth gun deaths and high school graduation rates by:
1. Conducting hypothesis test to determine the statistical significance of the relationship.
2. Performing regression analysis to quantify the strength and direction of this relationship.
3. Visualizing the data using maps and scatterplots to illustrate the relationship both geographically and quantitatively.
KEY INSIGHTS:
The relationship between graduation rates and juvenile gun deaths is inverse—as graduation rates increase, juvenile gun deaths decrease.
The relationship is statistically significant (p = 0.0216, p < 0.05).
For every 1% increase in the graduation rate, there is an associated decrease of 0.349 juvenile gun deaths per 100K youth.
The full project can be viewed on my github page: here
Completed as part of my role as Data Reporting Specialist for TEACH, LLC.

OVERVIEW:
Developed a 5-page data report analyzing the effectiveness of the ACT Prep program over 8 years, summarizing key trends, improvements in scores, and national comparisons.DATA:
Analyzed data from 8 cohorts, serving 156 students with an average enrollment of 19.5 students per cohort.INSIGHTS
93% of students achieved growth
On average, students’ national percentiles increased by 17 points, rising from the 22nd to the 39th percentile during the ACT Prep class


The full report can be viewed: here
OVERVIEW:
Developed an interactive Tableau dashboard to explore the relationship between wealth and trees in New York City.DATA:
Income data across NYC’s 5 boroughs.
Tree census data from over 680k trees
The full dashboard can be viewed on my Tableau Public page: here