KR
Hi, I'm

Keval Rakholiya

Data & AI Engineer. I build machine learning models, GenAI pipelines, and analytics systems that turn messy data into real business decisions.
0+
Years of ExperienceClick to explore →
0+
Data ProjectsClick to explore →
0+
HackathonsClick to explore →
0+
CertificationsClick to explore →
00 — Intro

About Me

Data professional with cross-functional expertise across data science, analytics, business intelligence, and data engineering. Currently working in marketing analytics at Genesis Motor America, delivering insights across customer behavior, campaign performance, and revenue optimization. AWS Certified Data Engineer – Associate, with hands-on experience designing cloud-based data pipelines and analytics architectures. Brings a hybrid mindset combining data science modeling, business analysis, and engineering execution to solve complex business problems and deliver measurable impact.

01 — Career

Work Experience

Where I've worked and the impact I've made.

Present
Genesis Motor America
Genesis Motor America

Senior Associate, Marketing Analytics

📍 Fountain Valley, CA

Nov 2025PresentFull-Time

Leading cross-functional analytics initiatives across the full marketing stack — integrating Adobe Analytics, GA4, CRM, and vehicle sales data into a unified measurement framework. Engineered scalable SQL pipelines and Python ETL workflows to consolidate multi-source marketing data, enabling consistent KPI tracking across owned, earned, and paid channels. Built propensity and attribution models using XGBoost and scikit-learn to predict customer purchase intent and allocate budget with greater precision. Designed A/B testing frameworks with statistical significance validation to evaluate creative and media strategies, directly informing multi-million dollar campaign decisions. Delivered executive-facing Power BI and Tableau dashboards surfacing real-time revenue trends, customer funnel metrics, and audience segmentation insights to VP-level stakeholders.

Key Impact

Built unified KPI dashboard used by VP-level stakeholders Reduced manual reporting time by ~60% Improved attribution model accuracy by 35% Identified $2M+ in marketing spend inefficiencies

Tools & Tech

Adobe AnalyticsGA4SQLPythonPower BITableau
Traffic Management Inc.
Traffic Management Inc.

Data Analyst

📍 Long Beach, CA

Dec 2023Oct 2025Full-Time

Owned end-to-end reporting infrastructure for operations and field performance metrics across 50+ internal stakeholders. Architected Power BI dashboards connected to SQL Server data warehouses — replacing fragmented Excel workflows and cutting report generation time from days to minutes. Wrote advanced SQL queries and stored procedures to clean, transform, and aggregate data from multiple source systems using SSIS ETL pipelines. Applied time-series analysis and predictive modeling in Python to surface trends in incident frequency, resource utilization, and project cycle times, enabling proactive risk management. Built a fully automated reporting suite using Python and Azure-scheduled jobs, eliminating 20+ hours of weekly manual effort and improving data freshness from weekly to daily.

Key Impact

Automated 12 reporting workflows, saving 20+ hrs/week Power BI dashboards adopted by 50+ internal stakeholders Reduced incident response time by 25% via predictive alerts 70% faster report generation through SQL optimization

Tools & Tech

Power BISQL ServerPythonExcelSSISAzure
California State University, Long Beach
California State University, Long Beach

Research Assistant — Health Science

📍 Long Beach, CA

Mar 2022Dec 2023Part-Time

Supported health science faculty research by designing reproducible data pipelines in Python (pandas, NumPy) to standardize and clean clinical and survey datasets across multiple concurrent studies. Applied inferential statistics — including multivariate regression, ANOVA, chi-square tests, and survival analysis — to extract clinically meaningful patterns from 10,000+ patient records with 99% data accuracy. Built statistical models in R and SPSS to analyze population health trends, behavioral risk factors, and intervention outcomes. Created publication-quality visualizations using matplotlib and ggplot2 to communicate findings to both technical collaborators and non-technical faculty audiences. Contributed to peer-reviewed publication pipelines across health behavior and chronic disease domains, delivering analysis-ready datasets and statistical summaries that supported 3 published papers.

Key Impact

Processed 10,000+ patient records with 99% accuracy Reduced data cleaning time by 40% via automated pipelines Supported 3 peer-reviewed research publications Built reusable datasets adopted by 5+ research teams

Tools & Tech

PythonRSPSSExcelStatistical Analysis
Professional Career Start
03 — Stack

Skills

My full tech stack across data, AI, and engineering.

Data Science & Analytics

PythonPandasNumPyScikit-learnJupyterMatplotlibSeabornRSciPySPSS

Machine Learning & AI

PythonTensorFlowPyTorchXGBoostLightGBMMLflowAWS SageMakerHugging FaceSHAPFeature Engineering

Generative AI & LLM

PythonLangChainOpenAI APIRAG PipelinesPrompt EngineeringPineconeChromaDBVector EmbeddingsFine-Tuning

BI & Analytics Tools

Power BITableauGoogle AnalyticsAdobe AnalyticsExcel / VBASQL ServerLookerDAXSSIS

Cloud & Data Engineering

AWSAzureSnowflakeApache SparkdbtApache AirflowBigQueryDockerPostgreSQLRedshift
04 — Portfolio

Projects

End-to-end data science and ML projects — predictive analytics, forecasting, and BI dashboards.

01

People Intelligence Engine

A workforce analytics solution that combines interactive dashboards and machine learning to identify attrition patterns and predict employee turnover risk. It helps HR teams take proactive actions by highlighting high-risk employees and key drivers of attrition.

People Analytics & Workforce Intelligence
Machine Learning
Data Analysis & Feature Engineering (Python)
Business Intelligence & KPI Development
Tableau & Excel
02

Stock Price Forecasting with Deep Learning Models

A hybrid time-series forecasting model that uses LSTM to capture market trends and ARIMA to correct prediction errors. The combined approach improves accuracy and stability compared to using a single model for stock price prediction.

Time Series Forecasting
Model Evaluation (MSE, RMSE, MAE)
Data Visualization
SageMaker
Sequence Modeling
Python
03

Web Traffic Analysis

Built a scalable website analytics dashboard in Power BI using SQL to deliver actionable insights on user engagement, traffic sources, and performance trends for data-driven decision-making.

Power BI
Python
PostgreSQL
Data Modeling
ETL Processes
Machine Learning
KPI Development
04

Vehicle Sales Analysis & BI Dashboard

Performed exploratory and statistical analysis on vehicle sales data to uncover market trends, pricing behavior, and customer segmentation, laying the foundation for predictive modeling and demand forecasting in the automotive industry.

SQL Server
Python
Data Cleaning & Feature Engineering
Business & Market Analysis
Power BI
05 — Builds

I Build at Hackathons

I have attended 7+ hackathons across California — from Stanford to San Diego — building data-driven solutions in 24–48 hours alongside some of the most driven people in tech.

LA Hacks 2025

LA Hacks 2025

📍 Los Angeles, California

Benchmarked six machine learning classifiers — XGBoost, Random Forest, SVM, Logistic Regression, KNN, and a shallow Neural Network — on a large gaming behavior dataset to predict player churn and session drop-off. Engineered features from raw event logs including session frequency, in-game purchase history, level progression rate, and social interactions. Handled severe class imbalance using SMOTE oversampling and class-weighted loss functions. Evaluated all models on AUC-ROC, precision-recall curves, and F1-score across 5-fold stratified cross-validation. XGBoost outperformed all baselines by 11% on AUC and was selected as the final model. Wrapped findings into an interactive Streamlit report with feature importance charts and player risk segmentation by cohort.

Hacktech 2025

Hacktech 2025

📍 Pasadena, California

Built a heart disease risk prediction system using clinical tabular data from the UCI Heart Disease dataset. Performed comprehensive EDA including correlation heatmaps, distribution plots, and chi-square feature selection tests. Identified chest pain type, max heart rate, ST depression, and number of major vessels as the top predictive signals. Trained and compared Logistic Regression, Decision Tree, and Random Forest models across multiple hyperparameter configurations. The final stacked ensemble achieved 89% accuracy and 0.91 AUC on the holdout test set. Applied SHAP values to generate model explainability reports, making predictions interpretable for a non-technical medical audience. Packaged the final output as a physician-facing risk summary PDF with per-patient risk scores and contributing factor breakdowns.

HackDavis 2025

HackDavis 2025

📍 Davis, California

Built a generative AI assistant for community health workers using LangChain, OpenAI embeddings, and a retrieval-augmented generation (RAG) pipeline grounded in CDC datasets, California Department of Public Health reports, and county-level health surveys. Designed a vector store using ChromaDB to index and retrieve relevant health statistics based on semantic similarity. Users could ask plain-English questions about disease prevalence, food insecurity rates, vaccination coverage, and healthcare access — and receive accurate, cited, data-backed summaries. Implemented prompt chaining to handle multi-turn conversations and context retention. Focused on accessibility for non-technical public health staff in underserved regions, with a clean Streamlit front-end requiring no technical knowledge to operate. Won recognition in the Social Good track.

TreeHacks 2025

TreeHacks 2025

📍 Stanford, California

Developed a real-time exercise activity recognition system at the intersection of computer vision and machine learning. Used MediaPipe Pose to extract 33 skeletal landmarks per video frame and OpenCV for live webcam frame capture and preprocessing. Engineered angular joint features from landmark coordinates — including elbow, knee, hip, and shoulder angles — to represent motion patterns rather than raw positions. Trained a lightweight LSTM network on a labeled video dataset of 8 common exercise types, achieving 93% classification accuracy on held-out clips. Added a repetition counter using angle-threshold state machines per exercise type. Demoed a live webcam interface that classifies exercise type and counts reps in real time, presented at the Stanford Healthcare Innovation track showcase to strong audience reception.

SoCal Tech Week 2024

SoCal Tech Week 2024

📍 Los Angeles, California

Built a civic data intelligence platform to surface community concerns in underserved Los Angeles neighborhoods using social media and public records. Scraped thousands of posts from Reddit (r/LosAngeles, neighborhood subreddits) and Twitter/X using their APIs, targeting discussions around housing affordability, rent increases, transit access, and public safety. Preprocessed and cleaned text data with spaCy — removing noise, normalizing slang, and extracting named entities for neighborhood tagging. Applied VADER sentiment analysis to score posts at both neighborhood and topic level, then aggregated trends over time. Visualized findings in an interactive Streamlit dashboard featuring choropleth heatmaps by zip code, time-series sentiment trends, and keyword frequency breakdowns. Placed in the top 5 teams in the Social Impact track out of 80+ submissions.

UC Berkeley AI Hackathon 2024

UC Berkeley AI Hackathon 2024

📍 Berkeley, California

Designed and built an end-to-end automotive sales intelligence platform during a 24-hour AI-focused hackathon at UC Berkeley. Architected SSIS ETL pipelines to ingest raw dealership data from CSV exports, CRM APIs, and inventory management systems into a centralized SQL Server staging area. Applied star schema dimensional modeling with fact tables for sales transactions and dimension tables for vehicles, dealerships, regions, and time. Built interactive Power BI dashboards tracking monthly revenue by region, sales velocity by model, customer lifetime value, and inventory turnover rate. Incorporated an AI-assisted anomaly detection module using Python's scikit-learn to flag unusual pricing and discount patterns. The solution reduced manual reporting time by an estimated 70% and surfaced pricing inefficiencies across three vehicle segments, earning the Best Data Engineering award at the event.

DataHacks 2024

DataHacks 2024

📍 San Diego, California

Predicted telecom customer churn by following the full CRISP-DM data science lifecycle — from business understanding through model deployment planning. Explored a real-world telecom dataset covering contract type, monthly charges, tenure, service bundle usage, and support call frequency. Performed targeted EDA to surface churn patterns by segment and applied label encoding and scaling for categorical and numeric features. Trained and compared Decision Tree, Naive Bayes, and KNN classifiers across stratified train/test splits, running grid search hyperparameter optimization for each. Surfaced that contract type and customer tenure were the dominant churn drivers, followed by whether the customer had tech support enabled. Achieved a final model F1-score of 0.84 on the holdout set. Delivered both a technical Jupyter Notebook and a non-technical retention strategy memo with targeted recommendations for reducing churn in high-risk customer segments.

Open to opportunities

Let's Connect

I'm actively looking for data science, analytics, and AI engineering roles. Whether you have an opportunity, a question, or just want to say hi, my inbox is open.

GitHub
LinkedIn
CV