How I Automated 20+ Hours of Weekly Reporting With Python
When I joined Traffic Management Inc. as a Data Analyst, the reporting process looked like this: every Monday morning, someone opened six different Excel files, copied data across three tabs, ran pivot tables, formatted numbers, pasted into a PowerPoint, and emailed it to 50 stakeholders. It took about four hours. It happened every week. It was done by a person.
Within six months, I had that process down to a scheduled Python job that ran overnight and delivered a formatted report to everyone's inbox before they arrived at work.
Here's exactly how I did it.
Step 1: Audit before you automate
The worst thing you can do is automate a broken process. Before writing a line of code, I spent two weeks just watching and documenting:
- Where does each data point come from?
- What transformations happen between source and output?
- Which steps are actually business logic vs. formatting busywork?
- What breaks most often, and why?
This audit revealed that roughly 40% of the four-hour process was pure formatting — column widths, number rounding, header colors. Another 30% was copy-pasting numbers that were already in a database. Only about 30% was actual judgment: interpreting anomalies, flagging exceptions, writing executive commentary.
Rule one of automation: only automate the repeatable parts. Leave the judgment to humans.
Step 2: Centralize the data source first
The reports were pulling from six Excel files that different teams maintained manually. Before automating delivery, I had to fix the source.
I built a SQL Server staging table that consolidated all six sources using SSIS ETL jobs. Each source system fed into the staging table on its own schedule. The report job only ever read from one place.
This step alone — before any automation — cut manual work in half. Teams no longer had to reconcile inconsistencies between files. One source of truth, versioned, auditable.
Step 3: Build the transformation layer in Python
With a clean source, I wrote the transformation logic in Python using pandas:
import pandas as pd import sqlalchemy engine = sqlalchemy.create_engine(CONNECTION_STRING) df = pd.read_sql("SELECT * FROM staging.weekly_ops WHERE report_date = %(date)s", engine, params={"date": report_date}) # Derived metrics df["incident_rate"] = df["incidents"] / df["total_hours"] * 1000 df["on_time_pct"] = df["on_time_completions"] / df["total_completions"] # Segment-level aggregations summary = df.groupby("region").agg({...}).reset_index()
I made every metric definition explicit and code-reviewable. No more "I think this number is calculated as..." — the formula was in source control.
Step 4: Generate the output programmatically
The output format was a styled Excel workbook (stakeholders lived in Excel). I used openpyxl to generate it:
from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment wb = Workbook() ws = wb.active # Write headers with formatting for col, header in enumerate(COLUMNS, start=1): cell = ws.cell(row=1, column=col, value=header) cell.font = Font(bold=True, color="FFFFFF") cell.fill = PatternFill("solid", fgColor="1F3864") # Write data rows for row_idx, row in summary.iterrows(): for col_idx, value in enumerate(row, start=1): ws.cell(row=row_idx + 2, column=col_idx, value=value)
The formatted output was pixel-for-pixel identical to the manually built version. Stakeholders noticed nothing changed — except it was in their inbox at 6am every Monday instead of 10am.
Step 5: Schedule and monitor
I deployed the script as an Azure-scheduled job. Every Sunday at 11pm, it:
- Validates the staging data is fresh (fails loudly if not)
- Runs the transformations
- Generates the Excel output
- Emails it via SMTP to the distribution list
- Logs completion status to a monitoring table
The monitoring table was key. It gave me a 90-day history of every run: duration, row counts, any warnings. When something went wrong, I could see exactly when it started and what changed.
What the numbers looked like
- Before: ~4 hours manual work per report × 5 reports per week = ~20 hours/week
- After: 0 hours manual work. The job runs in 4 minutes.
- Stakeholder complaints about late reports: dropped to zero.
- Data errors caught before delivery: 3 in the first month (the quality checks worked).
The lesson no one tells you
The hardest part wasn't the Python. It was convincing people to trust it.
The first three weeks after launch, the analyst who had been building the reports manually kept double-checking the output against her process. That was the right instinct — and it caught two edge cases I hadn't handled. I'm glad she did it.
Automation doesn't replace trust. It has to earn trust, the same way a new analyst would. Build in the quality checks, make the logic transparent, and be willing to explain every number. Once stakeholders trust the system, the time savings compound indefinitely.
Twenty hours a week, freed up for actual analysis. That's the real return on automation.