How I Automated 20+ Hours of Weekly Reporting With Python

When I joined Traffic Management Inc. as a Data Analyst, the reporting process looked like this: every Monday morning, someone opened six different Excel files, copied data across three tabs, ran pivot tables, formatted numbers, pasted into a PowerPoint, and emailed it to 50 stakeholders. It took about four hours. It happened every week. It was done by a person.

Within six months, I had that process down to a scheduled Python job that ran overnight and delivered a formatted report to everyone's inbox before they arrived at work.

Here's exactly how I did it.

Step 1: Audit before you automate

The worst thing you can do is automate a broken process. Before writing a line of code, I spent two weeks just watching and documenting:

Where does each data point come from?
What transformations happen between source and output?
Which steps are actually business logic vs. formatting busywork?
What breaks most often, and why?

This audit revealed that roughly 40% of the four-hour process was pure formatting — column widths, number rounding, header colors. Another 30% was copy-pasting numbers that were already in a database. Only about 30% was actual judgment: interpreting anomalies, flagging exceptions, writing executive commentary.

Rule one of automation: only automate the repeatable parts. Leave the judgment to humans.

Step 2: Centralize the data source first

The reports were pulling from six Excel files that different teams maintained manually. Before automating delivery, I had to fix the source.

I built a SQL Server staging table that consolidated all six sources using SSIS ETL jobs. Each source system fed into the staging table on its own schedule. The report job only ever read from one place.

This step alone — before any automation — cut manual work in half. Teams no longer had to reconcile inconsistencies between files. One source of truth, versioned, auditable.

Step 3: Build the transformation layer in Python

With a clean source, I wrote the transformation logic in Python using pandas:

import pandas as pd
import sqlalchemy

engine = sqlalchemy.create_engine(CONNECTION_STRING)
df = pd.read_sql("SELECT * FROM staging.weekly_ops WHERE report_date = %(date)s", 
                  engine, params={"date": report_date})

# Derived metrics
df["incident_rate"] = df["incidents"] / df["total_hours"] * 1000
df["on_time_pct"] = df["on_time_completions"] / df["total_completions"]

# Segment-level aggregations
summary = df.groupby("region").agg({...}).reset_index()

I made every metric definition explicit and code-reviewable. No more "I think this number is calculated as..." — the formula was in source control.

Step 4: Generate the output programmatically

The output format was a styled Excel workbook (stakeholders lived in Excel). I used openpyxl to generate it:

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment

wb = Workbook()
ws = wb.active

# Write headers with formatting
for col, header in enumerate(COLUMNS, start=1):
    cell = ws.cell(row=1, column=col, value=header)
    cell.font = Font(bold=True, color="FFFFFF")
    cell.fill = PatternFill("solid", fgColor="1F3864")

# Write data rows
for row_idx, row in summary.iterrows():
    for col_idx, value in enumerate(row, start=1):
        ws.cell(row=row_idx + 2, column=col_idx, value=value)

The formatted output was pixel-for-pixel identical to the manually built version. Stakeholders noticed nothing changed — except it was in their inbox at 6am every Monday instead of 10am.

Step 5: Schedule and monitor

I deployed the script as an Azure-scheduled job. Every Sunday at 11pm, it:

Validates the staging data is fresh (fails loudly if not)
Runs the transformations
Generates the Excel output
Emails it via SMTP to the distribution list
Logs completion status to a monitoring table

The monitoring table was key. It gave me a 90-day history of every run: duration, row counts, any warnings. When something went wrong, I could see exactly when it started and what changed.

What the numbers looked like

Before: ~4 hours manual work per report × 5 reports per week = ~20 hours/week
After: 0 hours manual work. The job runs in 4 minutes.
Stakeholder complaints about late reports: dropped to zero.
Data errors caught before delivery: 3 in the first month (the quality checks worked).

The lesson no one tells you

The hardest part wasn't the Python. It was convincing people to trust it.

The first three weeks after launch, the analyst who had been building the reports manually kept double-checking the output against her process. That was the right instinct — and it caught two edge cases I hadn't handled. I'm glad she did it.

Automation doesn't replace trust. It has to earn trust, the same way a new analyst would. Build in the quality checks, make the logic transparent, and be willing to explain every number. Once stakeholders trust the system, the time savings compound indefinitely.

Twenty hours a week, freed up for actual analysis. That's the real return on automation.