Data Analytics for People Who Hate Vague Questions
"Can you look into why sales are down?"
That is a complete sentence. It is not a complete question. It has no time period, no geography, no definition of "down" relative to what baseline, no clarity on whether we mean units or revenue, no indication of whether the stakeholder suspects a cause or is genuinely open-minded.
And it is approximately the question I received at the start of 40% of the analyses I've run.
Analytics starts with translating vague business anxiety into a question specific enough to answer. That translation is often harder than the analysis itself.
Decompose before you query
When I get a vague question, I resist opening a SQL editor for at least 15 minutes. Instead, I write out the decomposition on paper:
Original: "Why are sales down?"
Decomposed:
- Down vs. what? (Prior month? Prior year same period? Budget?)
- Which products/categories/regions?
- Revenue or units or both?
- Is it a volume problem (fewer transactions) or a value problem (lower average order size)?
- When did it start? Gradual decline or sudden drop?
Each of these sub-questions has a different answer — and sometimes a different root cause. Answering the wrong version of the question wastes everyone's time.
Structure your analysis around hypotheses
Good analytics isn't just "let me look at the data and see what I find." That produces unfocused exploration that's hard to act on.
I structure almost every analysis around explicit hypotheses, listed before looking at data:
Question: Why did website traffic drop 22% in September?
Hypotheses:
- Algorithm change penalized our content (SEO issue)
- Paid search budget was cut or performance degraded
- A competitor launched and absorbed our traffic
- Technical issue reduced crawlability or load speed
- Seasonal pattern — September is historically lower
I then rank these by prior probability (which is most likely given what we know?) and testability (what data can confirm or rule this out quickly?). I test the most likely, most testable hypothesis first.
This approach means I can often deliver a "here's what it's not" conclusion quickly, which is almost as useful as "here's what it is."
Distinguish correlation from causation in conversation, not just in methodology
Every analytics practitioner knows that correlation doesn't imply causation. The harder skill is communicating this to stakeholders without making them feel like their question was wrong.
When I find a strong correlation, I present it like this:
"We see a strong relationship between email open rate and same-week purchase rate (r = 0.73). This could mean emails drive purchases, or that customers who are already inclined to buy are also more likely to open emails, or both. To determine causation, we'd need an A/B test where we randomly suppress emails for a control group. Would that be worth setting up?"
That framing does three things: it gives them the finding, it's honest about what it means, and it proposes a path to the stronger answer without making the current finding useless.
Build exploratory and confirmatory analyses separately
Exploratory analysis is for you. You're looking for patterns, anomalies, hypotheses. It's rough, iterative, and shouldn't be shown to stakeholders directly.
Confirmatory analysis is for them. You have a specific question, you run the test, you report the result. This is clean, reproducible, and explicitly tests a pre-stated hypothesis.
The mistake I see most often is presenting exploratory findings as if they're confirmatory. "I looked through the data and noticed that customers who buy Product A are twice as likely to churn" sounds like a finding. But if you found it by looking at 50 correlations and reporting the largest one, you've almost certainly found noise. Confirmation on a holdout set changes the answer completely.
The five metrics that actually matter
In my experience, most business questions eventually reduce to one of five core metrics:
| Question | Core Metric |
|---|---|
| Are we growing? | Revenue / Users over time |
| Are we retaining? | Cohort retention rate |
| Are we profitable? | Margin by segment |
| Are we efficient? | CAC, LTV, payback period |
| Are we performing vs. plan? | Actual vs. budget variance |
Most of the analytical complexity comes from getting clean, consistent definitions of these metrics — not from the analysis once you have them. I spend more time on metric definitions than most people expect, and less time than most people expect on the analysis itself.
Document your methodology, not just your findings
Every analysis I deliver includes a methodology note — even if it's just a paragraph:
"Analysis covers March 1–31, 2024. Revenue figures pulled from the DW_SALES table, excluding returns processed after April 15. Comparison period is March 2023, same exclusion criteria. Statistical significance tested using a two-sample t-test at p < 0.05."
This might seem pedantic. It isn't. Three months later, when a stakeholder asks why the Q1 board deck showed a different number, this note is the difference between a 5-minute explanation and a two-day investigation.
Methodology documentation is free insurance.
Vague questions are a feature, not a bug
I've made peace with vague questions. They mean the stakeholder trusts me enough to think out loud with them. The decomposition conversation, done well, is actually where a lot of the value gets created — before any data is touched.
The analyst who asks the best clarifying questions is usually the one who delivers the most useful analysis.