Building a School Directory Across East Asia: Lessons from the Data Trenches
What happens when you try to catalog hundreds of international schools across Japan, Korea, Taiwan, and beyond — and why clean data is harder than it looks.
Data tips, ETL war stories, and occasional rants about messy spreadsheets.
What happens when you try to catalog hundreds of international schools across Japan, Korea, Taiwan, and beyond — and why clean data is harder than it looks.
A practical guide to CRM deduplication — why it's harder than your vendor claims, and the strategies that actually work.
How to append firmographics, emails, and context to your records using free and low-cost sources — before you even think about paying a vendor.
No toy examples. These are the real utility scripts that live in my data toolkit — for cleaning, validating, transforming, and shipping records.
A systematic playbook for walking into an unfamiliar database, figuring out what you're dealing with, and making a plan — without breaking anything.
A field guide for diagnosing and recovering from pipeline failures — and the monitoring habits that mean you wake up before the stakeholders do.
Not SQL 101. These are the window functions, CTEs, and query patterns that show up in real data work — with examples you can actually steal.
A dashboard is only as good as the data behind it. Here's the quick audit I run before I let any chart inform a real decision.
The first two weeks on a new data engagement set the tone for everything that follows. Here's how I run mine — and the mistakes I made early on that taught me to do it this way.
Cron gets a bad reputation it doesn't always deserve. Here's an honest breakdown of when cron is the right tool, when Airflow earns its complexity, and what to use in between.
Bad data kills margins quietly. Here's what messy product data actually costs Amazon sellers — and how to fix it before it gets worse.
Documentation that nobody reads is worse than no documentation — it creates false confidence. Here's what to write, where to put it, and how to keep it alive.
Introducing myself, what this blog is about, and why I genuinely get excited about messy databases.