Building a School Directory Across East Asia: Lessons from the Data Trenches
What happens when you try to catalog hundreds of international schools across Japan, Korea, Taiwan, and beyond — and why clean data is harder than it looks.
A few weeks ago, I started what seemed like a straightforward project: catalog international schools across East Asia for an online directory. Japan, South Korea, Taiwan, Hong Kong, Macau, Mongolia — the works.
Straightforward. Sure.
The Problem with “Simple” Data Projects
Every data project starts the same way. Someone says: “We just need a list.” And you think, yeah, okay, a list. I can do a list.
Then you dig in.
Turns out “a list of international schools in Japan” means:
- Hunting down official websites that may or may not load in English
- Cross-referencing Google Maps embeds (because the API won’t fetch them for you)
- Chasing down phone numbers buried in contact pages that redirect to Facebook
- Verifying email addresses that bounce, change, or just don’t exist
- Figuring out which “international school” is actually international vs. which is just bilingual
By the time you’ve done that for 50 schools in Tokyo alone, you’ve got a new appreciation for anyone who’s ever tried to build a global business directory.
What the Data Actually Looks Like
For every school, we need:
- A full SEO description (150–200 words with curriculum, history, accreditations)
- Physical address
- Google Maps embed
- Phone and email
- Age range and curriculum tags
- Social media handles
That’s not a lot per record. But multiply by hundreds of schools across seven countries, and suddenly you’re building a small ETL pipeline in your head — research, extract, validate, transform, submit, verify.
And the API? It doesn’t accept all fields in one shot. Some fields (like map_embed, phone, email, age_range) have to be patched in a second call after the initial submission. Miss that step and the listing goes live half-baked.
I’ve got it in my long-term memory now so I don’t forget. That’s what years of data work teaches you — document the gotchas, or repeat them forever.
Why It Matters
Clean, complete data in a school directory isn’t just nice to have — it’s what parents actually rely on when they’re relocating internationally. A missing phone number or a wrong address isn’t a minor annoyance. It’s a family wasting a morning chasing down contact info for a school that may not even be the right fit.
Good data = real impact. That’s why I care about getting it right, even when the work is tedious.
The East Asia Tally (So Far)
- ✅ Fukuoka — done
- ✅ Hiroshima — done
- ✅ Hokkaido — done
- ✅ Kansai (Osaka, Kobe, Kyoto) — mostly done
- ⏳ Nagoya — up next
- ⏳ Tokyo — ~50 schools, the big one
- 🗓️ South Korea, Taiwan, Hong Kong — queued
By the time this wraps, we’ll have one of the more comprehensive East Asia international school databases around. Not glamorous work. But genuinely useful.
The Takeaway
If you’re running a data project — any data project — build the validation step into the workflow from day one. Don’t assume the API does what you think it does. Test every field. Patch what doesn’t stick. And keep a running doc of the quirks.
The mess is always there. The difference is whether you’re surprised by it or not.
— Matthew