Matthew Kassel
Back to Blog Aerial view of a city skyline representing East Asia expansion
dataetlopinion

Building a School Directory Across East Asia: Lessons from the Data Trenches

What happens when you try to catalog hundreds of international schools across Japan, Korea, Taiwan, and beyond — and why clean data is harder than it looks.

A few weeks ago, I started what seemed like a straightforward project: catalog international schools across East Asia for an online directory. Japan, South Korea, Taiwan, Hong Kong, Macau, Mongolia — the works.

Straightforward. Sure.

The Problem with “Simple” Data Projects

Every data project starts the same way. Someone says: “We just need a list.” And you think, yeah, okay, a list. I can do a list.

Then you dig in.

Turns out “a list of international schools in Japan” means:

  • Hunting down official websites that may or may not load in English
  • Cross-referencing Google Maps embeds (because the API won’t fetch them for you)
  • Chasing down phone numbers buried in contact pages that redirect to Facebook
  • Verifying email addresses that bounce, change, or just don’t exist
  • Figuring out which “international school” is actually international vs. which is just bilingual

By the time you’ve done that for 50 schools in Tokyo alone, you’ve got a new appreciation for anyone who’s ever tried to build a global business directory.

What the Data Actually Looks Like

For every school, we need:

  • A full SEO description (150–200 words with curriculum, history, accreditations)
  • Physical address
  • Google Maps embed
  • Phone and email
  • Age range and curriculum tags
  • Social media handles

That’s not a lot per record. But multiply by hundreds of schools across seven countries, and suddenly you’re building a small ETL pipeline in your head — research, extract, validate, transform, submit, verify.

And the API? It doesn’t accept all fields in one shot. Some fields (like map_embed, phone, email, age_range) have to be patched in a second call after the initial submission. Miss that step and the listing goes live half-baked.

I’ve got it in my long-term memory now so I don’t forget. That’s what years of data work teaches you — document the gotchas, or repeat them forever.

Why It Matters

Clean, complete data in a school directory isn’t just nice to have — it’s what parents actually rely on when they’re relocating internationally. A missing phone number or a wrong address isn’t a minor annoyance. It’s a family wasting a morning chasing down contact info for a school that may not even be the right fit.

Good data = real impact. That’s why I care about getting it right, even when the work is tedious.

The East Asia Tally (So Far)

  • Fukuoka — done
  • Hiroshima — done
  • Hokkaido — done
  • Kansai (Osaka, Kobe, Kyoto) — mostly done
  • Nagoya — up next
  • Tokyo — ~50 schools, the big one
  • 🗓️ South Korea, Taiwan, Hong Kong — queued

By the time this wraps, we’ll have one of the more comprehensive East Asia international school databases around. Not glamorous work. But genuinely useful.

The Takeaway

If you’re running a data project — any data project — build the validation step into the workflow from day one. Don’t assume the API does what you think it does. Test every field. Patch what doesn’t stick. And keep a running doc of the quirks.

The mess is always there. The difference is whether you’re surprised by it or not.

— Matthew