The Art of Data Enrichment (Without Breaking the Bank)
How to append firmographics, emails, and context to your records using free and low-cost sources — before you even think about paying a vendor.
Data enrichment is one of those things that sounds expensive because vendors want it to sound expensive.
“Pay us $40k/year and we’ll append 200 firmographic fields to every record.” Sure, if your budget has that kind of room. But most teams I work with don’t — and they don’t need to. There’s a surprising amount of value you can extract from free and cheap sources before you even open a vendor conversation.
Let me walk you through how I actually approach enrichment projects.
What Is Data Enrichment, Exactly?
If you’ve got a list of leads — say, names and company names — enrichment is the process of appending additional context to those records:
- Company size, industry, revenue band (firmographics)
- LinkedIn profiles, job titles
- Phone numbers, email patterns
- Technology stack (what tools does this company use?)
- Social presence, founding year, HQ location
The goal: turn a sparse record into one your sales or marketing team can actually act on.
Layer 0: Clean Before You Enrich
This is the step everyone skips and regrets.
Enrichment applied to dirty data creates enriched dirty data. If your company name field has “Microsft Corp” in 300 rows, no API is going to reliably append the right firmographics. Garbage in, garbage out — but now with expensive garbage attached.
Before touching any enrichment source:
- Normalize company names (strip legal suffixes, fix typos, standardize case)
- Deduplicate records (see my previous post on CRM deduplication)
- Validate email domains — dead domains mean dead records
- Flag records where core fields are missing or obviously wrong
An hour of normalization up front saves days of bad enrichment downstream.
Layer 1: Free Sources (Do These First)
Clearbit’s Free Enrichment (Logo + Basic Data)
Clearbit offers a free autocomplete and company lookup API that returns industry, employee count, and HQ location for millions of domains. Rate limits are real, but for batches under 10k records it’s usually fine.
import requests
import time
def enrich_from_clearbit(domain: str, api_key: str) -> dict:
url = f"https://company.clearbit.com/v2/companies/find?domain={domain}"
headers = {"Authorization": f"Bearer {api_key}"}
resp = requests.get(url, headers=headers)
if resp.status_code == 200:
data = resp.json()
return {
"industry": data.get("category", {}).get("industry"),
"employees": data.get("metrics", {}).get("employeesRange"),
"country": data.get("geo", {}).get("country"),
"tech": [t["tag"] for t in data.get("tech", [])],
}
return {}
# Example batch run
for domain in domains:
result = enrich_from_clearbit(domain, API_KEY)
time.sleep(0.1) # Be polite to the API
Hunter.io for Email Patterns
Hunter’s free tier gives you 25 searches/month, but its real value is the email pattern endpoint — it tells you how a company formats their emails (e.g., {first}.{last}@company.com). Once you have the pattern, you can construct emails from name + domain without burning credits per record.
def get_email_pattern(domain: str, hunter_key: str) -> str:
url = f"https://api.hunter.io/v2/domain-search?domain={domain}&api_key={hunter_key}"
resp = requests.get(url).json()
return resp.get("data", {}).get("pattern", "")
Wikipedia / Wikidata for Public Companies
For large, well-known companies, Wikidata is an underrated enrichment source — founding year, HQ city, ticker symbol, CEO name. It’s free, structured, and surprisingly complete for Fortune 5000-type companies.
def wikidata_lookup(company_name: str) -> dict:
url = "https://www.wikidata.org/w/api.php"
params = {
"action": "wbsearchentities",
"search": company_name,
"language": "en",
"type": "item",
"format": "json"
}
resp = requests.get(url, params=params).json()
results = resp.get("search", [])
if results:
return {"wikidata_id": results[0]["id"], "label": results[0]["label"]}
return {}
It won’t help with SMBs, but for enterprise accounts it’s solid signal.
Layer 2: Cheap Paid Sources
Once you’ve exhausted free enrichment, here’s where I spend small budgets before considering big-ticket vendors:
FullContact — affordable firmographic and social enrichment, pay-as-you-go model, works well for B2C lists where you have personal email addresses.
Apollo.io — great for B2B. Their paid tiers are reasonable and their export quality is high. For building prospect lists or appending job titles, it’s hard to beat at the price point.
Google Maps API — if you need physical addresses, local business phone numbers, or business categories for SMBs, the Maps Places API is cheap and accurate. $0.017 per request; a 10k batch costs ~$170.
def enrich_from_maps(company_name: str, city: str, maps_key: str) -> dict:
query = f"{company_name} {city}"
url = "https://maps.googleapis.com/maps/api/place/findplacefromtext/json"
params = {
"input": query,
"inputtype": "textquery",
"fields": "name,formatted_address,formatted_phone_number,website",
"key": maps_key
}
resp = requests.get(url, params=params).json()
candidates = resp.get("candidates", [])
if candidates:
return candidates[0]
return {}
Layer 3: Know When to Call the Vendor
Okay, sometimes you do need the big guns. When does it make sense?
- Volume + freshness requirements. If you need 500k records enriched and kept fresh quarterly, a ZoomInfo or Bombora contract probably pays for itself.
- Intent data. There’s no DIY version of B2B intent signals. If your sales team needs to know who’s researching competitors right now, you need a vendor.
- Industry-specific databases. Healthcare NPI data, real estate ownership records, legal entity lookups — there are purpose-built databases for these that aren’t replicable with free tools.
The key question: does the enrichment increase pipeline value enough to justify the cost? Run the math. If a $20k enrichment subscription helps close one extra $50k deal per year, it’s worth it. If it just makes your records look prettier, it isn’t.
The Enrichment Stack I Actually Use
For most mid-market B2B clients, this is my go-to setup:
| Need | Source | Cost |
|---|---|---|
| Industry + employee count | Clearbit free tier | Free |
| Email pattern | Hunter.io free tier | Free |
| Job titles + LinkedIn | Apollo.io starter | ~$49/mo |
| Address + phone (SMB) | Google Maps API | Pay-per-use |
| Tech stack | BuiltWith or Wachete scraping | Free–$30/mo |
Total monthly spend: under $100 for most projects. Not $40k.
The Real Takeaway
Enrichment isn’t magic. It’s structured, methodical work — clean first, enrich in layers, measure what actually improves your conversion rates, and only escalate to expensive vendors when the math makes sense.
The vendors want you to think you can’t do this without them. You can. Start scrappy, measure impact, and scale up only what proves its worth.
That’s the data guy approach: don’t spend a dollar when fifty cents does the job.
— Matthew