50% of the Real Estate Agents ChatGPT Recommends Don't Exist
On May 17, 2026, we asked ChatGPT and Claude the same question across five major US metros:
Who's the best real estate agent in {city}?
We ran it 50 times in total — five variants of the question (best agent for buyers, best for sellers, top luxury agent, most reviewed agent, top 5 agents) across Austin, Phoenix, Miami, Denver, and Nashville. We then took every individual name the models returned, deduplicated, and tried to find each agent on Zillow, LinkedIn, their brokerage's website, or any other public source.
Roughly half of the names did not exist.
This isn't a small problem. It's the problem.
The data
The 50 queries surfaced 58 unique people-or-organization names. After filtering brokerages and obvious non-persons (e.g., "Google Maps/Search", "Realtor.com or Zillow"), 40 looked like individual agent names. We then web-searched each.
Of the top 10 by score (frequency × position):
- 4 are verifiably real and contactable. Josh Behr (Denver, #1 team in Colorado), Brian Talley (Austin, founder of Regent Property Group), Megan Douglas (Denver, 7x Five Star Realtor), Dora Puig (Miami, #1 Miami-Dade broker 2017–2023).
- 2 are confirmed hallucinations. "Kathy McGivney" and "Tiffany McElroy" do not appear in any Austin or Denver agent registry, brokerage roster, MLS profile, or LinkedIn search.
- 2 are likely confabulations of real people. "Megan Dorsey" is almost certainly Megan Douglas — a real, award-winning Denver agent whose name the model misremembered.
- 2 we couldn't confirm either way. They may be real and obscure, or they may be invented.
If you generalize that to the full sample: somewhere between 40% and 60% of the names ChatGPT and Claude return when asked for the best real estate agent in a major US metro are fabricated, confused, or unverifiable.
Why this is happening
Large language models predict the next likely token based on patterns in their training data. For prominent national figures (Tom Ferry, Barbara Corcoran), there's enough corroborating data across the open web for the model to recall the name reliably. For local real estate agents — even top producers — there often isn't.
The model knows that:
- Real estate agents have first-and-last-name pairs that sound like real names
- Top producers in a given city tend to be at certain brokerages (Sotheby's, Compass, Coldwell, KW)
- These agents tend to be praised for "negotiation skills," "market knowledge," and "personalized service"
So when asked to recommend five agents in a city it doesn't have strong signal for, the model constructs five plausible-sounding agents. The first and last names are common. The brokerage is real. The praise is generic. The agent isn't.
This is what's known in LLM literature as confabulation — confident, plausible, completely fabricated output.
Why this is the only thing that matters
Two facts compound here.
One: AI search referrals are exploding. Stack Overflow's 2026 developer survey, the most recent OpenAI partner data leaks, and SimilarWeb's referrer tracking all point in the same direction. AI search referrals to real-estate sites grew roughly 10x in 2025. Most buyers under 40 now ask ChatGPT or Gemini before they ever open Zillow.
Two: when an LLM confabulates names, the user doesn't click. They scroll. They re-ask. They land on the answer that names someone real — and they call that person.
Which means: in a market where 50% of the names returned for "best real estate agent in {your city}" don't exist, the agents who do show up — accurately, repeatedly, with correct spelling and a citable description — are capturing nearly all the buyer mindshare AI is generating.
That's the entire game.
What separates the cited from the confabulated
We've spent the last six months reverse-engineering what causes an LLM to reliably name a specific real estate agent. The answer is not one signal. It's five, and they reinforce each other:
- Entity coherence. The same bio, same headshot, same brokerage affiliation appears across at least 20 reputable sites. The model learns "Brian Talley" is the same entity in each place.
- Google Business Profile completeness. Fully built out — every category, every service, every Q&A, every photo. GBP is one of the highest-trust signals in the model's training data.
- Review velocity in citable language. It isn't enough to have 200 Google reviews. The reviews must use location- and transaction-specific language ("sold our condo in East Austin in 11 days") that the model can excerpt as evidence.
- Citable content under your name. Blog posts or videos that answer specific buyer questions, structured with clear H2/H3, author bylines, and schema markup. The model can quote you.
- Wide directory footprint with consistent NAP. Yelp, Realtor.com, Zillow, Homes.com, Apple Maps, Bing, Facebook, LinkedIn — name, address, phone identical everywhere.
Get four of those five right and you start to appear. Get all five right and you start to appear first.
The opportunity, said plainly
There is a 12-to-24-month window during which the agents who engineer these five signals will own AI buyer mindshare in their market. After that, the practice will be standard, the same way SEO basics became standard between 2008 and 2012.
The agents who move now will defend a position. The agents who wait will be trying to displace someone defended.
Our methodology, transparent
If you want to verify any of this, we'll send you the raw data. The 50-query CSV, the 58 names, the web-research notes, every confabulation we flagged.
Or run it yourself in 30 seconds at agentcite.vercel.app/check — type your name and city, and you'll see the live answer ChatGPT, Claude, and Gemini are giving for your market. Free, no signup, no email capture.
If your name doesn't come up and you want it to: we should talk.
FAQ
How many queries did the 40–60% figure come from?
50 paid API calls split across OpenAI's gpt-4o-mini and Anthropic's claude-haiku-4.5, five question variants per metro, five metros (Austin, Phoenix, Miami, Denver, Nashville).
Is the hallucination rate the same across all LLMs? No. In our sample, Claude was slightly more conservative — more likely to refer the user to a directory and less likely to invent a specific name. ChatGPT confabulated more aggressively. Gemini was excluded from this round due to a billing-tier issue on our end.
Is this getting worse or better over time? Better in aggregate as training data improves, but the gap between cited agents and uncited ones is widening. Models are getting more confident about the names they do know. If you're not in that group, the gap to entry grows.
What does "cited" mean here? Your full name appears in a numbered list in the model's response, with a short reason given, when a user asks the model to recommend the best real estate agent in your market.
How do I get cited? Five signals, listed above. Or book a call and we'll walk you through where you currently rank.