This study looks at a clear problem: conversational systems often suggest urls that fail. Ahrefs checked 16 million unique urls tied to popular platforms and compared outcomes to traditional search. The finding is stark: these tools send users to 404 pages about 2.87x more often than Google Search.
For marketers and SEO teams in India, this trend matters now. Early analytics show branded discovery routes can shift as newer search experiences surface content. A small share of traffic that breaks on landing can erode trust quickly.
This piece is a data-led report, not a panic alert. You will get the study method, per-platform results, why such errors occur, and practical steps to detect and fix problems operationally.
Key Takeaways
- Across 16 million urls, assistants drove 404s 2.87x more than Google Search.
- “Hallucinated links” here means credible-looking suggestions that return errors when clicked.
- Indian marketers should monitor small traffic shifts—they can grow into bigger issues.
- The report covers methodology, results by platform, and mitigation steps.
- Action now—set up tracking and quick-remediation workflows to protect brand journeys.
Why AI-generated broken links are becoming a measurable web traffic problem
Analytics can now surface a quiet but costly problem: generated URLs leading to dead pages. The 16 million–URL study found that assistants send visitors to 404 pages 2.87x more often than Google Search.
These are produced URLs that look plausible but route users to 404 pages. That dead-end experience erodes confidence in both the referrer and the brand.
Because many visits are mid-journey research, a broken landing interrupts discovery and lowers conversion chances. Repeated failures make a website feel neglected or outdated.
Key headline finding from the 16M-URL analysis
The data is clear: the clicked-URL average 404 rate was 0.43% for generated referrals versus a 0.15% baseline from google search across 629M unique URLs.
- Generated referrals appear as distinct referrers, so teams can quantify how often sessions hit 404 pages.
- For high-growth sectors in India—SaaS, fintech, education—even small bumps in website traffic from these sources create visible impact.
The next section will explain how the URLs were analyzed and how the 404 rate was measured across clicks and citations.
How the 16 million URLs were analyzed across ChatGPT, Gemini, Copilot, Perplexity, Claude, and Mistral
We split the study into two complementary datasets to capture both user-facing failures and raw citations.
Dataset one: web analytics of clicked urls
This view used referrer data in web analytics to collect clicked urls where the session source was an assistant. That ties the problem to real sessions rather than hypothetical mentions.
How likely 404 pages were identified
At scale, pages were flagged as likely 404 when the HTML title contained “404” or “not found.” This pragmatic rule made it possible to scan millions of unique urls quickly.
Dataset two: brand radar of cited urls
The brand radar extracted cited urls from model outputs, independent of clicks. This shows how often urls are mentioned, including many that never earn a visit.

Validating http status with a crawler database
For urls present in the crawler (~65% coverage), we pulled the most recent http status to compute 404 rates. Coverage matters because missing entries can hide true never-existing urls.
“Combining session-based clicked urls with a crawler-backed brand radar delivers a more complete, pragmatic picture of broken referrals.”
Known limitations: title-based detection can undercount 404s when templates omit those strings. Click data misses unclicked citations, and crawler gaps can understate true hallucinations. Conversely, some 404s reflect removed but legitimate pages, so not every 404 equals a fabricated url.
With methodology established, the next section compares 404 rates by source and benchmarks them against Google baselines.
What the data shows about AI assistants’ hallucinated links and 404 rates vs Google Search
The data presents a clear contrast between referral sources and real-world landing outcomes.
Clicked-URL leaderboard:
| Source | Clicked 404 rate | Cited 404 rate | Context |
|---|---|---|---|
| ChatGPT | 1.01% | 2.38% | Highest clicked and cited 404 rate |
| Claude | 0.58% | — | Mid-range clicked rate |
| Copilot | 0.34% | 0.54% | Lower cited rate than ChatGPT |
| Perplexity | 0.31% | 0.87% | Tracks close to google search index |
| Gemini | 0.21% | 0.86% | Also close to SERP baseline |
| Mistral | 0.12% | — | Lowest clicked rate; smaller volumes |
Benchmarking makes the gap clear. Google referrer baseline sits at 0.15% 404 rate across 629M urls. Across assistants the average clicked 404 rate is 0.43%, or 2.87x the 404 rate google baseline.
Why this matters: ChatGPT’s cited urls return 404 at 2.38%, far above the SERP baseline of 0.84% for top Google results. Perplexity and Gemini track closer to Google’s rate, which suggests their source index leans on Google data rather than inventing urls. Over time, urls that do not actually exist reduce trust and cost conversion opportunities.
Why AI assistants hallucinate links in the first place
Behind each fabricated url there is usually either an expired page or a clever pattern guess. Both create credible-looking referrals that then fail when clicked.

Once-valid URLs that expired
Models often surface pages that existed during training. When those pages were deleted or moved without redirects, the address now will return 404.
Deletions, CMS migrations, and missing 301s all produce this category. For Indian teams, this means older campaign pages can suddenly harm discovery if left unredirected.
Pattern-based hallucinations
Generative systems also guess addresses using a site’s common structure. These pattern-based hallucinated urls mirror real paths and feel trustworthy.
The more consistent your permalink and blog patterns are, the easier it is for models to invent plausible but nonexistent pages.
Real-world examples and amplification
Ryan Law documented practical cases on Ahrefs: plausible paths such as /blog/internal-links/ and /blog/newsletter/ attracted visits yet return 404.
“Plausible-sounding blog paths that never existed were pulling clicks and producing dead pages.”
When generated content publishes these fabricated urls, crawlers may index them. That creates a feedback loop: the web copies the error and it spreads.
| Cause | How it looks | Fix |
|---|---|---|
| Expired but once-valid | Previously live blog or product page now missing | Restore or 301 to closest live page |
| Pattern-based guess | Looks like /blog/topic-name/ but never existed | Create content, 301 to category, or serve helpful 404 |
| Published hallucination | Third-party content lists a fake url | Request removal, add canonical, or set redirects |
SEO implication: even wrong urls often carry correct intent. A targeted redirect or a useful 404 can reclaim value and protect brand journeys.
Impact on SEO and website operations for teams in India
For Indian marketing teams, a tiny share of misrouted traffic can create outsized operational work. Though these referrals make up roughly 0.25% of website traffic versus about 39.35% from google search, they often carry high intent.
Where these broken referrals appear
Look for sudden spikes in 404 pages with nonstandard referrers. They show up in landing page reports and content journeys where users expect a specific guide or pricing page.
How to find AI-referred 404 pages in GA4
Use Explorations, pick “Session source,” and apply this regex filter: .*gpt.*|.*chatgpt.*|.*openai.*|.*perplexity.*|.*claude.*|.*gemini.*|.*copilot.*|.*mistral.*|.*bard.*
Audit urls at scale with Google Sheets
Export landing urls, then add an Apps Script function such as =GetHttpStatus(A2) to pull each page’s http status. Filter results to 404 and combine with visit counts (example: >10 visits/month).
What to fix first and a quick mitigation playbook
- Prioritize broken urls that have meaningful traffic and business intent (pricing, docs, product pages).
- Use 301 redirects when a close topical match exists.
- When no good match exists, serve a high-converting 404 page with resource links and CTAs.
Measure changes over time. Track recurrence, update redirects, and let data guide where to spend engineering time for the best SEO impact.
Conclusion
The study delivers a clear, actionable message: generated referrals produce 404s at 2.87x the Google baseline, with ChatGPT showing the highest observed rates (1.01% clicked, 2.38% cited).
Why this happens is simple: web churn leaves expired pages, and pattern-based guesses create plausible but nonexistent addresses. Both paths send real users to dead pages.
Recommended posture: measure first, then fix. Use analytics and HTTP audits to find high-intent 404s. Prioritize traffic and business impact before creating redirects.
Action checklist: isolate non-search referrers, surface 404 landing pages, redirect or restore high-value paths, and improve 404 UX to recover trust. Teams that adopt this repeatable process will protect brand journeys as discovery channels evolve.

