How to Run an Internal Linking Audit and Fix Orphan Pages

Step-by-step internal linking audit: crawl your site, map the link graph, find orphan pages, fix weak links, and prioritize what actually moves rankings.

Sudharsan Ananth

Sudharsan Ananth

Founder & CTO

May 14, 202618 min read

Internal linking is the SEO lever most content teams never fully control. They add links as they write, follow their instincts, and move on. The result is a link graph that has grown organically but not strategically: pillar pages that are underlinked, supporting posts that are invisible to crawlers, and orphan pages that Google has no reliable way to discover or credit.

An internal linking audit is how you take back control. It maps what your site has actually built versus what you intended, surfaces the gaps doing the most damage, and produces a prioritized fix list you can act on immediately.

I have run these audits on content estates of varying sizes, and the pattern I see repeatedly is this: the sites that struggle with organic growth despite publishing quality content often have a structural problem, not a content problem. Fix the structure and the content starts working.

This guide covers the full process, including a copy-pasteable audit checklist you can use immediately.

PageRankGoogle confirms PageRank remains part of its core ranking systems, making internal link equity distribution a live ranking input (Google Search Central)
5%Share of organic visits driven by orphan pages in one enterprise case study, despite making up the majority of pages on the site (Botify)
0The 'magical ideal number' of internal links per page, per Google. The right number is whatever is genuinely useful to readers.

What Is an Internal Linking Audit?

An internal linking audit is a structured review of every link that connects pages within your own domain. The goal is to answer four questions: Which pages are not being linked to at all? Which pages are receiving far fewer links than their importance warrants? Which pages are over-linked in ways that dilute focus? And is the anchor text being used descriptive enough to pass meaning to crawlers and readers?

The audit is distinct from a full content audit, which evaluates content quality and performance page by page. The internal linking audit focuses on the connective tissue: how pages point to each other, how link equity flows through the site, and how easily Googlebot can discover and navigate the full estate.

Google has been explicit about why this matters. According to Google Search Central, "Google must constantly search for new pages and add them to its list of known pages... The main method is following links from pages that we already know about." If your pages are not linked from pages Google already knows, they may not get crawled. And Google also confirms that PageRank remains part of its core ranking systems, which means internal links are still one of the mechanisms by which authority flows from high-equity pages to pages that need it.

The audit starts with a complete crawl. Use Screaming Frog, Sitebulb, or Ahrefs Site Audit. Set the crawler to start from the homepage and follow all internal links. Let it run to completion.

What you need from the crawl:

  • Every crawlable URL on the domain
  • For each URL: the list of pages linking to it (inlinks) and the list of pages it links out to (outlinks)
  • HTTP status codes (you want to catch broken internal links and redirect chains at this stage)
  • Anchor text for every internal link

Export this to a spreadsheet. The raw crawl output is your working data set for the entire audit.

One important note: also pull your indexed URLs from Google Search Console. Cross-reference against the crawl. Any URL that appears in Search Console but was not found by your crawler has a discoverability problem worth investigating separately. Any URL your crawler found but that is not indexed by Google despite being meant to be indexed is also a flag.

With the crawl data in hand, the first thing I do is sort every URL by inlink count, lowest first. This surfaces orphan pages immediately.

An orphan page is any page that no other page on your site links to. Googlebot discovers pages primarily by following links. If no page links to a URL, the only way Google finds it is through your XML sitemap (which helps but is not a substitute for links) or an external backlink. Research on one enterprise site found that orphan pages drove only 5% of organic visits despite making up the majority of the site's pages, while pages embedded in the site's crawlable structure drove 95%. That is a single case study from one unnamed site, not an industry average, but it illustrates the exposure clearly: pages that are structurally isolated tend to be organically invisible.

For each orphan page, ask:

  • Is this page supposed to be indexed? (If not, check that it carries a noindex tag.)
  • Is this page live and intended to rank? If yes, why is nothing linking to it?
  • Does this page belong to a topic cluster that has other posts? If yes, those sibling posts should be linking to it.

Beyond true orphans, also flag pages with very few inlinks relative to their importance. A page you have identified as a pillar or money page that has only one or two internal links is structurally underserved regardless of how good the content is.

Why orphan pages underperform: the link discovery chain

Source: Google Search Central on crawl methodology; Botify orphan pages case study. Values are illustrative of the structural hierarchy, not a measured dataset.

Orphan pages are the extreme end of the underlinked spectrum. But pages with one or two inlinks are often just as exposed, especially when they are targeting competitive queries.

Pull a list of your most important pages by business value: pillar pages, product-adjacent content, high-conversion landing pages. For each one, count how many internal links point to it. If a page you consider central to your topic cluster is receiving three or four inlinks from your entire site, that is a structural gap.

The way I frame this in practice is as a "link budget" problem. High-value pages should be drawing link equity from a wide surface area of the estate. If the links pointing to them are clustered in a few places (say, only from the blog's landing page and one related post), you are leaving equity on the table that other pages in your estate could be passing.

The fix is to find every existing post that is topically related to the underlinked page and check whether it already links to it. If not, add a contextual link where it genuinely serves the reader. This is not a link-stuffing exercise. Only link where the connection is natural and the anchor text is descriptive.

The inverse problem is less common but worth catching. Pages that contain dozens or hundreds of internal links diffuse the link equity they pass to each destination. There is no hard limit. Google has stated explicitly that "there's no magical ideal number of links a given page should contain." But the same documentation adds the practical note: "if you think it's too much, then it probably is."

In practice, I flag any page sending more than 80 to 100 internal links, not because crossing a number triggers a penalty, but because that volume usually signals a structural issue: an uncurated resource page, an auto-generated archive with full link lists, or a page that has accumulated links over years without being trimmed. These pages are not usually ranking priorities, but they can be diluting the equity sent to pages that do matter.

Step 5: Audit Anchor Text Quality

Once you have the link graph mapped, audit the anchor text. This is often where the most straightforward wins live.

Pull the anchor text for every internal link pointing to your important pages. Look for:

  • Generic anchors: "click here," "read more," "this post," "learn more." These pass no topical signal to crawlers.
  • Over-exact match anchors: The same exact keyword phrase used on every link to a page. While internal exact match is generally safer than external, a completely uniform anchor profile looks unnatural.
  • Mismatched anchors: Anchor text that does not describe the destination page's topic. This can confuse Googlebot's understanding of what the linked page is about.

Google's guidance on anchor text is direct: "Good anchor text is descriptive, reasonably concise, and relevant to the page that it's on and to the page it links to." The same documentation flags keyword stuffing in anchor text as a spam policy violation. The target is natural, descriptive variation.

In practice, this means most of your links to a pillar page on "topic clusters" should use anchors like "topic cluster strategy," "building topic clusters," "how topic clusters work," and so on, not twenty instances of "topic clusters and pillar pages" verbatim, and definitely not "here." For a deeper look at how to build out anchor text variation and a full internal linking strategy, the internal linking for SEO guide covers the strategic layer that the audit surfaces gaps in.

The final structural check is strategic: do your highest-value pages have the internal link weight their importance warrants?

For a content estate built on topic clusters and pillar pages, the pillar pages should be the most internally linked pages on the site. Every spoke post in the cluster should link back to the pillar at least once, with descriptive anchor text. If your audited data shows a spoke post that does not link to its pillar, that is a gap to close immediately.

Similarly, if you have content that feeds a conversion path (bottom-of-funnel posts, pages adjacent to a product or service page), check that the surrounding content is routing readers toward it. A well-written BOFU page that sits behind one link from a single top-of-funnel post is not well-supported.

This step requires you to have your topic cluster map available alongside the crawl data. If you have not mapped your clusters, the topic clusters guide covers how to build that structure. You cannot audit the link support for clusters you have not defined.

How to Fix Orphan Pages

Orphan pages are the highest-priority fix from most internal linking audits. There are two valid approaches, and the right one depends on the page's quality and potential.

If the page has ranking potential: Find the most topically related pages already on your site and add contextual links to the orphan from each of them. The anchor text should describe what the orphaned page covers, not use a generic label. Also check your XML sitemap to ensure the page is listed there. Sitemap inclusion helps but does not replace link-based discovery.

If the page has no clear home in the estate: Ask whether it belongs. If the page covers a topic that has no related content and no clear cluster it fits into, it may be a content gap analysis problem: you have a page on a topic your estate does not support, which makes it structurally isolated by design. In that case, the right fix may be to plan cluster content that gives the orphan page a context, or to consolidate the orphan into a stronger related page.

What not to do: Do not solve orphan status by adding links from a sitemap page or a generic "related posts" widget. These links carry weak signals and do not build the contextual relationship that makes internal linking useful for rankings. The link needs to appear in the body of a page whose topic is genuinely related.

How Often Should You Run an Internal Linking Audit?

Run a full internal linking audit alongside your annual content audit. The two audits are complementary: the content audit tells you which pages are worth investing in, and the internal linking audit tells you whether those pages are structurally supported.

Run a lighter check whenever you publish a significant batch of new content. New posts create new orphan risk (if you publish without linking to and from existing content) and new opportunities (new posts may be the natural place to add links to existing underlinked pages).

Also run a targeted check after any site migration, URL restructure, or CMS change. These operations routinely break internal link paths or orphan previously well-linked pages.

Mistakes to Avoid

Fixing orphans with widget links only. Adding a page to a "you might also like" sidebar widget is better than leaving it fully orphaned, but it is not a substitute for contextual body links. Widgets often carry lower equity and do not provide topical context.

Ignoring HTTP status codes during the crawl. Broken internal links (404s) and long redirect chains waste crawl budget and lose link equity at each hop. Google's crawl budget guidance notes that redirect chains have "a negative effect on crawling." Fix broken links before spending time on link distribution.

Treating click depth as a rigid rule. Semrush and other tools commonly flag pages more than three clicks from the homepage as an issue. This is a useful heuristic, not a Google rule. Google has not published a click-depth threshold in its documentation. Still, deeply buried pages are harder to crawl frequently, so minimizing unnecessary depth is sensible, especially for large sites.

Adding links mechanically without editorial judgment. The purpose of an internal link is to help a reader find something more relevant or useful. Links that serve that purpose also serve SEO. Links stuffed in because "we need more links to the pillar page" serve neither. Every link you add during the fix phase should be one a reader would actually follow.

Running the audit and not tracking the changes. Log every link addition or fix with the date, source URL, destination URL, and anchor text used. You will need this log to understand what changed if rankings shift, and to avoid duplicating work in the next audit cycle.

The Internal Linking Audit Checklist

Copy this into a spreadsheet or task manager and work through it in order.

=== INTERNAL LINKING AUDIT CHECKLIST ===

PHASE 1: CRAWL AND DATA COLLECTION
[ ] Run a full site crawl from homepage (Screaming Frog / Sitebulb / Ahrefs)
[ ] Export all crawlable URLs with inlink count, outlink count, and HTTP status
[ ] Export full internal link list: source URL, destination URL, anchor text
[ ] Export indexed URLs from Google Search Console
[ ] Cross-reference crawl vs. GSC: flag URLs in GSC not found by crawl and vice versa
[ ] Note any 4xx errors or redirect chains in the internal link set

PHASE 2: ORPHAN PAGE IDENTIFICATION
[ ] Sort all URLs by inlink count ascending
[ ] Flag all URLs with 0 internal inlinks (true orphans)
[ ] Flag all URLs with 1-3 internal inlinks (near-orphans)
[ ] For each orphaned URL: determine whether it is indexable and intended to rank
[ ] Check sitemap: confirm indexed orphan pages are listed in XML sitemap

PHASE 3: PILLAR AND MONEY PAGE LINK AUDIT
[ ] List your top 10-15 priority pages (pillars, BOFU content, high-conversion pages)
[ ] Count inlinks to each priority page from crawl data
[ ] Flag any priority page with fewer than 8-10 inlinks as underlinked
[ ] Confirm each cluster spoke post links back to its pillar page at least once

PHASE 4: LINK DISTRIBUTION AUDIT
[ ] Sort all URLs by outlink count descending
[ ] Flag any page sending more than 80-100 internal links for review
[ ] Check these high-outlink pages: are they resource/archive pages? Are links curated?

PHASE 5: ANCHOR TEXT AUDIT
[ ] For each priority page, export all anchor text from inlinks
[ ] Count generic anchors ("click here", "read more", "here", "this post")
[ ] Flag any page where more than 20% of inlinks use generic anchors
[ ] Flag any page where all or nearly all inlinks use identical exact-match anchors
[ ] Flag any inlink where anchor text does not describe the destination page's topic

PHASE 6: FIX PRIORITIZATION
[ ] Priority 1: Fix broken internal links (4xx destination URLs)
[ ] Priority 2: Add body links to true orphan pages that have ranking potential
[ ] Priority 3: Add links to underlinked pillar and money pages from topically related posts
[ ] Priority 4: Replace generic anchor text with descriptive anchor text
[ ] Priority 5: Trim or reorganize pages with excessive outlinks

PHASE 7: TRACKING
[ ] Log every change: date, source URL, destination URL, anchor text, reason
[ ] Set a 60-day review reminder to check GSC coverage and ranking changes for affected URLs
[ ] Schedule next full audit (annually or post-migration)

=== END CHECKLIST ===

FAQ

What is an internal linking audit?

An internal linking audit is a structured review of how pages on your site link to each other. It identifies orphan pages (pages with no inlinks), underlinked priority pages, pages sending excessive links, and anchor text problems. The goal is to ensure link equity flows to the pages that matter most and that Googlebot can discover your full content estate.

What is an orphan page in SEO?

An orphan page is a page on your site that no other page on your site links to. Because Google primarily discovers pages by following links, orphan pages are at risk of being crawled infrequently or not at all, limiting their ability to rank regardless of content quality.

How do I find orphan pages?

Run a full site crawl with a tool like Screaming Frog or Ahrefs Site Audit, then export the inlink count for every URL. Any URL with an inlink count of zero is an orphan. Cross-reference with Google Search Console to see whether those orphan pages are indexed. If they are indexed but orphaned, they may be surviving on sitemap discovery or external links, but they are not receiving internal link equity.

Google has stated there is no ideal number of internal links per page. The practical guidance from Google is: if you think the number is too many, it probably is. Focus on linking where it is genuinely useful to a reader navigating the page, not on hitting a numerical target.

How is an internal linking audit different from a content audit?

A content audit evaluates page-level quality, traffic performance, and strategic fit, and assigns actions like keep, update, consolidate, or remove. An internal linking audit evaluates how pages connect to each other: link counts, link equity distribution, anchor text quality, and structural gaps. Both audits are complementary. Run them together and you get both a quality map and a structural map of the estate.

How often should I run an internal linking audit?

Once a year as part of a full content audit cycle, plus a lighter check whenever you publish a significant batch of new content or make structural changes to the site (migrations, URL restructures, CMS changes). New content creates new orphan risk and new interlinking opportunities that should not wait twelve months to be addressed.


The link graph is the skeleton of your content estate. You can publish excellent content indefinitely and still watch it underperform if the structure connecting it is broken. The internal linking audit is not glamorous work, but it is one of the clearest leverage points in SEO precisely because most teams run it rarely or never.

I find that the biggest shifts from this kind of audit often come not from any single link addition, but from the pattern that emerges when you look at the whole graph: clusters that are internally coherent performing well, isolated pages performing poorly regardless of content quality. The structure is the signal. Fix it once, maintain it on a schedule, and the estate compounds instead of just accumulates.

If you have not yet done a content gap analysis, running one after the linking audit is a natural next step: once you know which pages are structurally well-supported and which are not, you can identify where new content would close topical gaps and give orphan pages the cluster context they are missing.

Sudharsan Ananth

Written by

Sudharsan Ananth

Founder & CTO

Founder & CTO at Sparkable. He writes about pragmatic engineering, applied AI, and building content systems that actually ship — not just features.

Sudharsan Ananth

Sudharsan Ananth

Founder & CTO

Building something?

Grab a free 30-min call — no pitch, just a useful conversation.

Book a free call