How to Identify All External Links on a Website Using XPath

Often can be the case where interesting requests come in from people you are working with which there doesn’t appear to be a useful tool available for quickly gathering the information. This is an interesting required which has come in today about how to quickly identify all external links on a website. 

This is actually quite a common issue since with various content management systems simply adding various content (and links) around hundreds/thousands of pages across a website, how do you easily find all of the external links correctly? 

Fortunately, I have figured out a nice way using a bit of XPath, the SEO Tools plugin for Excel, and Xenu

If you are looking for a quick answer then this is the XPath required to identify external links on a single web page;

 

//a[not(contains(@href, ‘www.michaelcropper.co.uk’))]/@href

 

So what does this actually mean? 

  • //a : Get me any links that …
  • [not(contains( : … do not contain …
  • @href : … a link which ….
  • ‘www.michaelcropper.co.uk’))] : … contains this website address and …
  • /@href : … get the HREF attribute for this link

 

Make sense? Good. Lets look at actually using this XPath in a useful way.

 

SEO Tools Plugin

Now the interesting thing is when using XPathOnURL with SEO Tools, this doesn’t actually bring back the HREF attribute, instead it pulls back the first URL on the page which may be good enough for this purpose. So the function would be as follows when the URL you want to test against is in cell A1;

 

=XPathOnUrl(A1, “//a[not(contains(@href, ‘www.michaelcropper.co.uk’))]/@href”)

 

In the example above I was testing on the URL http://www.michaelcropper.co.uk/2012/06/googles-business-plan-steal-content-and-screw-publishers-1081.html as that contains a link to an external website. So now we want to look at scaling this up for a bunch of URLs on a website. 

 

Xenu

Now you know how to check if a specific URL contains an external link, then the next step would be to do this for all URLs on the site you want to check. 

Simply get Xenu installed and run the program on any website which you can then export all website URLs into an Excel file. 

Now you will have a huge list of all URLs on a website, where you can then run the same XPathOnURL function on to identify all pages on a website which contain at least one external link. 

 

Summary

This is likely only one solution to a problem and doesn’t actually allow you to create a definitive list of every single external link on every page of the website, but it does tell you which pages on a website contain an external link to another website. 

Simple, but effective. :-) 

 

 

SEO & Internet Geek

Twitter LinkedIn Google+ 

Leave a Comment

Switch to our mobile site