Last week, Google, Yahoo and Live all announced their support for a new header tag called the “canonical” tag. This is designed to let you tell the search engines an alternative URL to index for a page.
It is designed to remove duplicate content issues from your web pages by letting you specify the correct version of the page. It is like a 301 redirect, but is a “strong hint” rather than an order.
A good example might be if you use information from the query string to sort data on a page. Your website might generate the following pages
Search engines would regard each of these as individual pages, even though they are essentially the same page. The information is just in a different order.
Allowing URLs like this could be detrimental to your SEO efforts as your PageRank would be split between the pages, and people could be linking to the different versions of each page.
You could use the tag like this to remove this problem. There is a more technical explanation of how to use the tag over at the SEOmoz blog.
<link rel="canonical" href="http://example.com" />
The examples given by Google underline something very important about this tag; you should be striving to never need to use it.
Here are some situations that you should never need to use the tag to resolve:
In order to prevent duplicate content, a lot of websites redirect the non-www version of a URL to the www version. For example, accessing http://example.com will redirect you to http://www.example.com.
Lesson: use a 301 redirect
The links generated by the CMS or online store in Google’s examples are the result of sloppy coding.
While accessing http://example.com/prod.php?p=fish and http://example.com/prod.php?p=fish&cat=animals will result in the same URL, these two links should NEVER be generated by a CMS. This is the result of sloppy coding. While using the canonical tag as a backup might be OK, you should never rely on it.
Lesson: Fix your coding errors first.
Search results are full of URLs with tracking IDs, such as those generated by Google Analytics URL Builder. Like this:
But these links should be logged, stripped from the URL and the user redirected, preferably via a 301 redirect.
We are all familiar with sessions IDs in URLs. They look something like this:
These should never be used on publicly accessible URLs. If you use PHP, always choose the default cookie option for recording session identifiers. Using identifiers in the URL is less reliable and less secure.
Google also claims that they are able to detect session identifiers in URLs, so in theory this shouldn’t be a problem.
Lesson: Never use sessions identifiers in your publicly accessible URLs. Never.
As an exception, one thing this tag will be great for combatting is the number of printable versions of documents I come across via search engines. There is very rarely a link to access the full version of the page directly from the printable version. If all these versions of pages were eradicated from the search results, I would be very happy. People should really be tagging these as noindex, but if they are featuring in the search results, I guess this won’t happen.
As usual, most of these ideas are just my personal thoughts. While the new tag shouldn’t be necessary on your site, it is another useful tool to have handy.
But before you use it, think about what is happening on your website in the first place to make you need to use it.