The big three search engines have recently introduced the concept of canonical links in an attempt to solve the problem of duplicate data in their search results. The basic problem is that multiple links to the same content can vary in style, thereby creating duplicate instances in the results pages of a search. The variation in style could be really simple like these:
All of these links produce the same content, and would be considered different by a search engine and would therefore produce duplicate results. Consider a more complex example:
Again, all of these links essentially lead to the same content. The variances defined by the query string parameter only alter the way content is displayed, and not the intrinsic nature of the content itself. We therefore want to avoid having all four examples showing up in Google, if possible.
There are two ways to achieve this:
The first is to always use absolute urls in links - a consistent approach to linking reduces duplicates on the site and reduces the variance of links found by the search engine's crawler.
The second is to use a canonical link - this would be added to the head element of a page and contain a URL that is considered to be the 'true' path to the content. In the e-commerce style example above, the link element would look like this:
<link rel="canonical" href="http://www.fishing.ca/shop/rods.aspx" />
This means that however a crawler reached the page, it would know that the content is the same as defined in the page found at the http://www.fishing.ca/shop/rods.aspx and would therefore not index it seperately.
There are a number of caveats:
In summary, using canoncial links will help us build better search-engine optimized sites; but we shouldn't stop doing the basics like using proper absolute urls and building good site maps etc. There's still more research to do on all of this!