Are you paying attention to duplicated contents? Contrary to popular rumors, Google's John Mueller clearly affirmed no penalty is given to site with duplicated contents. But duplicated contents lead to cannibalization which divides page's evaluation and it makes hard to rank those pages high. This behavior is well known and are believed true. Even Google's official document referred to this cannibalization issue.
In many cases we have duplicated contents especially we add new contents in regular basis. There are few ways to work with this issue, this post explains canonical tag in detail.
Cannibalization by duplicated contents
Duplicated contents make Google's search engine confused to evaluate those pages properly, and give lower rating. This means those pages won't rank high as expected. SEO industry calls it cannibalization issue.
Lazy web masters may not pay attention to duplicated contents and get lower rank as result. But in many cases we have duplicated contents which we can't simply delete or making them not indexed. They looks duplicated but there are good reasons to be kept in site. But Google can't know this idea. This problem annoyed web masters for long time.
Web masters wanted to have definite way to let Google know which page is genuine one to evaluate in case there are duplicated pages.To meet this requirement, search engine developers cooperated to introduce canonical tag which is perfect solution to this issue.
Canonical tag
Introduction of canonical tag
Canonical tag is not new feature. It was introduced in mid 2009.
We now support a format that allows you to publicly specify your preferred version of a URL. If your site has identical or vastly similar content that's accessible through multiple URLs, this format provides you with more control over the URL returned in search results. It also helps to make sure that properties such as link popularity are consolidated to your preferred version.
Quoted from:Specify your canonical
So cannibalization issue has long history and let search engine developers like Google cooperate to come up with canonical tag.
Role of canonical tag
Canonical tag is used to normalize URL. Google's official document "Help Google choose the right canonical URL for your duplicate pages" is recommended for better understandings. Web master can tell search engine which page is canonical one in case there are similar/duplicated pages, to avoid cannibalization issue. Search engine sees canonical tag as hint to evaluate pages.
For example there are 4 pages written and updated in few months about same topic. Update one page instead of creating new page is one way, but it is common having new page and keep old one readable seems better. Old pages are valuable as archives and should be accessible but want Google to evaluate latest page only.
- More than 90% of 4 pages are same and difference is less than 10%. Google will see those pages are duplicated contents and we face cannibalization issue.
- Latest page is one to be evaluated and ranked in Google.
- Old 3 pages exist and linked from latest page, but want Google to ignore them.
Canonical tag makes them possible.
Example usage of canonical tag
Continued from previous section. All old 3 pages are canonical-ed to latest page, using canonical tag to specify latest URL is one to evaluate. Then reader can normally read old 3 pages and we can let search engine know genuine one is latest page.
One thing to be noted. Naming 4 pages A, B, C, D from older one, you should not do this.
- Let A canonical to B.
- Let B canonical to C.
- Let C canonical to D.
This makes chained canonical which should be avoided for better processing. Instead, we do this.
- Let A canonical to D.
- Let B canonical to D.
- Let C canonical to D.
And when we add latest duplicated page E, we do this.
- Let A canonical to E.
- Let B canonical to E.
- Let C canonical to E.
- Let D canonical to E.
But it simply increases maintenance cost. So I fix latest page's URL to X, all old pages are canonical-ed to X, and change latest page's URL when I add another latest one.
Canonical tag is hint for search engine
Google clearly stated canonical tag is just hit for search engine and it's not like page redirection. Google's search engine understands canonical tag as hint and respect, but it may not work always as expected.
I actually saw old page is also ranked along with latest page even I used canonical tag in right way. But using canonical tag is a best possible practice to avoid cannibalization issue and we should keep doing it.
Important note
- Both absolute and relative path can be used for canonical URL, but Google's John Mueller recommends use of absolute path.
- Google ignores all canonical tags if there are more than on in <head> section. Pay attention in case you can insert few canonical tags for one page.
- You should use normalized URL as canonical one. Same rule for internal link is valid for canonical tag.
Specify canonical URL in WordPress
It depends on theme you use. In my case edit screen has one for this feature and I simply enter URL and check to select canonical instead of redirection. (Sorry it can't be displayed in English.)
That's it. Then next one line is inserted to <head> section of page.
<link rel="canonical" href="https://secrets2mysuccess.net/summary-latest/" />
Simple enough.
Visual example of canonical-ed pages
Let me show you visual example of canonical-ed pages. There are top page "Investor's guide" and 4 "Financial Report" pages. Q4 is currently latest one and Q1 to Q4 pages are thought as duplicated.
- Latest page is in center and linked from top page.
- 3 old pages are canonical-ed to latest page. Line ended with circle indicates canonicalization.
- Latest page has link to Q3 page.
Let me move focus to Q3 page.
Q3 page has link to Q2 page.
Let me move focus to Q2 page.
Q2 page has link to Q1 page. All pages are linked nicely for both search engine and reader.
Manage internal link with canonical-ed page
Windows application "Link Map Viewer" can visualize internal link structure in realistic way. And it supports canonical-ed pages.
This video demonstrates how visually recognizing a canonical tag problem, fixing it in WordPress, and then refreshing display to confirm that it has been fixed.