Google Duplicate Content Guidelines: What You Need To Know

Duplicate Content

Google last month came up with another webmaster video about duplicate content. This time, Matt Cutts discussed about how Google treats quotes or a block quote copied from another website/blog. It’s an interesting video as I used to quote and reference a lot to other blogs and sites in most of my blog posts. So I decided to write a blog post by consolidating Google’s view on several aspects of duplicate content.

What Is Duplicate Content?

Duplicate content usually refers to similar content within a single domain or across multiple domains. Duplicate content can be exactly the same content or similar content (also known as spinning) and are categorized as Malicious and non-malicious duplicate content. Malicious duplicate content is otherwise known as web spam as it refers to content that’s written to manipulate search engines. Non-malicious duplicate content generally refers to the variations of the same web page. For example, printer friendly and mobile friendly pages that are hosted on another domain or subdomain or subdirectory.

The Google Panda update was probably the first move by Google to penalize content farms, low quality web pages and those websites that massively duplicated content to manipulate search engine rankings.

What You Should Do To Avoid Duplicate Content Penalty

Non-malicious duplicate content is not really an issue but it’s better if you could avoid that. There are several methods to avoid duplicate content within a single domain name.

1. Use 301 Redirects & Set Your Preferred Domain Name

If the web pages can be accessed with and without using a WWW in your URLs then search engines treat those pages as separate URLs. For example, the web page http://www.minterest.org/web-directory/ and http://minterest.com/web-directory/ are two separate pages according to search engines unless you use a 301 redirect from your non-preferred URL to the preferred URL. My preferred domain is http://www.minterest.org/ and hence I redirect all the non-WWW URLs to the one with a WWW.

Once you set a 301 redirect make sure that you choose your preferred domain name in your Google Webmaster Tools so that Google will show your preferred URLs in search engine results pages (SERPs).

Go to Google Webmaster Tools > Configuration > Settings > Preferred domain

Google Webmaster Tools > Preferred Domain

2. Always Use Your Preferred URLs

Now that you set a preferred domain by making use of a 301 redirect, make sure that you always use consistent URLs for all links – even if it’s internal linking or for your link building campaigns. For example, if your URLs contains a trailing slash then use the same whenever you write your URLs.

3. Use Meta Tags Effectively

If your website contains too many empty pages or under construction pages then use the noindex meta tag so as to block search engines from indexing those pages.

4. Noindex Metatags – noindex, nofollow and noindex, follow

<meta name=”robots” content=”noindex, nofollow” />

or

<meta name=”robots” content=”noindex, follow” />

5. Canonicalization Of URLs

Canonical Link Element

Canonicalization is the best way to manage the duplicate content within your own website.

What is a canonical page?

“A canonical page is the preferred version of a set of pages with highly similar content.”

Let’s say I offer some service via a sales page that targets U.S., U.K. and India and I drive targeted traffic by using Search Engine Marketing. Now, since the currency units of these countries are different I choose to create 3 different landing pages with exactly the same content and the only difference is the currency units of my services.

If I allow search engines to index those pages then it treats them as duplicate content. So, I can either add a noindex, nofollow meta tag to the UK and India page or make use of canonicalization.

Assume that my primary market is U.S. and I want search engines to show my sales page targeting U.S. customers on SERPs. What I do in this case is specify a canonical link for each version of my sales page.

Let’s say my preferred sales page is:

http://www.minterest.org/services/internet-marketing-usa.html

and I created two copies of the sales page targeting UK and India:

at http://www.minterest.org/services/internet-marketing-uk.html

and

at http://www.minterest.org/services/internet-marketing-india.html.

Now, to specify the canonical link to search engines I add the link tag

<link rel=”canonical” href=”http://www.minterest.org/services/internet-marketing-usa.html>

to the <head> section of the duplicate sales pages which tells the search engines that all those duplicate sales pages refer to the canonical page at:

http://www.minterest.org/services/internet-marketing-usa.html.

That said, if it’s just few sales pages then there’s no need to block those pages as search engines are smart enough to figure it out.

Duplicate Content – Google SEO Guidelines

Now coming back, the purpose of this blog post was to highlight Google’s view on block quotations we bloggers use. Block quotes are used when we use an excerpt from another website and wants to distinguish it from our own content.

If I quote another source, will I be penalized for duplicate content?

So, what is the best way to quote from another website without getting penalized for the so called duplicate content?

Google’s recommendation is that if we’re just quoting someone or highlighting something which is said by some other blogger then what we need to do is put that in a block quote with a link back to the original source.

Now, if you simply copy an entire article from another website or multiple websites and mark it as a block quote without adding your own insights or views then it can raise a red flag.

That said, if you quote excerpts from multiple articles or block quotes and then add your own comments or views or whatever with a linkback to the original source then it is absolutely fine.

Ask Yourself: What Is Compelling About Your Website?

I know tons of tech blogs out there and all they do is write the same story that appears on Mashable, TechCrunch or something else in their own words. Now, in that case what is compelling about that website? Nothing!

If I report the same news story as someone else, is that duplicate content?

Google understands that “You can’t make up news!” but their recommendations for news websites is that you write your own version of the original story rather than rewriting what’s written on some other site. It means that you should add some value to your version by adding some insights.

And that exactly is the reason why I DON’T publish news posts.

Image Credit: Free Digital Photos

  • http://loyfly.com Kulwant Singh

    Nice post Mahesh .
    Google is become strictier day by day . Very helpful to new bloggers . Thanks to wordpress too did lot of stuff itself .

  • http://iphone-5jailbreak.org/ iphone 5 jailbreak

    Google is now a days concentrating on Quality content
    Thanks for this important info
    really nice article.

  • http://seekdefo.com/how-to-earn-200-plus-for-each-of-your-guest-posts/ George

    That means quoting is ok. I frequently quote others to make an impact

  • Robb

    Advising people to use the canonical ‘suggestion’ when the meta robots ‘directive’ is available is so very wrong. At best it is redundant, while at worst it can deindex most of a site or allow content to be tagged as duplicate and placed in the supplemental index (or not indexed at all) having the knockon effect of the domain flagged for spam.

    • http://www.minterest.com Mahesh Mohan

      When I tried canonical for some landing pages… there was no issue… Google simply ignored them and used the canonical URL… And when I tried to visit the cached page it was showing the canonical… Anyways it was a small site… so maybe it’s a better option for e-Commerce sites like Matt is suggesting.

      • Robb

        Yes, that’s the gotcha. It can work as expected numerous times, then suddenly a redirect, an internal or external link gets pointed to a page, and suddenly the canonical gets thrown out the window. For me, the risk of ending up like verticaldescents.com, where a global template having a canonical pointing at the homepage was responsible for erasing 90% of their pages, and rankings/impressions/traffic from Google, meant that one head tag cost a company a truckload of money. A dozen digital agency staff, many very experienced, overlooked it, because it didn’t fail any test cases for pre-launch. Anyway – good article, good site – keep up the great work!

  • http://www.minterest.com Mahesh Mohan

    Thanks for that! :)

    That’s an interesting case study… but maybe they used canonical tag the wrong way.. also we don’t know why their SEO agency did that… This blog post is pending an update so I will add more info about canonical based on your feedback and experience.