Home > Blogging, Technology > The Duplicate Content Penalty

The Duplicate Content Penalty

November 19th, 2008

Working at a web hosting company, I frequently get asked questions about search engine optimization. I’m definately no expert in that arena, infact I’ve offered only the simplest of advice and never gone into detail for a customer that was looking for SEO. What I usually say is to make sure the content is original, and doesn’t look like spam, then use a handful of relevant keywords and a title that describes the page instead of advertising for it.

That has worked surprisingly well for me and for a few of the customers I advised to try it, but its vague. It doesn’t satisfy the people that really need to get into the nuts and bolts of how what they write affects where they appear on the search results pages. I need to get myself more information so that I can be more informative when people ask. Perhaps a bit more informative, but I still feel I’m dabbling in an art that’s more magic than science.

One of the things I’ve been asked, specifically, is about the so-called ‘duplicate content penalty’ from google. As in, multiple pages on your domain that display the same results being penalized. I went in search of, and found quickly and answer for this. It does not happen.

According to Susan Moskwa of Google in a Blog Entry:

Let’s put this to bed once and for all, folks: There’s no such thing as a “duplicate content penalty.” At least, not in the way most people mean when they say that.

So, all of those cgi applications, e-commerce programs, and on and on, ARE OK. When they see different urls, for example HTTP GET strings with long series of parameters that can appear in different orders, they combine the results of the duplicate pages, and assign the search engines ‘value’ to the combined page, represented by one of the URL’s. So /store/catalog.asp?cat=turban&color=black and /store/catalog.asp?color=black&cat=turban return the same results, and get grouped together and represented by one of the urls, which gets chosen nondeterministically by the crawler.

Its not penalized! But then again, you can’t choose the url that represents your content that way either. Not so good. What do you do if you want or need to do that? The answer is to use a Sitemap (note the capital ‘S’).

So, if that’s not the duplicate content that does get penalized, what is?

Spam sites, and theft, basically. Sites that scrape content from another site without adding meaningful original content, or sites that don’t differ from one another in any significant way. My next door neighbor in one of the houses I used to live in loved to do Multi Level Marketing schemes on the web, and frequently got new ‘cookie-cutter’ web sites from the companies he signed up with, and tried to advertise them (spamvertise, actually, but that’s another story). This is why that never worked. It was just like the other 99 sites above and below him, and the web just doesn’t need another copy of that page.

So for all intents and purposes, unless you’re scraping and republishing content that’s not original, you shouldn’t have to worry about the duplicate content penalty.

Blogging, Technology ,

  1. No comments yet.
  1. No trackbacks yet.
You must be logged in to post a comment.