duplicate content

[b](web browsers) Bar Graph MDI[/b]
[b](web browsers) Headlines MDI[/b]
[b](web browsers) Pioneer Report MDI[/b]
[b](web browsers) Tree MDI[/b]

[b](web browsers) Bar Graph MDI[/b]
[b](web browsers) Headlines MDI[/b]
[b](web browsers) Pioneer Report MDI[/b]
[b](web browsers) Tree MDI[/b]

[b](web browsers) Bar Graph MDI[/b]
[b](web browsers) Headlines MDI[/b]
[b](web browsers) Tree MDI[/b]
[b](web browsers) Pioneer Report MDI[/b]

I've been following the discussion about Google and mirrored information for some time. It is "common knowledge" that Google penalizes page rank when it determines that content is duplicated somewhere else. In fact, I've read many experts stating that there should be no duplicate domain names and no duplicate content anywhere.

On the face of it the arguments appear to be sound. Google obviously has several billion pages in it's database and could, it appears, easily determine if content is duplicated. It also seems, again on the face of it, that it's reasonable to check for duplicate content, as this is the "mark of a spammer" and not necessary on the web with hyperlinking available. At least, this is the common wisdom.

However, sometimes what seems reasonable and possible is not: not by a long shot.

Let's begin with the technical side of things. You've got domain x and domain y with exactly the same content. How on earth would Google be able to figure that out? Let's say Google had 3 billion pages in it's database. To compare every page to every page would be an enormous task - quadrillions of comparisons.

Now, if site x had page "page1" which linked to site y which also had "page1", then it would be possible for Google to determine the duplicate content. Conceivably, it could check this out.

Not only is the task enormous, but the benefit is so tiny as to be insignificant. Duplicate content does not imply in any way shape or form spamming. In actual fact, a duplicate site is generally going to lower page rank of BOTH sites. Instead of having 100 links to one site, there will presumably be 50 links to one and 50 to another. This would tend (all things being equal) to lower the page ranking of both sites. So Google gains nothing by this incredible expenditure of resources.

There are several reasons for duplicate content which have nothing to do with spamming. Sometimes the content is actually duplicated, and sometimes it's just that there are several different domains (at least the www and non-www versions) for the same website.

Mirroring a site for load balancing - This is very common. The purpose is to split up the traffic between two copies of the site.

Mirroring for region - Sometimes site mirroring is done simply to make it more efficient on the internet backbone itself. You might put an identical copy of a site in Europe, for example, to reduce traffic across the Atlantic, which should make it faster in European countries.

Tue Dec 14, 2004 10:29 pm MST by PioneerGold | Email Article |

Return to Main Page

duplicate content