What Writers and Publishers Should Know about Duplicate Content
Have you ever done this? Let’s call it Scenario A:
- Get an idea for an article
- Create a draft on your personal blog to weed-out format kinks or pitch your story around
- Email your pitch, bio, and link to your article to potential publishers
- Your pitch is accepted and a copy of your piece is published on a famous blog — party time
Are you freaking out yet? Are you thinking, “Oh no, it’s duplicate content because he has that draft copy hanging around”?
I actively contribute to four web publications. I also write articles on my blog and now write on Medium. I have 20 years experience as a software engineer in my past life. I use to read Stanford University white papers on web crawlers/bots for fun. The way normal people play crosswords puzzles. This was in the early 2000s. By the way, I implemented some of those white papers when I was a developer for Oracle Corporation.
“You really must remove your copy of the article because Google will consider it as duplicate content”. Why do people think this?
Back to Scenario A. I admit, I practice this occasionally. Sometimes it makes a lot of sense depending on the situation. At any rate, for the third time in as many years, I was told, “You really must remove your copy of the article because Google will consider it as duplicate content”. And for the third time I was baffled. Why do people think this?
Ok, if Googlebot happened to traverse into my hidden or password protected realms, which it isn’t supposed to, so what? Do index engines slap us on the wrist that easily if they find one dupe (pun intended)?
To understand if there is an issue, we should ask ourselves:
- What is wrong with duplicate content?
- Did I check the source — where did I learn that Google will penalize an article if there is a hidden/private copy?
- Did I test this theory or do I have any irrefutable evidence that Scenario A is bad?
Let’s break down these three one at a time.
1. What is wrong with duplicate content?
Is duplicate content a bad thing? It depends. Think about it? Should Googlebot be upset if it happened to find our hidden copy? I can see Googlebot being pissed if it found hundreds to thousands of copies for two reasons:
- Mainly because Googlebot might be confused. It might not know which one is the original i.e. canonical. Hint: play it safe by back-linking valid copies to the original. More on the canonical tag here.
- Googlebot may also suspect foul play if it sees content replication en masse on a large scale and all at once.
Here is another angle. What about syndications? What about retweets or reblogs or reposts or FB shares? These copies aren’t hidden or password protected articles living in the bowels of the web. They are the opposite of scenario A. This kind of stuff is content screaming to be discovered and seen. Duplicate content can be a very good thing.
When does duplicate content hurt our search rankings? Here’s an excerpt from Google support:
“In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved.”
Here is the backlink to the original article: https://support.google.com/webmasters/answer/66359?hl=en
Andy Crestodina, the Strategic Director of Orbit Media, writes about a real-life case where Googlebot’s trip-wires were set off:
“The day a new website went live, a very lazy PR firm copied the home page text and pasted it into a press release. They put it out on the wire services, immediately creating hundreds of versions of the home page content all over the web. Alarms went off at Google and the domain was manually blacklisted by a cranky Googler.”
Here is the backlink to the original article: https://blog.kissmetrics.com/myths-about-duplicate-content/
2. Did I check the source — where did I learn that Google will penalize an article if there is a hidden/private copy?
Here is where I need your help. If you think that having a hidden draft article is a duplicate content problem, please share why. Add the source of this reasoning in a comment below.
3. Did I test this theory or do I have any irrefutable evidence that Scenario A is bad?
I’m a fan of the scientific method. So I tested this. The published version always comes out on top. In fact, I have never seen my hidden articles in a search result. Try it yourself and share your results.
If this topic is important to you, I recommend reading the two articles I back-linked above.