For the last four-plus years now, we’ve heard a lot about Penguin. Initially announced in April 2012, we were told that this algorithm update, designed to combat web spam, would impact three percent of queries.
More recently, we’ve witnessed frustration on the part of penalized website owners at having to wait over a year for an update, after Google specifically noted one was coming “soon” in October of 2015.
In all the years of discussion around Penguin, however, I don’t believe any update has been more fraught with confusing statements and misinformation than Penguin 4.0, the most recent update. The biggest culprit here is Google itself, which has not been consistent in its messaging.
And this is the subject of this article: the peeling away of some of the recent misstated or just misunderstood aspects of this update, and more importantly, what it means for website owners and their SEOs.
So, let’s begin.
What is Penguin?
Note: We’re going to keep this section short and sweet — if you want something more in-depth, you should begin by reading Danny Sullivan’s article on the initial release of Penguin, “Google Launches ‘Penguin Update’ Targeting Webspam In Search Results.” You can also browse Search Engine Land’s Penguin Update section for all the articles written here on the topic.
The Penguin algorithm update was first announced on April 24, 2012, and the official explanation was that the algorithm targeted web spam in general. However, since the biggest losses were incurred by those engaged in manipulative link schemes, the algorithm itself was viewed as being designed to punish sites with bad link profiles.
I’ll leave it at that, with the assumption that I shouldn’t bore you with additional details on what the algorithm was designed to do. Let’s move now to the confusion.
Where’s the confusion?
Until Penguin 4.0 rolled out on September 23, 2016, there really wasn’t a lot of confusion around the algorithm. The entire SEO community — and even many outside it — knew that the Penguin update demoted sites with bad links, and it wasn’t until it was next updated that an affected site could expect some semblance of recovery.
The path was clear: a site would get hit with a penalty, the website owner would send out requests to have offending links removed, those that couldn’t be removed would be added to a disavow list and submitted, and then one would simply wait.
However, things got more complicated with this most recent update — not because the algorithm itself got any more difficult to understand, but rather because the folks at Google did.
In essence, there were only a couple of major changes with this update:
- Penguin now runs in real time. Webmasters impacted by Penguin will no longer have to wait for the next update to see the results of their improvement efforts — now, changes will be evident much more quickly, generally not long after a page is recrawled and reindexed.
- Penguin 4.0 is “more granular,” meaning that it can now impact individual pages or sections of a site in addition to entire domains; previously, it would act as a site-wide penalty, impacting rankings for an entire site.
It would seem that there isn’t a lot of room for confusion here on first glance. However, when the folks at Google started adding details and giving advice, that ended up causing a bit of confusion. So let’s look at those to get a better understanding of what we’re expected to do.
Rumor had it, based on statements by Google’s Gary Illyes, that a disavow file is no longer necessary to deal with Penguin-related ranking issues.
This is due to a change in how Penguin 4.0 deals with bad links: they now devalue the links themselves rather than demoting the site they’re linking to.
Now, that seems pretty clear. If you read Illyes’ statements in the article linked above, there are a few takeaways:
- Spam is devalued, rather than sites being demoted.
- There’s less need to use a disavow file for Penguin-related ranking penalties.
- Using the disavow file for Penguin-related issues can help Google help you, but it is more specifically useful for sites under manual review.
Here’s the problem, though — just a day earlier, the following tweets had been exchanged:
So now we have a “yes, you should use it for Penguin” and a “no, you don’t need it for Penguin.” But wait, it gets more fun. On October 4, 2016, Google Webmaster Trends Analyst John Mueller stated the following in an Office Hours Hangout:
[I]f these are problematic links that are affected [by Penguin], and you use a disavow file, then that’s a good way for us to pick that up and recognize, “Okay, this link is something you don’t want to have associated with this site.” So when we recrawl that page that is linking to you, we can drop that link from our link graph.
With regards to devaluing these low quality links instead of punishing you, in general we try to figure out what kind of spammy tactics are happening here and we say, “Well, we’ll just try to ignore this part with regards to your website.”
So… clear as mud?
The disavow takeaway
The takeaway here is that the more things change, the more they stay the same. There is no change. If you’ve used unethical link-building strategies in the past and are considering submitting a disavow file — good, you should do that. If you haven’t used such strategies, then you shouldn’t need to; if Google finds bad links to your site, they’ll simply devalue them.
Of course, it was once also claimed that negative SEO doesn’t work, meaning a disavow wasn’t necessary for bad links you didn’t build. This was obviously not the case, and negative SEO did work (and may well still), so you should be continuing to monitor your links for bad ones and adding them to your disavow file periodically. After all, if bad links couldn’t negatively impact your site, there would be no need for a disavow at all.
And so, the more things change, the more they stay the same. Keep doing what you’ve been doing.
The source site?
In a recent podcast over on Marketing Land, Gary Illyes explains that under Penguin, it’s not the target site of the link that matters, it’s the source. This doesn’t just include links themselves, but other signals a page sends to indicate that it’s likely spam.
So, what we just were informed is that the value of a link comes from the site/page it’s on and not where it’s pointing. In other words, when you’re judging your inbound links, be sure to look at the source page and domain of those links.
The more things change, the more they stay the same.
Your links are labeled
In the same podcast on Penguin, it came to light that Google places links on a page into categories, including things like:
- Penguin-impacted; and
It was suggested that there are other categories, but they weren’t named. So, what really does this mean?
It means what we all pretty well knew was going on for about a decade. We now have a term to use to describe it (“labels”) rather than simply understanding that a page is divided into sections, and the sections that are the most visible and more likely to be engaged with hold the highest value (with regard to both content and links).
Additionally, we already knew that links that were disavowed were flagged as such.
There is one new side
The only really new piece of information here is that either Google has replaced a previous link weighting system (which was based on something like visibility) with a labeling system, or they have added to it. Essentially, it appears that where previously, content as a whole may have been categorized and links included in that categorization, now a link is given one or possibly multiple labels.
So, this is a new system and a new piece of information, which brings us to…
The link labeling takeway
Knowing whether the link is being labeled or simply judged by its position on the page — and whether it’s been disavowed or not — isn’t particularly actionable. It’s academically interesting, to be sure, and I’m certain it took Google engineers many days or months to get it figured out (maybe that’s what they’ve been working on since last October). But from an SEO’s perspective, we have to ask ourselves, ”What really changed?”
Nothing. You will still be working to develop highly visible links, placed contextually where possible and on related sites. If this strays far from what you were doing, you likely weren’t doing your link building correctly to begin with. I repeat: the more things change, the more they stay the same.
But not Penguin penalties, right? Or… ?
It turns out that Penguin penalties are treated very differently in 4.0 from the way they were previously. In a discussion with Google’s Gary Illyes, he revealed that there is no sandbox for a site penalized by Penguin.
To put that in context, here’s a fuller glimpse of the conversation:
So essentially, if you get hit with a Penguin penalty, there is no trust delay in recovery — once you fix the problem and your site is recrawled, you’d bounce back.
That said, there’s something ominous about Illyes’ final tweet above. So Penguin does not require or impose a sandbox or trust-based delay… but that’s not to say there aren’t other functions in Google’s algorithm that do.
So, what are we to conclude? Avoid penalties — and while not Penguin-related, there may or may not be delays in recovering from one. Sound familiar? That’s because (surely you can say it with me by now)…
The more things change, the more they stay the same
While this was a major update with a couple of significant changes, what it ultimately means is that our SEO process hasn’t really changed at all. Our links will get picked up faster (both the good and the bad), and penalties will likely be doled out and rolled back much more reliably; however, the links we need to build and how they’re being weighted remain pretty much the same (if not identical). The use of the disavow file is unchanged, and you should still (in my opinion) watch for negative SEO.
The biggest variable here comes in their statement that Penguin is not impacted by machine learning:
I have no doubt that this is currently true. However, now that Penguin is part of the core algorithm — and as machine learning takes on a greater role in how search engines rank pages — it’s likely that it will eventually begin to control some aspects of what are traditionally Penguin algorithms.
But when that day comes, the machines will be looking for relevancy and maximized user experience and link quality signals. So the more you continue to stay focused on what you should be doing… the more it’ll stay the same.