We have new data which shows that the 25th October spike wasn’t part of the gradual Penguin 3 roll-out. It appears the update had a lot to do with on-page factors and is (among other things) related to how Google treats thin content pages and soft 404s.
The page has always been treated as thin content and flagged in Google Webmaster Tools as having a non-informative title tag:
After the 25th October update, we no longer see anything in this category, which now says: “Non-informative title tags: 0”
In our Crawl errors, however, a new page just popped up. You’re guessing it right, it’s the untitled.html:
Google’s algorithm update seems to have reclassified certain types of thin content pages, now treated as soft 404s:
We checked a number of other web properties to make sure this wasn’t a coincidental date match. There were several instances of an increase in soft 404 pages all on, or around the 25th October:
After inspecting each case of newly flagged soft 404s we found several instances of correctly classified pages which failed to returned 200 instead of 404. There were also a number of interesting borderline cases, most with the detection date correlating to the recent update.
Example 1: Thin Content Page
A thin content page page with the following in the content section:
Customer Happiness Page (h1)
http://linktoahappinesspage.com/page (a href)
– We don’t have any topics in this community at this point. (p)
[icon] Promo line for the survey tool (p)
This page was first flagged on the 28 October 2014. The URL is noindexed.
Example 2: Low Value Page
Another unusual soft 404 we noticed is just a HTML sitemap:
Note: This page was first detected on the 8 August 2014 so it’s unlikely it’s related to the recent update.
Example 3: Tag Pages
In this example I can share full details as the site in question is an old pet project. Essentially we’re talking about a classic, tag pages containing the “trigger” type of statement in the content area:
Here’s the full list of newly detected URLs:
Example 4: Zero Result Pages
This one is the case of a zero result search, more specifically an indexable zero post author page in WordPress. The page was first detected on the 25th October alongside a couple of tag pages similar to our previous case with analogik.com.
URL format: website.com/blog/author/username/ and website.com/blog/tag/
We found the same sort of notification for another domain (URL format: /blog/search/keyword/) with the following in it’s content section:
Note: The soft 404 was first detected on the 27th October 2014.
Summary of Findings
Data examined so far suggests a change in treatment of certain types of thin content pages (some of which were noindexed). Page types treated as soft 404s include:
[list style=”arrow” color=”green”]
- Tag Pages
- Zero Search Result Pages
- Blog Author
- Internal Website Search
- Ultra Thin Content
- Uninformative Documents
- Blank Pages
So far the consensus is that this was a Panda update. Glenn Gabe wrote about it here and tweeted:
— Glenn Gabe (@glenngabe) November 5, 2014
Others agree, including Aleyda Solis: