Early morning. Bleary-eyed we stumble from one car into another to start our journey. Takeaway coffees clutched in hands and forgotten mobile alarm clocks reminding us that we would normally just be waking up. It’s worth it though. Our destination? Brighton. The seaside town that is fast becoming synonymous with SEO in the UK. To the largest search marketing conference in the UK. To BrightonSEO.
Twice yearly this pilgrimage is made, and we’re not alone, 3,000 other search marketing professionals converge on the seafront every April and September to hear talks ranging from the fundamentals of search marketing to details of cutting-edge experiments and the future of SEO.
There’s something very energising about being amongst thousands of other specialists in your field. Although picking the right talks to attend can be a struggle and there are some misses alongside the hits, I always find I come away with a new theory or idea I want to test. This time is no different.
My focus this time (well, always) is technical SEO. The fundamentals of how search engines crawl, index and display websites is in my opinion, the most important element of SEO. As such, I was particularly excited to sit in on the Technical SEO, Crawl and Indexation streams. Neither disappointed.
There was a lot of information given, some ground-breaking… some less so. I want to give you an overview of what I consider to be the most important points raised in these two particular sessions. I’ll keep it light, promise.
First up, Dawn Anderson’s talk on “Generational Cruft in SEO”, the premise – there is never a new website when it has history. Sites evolve with every new design and CMS, rather than get replaced.
Essentially, the key points I took from this talk is that every URL you create has an impact on your site for years to come. Most SEOs and website owners are guilty of paying little attention to the URLs they create past the use of keywords and potentially their length. It’s a general feeling that if you change your mind about a URL, or decide a page is no longer needed, it’s a simple case of redirecting it or letting it fall away to a 404 “not found” page and it can be forgotten. However, as Dawn carefully pointed out, this simply isn’t the case.
With response codes, the 4XX codes denote that a page can no longer be found. In the case of a 404 code, it means the page could not be found on the server. Essentially, it means either the page has been removed or moved, or the URL has been typed in incorrectly. Google has stated in the past that it will often crawl a 404 page more than once to check that it is definitely dead and gone, and not just the result of an error. It can therefore still eat into your precious (limited) crawl budget. Too many 404s on large sites like e-commerce shops, could result in pages you do want crawled and indexed not being reached before the crawl budget runs out.
The better idea is to redirect your 404 pages or to mark them as 410 “gone”. The 410 status tells the search engines that this page is never coming back.
Google’s webmaster liaison John Mueller said, “We do treat 410s slightly differently than 404s. In particular, we’ll sometimes want to confirm a 404 before removing a URL from the index, and we tend to do that faster with a 410 HTTP result code… if you want to speed up the removal (and don’t want to use a noindex meta tag or the urgent URL removal tools), then a 410 might have a small time-advantage over a 404.”
Dawn raised the point that the issue with 404s is they are the default, so the search engines can’t be sure the pages are intentionally gone. With a 410, however, a webmaster has had to make a decision and take a definitive action which gives the search engines a better idea that the page really is gone. Googlebot is known to keep trying to recrawl 404 pages years after they first returned that code. Over time that’s a lot of resources wasted crawling pages you no longer care about. Therefore, if crawl budget is an issue for your site it is worth considering the 410 as an alternative to the 404.
Not sure if crawl budget is an issue for your site? Server logs are your best bet. Server logs can show you how your website is being crawled by Googlebot and others, and with that you can determine if there are any parts of your site that are not getting reached. Another good point that was raised in the Technical SEO stream.
The next session I want to talk about is Cindy Krum’s discussion about Mobile First Indexing in the Crawl and Indexation stream. Cindy started the talk with the insightful observation that “mobile first indexing” doesn’t mean “mobile first design”. Just because your site looks super fancy on a mobile device, does not mean you are ready for the mobile first index.
My key takeaways from this talk are that the search giants like Google and Bing (I’m being polite, I just mean Google) are reading and indexing data not just from websites but from a whole host of connected devices. The buzz term “internet of things” has been floating around for a while now, but it’s only in recent years that the full scale of it is becoming apparent. So many of the devices we use on a daily basis are internet connected and many of those all sync up to our Apple ID or Gmail account. From there, the search engines can build up a huge data set of user behaviour. Think smart fridges, the new generation of “learning” thermostats and your Fitbit. These all have the potential to provide indexable data. And here’s the kicker… none of them have URLs.
Yup. That blew my mind a little.
As an SEO I’m used to viewing information online as being fed to the search engines through the websites we create, or at best, the feeds we provide them. That content is digested, weighed, rated and ranked. From there, whenever a user searches for something on Google they are presented with a list of these URLs in order of relevance. But with the advent of these connected devices, there’s a possibility that I might search for “how hot is my house” and be returned a personalised result – the reading from my home thermostat. What’s the URL for that? There isn’t one!
Cindy gave great examples of reams of information we’re already being presented within the SERPs that isn’t tied to a URL. Search “cow sound” on Google. The top result is a knowledge graph result containing a playable sound. There’s no URL for that sound. It’s just… there. This might seem inconsequential but actually it’s a reminder to us SEOs that mobile first indexing should more appropriately be called “cloud first indexing” as a lot of the search results we’ll increasingly see won’t have a URL associated with them at all, they’ll be pulled down from the “cloud”.
I’m used to making sure websites are easily crawled and indexed, but they are not the only source of information relevant to my clients’ audiences. I concluded from this talk that I can no longer just consider how websites provide data to be indexed, but how companies provide it. In order for my clients to rank high in the search results, it might be that I need to start working out where else data might be pulled from in response to their audience’s search queries.
Yet again BrightonSEO has provided me with a lot to think about. But first I’m going to Google what a zebra sounds like.