Server-side (technical) SEO is not really about the content of your website’s pages — it deals more with the coding of your website on both the front and back end.
If you want really good rankings, focusing solely on content won’t automatically ensure your site rises to the cream of the crop. The technical side can make an enormous difference in your overall SEO strategy, so ignore it at your peril.
Google Is Getting Smarter
The early days of technical SEO weren’t all that ‘technical’. You see, Google robots were kind of dumb up until circa 2010. Most SEO consultants, including myself, believe that the old, dumb Google used a pretty basic program that executed a script on each page it visited, jumping from one link to the next in order to build its searchable index.
Things have changed in the last few years, though. A cleverer version of Googlebot is now on the scene, and to call this software a simple link crawler doesn’t do justice to the intelligence of it. Here is some key info about the current Googlebot.
Googlebot – looks dumb but is now clever.
- Googlebot is a headless browser that can ‘view’ the same output you’d see on Chrome, Firefox, or any other browser when visiting a website.
- The current version of the software is intelligent enough to understand the DOM (Document Object Model). The DOM is essentially a connector between web pages and programming languages, which makes it kind of important. Hitting the DOM gives Google the ability to index dynamic content.
- The pesky Googlebot can now accept cookies, meaning that you can no longer easily tell the difference between Googlebot traffic and human traffic. The fact that Google is now a cookie monster is not so bad for website owners, but it’s a pain in the ass for SEOs.
Damn you, Google cookie monster.
The latest version of Googlebot has gone all greedy on us. This new gluttonous version of the software demands rapid access to all your website’s files. Here are some ways to avoid upsetting Greedybot and getting penalized in the process:
- Avoid using Google Webmaster Tool’s crawl rate limiter or limiting Googlebot’s crawl rate with your robots.txt file. Setting crawl rate limits tells Google that your site isn’t ready for large volumes of traffic. You need to give the impression that your site can swim with the big fish if you want a good ranking. Let verified search bots run wild on your site (and be prepared for them to eat up as much as 30% of your bandwidth)!
Greedybot wants to gobble up all of your site’s files.
Introducing Crawl Budget
Crawl budget means the number of pages Google (or any other search engine) will crawl on your site during a specific time period, and it is a huge influencing factor in your technical SEO success.With crawl budget, the old adage ‘the more the merrier’ applies.
I’m going to present to you a simple equation now to easily understand what crawl budget means. Regardless of how good or bad you were at math in school, the mathematics of technical SEO are a lot more intuitive than solving equations with 3 unknowns. So without further ado:
SEO Math 101: More pages crawled = More pages indexed
And what does having more indexed pages mean? More traffic!
It’s all well and good knowing that more pages crawled is great news for your site. But how do you increase your site’s crawl depth? In general, optimizing your website so Googlebot will crawl it for longer can be done in four ways:
- Get more links. More incoming links pointing at your site means Google is more likely to take an interest in it and invest more time crawling your pages. It’s like when guys become popular at school and suddenly women start swarming all over them.
Add more content. If you add a lot of content and do it frequently, or just make endless small changes, Googlebot will stop for a visit more often. We all have that aunt whose house we only visit when we know she’s baking some chocolate cake. Googlebot likes frequent content in much the same way as humans enjoy chocolate cake.
Remember to improve your low-quality content, because getting poor content crawled is a colossal waste of your crawl budget. You can also block your crap pages, which we’ll get to later. Either way, do something – don’t just leave poor content on your site in its current sad state.
Trust. As with any mutually beneficial relationship, trust is an important part of getting Googlebot to crawl more of your site. The problem is that Google treats relatively new websites with the disdain and suspicion of your new girlfriend’s father when you pull up outside her home in a leather jacket with your car window half-rolled down.
Googlebot is like your girlfriend’s dad – -it doesn’t trust you at the start.
Googlebot is programmed with a very cynical attitude and automatically assumes your site is a cesspit of spam that is out to screw it over. Trust builds up over time – Google will eventually crawl more of your content. And your girlfriend’s father will probably trust you when he realizes you’re not a douche.
- Think laterally. Often when Googlebot doesn’t trust you enough yet to allocate a large crawl budget to your site, the software will just look at the opening snippets of your web pages. This means that you need to think about things laterally – make sure that you’ve got consistent high-quality content across all site pages, not just one page with great content.
A Quick SEO Health Check
Crawl stats are useful at giving a quick indication of your website’s current SEO health. It’s like taking your site to for a check-up, without having to fork out a shitload of cash to a doctor who doesn’t even know what’s wrong with you.
You should be looking at answers to questions like:
- How many pages are indexed?
- What are the crawling trends like?
- What parameters and errors are indexed?
Positive crawl statistics are generally indicative of good SEO on a website. It’s kind of like when you see a dog with a wet nose – you can be reasonably sure that the animal is healthy.
Golden Rule of Technical SEO
Most SEOs understand this rule, but developers have trouble with it. The golden rule of technical SEO is:
**drumroll please** One file path per unique content piece.
Why? Because even though a server treats slight file path discrepancies as arising from the same page, Googlebot doesn’t. The latest software is finicky like that – even a difference in lowercase and uppercase letters between two file paths of the same page results in them being treated separately by Googlebot.
Blocking Content – Occam’s Razor.
Remember before when I said you can’t leave low-quality pages just sitting there on your site? If you can’t be bothered to improve crap pages, you need to block them. Blocking content can be a pain in the ass, but it’s better than wasting your crawl budget.
By low-quality pages, I’m referring to pages that are either:
a) filled with crap content
b) receiving next to no traffic
c) plagued by high exit rates
d) all of the above
When trying to solve the dilemma of blocking content, I think it’s a good idea to borrow an idea from a certain William of Ockham, that famous English theologist.
Occam’s Razor: The simplest solution is most often the right one.
Applying this to blocking content: Deleting low-quality content is the best way to ‘block’ it. In other words, take out the garbage – don’t leave it lying around if you don’t need it on your site.
Other Blocking Methods
- Many people utilize robots.txt to block search bots from viewing certain content on their pages. This text file is placed on your web server, and it contains instructions that tell web crawlers what files they can access on your site.
- The instructions contained in robots.txt are not set in stone. For example, the Disallow command can be overridden by Googlebot if it deems certain content relevant, based on incoming links.
- A better way to exclude pages from your site in search rankings is to use the noindex meta tag in each page’s HTML code. Place the following code in the header of any pages you don’t want to be indexed: <meta name=”robots” content=”noindex”>
- Desperation to block content can lead to mistakes – you should never attempt to block a page using both a robot.txt Disallow command and the meta robot tag. Googlebot gives preference to what it sees in robots.txt(this is the first thing it looks at on your site), meaning your noindex tag will never be noticed by the bot. If you need to block content, always choose meta robot tags over the robots.txt file.
- If you want to block content in documents such as PDFs and Word docs, you can use the extremely useful X-robots tag in the HTTP header. Standard meta robot directives don’t work with document files because you can’t add meta tags to these file types. X-robot is like that good guy who buys you a drink at the bar when you’ve run out of cash. He’s helpful and you might want to become friends with him.
Think of the X-robot tag as a potentially helpful new friend, like this guy.
Redirecting is an intrinsic part both of website maintenance and technical SEO. A redirect is a method used to send users and search bots to a different URL from the one they requested. Much like a signpost, they point people in the right direction.
- If you use a 301 redirect, you’re informing the search engine spider that it should permanently remove the old URL from its index, replacing it with the newer version. A 301 redirect passes pretty much all of the link juice from the old URL to the new one.
- Think about what would happen if you didn’t utilize a 301 redirect. In this case, both the user and the search bot will be served a 404 error code. Sending repeated 404 error codes is a good way for Googlebot conclude that your page doesn’t exist anymore, thus making it useless for SEO purposes.
- It’s a little-known fact (outside of SEO circles) that you should also set up a 301 redirect between the http:// and the http://www versions of your domain because they are treated as two separate websites. The same can be said for all other canonical differences, including http and https.
- Like a bunny rabbit that loses a bit of energy with each hop, the same can be said for redirects and link juice. While one redirect will pass most link juice, five or six redirects won’t. The crux of this is that you should try to catch as many instances as possible for redirects in a single rule.
- Bear in mind that Googlebot hates broken links because it treats them as inefficiencies. You should always be on the lookout for 404 errors to keep Mr.Googlebot pleased. You can download WordPress plugins such as this one that analyze your site for 404 errors and help remove them with 301 redirects.
Don’t ignore 301 redirects. They are vital components of good technical SEO strategy and can help you lower your bounce rate.
My best three tips for URL structures are:
- Be consistent with your structure. There is no right way to structure your file paths, but making them consistent across your site is always better.
- Always think in terms of the clickability of your link. Is it convoluted looking or relatively readable? If it’s the latter, you’re fine. If it’s the former, people won’t engage with your page. And search engines don’t like that.
- Have a clearly defined file path hierarchy. Make it easier for search engines to crawl your site by clearly defining a hierarchy of links, with main categories and sub-categories.
Your Own Personal Crawling Tool
It’s always nice to have tools that make a job easier, especially when it comes to something like technical SEO. A crawling tool essentially does the same job as Googlebot, except this tool will always be a help instead of penalizing your site for bad SEO.
A favorite of mine is Microsoft’s IIS SEO Toolkit, which you can download here for free.
This diagnostic SEO tool is exceptionally fast, and it can help you make your website more search engine friendly by providing loads of SEO recommendations. The tool performs a plethora of other functions and is intuitive to use. So, knock yourself out and have fun with it.
Advice on Instant Gratification
We live in an odd world, where a man can view more naked women at the click of a button than his ancestors ever did in an entire lifetime. People can get social approval within one minute by posting a Facebook status, without ever leaving their homes. This has nurtured an unhealthy mindset of instant gratification.
My advice for owners of new websites is that you shouldn’t expect instant results with technical SEO. It is very tricky to get many pages indexed on your site quickly. If you foster your patient side and listen to the advice in this article, you’ll succeed at this stuff. If you expect to be number one rank on Google by next week, you’re deluded.