How Do Search Engines Index Websites?

Drone mage of a farmer loading grain from a harvester into a receiving tractor

Have you ever wondered how your business website or blog actually gets indexed on Google or Bing so it can display for a searcher? Come on, you know you’ve thought of it at least once…

The short answer; is they do this through two primary methods (overly simplified here)

  1. They crawl and harvest the sitemap on your website, which lists only the pages and asset URLs you wish to display on the search results page (SERP) – things like pages, blog posts, images, blog categories & tags, and other files. This is the preferred method.
  2. Or…they actually visit your website via a coded web crawler (often called a bot, a spider, or [surprise] a spiderbot – who names this stuff?). These crawlers navigate your website just like a human would by clicking and entering every page that is listed in your visible navigation and the inner pages of your website as they are linked to each other.

This is Where Things Either Work or Fail

In the first scenario (sitemap-crawl), if your website doesn’t have a sitemap, bots will default to step two. If your website does have a sitemap (congrats), pull it up and read it…right now, go ahead – do it. Do you see any pages on it that would not answer a lead or prospect’s question, or that do not generate leads or sales? If you have login, policy, conversion, or shipping pages – these will not drive sales for your business, and you need to optimize your sitemap, so it only contains the pages that either inform a visitor or create leads & sales. This is referred to as sitemap pruning. Everything else probably belongs on your site – but not on your sitemap.

The second scenario (site-crawl) is probably the worst option for search engines – and your business. Unless every page of your website is inter-linked from another page of your website, or through your navigation, you’re only going to get partially indexed, and therefore will not show up on search engines. Add to this, the same issue of having login, shipping, conversion, and policy pages on your site without being properly coded to prevent them from showing up on search engines, and you’re certain to be limiting your branding, market share, and revenue potential.

So what to do?

  • Be certain to have proper sitemaps (pages, posts, & image ones are most common)
  • Add all these sitemaps into your robots.txt file – if you are able
  • Prune your sitemap of any pages that are not informational, or lead/sale generating
  • Finally, add your sitemaps to both Google Search Console and Bing Webmaster Tools, so they can find all your awesome website pages, blog posts, and images!
  • And lastly, get in the habit of thinking about your sitemap as you add or change content

#sitemaps #sitemapoptimization #marketingshorts #2minutemarketing #2minuteseo