Fluctuations in Pages Crawled

Frequently Asked Questions

  • No, we do not currently support submitting a sitemap to our crawler. If your sitemap is linked on your site (not just in your robots.txt file), our crawler may attempt to crawl it but please note that sitemaps are only used as a guide for a crawler but that does not mean that the crawler will be able to access all of the pages listed.

What's Covered?

In this guide you’ll learn more about what can cause fluctuations in the number of pages crawled in your Site Crawl and how to investigate them in Moz Pro.

Quick Links

Overview of How We Crawl and What Causes Fluctuations In Pages Crawled

Our Site Crawl bot, Rogerbot, finds pages by crawling all of the HTML links on the homepage of your site. It then moves on to crawl all of those pages and the HTML links and so on. Rogerbot continues like that until all of the pages we can find are crawled for your site, subdomain, or subfolder that was entered when you created your Campaign.

Usually, if a page is linked to from the homepage, it should end up getting crawled. If it doesn't, it may be a sign that those pages aren't as accessible as they could be to search engines.

Here are some things that can affect our ability to crawl your site:

  • Broken or lost internal links
  • If your site is built primarily with Javascript, especially if your links are in Javascript we won't be able to parse those links
  • Meta tags or robots.txt telling rogerbot not to crawl certain areas of the site
  • Lots of 5xx or 4xx errors in your crawl results

Below we’ll talk about how to investigate a few of these issues using Moz Pro.

How to Monitor Crawl Fluctuations

If you're seeing your number of pages crawled fluctuate it can take some investigation to find the cause.

To get started, it can help to identify which pages are being included or excluded in your crawl report by exporting your weekly Site Crawl data to CSV. To do so head to Site Crawl > All Crawled Pages > Export CSV (located on the right-hand side).

When examining your reports, take note of any pages you’re expecting to see included which aren’t. Additionally, make sure to note any pages that have unusual URLs, extra long URLs, or ones you’re not expecting to be included in your crawl report.

After investigating, hold onto these reports so you can use them to compare future crawls and investigate issues if necessary.

Below you’ll find common causes of fluctuations in pages crawled and how to investigate them.

Broken or Lost Internal Links

If you’re seeing a drop in the number of pages crawled for your site, or you’re not seeing as many pages crawled as you’re expecting, it is a good idea to check in on your broken and/or lost internal links.

Within the Site Crawl section of your Campaign, you can find links that are redirecting to 4xx errors in the Critical Crawler Issues tab.

Head to your Critical Crawler Issues section and then to 4xx errors.

If an internal link is redirect to a 4xx error, our crawler won’t be able to move past that 4xx to find more links and pages.

Meta Tags Banning Rogerbot

Within the Site Crawl section of your Campaign, you can find pages that are marked as nofollow in the Crawler Warnings tab.

You can monitor your noindex pages in the Crawler Warnings section of your Campaign.

If a page on your site is marked as nofollow, this is telling our crawler not to follow and crawl any links on, or beyond, that page. So for example, if you have a page with 10 new pages linked on it but the page is marked as nofollow in the meta tag or x-robots tag, those 10 new pages will not be crawled, and therefore not added to your site crawl data.

Robots.txt file banning Rogerbot

If there are pages you’re expecting to be in the crawl which aren’t, it’s recommended that you check your robots.txt file to make sure that our crawler isn’t being blocked from accessing those.

If there are subfolders or your site blocking crawlers by a wild card directive or a user-agent specific directive for rogerbot, our crawler will not be able to access and crawl pages within that subfolder or any pages beyond it.

4xx or 5xx Errors Limiting the Crawl

Within the Site Crawl section of your Campaign, you can find pages that returned a 5xx or 4xx error to our crawler in the Critical Crawler Issues tab.

You can monitor your 5xx and 4xx errors in the Critical Crawler Issues section

5xx and 4xx errors returned in your Site Crawl can be a sign that something is amiss with your site or server. Additionally, if our crawler encounters one of these errors, it’s not able to crawl any further. This means, if you have pages that are normally linked to on a page but that page returns an error to our crawler, our crawler will not find any links or pages beyond that error.


Woo! 🎉
Thanks for the feedback.

Got it.
Thanks for the feedback.