The arrival of the index coverage report is an exciting development for SEOs and developers alike. Being able to see which pages are indexed and which are not is powerful, especially for larger websites where it’s hard to determine manually. But many of the indexed statuses may be new (it certainly was to me) and it’s hard to make sense of what all this information means.
So in this post, I’ll break provide:
- An overview of the report for both sitemap pages and non-sitemap pages
- A breakdown of each index status and the implications for your website
- And how to use the tools Google provides to assist with analysis
Before I dive into the particulars of indexation, it’s important to understand the concept of indexation and how your website should approach it.
What is Indexation?
- An page being indexed simply means that it is visible in search results.
- If you want a page to be indexed and updated in Google results regularly, it should be a part of your sitemap.
- However, not all pages should be indexed (for a variety of reasons outside the scope of this post) and there are various ways webmasters and SEOs try to stop Google from displaying them in search results.
What makes the index coverage report so powerful is that it shows if a page has been indexed and why it was/was not. And if a page has not been indexed, it cannot generate search traffic.
In order to see the results, simply click try new search console in the top left corner of your search console tools. Then click index coverage report:
This is the screen you are shown. Before we begin, let’s briefly go over the four broad categories and how they relate to indexation:
- Error: any page in this status category is a part of your sitemap and is not being indexed for errors such as 404s, crawl anomalies and more (see next section).
- Valid with Warnings: any page in this category is indexed, but Google believes there might be a reason it shouldn’t. This is normally by Google ignoring URLs blocked in a Robots.TXT file (see section on non-sitemap pages).
- Valid: this means that a page has successfully been indexed by Google (whether or not you wanted it to be)
- Excluded: this category covers a wide range of issues related to why Google has not indexed a page (as described above, having many excluded pages is certainly not a problem but you need to verify there aren’t important pages)
Within the index coverage report, you are given two options: viewing the index coverage report for pages in your sitemap and for pages not in your sitemap. Both are important but present different index coverage report statuses. As a result, I have broken down this analysis in sitemap pages and non-sitemap pages. When evaluating your sitemap index coverage report, remember that these are all pages your site has selected to be indexed by Google.
Common Sitemap Issues in the Index Coverage Report
First a brief note on how to access your sitemap’s index coverage report. Click on sitemaps (on the lefthand side under index coverage report). When you arrive at the page, you’ll see a section called Submitted Sitemaps. Click on an icon resembling a graph on the far right side of this section to view your sitemap’s index coverage report.
Here are some of the common crawl issues and categories visible in your index coverage report, what they mean in terms of your sitemap’s performance and how to fix any issues:
Submitted and Indexed
Nothing to do here. This is a sign that your sitemap is working properly. Indeed, a perfectly optimized sitemap will have all pages in the submitted and indexed category.
The remaining errors are either in the exclude or error section of your report. Because they are in the sitemap, you will need to resolve them ASAP.
Submitted URL has crawl issue
A URL can return a crawl issue for many reasons. According to Google, it’s because this page is a returning a 4xx or 5xx level error. The key here is that Google does not know the exact error it encountered. Some common reasons include a deleted page, a problem with your redirects or your website was down at the time Google was visited. You can investigate the cause of this issue by:
- Visiting the page yourself to see what’s going on.
- Running through Screaming Frog and checking the response code is.
- Using some of tools in the Index Coverage Report that I describe at the end of this post.
Submitted URL is a 404
This problem is much like it sounds. A URL in your sitemap is a 404. Before doing anything, export this list and verify that its a 404 in Screaming Frog. Whatever pages return a 404 error in Screaming Frog should be redirected with a 301. You should also generate a new sitemap to remove these 404 pages.
Submitted URL is a Soft 404
This is similar to the error above, only that the site is returning a soft 404 (which means your website is returning both a 404 error as well as a 200 success code). Before acting, run them through Screaming Frog. Whatever pages a return a 404 in Screaming Frog should have a 301 added and be removed from the sitemap. This verification step is especially important with soft 404s as Google is not always 100% accurate.
Submitted URL Not Accepted as Canonical
This is error will only appear on more complex websites that have implemented canonicals . Since it’s a bit outside the scope of this blog post, I’ll simply say that canonicals are often set up on websites that have duplicate content/pages. Canonicals are commonly used on e-commerce websites or websites that target different countries. They help Google understand the relationship between pages with similar content.
If you are seeing this error, it means Google is ignoring how you have set up your canonicals and believes another page is in fact the original. This means that this webpage has not been indexed. There are a couple of solutions here:
- Replace the URL in your sitemap with the canonical one. Use the info: with the URL generating this error to discover the canonical.
- If you want the page generating this status indexed, then the solution would be to add noindex nofollow tags to the current canonical page in order to prevent it from being crawled by Google. Ideally, the page in your sitemap will then become the canonical.
- However, there might be reasons why both of these options are not appealing. Then the solution is investigating how your canonicals are implemented on a sitewide level and revamp (a process that is outside the scope of this post and varies depending on the website structure)
Submitted URL blocked by robots.TXT
As described, the report will show this error when a webpage is being blocked by Robots.TXT and therefore is not being indexed. As this URL is likely part of your robots.TXT file by design, the best solution is to remove from your sitemap.
Crawled, Current Not Indexed
This is generally caused by a page that is a duplicate or Google deems not relevant enough for indexing. Like the error above, remove from Sitemap.
Common Issues in the Index Coverage Report Not Related to the Sitemap
Many of these issues are resolved the same as the section above. But it’s important to go through as the errors have different titles and require a slightly different approach.
Indexed, not submitted in sitemap
This section and the one below it (indexed, though blocked by Robots.TXT) are two of the most important for non-sitemap pages. Why? Because both help you discover indexed pages that you don’t necessarily want indexed. As shown below, they both appear in the valid section.
For Indexed, not submitted in sitemap, most of these pages don’t need any action. Just because you don’t have a page in your sitemap doesn’t mean you don’t want it indexed. However, if there are pages in here that you don’t want, remove them because these pages can reduce your sitewide click through rate and negatively impact rankings. Simply add a noindex nofollow meta tags to remove.
Indexed, though blocked by robots.TXT
What makes this section different and more critical than the one above is that you know why these pages have been indexed and it’s likely you don’t want them there. This is because any pages in Robots.TXT have been marked as not to be indexed. Unfortunately, Google sometimes ignores Robots.TXT instructions. The solution is to add noindex nofollow tags to these pages.
The rest of these pages are in the excluded section shown above. Remember, this list of errors relates just to my own website (as well as a client’s) and is probably not entirely comprehensive.
Google chose different canonical than user
This error comes from a specific set of conditions. It means that you have a self-referential canonical on a page, but Goole believes another page is the original and decides to ignore your canonical. This is not ideal as any page with a self-referring canonical is meant to appear in search results.
There are a lot of reasons why you might see this error and it depends a lot on how the website was set up. In my case, it was because the website was setup so that every page generated a self-referring canonical. This rule, coupled with the problem that the website was generating dynamic URLs as static, meant that many pages had about 8 different duplicate versions all with the self-referring canonical tags.
Alternate page with proper canonical tag
This status is likely nothing to worry about. It means that a URL on your site is not being indexed because Google has decided a similar webpage with a canonical should be instead. It’s not necessarily considered a duplicate page as described below, but still similar enough to the canonical that Google believes it should not be indexed.
Duplicate page without canonical tag
This is common for pages that are duplicates without a canonical tag. This category isn’t anything to worry about as there are many reasons why your site might have a duplicate page and Google is smart enough to detect it as that. But it’s best to add a canonical tag pointing to the page Google considers the original (you can figure this out using the View in Search described later in this post).
The cause and solution are identical to that described in the section on sitemaps (discover cause of crawl anomaly using Screaming Frog). The only difference is that it’s less critical to resolve because the URL is not in your sitemap.
Crawled – Not currently indexed
If a page has been crawled but not indexed, it’s returning a 200 status code but it’s either been deleted or for some other reason Google has decided not to index it. It’s less critical to resolve than crawl anomalies.
Not Found (404)
Simply resolve using a 301 redirect like you would in any other instance.
Same cause and solution as the sitemap section (use Screaming Frog and resolve). The only difference again is because this URL is not in the sitemap and is therefore less urgently in need of fixing.
Blocked by robots.TXT
No action required in this category. If a page being blocked by robots.TXT is excluded from search, that’s because a webmaster set it up that way.
Excluded by NoIndex Tag
This is by design and nothing to worry about. If a page has a noindex tag, it’s been added for a reason and is meant not to appear in search result.
Page with Redirect
Again, this is by design. There’s nothing wrong with a redirected page and therefore nothing to worry about.
In closing, always check every status in the excluded section in case you see a page you want indexed.
4 Tools to Test URLs in the Index Coverage Report
Even though two urls are in the same category, the problems that put them there might be different. Fortunately, Google has provided 4 testing options within search console that allow you to further explore how your page is categorized and why.
Test Robots.TXT Blocking
As the title describes, this test is for finding out whether your webpage is being blocked by a Robots.TXT file. This can be useful for trying to investigate why a webpage has been indexed when you have blocked it in a Robots.TXT. While the index coverage status (all the categories mentioned above) should already give you some idea as to cause of the problem, further testing is always recommended.
Google will send you to this screen, which provides an overview of your robots.TXT file and whether or not the page is allowed by GoogleBot. It also allows you to test with different Google agents (e.g. Google news). Please note that while your robots.TXT file may allow or block the crawler from accessing a certain page, Google regularly ignores the instructions in the file and crawls a page anyways. The most surefire way to prevent a page from being indexed is by adding noindex nofollow tags to the page.
Fetch as Google
Fetching as Google is another option to test how your webpage is categorized and to resubmit if you have made any changes. Fetching as Google can be useful for any of the following:
- Determining if a page is a 404
- Determining if a page has been redirected i.e. 301
- Seeing if Google is blocked from crawling your page
- Seeing how Google renders your page (through the fetch and render option)
While this tool can be useful for a variety of different tests, for the index coverage report its most useful for testing 301s, 404s and whether it is being blocked by your Robots.TXT file or noindex nofollow meta tags.
View as Search Result
This is my personal favorite testing tool for the index coverage report mostly because this report is so useful for testing how canonicals are set up. So for example, when you are working on resolving the problem of Google choose another URL as Canonical, this tools shows you which page Google has chosen as the canonical, a critical insight into finding the solution.
This tool can also be useful for subsequent testing if you are trying to prevent Google from indexing a page. As it takes time for Google to remove a page from search results, this tool is useful for verifying if it is still visible.
Note that the Google uses the info: operator for this tool, so you can just skip the testing tool entirely and simply put info: before any URL you desire to test.
Submit for Indexing
Finally, there is the submit for indexing tool. This tool allows you to easily submit your URL to Google for indexing. This feature is mostly useful if you have a made an update to the page and want to test its impact on the other tools.
In conclusion, the big winners from the launch of the index coverage report are SEOs and webmasters working on large sites where it’s simply impossible to manually discover this information yourself. However, every SEO should still check out this new tool and how it works. Even if it doesn’t offer any useful insight at the moment, that might change with time.