Google Search Console’s ‘Excluded Pages’

Don't go unnoticed with your material. Learn about the parts of the Google Search Console Index Coverage report that aren't included and how to rectify your situation.

google-search-console

Google Search Console allows you to see your website from Google's perspective.


You'll learn about your website's performance as well as facts regarding page experience, security problems, crawling, and indexation.


The Google Search Console Index Coverage report's Excluded section contains information about the indexing status of your website's pages.


Learn why some of your website's pages appear in Google Search Console's Excluded report – and how to fix it.

 

What Is The Index Coverage Report?

The Google Search Console Coverage report provides extensive information on the status of your website's web pages in Google's index.

 

Your web pages can fall into one of four categories:

 

Error: The pages that Google is unable to index. This report should be reviewed since Google believes you may want these pages indexed.

Valid with warnings: Google crawls your pages, but there are a few flaws you should address.

Valid: The pages that Google indexes are valid.

Excluded: Pages that are not included in the index are referred to as excluded.

 

What Are Excluded Pages?

Pages in the Error and Excluded buckets are not indexed by Google.

 

The following is the key distinction between the two:

 

  1. Google believes that pages in Mistake should be indexed, but they are unable to do so due to an error that you should investigate. Non-indexable pages uploaded using an XML sitemap, for example, come under Error.
  2. Google believes that sites in the Excluded bucket should be excluded, which is exactly what you want. Non-indexable pages that aren't submitted to Google, for example, will display in the Excluded report.

 

Google, on the other hand, doesn't always get it right, and pages that should be indexed end up in Excluded.

 

Fortunately, Google Search Console explains why sites are assigned to various buckets.

 

This is why it's a good idea to go through the pages in each of the four buckets again and again.

 

Let's take a look at the bucket labelled Excluded.

 

14 Possible Reasons For Excluded Pages

There are 14 reasons why your web pages have been placed in the Excluded category. Let's look at each one in more detail.

 

1. Excluded by “noindex” tag

The "noindex" tag is applied to these URLs.

 

Because you don't mention these pages in the XML sitemap, Google believes you truly intend to exclude them from indexation.

 

These might be login pages, user pages, or search result pages, for example.


Solution:

Examine these URLs to ensure that you wish to keep them out of Google's index.


Check to see if the "noindex" tag is still active on those URLs.

 

2. Crawled – Currently Not Indexed

These sites have been crawled by Google, but they have yet to be indexed.

 

The URL in this bucket "may or may not be indexed in the future; no need to resubmit this URL for crawling," according to Google's instructions.

 

Many SEO experts have discovered that if many typical and indexable pages are listed under Crawled – presently not indexed, the site may have major quality difficulties.

 

This might indicate that Google has crawled these pages and determined that they are not valuable enough to index.


Solution: 

Evaluate the quality and E-A-T of your website.

 
3. Discovered – Currently Not Indexed

The page under Discovered – presently not indexed "was found by Google, but not crawled yet," according to Google documentation.

 

To avoid overloading the server, Google did not crawl the page. A large number of pages in this bucket might indicate that your site is running out of crawl money.


Solution:

Examine your server's health.

 

4. Not Found (404)

When Google requested these pages, they returned the response code 404 (Not Found).

 

These aren't URLs that were submitted to Google (e.g., in an XML sitemap), but rather pages that Google found (e.g., via another website that linked to an old page that had been removed a long time ago).


Solution:  

Examine these pages and determine whether a 301 redirect to a functional page is necessary.

 

5. Soft 404

In most circumstances, a soft 404 error page provides the status code OK (200).

 

It can also be a thin page with little to no information that employs terms like "sorry," "error," "not found," and so on.

 
Solution: 

Make sure you return status code 404 in the event of an error page.


Add unique material to thin content pages to help Google recognise the URL as a single page.

 

6. Page With Redirect

All redirected pages on your website will be added to the Excluded bucket, where you can see all of the redirected pages that Google found on your site.

 

Solution:

Examine the rerouted pages to ensure that the redirects were done on purpose.


When you update the URL, certain WordPress plugins may automatically establish redirects, which you should check on a regular basis.

 

7. Duplicate Without User-Selected Canonical

These URLs are duplicates of other URLs on your website, according to Google, and should not be indexed.

 

You didn't specify a canonical tag for these URLs, so Google chose one based on other factors.

 

Solution: 

Examine these URLs to see which canonical URLs Google has assigned to these sites.

 

8. Duplicate, Google Chose Different Canonical Than User

Despite the fact that you defined a canonical URL for the page, Google chose a different URL as the canonical. As a consequence, the canonical chosen by Google gets indexed, but the one chosen by the user is not.

 

Solution:

Examine the URL to see which canonical Google chose.


Examine the signs that led Google to pick an alternative canonical URL (i.e., external links).

 

9. Duplicate, Submitted URL Not Selected As Canonical

The difference between this status and the one before is that in this example, you submitted a URL to Google for indexation without defining its canonical address, and Google believes an alternative URL would be a better canonical.

 

As a result, rather than the supplied URL, the Google-selected canonical gets crawled.

 

Solution:

Examine the URL to see which canonical Google has chosen.

 

10. Alternate Page With Proper Canonical Tag

These are merely copies of the sites with canonical URLs that Google recognises.

 

The canonical addresses on these sites redirect to the proper canonical URL.

 

Solution:

In the vast majority of situations, no action is necessary.

 

11. Blocked By Robots.txt

These are the pages that have been banned by robots.txt.

 

When looking at this bucket, bear in mind that if Google discovers a reference to these pages on other websites, it can still index them (and display them in a "impaired" fashion).


Solution:

Use the robots.txt tester to see if these pages are blacklisted.


If you wish to remove pages from the index, add a "noindex" tag and delete them from robots.txt.

 

12. Blocked By Page Removal Tool

The pages whose removal has been requested by the Removals tool are listed in this report.

 

Keep in mind that this tool only removes sites from search results for a limited time (90 days), not from the index.

 

Solution:

Check if the pages provided using the Removals tool should be deleted temporarily or have a 'noindex' tag.

 

13. Blocked Due To Unauthorized Request (401)

Because of an authorisation request, Googlebot was unable to access the websites at these URLs (401 status code).

 

You don't need to do anything unless these pages should be accessible without authorisation.

 

Google is only alerting you of what it has discovered.

 

Solution:

Check to see if these pages really need to be authorised.

 

14. Blocked Due To Access Forbidden (403)

The most common cause of this status code is a server issue.

 

When the credentials given are incorrect, a 403 error is returned, and access to the page is denied.

 

According to Google's documentation:

 

"Because Googlebot never supplies credentials, your server is improperly delivering this error." This mistake should be rectified, or robots.txt or noindex should be used to block the page."

 

What Can You Learn From Excluded pages?

Sudden and massive increases in the number of Excluded pages in a single bucket might signal major site difficulties.

 

Here are three surges that might signal serious issues with your website:

 

  • A large increase in Not Found (404) pages might suggest a failed migration in which URLs were altered but no redirects to new addresses were established. This may happen if, for example, an untrained individual modified the slug of a blog post, causing all blog URLs to change.


  • A significant increase in the Discovered – currently not indexed or Crawled – currently not indexed metrics might signal that your website has been compromised. Check the example pages to see whether they are your pages or if they were produced as a consequence of a hack (i.e., pages with Chinese characters).


  • A large increase in the Excluded by 'noindex' tag might also signal a failed launch or migration. When a new site goes live with "noindex" tags from the staging site, this is a common occurrence.

 

Conclusion

The Excluded area of the GSC Coverage report may teach you a lot about your website and how Googlebot interacts with it.

 

Make checking Google Search Console a daily routine, whether you're a novice SEO or have a few years of expertise.

 

This can assist you in detecting a variety of technical SEO concerns before they become major problems.


No comments:

Post a Comment