When using the Google Search Console, you may receive an email stating "A new'index coverage'problem has been detected." Also, what does that email mean? I think some of you may have arrived at this article.
What is written on this page
What is an Index Coverage Report?
Google crawls and indexes web pages found by crawlers and displays them in search results. Storing page information in Google's database is called indexing, and index coverage reports allow you to see how many pages on your site were indexed or why they weren't (or couldn't) be indexed. increase.
Index coverage report is one of the functions in the search console and can be confirmed from "Coverage" on the left menu in the tool.
Index coverage errorGoogle is not indexing the page correctlyThat means that the page will not be indexed by Google and will not appear in search results unless you clear the error. From an SEO perspective, a non-indexed page is the same as having no page itself, so no matter what search query you search for, it will not be displayed as a result, so it should be improved.
Index coverage status and workaround
Mainly, the displayed status is
- Valid (with warning)
It will be four.
Of these, the ones that should be particularly improved"Error" and "Exclusion"There are two.
If you see more than one "error", there may be a problem with your website. Regarding "exclusion", please note that there are cases where pages are unintentionally excluded from the index, and cases where a huge number of excluded pages hinders the crawler's migration.
Also, regarding "effectiveness", it is said that it will not be a big problem even if it is not dealt with, but it is ideal to deal with this part as well.
There is also a Google help page, so please refer to this as well.
First of all, the content of the "error" that you want to improve and how to deal with it
The page displayed in "Error" is not indexed.
Since it is the content that should be improved most, I will explain the content of the error and how to deal with it.
This error can be caused by too many redirects. Google crawlers are said not to follow more than 5 redirects.
This kind of repetition is called a redirect chain and is not desirable for site operation.
In worse cases, redirect loops can also occur where the final destination is undecided and looped.
Make sure that the redirect settings you made for SSL (change from http to https) and site relocation are correct, and try to combine multiple redirects into one if possible.
Please refer to here for specific improvement methods.
An error that occurs between 500 and 511 in the status code is called a server error.
The status code of the 5xx series is the status code when something goes wrong with the server, so let's check the status of the server.
Status codes and other numbers are also explained here.
Here's a quick summary of the most common server errors.
500 Internal Server Error
"500 Internal Server Error" is the status code displayed when there is something wrong with the server and the request cannot be executed. With this alone, I do not know where the problem is occurring, but in IIS 7.0 (Windows Web server software) and later versions, details such as "500.19" (configuration data related to the page is invalid) are displayed. It came to be done. In any case, if you get a 500 error, you have to deal with it as soon as possible, such as by contacting your hosting company.
502 Bad Gateway
"502 Bad Gateway" is the status code displayed when something goes wrong with the server acting as the gateway.
A gateway is a device that connects networks by converting communication protocols. For example, it can be said that a smartphone can connect to the Internet via a gateway.
Although the cause may be a setting error on the server side or a version change of PHP (a type of programming language often used for Web server development),
This is one of the errors that is difficult to deal with because the details cannot be specified.
503 Service Unavailable
"503 Service Unavailable" is a status code that is displayed when access is concentrated on the request destination or the server is temporarily unavailable due to maintenance.
Most rental servers have a limit on the amount of data transferred and the number of simultaneous accesses to prevent excessive load, and if this limit is exceeded, a 503 error will occur.
For example, it is not uncommon for influencers with hundreds of thousands of followers on Twitter to share a URL and the number of accesses to increase rapidly, resulting in a 503 error and communication restrictions.
The URL sent was not found (404)
Displayed when the URL is a 404 error (404 not found).
Check that it's okay to have a 404, and if it's okay, remove the URL from your sitemap.
The URL sent seems to be a soft 404 error
This error is displayed when the target page is considered to be a 404 page even though it is the URL sent by the sitemap.
A soft 404 is a page that Google has determined to look like a 404 error page, even though it returns a status code of 200 (normal).
Such pages are not indexed as they are determined to have no content.
As a workaround, change the status code to 404 if the content doesn't exist.
Or, if you created the page with the intention of indexing it and still get this error, it is possible that the page's original information is missing.
As explained in the "Exclusion" section, it is necessary to index by adding content and improving originality.
See also here.
The noindex tag has been added to the URL sent
It is displayed when there is a "noindex tag" on the WEB page sent to the site map or the page indexed by Google.
Check if the "noindex tag" is entered, and if there is a "noindex tag" on an unintended page, it will be necessary to cancel it.
If you have been operating your company's website for a long time, you may forget to enter the "noindex tag", so this is one of the error items that you should be aware of.
See also here.
The URL sent was blocked by robots.txt
If this error is due to the URL being included in robots.txt.
Make sure the contents of robots.txt are what you intended, and modify the sitemap or robots.txt to resolve the conflict.
See also here.
Contents and countermeasures for "exclusions" that are inconspicuous but need confirmation
The "Exclude" page is not indexed due to factors other than the error mentioned above.
For example, a page that has an explicit noindex tag, or a page that is a duplicate of a legitimate page that has already been indexed.
If it is small compared to the number of "valid" pages, there is little need to deal with it.
However, if the "Exclusion" page is huge, it is recommended to take measures in advance because the depletion of the crawl budget may adversely affect the crawler's mobility.
(The crawler budget is the maximum number of pages that can be visited on one website provided by the crawler.)
Excluded by the noindex tag
This page is tagged with noindex.Intentionally tagged with noindexIf so, there is no problem, so no action is required.
However, if the noindex tag is unintentionally attached, delete the noindex tag immediately and apply for a crawl.
Blocked by robots.txt
This is the page contained in robots.txt.No problem if you are intentionally blocking.. No action is required.
However, it can be indexed even if it is blocked.
If you don't want it to be indexed, it's better to use the noindex tag.
Discovered – unindexed
This applies to pages that the crawler has detected by the sitemap (sitemap.xml) or internal links, but the index has not been registered.
Normally, it will be indexed shortly after the URL is detected, but if the status does not change after a while, there may be a problem with the quality of the content.
Just in case, please check the index status of the corresponding URL with the URL inspection tool.
* (Enter the URL (URL starting with https: //) in the text box at the top of the Google Search Console and press the Enter key to display the inspection results.)
Crawled – unindexed
This is the page that was crawled after the URL was found, but was not indexed.
Due to reasons such as "thin content", "duplicate page", "notified by sitemap (sitemap.xml), but not linked from anywhere in the site", etc.Because Google decided that the page wasn't very importantIt is considered.
The page has a redirect
This corresponds to the page for which redirect is set. No special action is required, but if the redirect chain is long, the number of URLs to be crawled will increase by the number of redirect steps, which can be a factor in wasting the crawl budget.
As mentioned above, it is desirable to set to redirect to the target URL in one step as much as possible.
Not found (404)
If the page has been deleted intentionally, you do not need to take any action, but if the page moves to another URL, check the redirect settings.
It's a duplicate. Not selected as a legitimate page by the user
The "rel =" canonical "" tag is not set for your page, and Google has determined that it does not need to be indexed.
If you want this page to be indexed, check the canonical settings.
You can use the URL inspection tool in the search console to see which pages Google recognizes as legitimate.
It's a duplicate. The URL sent is not selected as the canonical URL
A page that is included in the sitemap (sitemap.xml) but is indexed by Google as a duplicate content and has a different URL as a legitimate page.
Instead of this page, the pages that Google considers to be legitimate are indexed.
Make sure that the canonical settings on this page are correct, or that the URL in your sitemap (sitemap.xml) is correct.
It's a duplicate. Google has selected a page that is different from the one you marked as a legitimate page
This page is not indexed because another page has been set as a canonical page with the "rel =" canonical "" tag, but Google has determined that another page is more appropriate as a canonical page. It is a state.
If you want other pages to be indexed as legitimate pages, you need to improve the quality and originality of your content so that they are considered "important pages".
Contents of "valid (with warning)" that you want to deal with if possible
The "Valid (with warning)" page indicates that while indexed, there are issues to be aware of as a website operator.
The main message is "Blocked by robots.txt but indexed"there is.
This indicates that the URL is blocked from being indexed in robots.txt, but was indexed because the URL link was posted from another page called A.
Probably, the page A in this case is an external site, so there are many cases where you cannot control the setting or deletion of links.
Such pages will appear in the search results with a snippet that says "There is no information for this page" as shown in the image.
Pages blocked by robots.txt cannot be crawled by Googlebot, so generate the title part from peripheral information such as the anchor text of the link and generate the title part.
This is because the index is created without a description.
Workarounds include the following, depending on what pages are listed:
- Leave as it is
- If it's okay to crawl, remove the block by robots.txt
- If you do not want to index, remove the block by robots.txt and insert noindex
- If you want to prevent accidental browsing, restrict access by basic authentication or IP.
"Valid" content that basically has no problem
"Valid" indicates that the page is indexed.
Submitted and registered
This is the URL sent from the site map and registered in the index.
Ideally, all URLs on your sitemap are indexed.
For the index status of the URL described in the site map file, select the relevant site map file from "Site map" on the left menu and check it from "Display index coverage".
Indexed but not sent to the sitemap
This page is indexed even though it is not listed on the sitemap.
Since it is displayed as valid, many people may not care so much, but ideally this item will be 0 and all will be "sent and registered".is.
Click to display the corresponding URL list and check what kind of page each page is.
If the page in question is important, we recommend that you include it in your sitemap.
On the other hand, there are times when a page you don't want to index is unintentionally indexed. Such a pageIn most cases, there are duplicate pages and a single sheet of paper.So, having a lot of such pages can overwhelm your site's crawls and indexes.
As a countermeasure,
- Review canonical and no index
- Perform crawl control such as robots.txt and nofollow in robots.txt
- Adjust so that the page is not generated in the first place
Such will be considered.
Don't forget to "verify fix" after resolving the error
"Valid" and "Exclude" cannot be verified, but if you correct "Error" and "Valid (with warning)", you can submit a verification request to Google from the search console.
It is important to note that if you do not correct all the target URLs that have the same error (6 URLs in the above capture), you will not be able to start verification, or even if you can start it, it will fail. ..
In a few days you will receive a "pass" or "fail" result in your email or search console.
If it fails, you need to review the page again from the viewpoint of checking the URLs of all the applicable items or whether another error has occurred.
This time, what is index coverage? I explained the solution when an error or exclusion occurs.
By using the Google Search Console, you can obtain information on Google search and information on your company's site status, and a lot of information that leads to improvement of your company's site is hidden in that information.
If you are a WEB person who is wondering what kind of error occurs on the site and what kind of response you should take, please refer to the status contents and countermeasures introduced this time and respond one by one.
I'm sure some of you may be worried about the sudden warning email.
I hope this article will be of some help to you.
If you have any questions after reading the article, feel free to Twitter (@kaznak_com) Etc., please ask.