The URL sent seems to be a soft 404 error Cause and remedy

When using the Google Search Console, you may receive an email stating "A new'index coverage'problem has been detected."

This time, I will explain the cause of the error "The sent URL seems to be a software 404 error" and how to improve it.

Read more about index coverage here.

What is a soft 404 error?

For soft 404 errors, the official Google help page has the following explanation.

What is Soft 404?
A soft 404 is a URL that not only displays a page that tells the user that the page doesn't exist, but also returns a 200-level (successful) code. In some cases, this is also the case for pages with little or no content (pages with sparse content or blank pages).

Why this error is not recommended
Avoid returning a success code instead of a code like a 404 or 410 (undetected) or 301 (moved). The success code tells the search engine that the actual page exists at that URL. As a result, the page will appear in the search results, and search engines will continue to try to crawl insubstantial URLs, avoiding the hassle of crawling the actual page.

https://developers.google.com/search/docs/advanced/crawling/soft-404-errors?hl=ja

Simply putThe URL does not actually have the content, but it does.about it.

I don't know what you're talking about, but to explain it in a little more detail, the website has a "status code" that tells the robot what kind of situation the URL you are trying to display is. Yes, it looks like the following.

  • 200-> No problem! It will be displayed normally!
  • 301-> That URL has moved to another URL!
  • 404-> The URL cannot be found! It's an error!

The difference between a soft 404 and a normal 404 error is that it returns a 200 status code (normal display).
It's kind of absurd.

By the way, the status code is explained in detail here as well, so if you want to know more, please read it.

The URL sent seems to be a soft 404 error Cause

This absurd situation confuses robots like search engines and causes soft 404s.

It occurs when the search engine can crawl the URL, but it is not clear whether it is a non-existent URL, such as the correct URL but the contents cannot be displayed.

Soft 404 error
Due to a temporary server load etc., the site may become heavy and software 404 may occur, so there is no problem if it is a short-term software 404.
If it continues for a long time, it may not be indexed, so it is better to quickly understand the cause of the page where the software 404 occurs and improve it if possible.

Let me introduce a concrete example.

Misconfiguration on page 404

For sites that dynamically generate pages built on systems such as CMS, a soft 404 error may occur due to a misconfiguration.
Even if the page that says "This is a non-existent page" is displayed on the site, the status code 200 (normal display) may be returned due to a program setting error or setting omission. In other words, each URL is not telling you the correct status code to return.

There are resources that cannot be read

Another reason for a soft 404 error is that there are resources that cannot be loaded.
For example, the situation is as follows.

  • Refers to too many resources to load
  • The server is slow and cannot complete loading
  • The resource is too large to complete loading

This was also mentioned in the explanation in the Search Console help.

For example, because the image size in the article is very large, when a Google crawler crawls this page, the content actually exists, but the image size is too large to load and it takes too long to load. Was not completed, and as a result, it was recognized as "content does not exist". It is possible that.

How to improve a soft 404 error

Now, I will explain how to improve the software 404.
Since the cause explained above will be improved, the following two points are required for improvement.

  • Make sure the correct status code is returned
  • Make sure all resources can be read normally

I will explain each.

Make sure the server settings return the correct status code

It's a server-side story, so it may be a little difficult, but if your web server uses Apache and .htaccess is available, you can improve it this way.

Writing a 404 page HTML location in .htaccess ensures that if the requested URL is not found, the server will display the 404 page and return a 404 status code.

Write the following code in .htaccess and upload it to the server together with the html file for 404 pages as described.

.htaccess
ErrorDocument 404 /404.html

Now when you access a non-existent URL, it will display a 404.html (404 error page) and return a 404 with an HTTP status code (tell Google Not Found "not found" or "not found"). This eliminates soft 404 errors.

As a caveat, if you write an absolute path as shown below, you will be redirected and will not return a 404.

Wrong description
ErrorDocument 404 http://***.com/404error.html
Code language: JavaScript (javascript)

If you want to display an arbitrary 404 error page using .htaccess ErrorDocument, you have to use a relative path, isn't it?
Please note that if you write an absolute path as in the incorrect description example, you will be redirected and will not return a 404.

Make sure PHP settings return the correct status code

For dynamic sites that are built using PHP, you can add a description to the 404 page to correctly convey the 404 status even with PHP settings.

In PHP, you can use the header function to return a status code 404.
Put the following sentence at the beginning of the HTML on page 404.

PHP to add to 404 page HTML
<?php header(“HTTP/1.1 404 Not Found”); ?>
Code language: HTML, XML (xml)

Please refer to this site for details on PHP description.

Setting these status codes requires modifying the system and server settings, so if you think it may be a little difficult, please consult with the server management company or production company.

Of course, we can handle it.

Optimization of slow-loading resources

  • Check the image file on the corresponding page and adjust the large one to the optimum size.
  • Check the resource that is taking a long time to load or cannot load on the corresponding page, and improve it.

summary

I have explained about "Soft 404 error".
Finally, let's summarize the important points again.

  • Software 404 has no page content, but the status code is 200, etc.
  • If it continues for a long time, the corresponding page will not be indexed, so it is recommended to grasp the cause early
  • You can check for soft 404 errors in the Search Console

In general, software 404s often occur in the short term and are not so important errors, so the priority to deal with them tends to be low, but there are cases where they occur from unexpected errors, which is important. If it is a page, it can be fatal. It's ambiguous on a case-by-case basis, but let's get an accurate grasp of the situation early.

Please feel free to contact us if you have any questions.

You may be worried when you receive the warning email, but we hope you find this article useful.
If you have any questions after reading the article, feel free to Twitter (@kaznak_com) Etc., please ask.

see you.

Kazuhiro Nakamura
Kazuhiro Nakamura
Representative of Cocorograph Inc. 13 years of SEO history, more than 970 sites with countermeasures. We provide SUO, an upward compatible service of SEO that optimizes not only search engines but also search users. SEO / SUO's original report tool, Sachiko Report Developer. Book "The latest common sense of SEO taught by professionals in the field"
Kazuhiro Nakamura
Kazuhiro Nakamura
Representative of Cocorograph Inc. 13 years of SEO history, more than 970 sites with countermeasures. We provide SUO, an upward compatible service of SEO that optimizes not only search engines but also search users. SEO / SUO's original report tool, Sachiko Report Developer. Book "The latest common sense of SEO taught by professionals in the field"