[SEO basics] About the history, mechanism, and algorithms of search engines

It's not just about SEO, but it's important to know history. The reason is that SEO has "what you should do and what you should not do in doing SEO", and why should you do it by knowing the history of the past? Shouldn't it be done? Because you can understand.
Knowing the history of SEO and understanding how search engines work is very important for SEO measures. First of all, please understand the history and basic movement of the search engine so that you can use it for SEO measures.

  • Robotic search engines are the mainstream
  • The link is the path of a robot-type search engine
  • First crawled and not indexed will not appear in search results
  • The more links there are, the higher the rating
  • Search engine spam is eliminated by the algorithm

Understand the present of search by knowing the history of search engines

In the history of the evolution of search engines so far, there is always a conflict between the operation of a search engine that wants to be a "convenient search engine" and the site owner who wants to attract a lot of customers.
There are SEO measures that take advantage of the search engine mechanism, and there is a history that search engines have always taken strict measures against them.

What you shouldn't do is strict SEO measures, which are called "SEO spam" or "black hat SEO".
If you do black hat SEO, it will be a site that search engines dislike, and in the worst case, it may not be displayed in the search results.

Also, if you understand the direction that search engines are aiming for, you will be able to create future evolutions of search engines and create websites that support them.

In order to understand them, it is effective to understand the history of search engines, and it can be said that it is the first step in search engine optimization.

Search engine type

There are four main types of search engines.

  • Robot type search engine
  • Directory search engine
  • Meta search engine
  • Decentralized search engine

The most major is "Robot type search engine], And typical examples are "Google" and "bing".

next,"Directory search engine"is.
Well-known are "Yahoo! Directory" and "DMOZ", which were operated until 2014.

Meta search engines and distributed search engines are search engines that you don't often hear about.

Nowadays, when you think of search engines, you might think of them as Google, but in reality, there were many search engines in the past.
I will explain it briefly, but if you do not need it, you can skip this section.

Archie
The first search engine in history. Born in 1990. It was a mechanism to connect to an FTP server, get a list of files, and search with the UNIX grep command.
The World Wide Web Virtual Librar
Created in 1991 by HTML creators Tim Berners-Lee and CERN. The first content index on the World Wide Web.
Veronica / Jughead
Veronica was born in November 1992. A Gopher protocol search engine system developed by Steven Foster and Fred Barrie at the University of Nevada, Reno. Jughead, also a search engine system for the Gopher protocol. It differs from Veronica in that it searches one server at a time.
W3Catalog
Born in 1993, developed by Oscar Nears Trusts of the University of Geneva. W3Catalog mirrors pages and allows reformatted content to be searched on Perl-based screens.
Aliweb
ALIWEB is an abbreviation for Archie Like Indexing for the Web, and was announced in 1993. When the user registered the page, it was possible to register keywords and explanations.
WebCrawler
WebCrawler was the first full-text search engine to be created by Brian Pinkerton in 1994. The oldest existing search engine. Other search services A meta search engine that allows you to use multiple search services at once.
Infoseek
Infoseek is a popular internet search engine founded by Steve Kirsch in 1994. It was also one of the first search engines to sell ads on a CPM basis.
Lycos
Lycos is a web search engine and web portal founded in 1994 that also includes email networks, web hosting, social networking, and entertainment websites.
Altavista
AltaVista was a web search engine founded in 1995 and one of the most popular early search engines. It features two features: a high-speed multi-threaded crawler (Scooter) and an efficient back-end search.
Excite
Excite is a web portal launched in 1995 that offers a variety of content including news and weather, web-based email, instant messaging, stock quotes, and customizable user homepages.
Daum
Daum is a well-known web portal in South Korea, offering many internet services such as free email, messaging services, forums, shopping, news and Webtoon services.
Yahoo!
A directory-oriented search engine developed by Stanford University's Jerry Yang and David Filo in 1994. In the latter half of the 2000s, it lost the competition with Google and Facebook, and the competitiveness of its main businesses such as search engines and portal sites declined.
Looksmart
A java-based search engine founded in 1996, more than 85,000 sites were registered when it was first established. In 1999, the number of accesses exceeded 10 million.
Hotbot / Inktomi
It is a search engine launched by HotBot in 1996, and it seems that it has gained popularity because it can search by the entire entered word or phrase. I was also able to crawl 10 million web pages a week.
Yandex
It was announced in Russia in 1997 and is now the fifth largest search engine in the world after Google.
Ask
A search service that has existed since the dawn of the Internet, and is the fourth-largest desktop search site among five companies in the United States.
Google
It is a search engine founded by Larry Page and Sergey Brin in 1996 and has an algorithm called "PageRank". Billions of web pages around the world are now indexed and have a market share of over 60%.
MSN
MSN is a web portal site provided by Microsoft and released on August 24, 1995, which is the same release date as Windows 95. At that time, it was functioning as the default home page for Internet Explorer.
DMOZ
DMOZ, also known as the Open Directory Project, is a non-profit web directory published in 1998. .. The feature is that anyone can participate in editing the directory and redistribute it free of charge.
AllTheWeb
The search engine that appeared in 1999 is now closed, but it has advanced search functions and seems to have some advantages over Google.
Vivismo
Developed in 2000 by three computer science researchers, it is a search engine specializing in integrated search and document clustering.
Exalead
It was a search engine established in 2000 and was famous for having voice search, truncated search, and proximity search functions in addition to the conventional functions.
Baidu
It is the largest search engine in China operated by Baidu Co., Ltd. in China, and its market share is the third largest in the world after Google and Yahoo. About 70% of Chinese Internet users use Baidu.
Info.com
Info is a meta search engine that provides results from search engines and directories such as Google and Yahoo !. Launched in 2004, the estimated monthly number of visitors in 2016 is 13.5 million.
Overture
Originally operated under the name "GoTo.com", it refers to a paid search engine for search-linked advertising originating in the United States. Since the crawler rotates periodically to collect information, it attracted considerable attention as a search engine with a quick response.
SearchMe
Founded in 2005 by Landia Dams and John Holland, it was a search engine for snapping and was attracting attention, but it will be closed in 2009.
Snap
A search engine developed in 2004. Unlike conventional search engines, it was a search engine that improved search results based on user feedback by allowing the user to control the display order of search results.
ChaCha
A search engine developed in the United States in 2006. It was a rare type of search engine in which real people helped search for information via chat.
Sproose
A consumer search engine launched by Bob Pack in August 2007. Users were able to influence the ranking order of search results by voting on the website and removing bad or spam results, which allegedly provided higher quality results than algorithmic search engines. ..
Wikiseek
A Wikipedia-specific search engine that indexes pages linked from English Wikipedia pages and Wikipedia articles. Launched in 2007.
Picollator
An image type search engine developed in 2006. You can search the Internet for websites that contain other images, similar images, or similar photos of the transmitted image.
Powerset
It was a search engine developed in 2006 and was a natural language type search engine that can find accurate answers to user's questions.
Viewzi
It is a visual type search engine developed in 2008. Not only images but also a wide variety of search results such as albums and mp3 data can be displayed.
Cuil
It is said that 120 billion web pages were indexed by the search engine that was retired in 2008, but the service ended in 2010.

Huh. .. ..

There were so many search engines. smile

Various search engines have been born and disappeared, but let's take a look at the four types of search engines I mentioned at the beginning.

Robot type search engine

Robotic search engines, on the other hand, do not perform human registration reviews.
Websites around the worldCrawlerA robot called "" will automatically patrol and automatically register the website.
Therefore, robot-type search engines are characterized by an overwhelmingly large number of registered sites compared to directory-type search engines.

Directory search engine

Directory-type search engineA person reviews and registers a websiteTherefore, there is an advantage that the quality of the displayed website is high.
However, on the other hand, websites that have not been applied for registration are not displayed in the search results, so there is a disadvantage that the information that users can search is limited.

In the past, directory-type search engines that searched from websites registered for each category (directory) were common.
In order to register a website with this directory-type search engine, you had to apply for the website registration with the search engine and pass the examination.

Meta search engine

A meta search engine is a search engine that can be used by multiple other search services at once, and displays search results of major search engines such as Google and Yahoo! Search, blogs, videos, and voices.

Decentralized search engine

Finally, a distributed search engine is a search system that distributes the index of Web content to a large number of peers by P2P communication and shares the index of each peer throughout the P2P network. That's right.

Most major search engine

Nowadays, robot-type search engines centered on Google have become commonplace, and there are almost no site owners who bother to register websites in directory-type search engines.
Therefore, it is important for SEO to understand the mechanism and behavior of robot-type search engines.
So how does a robotic search engine discover and register a website?

Flow until it is displayed in the search results. About the behavior of the algorithm

Crawl and index

As mentioned above, in a robot-type search engine, a robot called a crawler automatically patrols websites around the world and automatically registers the website with the search engine.
About this patrolcrawlThe fact that a crawled website is registered in a search engine is called ""indexI call it.

Search engine search results pages show indexed websites, but the flip side is that if they aren't indexed by search engines, they won't show up in search results. increase.

This means that if you don't crawl in the first place, your website won't appear on search engine search results pages, so keep track of how crawlers crawl your website. That is an important point.

Importance of links

Even though it's a robotic search engine that automatically crawls and indexes, you can't crawl URLs you don't know at all.
for that reason,Search engines use "links" posted on known websites as a way to find more URLs.

By following this link and crawling, you will discover and index unknown websites around the world one by one.
In other words, it is impossible even with a robotic search engine to find the URL of a website that has just been launched on the Internet and is not linked to from any website.

Therefore, the website owner (webmaster) needs to secure an external link at the same time as launching the website.
You can also apply for a website URL directly to a search engine, or apply for a site map for a search engine, etc., if you only want to promote the index.
By doing this, the website can also promote crawls and indexes.

About Google's algorithm. Understand how search engines work

Link and page rank

The role of links in SEO measures is not limited to having websites crawled.
actual,Search engines also use links as an indicator of the popularity and importance of a website.
In other words, if the website is linked from many websites, it can be said that it is a popular website. about it.

Google's patent is famous as a mechanism to measure the popularity and importance of websites by links.PageRank"It's called.
Google uses this PageRank technology to use links as indicators for each and every website and web page around the world.
We'll talk more about this PageRank later, but Google and other search engines can think of links as a popularity poll for your website.

For example, suppose a user pastes a link on a blog, social media, etc. to introduce the content of a website.
Think carefully about the psychology of the user who posted the link at this time.
Why did users bother to introduce the content of other people's websites by pasting links on blogs, SNS, etc.?
Perhaps, in many cases, having some positive impression of the content, such as "profitable," "delicious," "interesting," and "empathetic," led to the action of introducing it.

In general, when you have a positive impression or impression of something, you want to share it with others or empathize with it, and it extends to the act of referral.
Search engines applied these people's behaviors and habits as one of their website ratings.That's why.
Therefore, the more external links to the website, the more popular votes are considered to be, and the higher the page rank.
The more links you receive and the higher the page rank, the easier it is for your website to appear at the top of search engine search results pages, which is a very important point in SEO measures.

Website as seen by search engines

I explained that search engines judge the popularity and importance of websites based on external links, but of course that alone does not determine all website ratings.
Search engines analyze not only the external conditions of websites such as links, but also the contents of websites themselves, and use them as criteria for evaluation.

We are trying to understand what is written and how it is written on the website / web page as accurately as possible and reflect it in the evaluation.
Therefore, in SEO measures, it is important to understand how search engines recognize and evaluate websites.

The premise I want to keep in mind here isThe way search engines recognize websites is different from the way humans recognize websitesabout it.

When people recognize a website, they target visual content with a layout that is displayed in a web browser such as Internet Explorer, Safari, Google Chrome, etc., but search engines likewise use it on a web browser. I'm not reading the content.
What does that mean?I'm reading a markup language called HTML (Hypertext Markup Language) that is used to build websites.
The search engine is designed to read, understand, and evaluate the content defined in this HTML.

Importance of HTML grammar and rules

Websites are written in languages such as HTML, which forms the basis of the website, CSS (cascading style sheet), which controls only the website, and JavaScript (Javascript), which enables dynamic expression.
By reading these languages such as HTML, CSS, and JavaScript directly, search engines can understand how the website is structured and what it says.
It is this that is important hereThere are grammars and rules in HTML, CSS, and JavaScriptabout it.

Even if the display on the web browser looks the same to humans, if one website follows the grammar and rules and the other website does not follow the grammar and rules, the latter is lower than the former. It may end up being evaluated.
But why,HTML has the role of specifying what role a string plays for a string, and reads that role to try to understand the document structure of the content.Because.

For example, in the case of the title tag, it is a character string that literally means the title of this page. In the case of the h tag, it conveys the role that the character string plays, such as the character string being the heading.
A solid understanding of the grammar and rules of the language and the correct description of the website will lead to the creation of a website that will be evaluated by search engines.

Search Engine Spam Black Hat SEO and White Hat SEO

Actually, this is the point I want to tell you the most this time.

As explained so far, search engines try to evaluate websites by various mechanisms, but webmasters who are trying to force high-ranking display by taking advantage of these mechanisms have appeared.
As explained earlier, this "Black hat SEOOn the other hand, in accordance with the search engine philosophy, SEO is performed by creating high-quality content.White hat SEO"It's called.

"What are SEO measures? As briefly introduced in the article "Basics and Mechanisms of SEO Measures for Super Beginners", webmasters who perform black hat SEO have tried to win the top search by various methods using the mechanism of the search engine.
If you haven't read this article, you'll find the basics of SEO, so be sure to read it.

https%3A%2F%2Fcocorograph
favicons? domain = cocorograph [2022 latest version] What are SEO measures? Basic knowledge for SEO beginners | Cocorograph Inc.
This article is for SEO beginners who want to know SEO, but where should I start studying, such as those in charge of the web of companies and individual bloggers.

For example, a webmaster who "creates an external link for self-made self-performance" has appeared, taking advantage of the mechanism that if you receive a lot of links, it will be displayed at the top.
On the other hand, the search engine side responds to sites with many unnatural links by disabling the effect of the links and penalizing malicious ones.

When it comes to the fact that sites that contain a large number of search keywords are displayed at the top of the content, SEO spam such as "unnecessarily stuffing keywords" and "preparing hidden text that is invisible to the human eye" is popular. I did.
At that time, the frequency of keyword appearance was important, but search engines also penalized finding hidden text methods, understood the context, and co-occurred. We have updated the algorithm to determine how many words are included.

Black hat SEO has been weeded out by search engines, especially among the algorithm updates that search engines have made to support black hat SEO and the updates to improve the search experience. Here are some of the ones that had a big impact.

RankBrain

What is RankBrain?AI-based algorithms that artificial intelligence uses machine learning to determine the relevance of search queries and contentSo, it was introduced around 2015.

How this artificial intelligence affects search results is, for example, even if the content does not contain keywords, if the content is useful for the keywords, it can be displayed in the search results, or the user. Now you can understand what content you are looking for and automatically make improvements every day so that you can see the best search results.

Update history

From here, I will explain the history of the update.

First of all, I will explain about the famous update. You may have heard it once, too.

Penguins update

The Penguins Update was applied in April 2012,Lower search results for web pages that are spamming or significantly violate webmaster guidelinesIt is for.
For example, to lower the ranking of sites with low content quality that have been modified or processed from the contents of other sites, links that are meaningless to search users, and meaningless link networks that are only intended to pass PageRank. Algorithm update.

Panda update

The Panda Update was introduced in Japan in July 2012,Make it difficult for low-quality content to appear at the top of search resultsIt is an algorithm update for.
For example, it is an algorithm update to lower the ranking of sites with low originality and specialization, sites with a large proportion of advertisements, flimsy content, copy sites, etc.

Both Penguins Updates and Panda Updates aim to keep low-quality sites out of sight by downgrading them.
Search users will be dissatisfied with search results if poor quality sites are at the top of the search.
To prevent that from happening, Google is constantly fighting search engine spam and black hat SEO.

Update history list

September 2000: Launch of Google search service
February 2003: Boston Update
April 2003: Cassandra Update
May 2003: Dominic Update
June 2003: Esmeralda Update
July 2003: Fritz Update
November 2003: Florida Update
January 2004: Austin Update
February 2004: Brandy Update
January 2005: Nofollow Update
June 2005: Started Personalized Search
October 2005: Introduced Google Local
December 2005: Big Daddy Update
May 2007: Introduced Universal Search
February 2009: Canonical Tag Update
December 2009: Personalized Search Update
June 2010: Caffeine Update
February 2011: Panda Update
November 2011: Freshness Update
April 2012: Penguin Update
April 2012 Penguins Update 1.0 implemented
May 2012 Penguins Update 1.1 implemented
October 2012 Penguins Update 1.2 implemented
May 2013 Penguins Update 2.0 implemented
October 2013 Penguins Update 2.1 implemented
October 2014 Penguins Update 3.0 implemented
September 2016 Penguins Update 4.0 Implementation ”
May 2012: Introduced Knowledge Graph
August 2012: Pirate Update
September 2012: Exact Match Domain Update
June 2013: Payday Loan Update
September 2013: Hummingbird Update
August 2014: Announced to use HTTPS for ranking signals
December 2014: Venice Update
March 2015: Doorway Update
April 2015: Mobile Friendly Update
May 2016: Rich Cards introduced
September 2016: AMP officially introduced
February 2017: Announced algorithm improvement unique to Japanese
March 2017: Fred Update
April 2017: Owl Update
December 2017: Medical Health Update
March 2018: Core Algorithm Update
July 2018: Speed Update
August 2018: Core algorithm update
March 2019: Core Algorithm Update
June 2019: The June 2019 Core Update
September 2019: September 2019 Core Update
October 2019: BERT Update
November 2019: Nov. 2019 Local Search Update
January 2020: January 2020 Core Update
May 2020: The May 2020 Core Update
May 2020: Announced Core Web Vitals
August 2020: Index system problem detected
December 2020: The December 2020 Core Update
June 2021: The June 2021 Core Update

summary

I thought I wrote it fairly roughly, and if I explain the details in detail, I can dig deeper, but I think you can remember it after you have a little more knowledge.

With that said, I think you can see that search engines are making improvements every day in order to appropriately deliver the content that search users want with various mechanisms, but even if the search engine mechanism is counterproductive, it is technical. Even if the method succeeds in top-ranking, if the content is essentially unsolicited by search users, it will always be kicked off the top.
What is important in SEO measures isTo provide the content that is required by firmly understanding the needs of users who search for keywords that they want to rank higher.It is.

It may not be easy, but I think that repeated efforts will help your website grow beyond SEO measures.

Kazuhiro Nakamura
Kazuhiro Nakamura
Representative of Cocorograph Inc. 13 years of SEO history, more than 970 sites with countermeasures. We provide SUO, an upward compatible service of SEO that optimizes not only search engines but also search users. SEO / SUO's original report tool, Sachiko Report Developer. Book "The latest common sense of SEO taught by professionals in the field"
Kazuhiro Nakamura
Kazuhiro Nakamura
Representative of Cocorograph Inc. 13 years of SEO history, more than 970 sites with countermeasures. We provide SUO, an upward compatible service of SEO that optimizes not only search engines but also search users. SEO / SUO's original report tool, Sachiko Report Developer. Book "The latest common sense of SEO taught by professionals in the field"