How the Yandex bot sees the page. How to upgrade to the new version of Search Console

Migration guide for users of the old version

We are developing a new version of Search Console that will eventually replace the old service. In this guide, we will cover the main differences between the old and new versions.

General changes

In the new version of Search Console, we have implemented the following improvements:

  • You can view search traffic data for 16 months instead of the previous three.
  • Search Console now provides detailed information about specific pages. This information includes canonical URLs, indexing status, degree of mobile optimization, and more.
  • The new version includes tools that allow you to track the crawling of your web pages, fix related errors, and submit requests for re-indexing.
  • The updated service offers both completely new tools and reports, as well as improved old ones. All of them are described below.
  • The service can be used on mobile devices.

Comparison of tools and reports

We are constantly working on modernizing various Search Console tools and reports, and you can already use many of them in the updated version of this service. Below, the new report and tool options are compared with the old ones. The list will be updated.

Old version of the report Analogue in the new version of Search Console Comparison
Search query analysis The new report provides data for 16 months, and it has become more convenient to work with.
Useful tips Rich Results Status Reporting New reports provide detailed information to help troubleshoot errors and make it easy to submit rescan requests.
Links to your site
Internal links
Links We have merged two old reports into one new one and improved the reference counting accuracy.
Indexing Status Indexing report The new report has all the data from the old one, as well as detailed information about the status in the Google index.
Sitemap report Sitemap report The data in the report remains the same, but we have improved its design. The old report supports testing the Sitemap without submitting it, but the new report does not.
Accelerated Mobile Pages (AMP) AMP Status Report The new report adds new types of errors for which you can view details, as well as sending a request to rescan.
Manual action Manual action The new version of the report provides a history of manual actions, including review requests submitted and review results.
Google Crawler for Websites URL Inspection Tool In the URL Inspection Tool, you can view information about the version of the URL included in the index and the version available online, as well as submit a crawl request. Added information about canonical URLs, noindex and nocrawl blocks, and the presence of URLs in the Google index.
Ease of viewing on mobile devices Ease of viewing on mobile devices The data in the report remained the same, but it became more convenient to work with it. We've also added the ability to request that a page be rescanned after mobile viewing issues have been fixed.
Scan error report Indexing report and url checking tool

Site-level crawl errors are shown in the new indexing report. To find errors at the page level, use the new URL Inspection tool. New reports help you prioritize issues and group pages with similar issues to identify common causes.

The old report showed all errors for the last three months, including irrelevant, temporary, and minor. A new report highlights issues that are important to Google over the past month. You will only see issues that could cause the page to be removed from the index or prevent it from being indexed.

Issues are shown based on priorities. For example, 404 errors are only flagged as errors if you requested that the page be indexed via a sitemap or otherwise.

With these changes, you'll be able to focus more on issues that affect your site's position in the Google index, rather than dealing with a list of every error Googlebot has ever found on your site.

In the new indexing report, the following errors have been converted or are no longer shown:​

URL Errors - For Computer Users

Old error type Analogue in the new version
server error In the indexing report, all server errors are indicated with the flag Server Error (5xx).
Fake 404 error
  • Error: Submitted URL returns a false 404 error.
  • Excluded: False 404 error.
Access is denied

The indexing report lists one of the following categories, depending on whether you requested processing for this error type:

  • Error: Submitted URL returns a 401 (Unauthorized Request) error.
  • Exception: Page not indexed due to 401 (Unauthorized request) error.
Not found

The indexing report is indicated in one of the following ways, depending on whether you requested processing for this type of error:

  • Error: Submitted URL not found (404).
  • Excluded: not found (404).
Other The indexing report states as Scan error.

URL Errors - For Smartphone Users

Smartphone bugs are currently not shown, but we hope to include them in the future.

Site errors

The new version of Search Console does not show site errors.

Security Issues Report New security issue report The new Security Issues Report retains much of the functionality of the old report and adds a site history of issues.
Structured data Rich Results Checker and rich results status reports To process individual URLs, use the Rich Results Checker or the URL Checker. Site-wide information can be found in the rich results status reports for your site. Not all rich results data types are yet available, but the number of reports is constantly growing.
HTML optimization There is no similar report in the new version. To create informative titles and page descriptions, follow our guidelines.
Locked Resources URL Inspection Tool There is no way to view blocked resources for the entire site, but with the URL Inspection tool, you can see blocked resources for each individual page.
Android Applications As of March 2019, Search Console will no longer support Android apps.
Resource Kits As of March 2019, Search Console will no longer support resource sets.

Do not enter the same information twice. Data and queries contained in one version of Search Console are automatically duplicated in another. For example, if you submitted a revalidation request or sitemap in the old Search Console, you don't need to submit it again in the new one.

New ways to do familiar tasks

In the new version of Search Console, some of the previous operations are performed differently. The main changes are listed below.

Features not currently supported

The features listed below are not yet implemented in the new version of Search Console. To use them, return to the previous interface.

  • Crawl statistics (number of pages scanned per day, their download time, number of kilobytes downloaded per day).
  • Checking the robots.txt file.
  • Manage URL parameters in Google Search.
  • Marker tool.
  • Read messages and manage them.
  • Tool "Change address".
  • Specifying the primary domain.
  • Linking a Search Console property to a Google Analytics property.
  • Reject links.
  • Removing obsolete data from the index.

Was this information helpful?

How can this article be improved?

Good day, readers. I always get a lot of questions from webmasters, site owners and bloggers about errors and messages that appear in Yandex.Webmaster. Many of these messages are scary.

But, I want to say, not all messages are critical for the site. And in the next articles I will try to cover all the possible questions that webmasters may have as fully as possible. This article will cover the following sections:

  1. Diagnostics - Site Diagnostics
  2. Indexing - Pages in Search

About that, and why it is needed, I wrote a few years ago. If you are not familiar with this tool, please read the article at the link first.

Site Diagnostics

Possible problems

1. Host directive not set in robots.txt file

This note by Yandex is remarkable in that the Host directive is not a standardized directive; only the Yandex search engine supports it. It is needed if Yandex incorrectly determines the site mirror.

As a rule, a site mirror is automatically determined by Yandex based on the URLs generated by the CMS itself and on the basis of external links that lead to the site. To specify the main site mirror, it is not necessary to indicate this in the robots.txt file. The main way is to use 301 redirect, which is either configured automatically in the CMS, or the necessary code is entered into the .htachess file.

I draw your attention to the fact that you need to specify the directive in the robots.txt file in cases where Yandex incorrectly determines the main mirror of the site, and you cannot influence this in any other way.

The CMS I have worked with lately, WordPress, Joomla, ModX, by default redirect the address from www to without, if the system settings specify the site address without a prefix. I'm sure all modern CMS have this capability. Even my favorite Blogger correctly redirects the address of a blog located on its own domain.

2. Missing meta tags

The problem is not critical, you don’t need to be afraid of it, but if possible, it’s better to fix it than not pay attention. If your CMS does not provide for the creation of meta tags by default, then start looking for a plugin, add-on, extension, or whatever it is called in your CMS, in order to be able to manually set the page description, or so that the description is automatically generated from the first words of the article.

3. No sitemap files used by the robot

Of course, it is better to correct this error. But note that the problem can occur both in cases where the sitemap.xml file is present, and in those when it really is not. If you have the file, but Yandex does not see it, just go to the Indexing - Sitemap Files section. And manually add the file to Yandex.Webmaster. If you don’t have such a file at all, then, depending on the CMS used, look for solutions.

The sitemap.xml file is located at http://your-domen.ru/sitemap.xml

4. Robots.txt file not found

Nevertheless, this file should be, and if you have the opportunity to connect it, it is better to do so. And pay attention to the item with the Host directive.

The robots.txt file is located at http://your-domain.ru/robots.txt

On this, the fountain of errors on the Site Diagnostics tab has dried up for me.

Indexing

Pages in search

Let's start from this point. This will make it easier to structure the information.

Highlight in the "All Pages" filter
We go down to the right on the page "Download spreadsheet" Select XLS and open the file in Excel.


We get a list of pages that are in search, i.e. Yandex knows about them, ranks them, shows them to users.
We look, how many records in the table. I got 289 pages.

And how to understand how much should be? Each site is unique and only you can know how many pages you have published. I'll use my WordPress blog as an example.
The blog at the time of writing has:

  • Entries - 228
  • Pages - 17
  • Headings - 4
  • Tags - 41
  • + main page of the site

In total, we have 290 pages that should be in the index. Compared to the data in the table, the difference is only 1 page. You can safely consider this a very good indicator. But it's too early to rejoice. It happens that everything coincides mathematically, but when you start to analyze, inconsistencies appear.

There are two ways to find that one page that is not in the search. Let's consider both.

Method one. In the same table that I downloaded, I divided the search into several stages. First, I selected the Rubric pages. I only have 4 sections. To optimize your work, use text filters in Excel.


Then Tags, excluded Pages from the search, as a result, only articles remained in the table. And here, no matter how many articles there are, you will have to look through each to find the one that is not in the index.

I draw your attention to the fact that each CMS has its own structure. Each webmaster has his own SEO , canonical, robots.txt file.

Again, if using WordPress as an example, pay attention to which sections of your site are indexed and which are closed. There may be pages of the Archive by months and years, pages of the Author, page paging. I have all these sections closed by the robots meta tag settings. It may be different for you, so consider everything that is not prohibited for indexing.

Taking Blogger as an example, blog owners only need to count published Posts, Pages, and Home. All other pages of archives and tags are closed for indexing by settings.

Method two. We return to Webmaster, select "Excluded pages" in the filter.

Now we have a list of pages that are excluded from the search. The list can be large, much larger than with the pages included in the search. There is no need to be afraid that something is wrong with the site.

When writing the article, I tried to work in the Webmaster interface, but did not get the desired functionality, perhaps this is a temporary phenomenon. Therefore, as in the previous version, I will work with tabular data, you can also download the table at the bottom of the page.

Again, using my WordPress blog as an example, I will look at typical reasons for an exception.

In the resulting table, we are primarily interested in column D - “httpCode”. Who does not know what server responses are, read on wikipedia. This will make it easier for you to understand what follows.

Let's start with code 200. If you can get to some page on the Internet without authorization, then such a page will have a status of 200. All such pages can be excluded from the search for the following reasons:

  1. Prohibited by robots meta tag
  2. Prohibited from indexing in the robots.txt file
  3. Are non-canonical, the canonical meta tag is set

You, as the site owner, need to know which pages have which settings. Therefore, sorting out the list of excluded pages should not be difficult.

Set up filters, select in column D - 200

Now we are interested in column E - “status”, we sort it.

BAD_QUALITY status- Poor quality. The most annoying status of all. Let's break it down.

In my table, there were only 8 URLs with the status Not good enough. I numbered them in the right column.

URLs 1, 5, 7 - Feed pages, 2,3,4,5,8 - service pages in the wp-json site directory. All of these pages are not HTML documents and in principle should not be on this list.

So go through your list of pages carefully and highlight only the HTML pages.

META_NO_INDEX status. Paging pages, the author's page, are excluded from the index due to the settings of the robots meta tag

But there is a page in this list that should not be. I highlighted the url in blue.

NOT_CANONICAL status. The name speaks for itself. Non-canonical page. On any page of the site, you can set the canonical meta tag, in which you specify the canonical URL.


Your website promotion should include page optimization to get the attention of search spiders. Before you start creating a search engine friendly website, you need to know how bots see your site.

search engines not really spiders, but small programs that are sent to analyze your site after they know the url of your page. Search engines can also get to your site through links to your website left on other Internet resources.

As soon as the robot gets to your website, it will immediately start indexing pages by reading the contents of the BODY tag. It also fully reads all HTML tags and links to other sites.

Then, search engines copy the content of the site to the main database for subsequent indexing. This whole process can take up to three months.

Search Engine Optimization not such an easy thing. You must create a spider friendly website. Bots don't pay attention to flash web design, they only want information. If you look at the website through the eyes of a search robot, it would look rather stupid.

It is even more interesting to look through the eyes of a spider at the sites of competitors. Competitors not only in your field, but simply popular resources that may not need any search engine optimization. In general, it is very interesting to see how different sites look through the eyes of robots.

Text only

Search robots see your site more like text browsers do. They love text and ignore the information contained in pictures. Spiders can read about the picture if you remember to add an ALT tag with a description. It's deeply frustrating for web designers who create complex sites with beautiful pictures and very little text content.

In fact, search engines just love any text. They can only read HTML code. If you have a lot of forms or javascript or anything else on the page that might block the search engine from reading the HTML code, the spider will just ignore it.

What search bots want to see

When a search engine crawls your page, it looks for a number of important things. After archiving your site, the search robot will begin to rank it in accordance with its algorithm.

search spiders guard and often change their algorithms so that spammers cannot adapt to them. It is very difficult to design a website that will rank highly in all search engines, but you can get some advantage by including the following elements in all your web pages:

  • Keywords
  • META tags
  • Titles
  • Links
  • The selected text

Read like a search engine

After you have developed a site, you have to develop it and promote it in search engines. But looking at the site only in the browser is not the best and most successful technique. It's not easy to evaluate your work with an open mind.

It's much better to look at your creation through the eyes of a search simulator. In this case, you will get much more information about the pages and how the spider sees them.

We have created not a bad, in our humble opinion, search engine simulator. You will be able to see the web page as the search spider sees it. It will also show the number of keywords you entered, local and outbound links, and so on.

Webmaster Tools allows you to understand how your page looks to Googlebots. Server headers and HTML code help to identify errors and the consequences of a hack, but sometimes it can be difficult to understand them. Webmasters are usually on their toes when they have to deal with such issues. To help you in situations like this, we've improved this feature so it can serve the page using the same algorithm that Googlebot uses.

How the scanned page is displayed
When processing a page, Googlebot searches for and imports all related files from external sources. These are typically images, style sheets, JavaScript elements, and other files embedded with CSS or JavaScript. The system uses them to display the page the way Googlebot sees it.
The feature is available in the "Scan" section of your Webmaster Tools account. Please note that page processing and its subsequent display may take quite a long time. Once completed, hover your mouse over the line containing the desired URL to view the result.



Handling resources blocked in the robots.txt file
When processing the code, Googlebot respects the instructions specified in the robots.txt file. If they prohibit access to certain elements, the system will not use such materials for preview. This will also happen if the server does not respond or returns an error. Relevant data can be found in the Crawl Errors section of your Webmaster Tools account. In addition, a complete list of such failures will be displayed after the preview image of the page has been generated.
We recommend that you make sure that Googlebot has access to any embedded resources that you have on your site or layout. This will make your Browse Like Googlebot experience easier, allow the bot to detect and properly index your site content, and help you understand how your pages are being crawled. Some code snippets, such as social media buttons, analytics tool scripts, and fonts, usually don't define page styling, so they don't need to be scanned. Read more about how Google analyzes web content in the previous article.
We hope that our innovation will help you solve problems with the design of the site and discover resources that Google cannot crawl for one reason or another. If you have questions, please contact us on the Google Plus Webmaster Community or search