PR SEO

SEO効果UP!robots.txt最適化ガイド:クローラー制御でサイトパフォーマンス向上

Published: 2025.01.08 Updated: 2026.03.12
世界に広がるネットワーク

Crawler control plays an important role in both SEO and website performance. Search-engine crawlers move through a website and collect information so they can retrieve the data needed to show pages in search results. By controlling crawler behavior appropriately, you can improve SEO results and site performance.

The central tool for this is robots.txt. This article explains robots.txt in depth, from the basics to practical use, points of caution, and advanced techniques, so that you can become genuinely proficient with it.

SEO超網羅ガイド【2025最新】検索順位を上げる完全マップ
SEO超網羅ガイド【2025最新】検索順位を上げる完全マップ

Chapter 1: The basics of robots.txt

世界に広がるネットワーク

What is robots.txt? How crawler control works

Robots.txt is a plain-text file placed in the root directory of a website. It tells crawlers which parts of the site they may crawl and which parts they should not crawl.

When a crawler accesses a website, it usually reads robots.txt first and then crawls the site according to those instructions. Robots.txt is a request to crawlers, not a forceful block, but major search engines do respect it. However, because malicious crawlers and some other bots may ignore robots.txt, you should never rely on it alone to protect confidential information.

Where to place robots.txt, file format, and character set

Robots.txt must be placed in the root directory of the website, such as https://example.com/robots.txt.

It will not work if you place it in a subdirectory. The file name also has to be lowercase robots.txt.

The file format must be plain text, and UTF-8 encoding is strongly recommended. If you use another encoding, crawlers may fail to interpret the file correctly.

Basic syntax: User-agent, Disallow, Allow, and rule details

Robots.txt is written with directives such as User-agent, Disallow, and Allow. These directives are case-sensitive and are written one per line.

  • User-agent:

    どのクローラーに適用するルールかを指定します。具体的なクローラー名を指定するか、*ですべてのクローラーを指定できます。複数のUser-agentを指定することで、クローラーごとに異なるルールを設定できます。例:User-agent: Googlebot (Googleのクローラー), User-agent: Bingbot (Bingのクローラー), User-agent: * (すべてのクローラー)

    User-agent: Googlebot

    ,

    User-agent: Bingbot

    ,

    User-agent: *

    .

  • Disallow:

    クロールを禁止するパスを指定します。スラッシュ(/)から始まる相対パスで記述します。空のDisallow行は、すべてを許可することを意味します。例:Disallow: /private/ (/private/以下のディレクトリをクロール禁止), Disallow: (すべて許可)

    Disallow: /private/

    ,

    Disallow:

    .

  • Allow:

    クロールを許可するパスを指定します。Disallowで禁止した範囲内の一部を許可する場合に使用します。DisallowよりもAllowの指定が優先されます。例:Disallow: /private/ と Allow: /private/public.html (/private/以下はクロール禁止だが、/private/public.htmlは許可)

    Disallow: /private/

    and

    Allow: /private/public.html

    .

How to use wildcards (*) and ($): flexible path matching and advanced usage

*は任意の文字列に一致します。Disallow: /*.pdf は、全てのPDFファイルをクロール禁止にします。Disallow: /images/*.jpg$ は、/images/ディレクトリ以下のjpgファイルのみをクロール禁止にします。Disallow: /*.pdf blocks every PDF file, and Disallow: /images/*.jpg$ blocks only JPG files under the /images/ directory.

$は行末に一致します。Disallow: /blog/$ は /blog/ ディレクトリ自体へのアクセスを禁止しますが、/blog/article1/ などは許可します。Disallow: /blog/$ blocks access to the /blog/ directory itself while still allowing addresses such as /blog/article1/.

Setting Crawl-delay: reducing server load and its effect on Googlebot

With the Crawl-delay directive, you can specify the interval between crawler requests in seconds. This can help when server load is high, but Googlebot does not officially support Crawl-delay. Google previously recommended crawl-rate settings in Search Console, but now handles this automatically, so it usually does not require much attention.

クロール頻度の自動調整処理が進歩したため、ユーザーの操作をシンプルにするという理念に基づき、Google は Search Console のクロール頻度制限ツールのサポートを終了します。

Search Console のクロール頻度制限ツールのサポート終了予定

It may still have an effect on other crawlers.

Specifying Sitemap: guiding crawlers and handling multiple sitemaps

Sitemapディレクティブを使って、サイトマップのURLを指定できます。これにより、Sitemap: https://example.com/sitemap.xml複数のサイトマップを指定することも可能です。例:Sitemap: https://example.com/sitemap.xml, Sitemap: https://example.com/sitemap_images.xmlSitemap: https://example.com/sitemap_images.xml.

SEO効果爆上げ!サイトマップ.xmlでGoogleに愛されるサイト構造を構築する方法

Chapter 2: Practical robots.txt examples

ノートパソコンをタイピングする男性

Protecting login-required pages: Disallow: /member/

Content that requires login, such as members-only pages, should generally be excluded from search-engine indexing.

robots.txt を使用することで、クローラーがこれらのページにアクセスするのを防ぎ、無駄なクロールを削減できます。例えば、会員限定コンテンツが /member/ ディレクトリ以下に配置されている場合、Disallow: /member/ と記述することで、このディレクトリ以下の全てのファイルとサブディレクトリへのアクセスを禁止できます。Disallow: /member/ blocks access to every file and subdirectory under that location.

However, robots.txt is only a request to crawlers, so malicious crawlers may ignore it.

Truly sensitive information must be protected with server-side authentication rather than robots.txt. Robots.txt should be treated as a supporting method for limiting crawler access and saving server resources. In many cases, it is appropriate to allow access to the login page itself so that crawlers can understand that authentication is required.

Controlling parameterized URLs: Disallow: /*?page=*

Parameterized URLs can sometimes make the same content accessible under multiple URLs, which may be treated as duplicate content. For example, if you use a 例えば、ページネーションに ?page= パラメータを使用している場合、example.com/blog?page=1、example.com/blog?page=2 など、URLは異なりますがコンテンツはほぼ同じページが複数存在することになります。 parameter for pagination, you may end up with pages like example.com/blog?page=1 and example.com/blog?page=2 that have different URLs but almost the same content.

Disallow: /*?page=* と記述することで、page= パラメータを含む全てのURLへのアクセスを禁止できます。ただし、これはページネーションされたコンテンツを全て検索エンジンから除外してしまうため、SEO上不利になる可能性があります。Disallow: /*?page=*, you can block access to every URL that includes the page= parameter. However, this can remove all paginated content from search engines and may hurt SEO.

A better approach is to use a canonical tag and indicate the canonical URL. If every paginated page points to the first page, such as example.com/blog, with a canonical tag, you can avoid duplicate-content issues and communicate the correct page to search engines.

Using robots.txt to control pagination should be treated as a last resort when implementing canonical tags is not possible.

Controlling a specific crawler: User-agent: YandexBot Disallow: /

User-agent ディレクティブを使うことで、クローラーごとに異なるルールを設定できます。上記のように User-agent: YandexBot と記述し、その後に Disallow: / とすることで、YandexBot のみがサイト全体へのアクセスを禁止されます。他のクローラーは、別の User-agent で指定されたルール、もしくは User-agent: * で指定されたルールに従います。User-agent: YandexBot and then Disallow: /, only YandexBot will be blocked from the entire site. Other crawlers will follow rules set under other User-agent sections, or the rules under User-agent: *.

Typical cases where you may want to control a specific crawler include the following.

  • 特定のクローラーがサーバーに過度な負荷をかけている場合

  • 特定のクローラーがrobots.txtの指示に従わず、問題を引き起こしている場合

  • 特定の地域向けのコンテンツを、その地域で利用されていない検索エンジンのクローラーから隠したい場合

In these and similar cases, the User-agent directive is useful. The names of major search-engine crawlers can be confirmed in each search engine’s official documentation.

Chapter 3: Cautions and common mistakes in robots.txt

スマホを操作する男性

Robots.txt is a powerful tool, but incorrect settings can have serious consequences for a website. This chapter explains common mistakes and points of caution so that you can use robots.txt safely and effectively.

3.1 SEO damage from robots.txt mistakes: falling out of search

The most serious mistake in robots.txt is accidentally blocking important pages from crawling.

If you disallow product pages or service pages, for example, those pages may fall out of the search index and disappear from search results. That directly reduces website traffic and can severely harm SEO.

Whenever you change robots.txt, always use the robots.txt testing tool in Google Search Console to confirm that only the intended pages are blocked. After the change, continue monitoring rankings and traffic regularly so you can catch any unintended effects.

3.2 The mistake of using Allow for pages you meant to block

The Allow directive should be used only when you want to permit part of a location that has been blocked with Disallow. For example, if you want to block /private/ but allow only /private/public.html, you would use both 例えば、/private/ディレクトリ全体を禁止しつつ、/private/public.htmlだけは許可したい場合にDisallow: /private/とAllow: /private/public.htmlのように記述します。 and Allow: /private/public.html.

Using Allow alone for an area that has not been disallowed has no effect. Crawlers generally assume every page is accessible unless it has been explicitly blocked with Disallow.

3.3 Case sensitivity: pay close attention

User-agent、Disallow、Allow、そしてURLパスは、すべて大文字・小文字を区別します。例えば、disallow: /images/と記述した場合、これはDisallow: /images/とは異なるディレクティブとして扱われ、意図した通りに機能しません。disallow: /images/ is treated differently from Disallow: /images/ and will not work as intended.

When writing robots.txt, always use the correct capitalization and check carefully for typographical errors.

3.4 Differences in crawler behavior: dealing with malicious crawlers

Robots.txt works with good-faith crawlers such as Googlebot and Bingbot, but malicious crawlers may ignore it completely. That means robots.txt alone cannot protect sensitive information.

Information that is truly confidential must be protected with server-side authentication or access restrictions. You need to understand that robots.txt is only a tool for controlling cooperative crawlers and is not sufficient as a security measure.

3.5 Robots.txt alone cannot provide security

As noted above, robots.txt is insufficient as a security measure. Anyone can read the contents of a robots.txt file, so malicious users may use it as a clue for finding restricted areas.

Real security requires a layered approach that combines multiple methods, including password protection, access control lists, and firewalls, not robots.txt alone.

3.6 Unexpected behavior from overusing wildcards

ワイルドカード(*や$)は、柔軟なパス指定を可能にする便利な機能ですが、過剰に使用すると意図しないページをブロックしてしまう可能性があります。例えば、Disallow: /*image*と記述した場合、/images/ディレクトリだけでなく、/article/my-image.jpgのようなURLもブロックされてしまいます。Disallow: /*image* would block not only the /images/ directory but also a URL such as /article/my-image.jpg.

When using wildcards, check the full scope of their effect carefully and make sure you are not blocking pages unintentionally.

3.7 robots.txt caching: delays before changes are reflected

Search engines cache robots.txt, so changes are not always reflected immediately. Even if you check with a testing tool right after editing it, the result may still be based on the previous version.

In Google Search Console, you can request that robots.txt be fetched again through the robots.txt tester. This can shorten the delay before the cache updates and your changes are reflected.

By following these cautions and configuring robots.txt properly, you can improve SEO and avoid unnecessary risk.

Chapter 4: robots.txt creation tools and verification methods

タイピングする男性

This chapter explains how to create, test, and revise robots.txt efficiently. By following these steps, you can prevent unintended mistakes and maximize website performance.

4.1 Using robots.txt creation tools

You can write robots.txt manually, but online tools let you do it faster and with fewer mistakes. These tools generate a robots.txt file automatically once you input the necessary directives, which helps reduce syntax errors and rule mistakes.

Representative tools include the following.

  • Google Search Consoleのrobots.txtテスター:

    Search Console内蔵のツールで、robots.txtの作成、編集、テストが可能です。既にSearch Consoleを利用している場合は、最も手軽に利用できるツールと言えるでしょう。

  • SEOチェッカーツール:

    様々なSEOツールの中には、robots.txt生成機能を備えているものがあります。これらのツールは、他のSEO機能と合わせて利用できるため、ウェブサイト全体の最適化を進める上で便利です。

  • その他のオンラインrobots.txtジェネレーター:

    ウェブ検索で「robots.txt ジェネレーター」と検索すると、様々な無料ツールが見つかります。これらのツールは、シンプルなrobots.txtを作成するのに適しています。

Which tool is best depends on your needs and the size of the website.

4.2 Testing robots.txt in Google Search Console

Once you create robots.txt, you must test it to verify that crawlers interpret it correctly. Google Search Console provides a robots.txt testing tool that can show whether a specific URL is crawlable and whether there are mistakes in the file.

The testing process is as follows.

  1. Google Search Consoleにアクセスし、対象のウェブサイトのプロパティを選択します。

  2. 左側のメニューから「robots.txtテスター」を選択します。

  3. テストしたいURLを入力し、「テスト」ボタンをクリックします。

  4. 結果画面で、URLがクロール可能かどうか、どのディレクティブが適用されているかなどを確認します。

Whenever you change robots.txt, use this tool and confirm that the file works exactly as intended.

4.3 Reviewing and fixing robots.txt

Because robots.txt is placed in the root directory of a website, you can open it directly in a browser, review its contents, and revise it if necessary. For example, accessing https://example.com/robots.txt will display the file.

When making corrections, open robots.txt in a text editor, make the necessary changes, and upload it to the server. Because search engines need to refresh their cache, it may take a little time before the changes are reflected.

The robots.txt tester in Google Search Console lets you edit and test at the same time, making it easier to iterate on corrections and verification.

By following these steps, you can keep robots.txt in an optimal state and improve both SEO and site performance.

Chapter 5: Crawler control beyond robots.txt

Differences from the meta robots tag and how to use each

The meta robots tag is used to control crawlers on an individual page basis. When used together with robots.txt, it enables finer control. Noindex instructs search engines not to index a page, and nofollow instructs them not to follow links. If you add noindex to a page that has also been blocked from crawling with robots.txt, it may help remove an already indexed page from search results in some cases.

Using it together with noindex and nofollow

You can specify multiple directives separated by commas, such as noindex,follow.

Control through the X-Robots-Tag HTTP header

By using X-Robots-Tag in the HTTP response header, you can control crawling for non-HTML files such as PDFs and images as well. This requires server-side configuration.

Summary

Robots.txt is an indispensable tool for both SEO and website performance.

When you understand the points covered in this article and configure robots.txt properly, you can draw out the full potential of your website. It is important to stay current and keep optimizing robots.txt over time.

Appendix: robots.txt examples, including advanced ones

  • 特定のクローラーの特定のファイルタイプのみ許可:

User-agent: Googlebot-Image Allow: /images/*.jpg Allow: /images/*.png Disallow: / User-agent: * Disallow: /images/

  • 特定のクローラーのアクセスを遅延:

User-agent: AhrefsBot Crawl-delay: 10 User-agent: * Allow: /

Use these advanced patterns to optimize your website and move it toward success.