Robots.txt Generator

The role of robots.txt

A robots.txt file lives at the root of your domain and tells well-behaved crawlers which paths they are allowed to fetch. It is the oldest and simplest tool in the SEO toolbox, dating back to 1994. Despite its age, robots.txt is still the primary way to keep crawlers out of admin pages, draft URLs, search-result pages, and other low-value content that would otherwise dilute your site’s index.

Crucially, robots.txt is not a security mechanism. Bad actors will ignore it. Use it for crawl control, and keep sensitive content behind authentication.

Modern AI crawler controls

In the past two years, a wave of new crawlers has appeared, each scraping the web to train large language models. The robots.txt generator includes presets for the most common ones so you can allow or disallow them with a single toggle: OpenAI GPTBot, Anthropic ClaudeBot and anthropic-ai, Google-Extended (controls Bard / Gemini training without affecting Google Search), PerplexityBot, CCBot (Common Crawl), and Bytedance.

Best practices

Always start with a global User-agent: * block as a baseline.
Use Disallow: /admin/ to block sensitive directories.
Add a Sitemap: directive pointing to your sitemap.xml.
Test the result with Google’s robots.txt Tester before deploying.
Remember that Disallow does not de-index already-indexed pages — use a noindex meta tag for that.

Frequently asked questions

Where do I place robots.txt?

At the root of your domain, exactly at https://yourdomain.com/robots.txt. Subdirectory robots.txt files are ignored by crawlers.

How do I block GPTBot from training on my site?

Add User-agent: GPTBot followed by Disallow: / to your robots.txt. The generator includes a one-click preset for GPTBot and other AI crawlers.

What is Google-Extended?

A separate user-agent token that controls whether Google can use your content to train Bard and Gemini, independently of Google Search crawling. Block Google-Extended to opt out of generative AI training while still being indexed.

Which AI crawlers can I control?

GPTBot (OpenAI), ClaudeBot and anthropic-ai (Anthropic), Google-Extended (Google AI), PerplexityBot (Perplexity), CCBot (Common Crawl), Bytespider (Bytedance), and Applebot-Extended (Apple Intelligence).

Does robots.txt block bad bots?

No. Robots.txt is a voluntary protocol. Well-behaved crawlers respect it; malicious bots ignore it. For security, use authentication, rate limiting, or a web application firewall.

How do I allow specific bots and disallow others?

Add a Disallow: / under User-agent: * as a global block, then add a User-agent: Googlebot section with Allow: / to whitelist specific crawlers. The generator handles this pattern automatically.

About this tool

The role of robots.txt

Modern AI crawler controls

Best practices

Frequently asked questions