Robots.txt & Sitemap Validator

Validate robots.txt and sitemap.xml. Detect syntax errors, bad directives, and simulate Googlebot access.

Paste robots.txt Content

0 characters

Bot Access Simulator

Check if a bot can crawl a specific URL based on the rules above

URL to test

Bot name

Validation Results

Paste your robots.txt content to validate it instantly.

💡 Tip: Your robots.txt should live at https://yoursite.com/robots.txt

💡 Disallow: / blocks the entire site — only use on staging environments.

Robots.txt & Sitemap Validator

The Robots.txt Validator instantly checks your robots.txt file for syntax errors, conflicting directives, duplicate user-agents, and missing recommendations. The bonus Bot Access Simulator lets you test whether Googlebot, Bingbot, GPTBot, or any crawler would be allowed to access a specific URL, using real robots.txt matching logic (longest-match wins).

How to Validate Your robots.txt

Open your robots.txt: Navigate to yoursite.com/robots.txt in a browser, select all, and copy the text.
Paste and validate: Paste into the input box. Issues appear instantly, color-coded by severity: red (errors), yellow (warnings), blue (informational).
Simulate bot access: Enter a URL like /admin/settings and select a bot to instantly see if it would be blocked or allowed.
Validate your sitemap: Switch to the sitemap.xml tab, paste your file, and check for missing fields, duplicate URLs, or invalid date formats.

Frequently Asked Questions (FAQs)

What's the difference between robots.txt and noindex?

robots.txt controls whether a bot crawls a URL. A noindex meta tag controls whether an already-crawled URL gets indexed. If you block a URL in robots.txt, Google can't see the noindex tag inside — so use noindex for pages you want un-indexed but still crawlable.

How does robots.txt matching work?

Google uses the longest matching prefix rule: if an Allow and Disallow directive both match a URL, whichever has the longer pattern wins. Ties go to Allow. This validator simulates that exact logic.

What makes a valid sitemap?

A valid sitemap must be well-formed XML with a <urlset> root, and every <loc> must be an absolute URL. The <lastmod> should use W3C datetime format (YYYY-MM-DD), and <priority> must be between 0.0 and 1.0.

Can I block AI scrapers with robots.txt?

Yes — and no. Well-behaved AI crawlers like GPTBot and CCBot respect robots.txt. You can add specific user-agent blocks for them. However, poorly-behaved scrapers ignore it entirely, so robots.txt is not a security measure.