Validate robots.txt and sitemap.xml. Detect syntax errors, bad directives, and simulate Googlebot access.
0 characters
Check if a bot can crawl a specific URL based on the rules above
Paste your robots.txt content to validate it instantly.
π‘ Tip: Your robots.txt should live at https://yoursite.com/robots.txt
π‘ Disallow: / blocks the entire site β only use on staging environments.
The Robots.txt Validator instantly checks your robots.txt file for syntax errors, conflicting directives, duplicate user-agents, and missing recommendations. The bonus Bot Access Simulator lets you test whether Googlebot, Bingbot, GPTBot, or any crawler would be allowed to access a specific URL, using real robots.txt matching logic (longest-match wins).
yoursite.com/robots.txt in a browser, select all, and copy the text./admin/settings and select a bot to instantly see if it would be blocked or allowed.robots.txt controls whether a bot crawls a URL. A noindex meta tag controls whether an already-crawled URL gets indexed. If you block a URL in robots.txt, Google can't see the noindex tag inside β so use noindex for pages you want un-indexed but still crawlable.
Google uses the longest matching prefix rule: if an Allow and Disallow directive both match a URL, whichever has the longer pattern wins. Ties go to Allow. This validator simulates that exact logic.
A valid sitemap must be well-formed XML with a <urlset> root, and every <loc> must be an absolute URL. The <lastmod> should use W3C datetime format (YYYY-MM-DD), and <priority> must be between 0.0 and 1.0.
Yes β and no. Well-behaved AI crawlers like GPTBot and CCBot respect robots.txt. You can add specific user-agent blocks for them. However, poorly-behaved scrapers ignore it entirely, so robots.txt is not a security measure.