Skip to content
Vol. 1 · Ed. 2026
CyberGlossary
Entry № 942

robots.txt

What is robots.txt?

robots.txtA text file served at the site root that tells well-behaved web crawlers which paths they may or may not fetch, formalized in IETF RFC 9309.


robots.txt is a plain-text file placed at the root of a website (for example https://example.com/robots.txt) that uses the Robots Exclusion Protocol standardized as IETF RFC 9309. It contains User-agent and Disallow/Allow directives that compliant crawlers consult before fetching URLs, and it can advertise the sitemap location. robots.txt is an advisory mechanism, not an access control: malicious bots routinely ignore it, and listing sensitive paths there often acts as a roadmap for attackers. Defenders should pair robots.txt with proper authentication, authorization, rate limiting, and noindex tags, and avoid using it to hide secret URLs.

Examples

  1. 01

    Disallow: /admin/ entry that good crawlers respect but attackers use as a hint to probe the admin path.

  2. 02

    A Sitemap: https://example.com/sitemap.xml line that helps search engines index public content.

Frequently asked questions

What is robots.txt?

A text file served at the site root that tells well-behaved web crawlers which paths they may or may not fetch, formalized in IETF RFC 9309. It belongs to the Application Security category of cybersecurity.

What does robots.txt mean?

A text file served at the site root that tells well-behaved web crawlers which paths they may or may not fetch, formalized in IETF RFC 9309.

How does robots.txt work?

robots.txt is a plain-text file placed at the root of a website (for example https://example.com/robots.txt) that uses the Robots Exclusion Protocol standardized as IETF RFC 9309. It contains User-agent and Disallow/Allow directives that compliant crawlers consult before fetching URLs, and it can advertise the sitemap location. robots.txt is an advisory mechanism, not an access control: malicious bots routinely ignore it, and listing sensitive paths there often acts as a roadmap for attackers. Defenders should pair robots.txt with proper authentication, authorization, rate limiting, and noindex tags, and avoid using it to hide secret URLs.

How do you defend against robots.txt?

Defences for robots.txt typically combine technical controls and operational practices, as detailed in the full definition above.

What are other names for robots.txt?

Common alternative names include: Robots Exclusion Protocol, RFC 9309.

Related terms