Robots.txt
A text file that instructs search engine crawlers which pages they can or cannot access.
Understanding Robots.txt
Robots.txt is a file placed at the root of your website (example.com/robots.txt) that provides instructions to search engine crawlers about which areas of your site they should or should not visit. It uses simple directives like Allow, Disallow, and Sitemap to guide crawlers. While robots.txt can prevent crawling, it does not prevent indexing — a page blocked by robots.txt can still appear in search results if other pages link to it. For preventing indexing, use the noindex meta tag instead. Common uses include blocking admin areas, search result pages, and duplicate content from being crawled.
Keep learning
Canonical URL
An HTML element that tells search engines which URL is the preferred version of a page.
Crawl Budget
The number of pages a search engine will crawl on your site within a given timeframe.
Sitemap
An XML file that lists all important pages on your website to help search engines discover and crawl them.
Indexing
The process by which search engines store and organize web pages in their database for retrieval in search results.
Track robots.txt and more with Optic Rank
Get AI-powered SEO intelligence that puts glossary knowledge into actionable insights.