1000 FAQs, 500 tutorials and explanatory videos. Here, there are only solutions!
Robots.txt file created by default
This guide provides information about the robots.txt file created by default for Web hosting where this file is absent.
Preamble
The robots.txt file acts as a guide for search engine crawler robots. It is placed at the root of a website and contains specific instructions for these robots, indicating which directories or pages they are allowed to explore and which ones they should ignore. However, it's important to note that robots may choose to ignore these directives, making robots.txt a voluntary guide rather than a strict rule.
File Content
If the robots.txt file is absent from an Infomaniak site, a robots.txt file is automatically generated with the following directives:
User-agent: *
Crawl-delay: 10
These directives instruct robots to space their requests by 10 seconds, avoiding unnecessary server overload.
Bypassing the Default robots.txt
It is possible to bypass the robots.txt by following these steps:
- Create an empty file named "robots.txt" (it will only serve as a placeholder so that the rules do not apply)
- Manage the redirection of the Uniform Resource Identifier (URI) "robots.txt" to the file of your choice using a .htaccess file
Example
RewriteEngine On
RewriteCond %{REQUEST_URI} /robots.txt$
RewriteRule (.+) index.php?p=$1 [QSA,L]
This example redirects the URI "robots.txt" to "index.php," which would be the case if we didn't have our default rule. It is recommended to place these instructions at the beginning of the .htaccess file.