Knowledge base

1000 FAQs, 500 tutorials and instructional videos. Here, there are only solutions!

Manage the default created robots.txt file

This guide provides information about the robots.txt file created by default for web hosting where this file is missing.

 

Preamble

  • The robots.txt file acts as a guide for search engine crawler robots
  • It is placed at the root of a website and contains specific instructions for these robots, indicating which directories or pages they are allowed to explore and which they should ignore
  • However, note that robots may choose to ignore these directives, making the robots.txt a voluntary guide rather than a strict rule

 

File Content

If the robots.txt file is missing from an Infomaniak site, a file of the same name is automatically generated with the following directives:

User-agent: *
Crawl-delay: 10

These directives tell the robots to space out their requests by 10 seconds, which prevents unnecessarily overloading the servers.

 

Bypassing the Default robots.txt

It is possible to bypass the robots.txt by following these steps:

  1. Create an empty robots.txt file (it will only serve as a placeholder so that the rules do not apply).
  2. Manage the redirection of the URI (Uniform Resource Identifier) robots.txt to the file of your choice using a .htaccess file.

Example

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} /robots.txt$
RewriteRule ^robots\.txt$ index.php [QSA,L]
</IfModule>

Explanations

  • The mod_rewrite module of Apache is enabled to allow redirections.
  • The condition RewriteCond %{REQUEST_URI} /robots.txt$ checks if the request concerns the robots.txt file.
  • The rule RewriteRule ^robots\.txt$ index.php [QSA,L] redirects all requests to robots.txt to index.php, with the [QSA] option that preserves the query parameters.

It is recommended to place these instructions at the beginning of the .htaccess file.


Has this FAQ been helpful?