In this tutorial, we will study what is robots.txt, what is it used for, how can we create it and how can we test it for its validity or not. So let us start.
What is robots.txt?
The robots.txt file is used to tell the search engine crawlers about which pages or files they can crawl and which they can not request from the web site. This is mainly used to avoid overloading our website with requests. Don’t confuse yourself by assuming it to be a mechanism used for keeping away web pages out of google. For this purpose, we can use noindex. directives or password-protect our web page.
What is robots.txt used for?
To manage traffic on the web page
We can use robots.txt to manage the crawling traffic in case if we feel that our web server will be overwhelmed by requests from google crawlers. We can avoid crawling unimportant or similar pages on our website.
We should not use robots.txt to hide our web page from the Google search result because in case other pages point to our page along with descriptive text then our page could however be indexed without visiting the page.
To manage traffic and hide media file from google
We can use robots.txt to manage the traffic and preventing image, video, and audio files from appearing in the google search results. But this won’t prevent the users or other pages from linking to the images, videos, audio files on the webserver.
To manage traffic and hide resource file from google
We can use robots.txt to block resource files like unimportant image, script, style files. After blocking the resources the web crawler will not be able to understand the page and thus it will affect the analysis of the page that depends on these resources.
Learn What is .htaccess File
How to create robots.txt file?
A new robots.txt file can be created by using a plain text editor of choice. In case, if we already have a robots.txt file we should make sure that we have deleted the text inside the file.
- Set the user agent
Start making the file by setting up the user agent.
We can do this by using asterisk after writing a user-agent term.
- Next, type “disallow”. Do not type anything after disallow.
- Next, there is nothing after the disallow so the web robots will be directed to crawl our entire website. Our robots.txt file will be looking like this:
- We can link the XML sitemap to this. It is completely your choice.
How to test robots.txt file?
- Start by opening the tester tool for the site, and scroll down through the robots.txt code to locate the highlighted syntax warnings and logic errors. The number of syntax warnings and logic errors will be shown immediately below the editor.
- Type in the URL of a page on your site in the text box which is at the bottom of the page.
- Select the user-agent you want to simulate in the dropdown list which is located to the right of the text box.
- Click the TEST button to test access.
- Check to see if the TEST button now reads ACCEPTED or BLOCKED to find out if the URL you entered is blocked from Google web crawlers.
- Edit the file on the page and retest as required. Note that changes made on the page are not saved to your site! See the next step.
- Copy the changes to your robots.txt file on the web site. By using this tool we can not make changes to the actual file on the site, it only tests against the copy hosted in the tool.