The benefits of
1., almost all search engine Spider follows the crawling rules given by robots.txt, which specifies the search engine Spider into
The entry of a
into a web site is the robots.txt of the site, provided that the site exists. For no robots.txt website, Spider will be redirected to the 404 error page, related research shows that if the site uses a custom 404 error page, then Spider will take it as robots.txt, although it is not a pure text file which will give Spider index. A great deal of distress, affect the search engine on the web page included.
2. robots.txt can prevent unnecessary search engines from occupying the valuable bandwidth of the server, such as email, retrievers, and such search
engines do not make sense for most websites; and, like image strippers, it doesn’t make much sense for most non graphics sites, but consumes a lot of bandwidth.
3. robots.txt can stop crawling and indexing search engine on non public pages, such as website backstage procedures, management procedures, in fact, for some in the operation of the temporary page of the site, if robots.txt is not configured, the search engine will even index the temporary file.
4., it’s more important to configure robots.txt for sites that are rich in content and have lots of pages, because a lot of times,
will experience tremendous pressure from search engine Spider to give web sites: flood like Spider access, without control, can even affect the normal access of the site.
5. also, if the site of memory in the repetitive content, use robots.txt to limit the part of the page not indexed by search engines and included, can avoid the website by the search engine on the duplicate content punishment, ensure the ranking of the site is not affected.
The risks and solutions posed by
1. has advantages and disadvantages, and robots.txt poses a certain risk: it also points out the directory structure and the location of the private data to the attacker. Although the Web server’s security measures are properly configured, this is not a serious problem, but after all, it reduces the difficulty of those malicious attacks.
, for example, if you have a website such as www.>