Robot Exclusion |
Robot.txt files provide a protocol that will help all search engines navigate a Web site. If propriety or privacy is an issue, we suggest you identify folders on your Web site that should be excluded from searching. Using robots.txt file, these folders then can be made off-limits. The following discussion about robots will be updated frequently. The Ultraseek robot respects the use of the robots.txt file. Starting at the root URL, the spider proceeds through the site based on links from this root. The robots.txt file will also help other search engines traverse your Web site while excluding entry to areas not desired. To facilitate this, many Web robots offer facilities for Web site administrators and content providers that limit robot activities. This exclusion can be achieved through two mechanisms: The Robots Exclusion Protocol A Web site administrator can indicate which parts of the site should not be visited by a robot by providing a specially formatted file on their site in http://.../robots.txt. The robots.txt file needs to reside in the root directory of the Web site!
User-agent: *
In this example, three directories are excluded. The line User-agent specifies which robots are allowed to enter the
site. In this case the * signals that all robots are allowed to pass. You
need a separate "Disallow" line for every URL prefix you want to
exclude; you cannot say "Disallow: /cgi-bin/ /tmp/".
The Robots META tag A Web author can indicate if a page may be indexed or analyzed for links through the use of a special HTML META tag. The tag looks like the one below and would be located with other metatags in the <HEAD> area of the Web page Within the robot's META tag are directives separated by commas. The INDEX directive tells an indexing robot to index the page. The FOLLOW directive specifies a robot to follow links on the page. Both INDEX and FOLLOW are defaults. The values ALL and NONE set all directives on or off: ALL=INDEX,FOLLOW and NONE=NOINDEX,NOFOLLOW. Here are some examples: Unfortunately, this metatag has a few drawbacks: Few robots adhere to the standard and not many people know about and use the Robots metatag. In addition, there is no individual robot exclusion. This may change soon.<meta name="robots" content="index,follow"> <meta name="robots" content="noindex,follow"> <meta name="robots" content="index,nofollow"> <meta name="robots" content="noindex,nofollow"> For more information on robots visit The Web Robots Pages |
|
|