# # robots.txt 2010.04.29 # Googlebot is very severely mentally challenged. # Add disallow directives for directories that are not even there, # and haven't been for over 5 weeks now. # This is merely to try to get around having my request to delete the # non-existant directories from the search database being denied. # # robots.txt 2010.04.16 # Add specific directives for exabot, including a crawl delay. # Reduce the slurp (Yahoo) crawl delay (which it doesn't seem to obey anyhow). # Disallow googlebot-image. # # robots.txt 2010.04.13 # disallow taptubot, the mobile device crawler # # robots.txt 2010.04.01 # Yet another attempt to get web crawlers not to index old versions of index.html files. # All old version are called index_0???.html. # # robots.txt 2010.03.19 # Archives have been moved to a seperate directory. Add disallow directive. # # robots.txt 2010.02.10 # The Yandex web crawler behaves in a very strange manor. Block it. # Ask Robots not to copy PDF files. # # robots.txt 2009.12.07 # Fix some syntax based on feedback from http://tool.motoricerca.info/robots-checker.phtml # # robots.txt 2009.12.04 # There are still issues with googlebot. I don't want old versions of index.html # type pages indexed, but I do want the photoshop elements generated pages indexed. # Try some new directives. # # robots.txt 2009.09.09 # Googlebot is not ignoring the rebuilt directory and is obtaining .MOV videos. # Add some more googlebot specific directives. # # robots.txt 2009.07.27 # Googlebot directives are case sensitive. Add .JPG to .jpg ignore directives. # Googlebot is not ignoring old index pages as global directive indicates to. Try a googlebot # specific directive. # # robots.txt 2009.04.12 # Some robots, for example googlebot, obey global directives as well as googlebot specific directives. # Other robots, for example slurp (Yahoo) and msnbot, only obey their specific directives. # The robots.txt standard is rather weak, incomplete, and generally annoying. # Add tons of the same specific directives to each robot area. # Try to change no index Christmas pages to include a wildcard. # # robots.txt 2008.12.03 # Block the Cuil (twiceler) robot entirely. # # robots.txt 2008.11.23 # The majestic robot comes in bursts at a high rate. Just block it. # The Cuil robot comes to much. Try to slow it down. # # robots.txt 2008.07.03 # Now msnbot has started to grab images. Try to stop it. # Googlebot is grabbing PNG files. Try to stop it. # # robots.txt 2007.11.20 # Try to disallow the panscient.com web crawler. # # robots.txt 2007.08.23 # Still search engine pages do not agree with contents of robots.txt file. # Add specific disallow for ~doug/rebuilt. # - put global user agent lines after specific ones. # - next will be to repeat global lines in each specific agent area. # # robots.txt 2007.05.03 # Now Googlebot has started to grab images. Try to stop it. # For whatever reason, google is mainly showing my re-built directory. It # never seems to go back to the higher level page that now has meta tags # telling it not to index those pages. Put in a global disallow. # Add some other global disallows, that I got behind on. # # robots.txt 2007.03.13 # stupid yahoo slurp comes all the time now. It supports a non-standard delay command. # so add the command. The web site doesn't state the units of measure. # # robots.txt 2007.02.11 # yahoo, slurp seems to now obey the non-standard ignore this type of file wildcard usage # try it. # # robots.txt 2006.12.29 # Delete instructions for directories that don't exist anymore # # robots.txt 2004:12:21 # Try to eliminate yahoo.com grabbing images. # Can only think of global deny. # Can not find Yahoo name, try one shown below. # # robots.txt 2004:11:16 # Try to eliminate alexa.com grabbing images. # InkTomi comes too often, can them entirely. # # robots.txt 2004:07:16 # Try to eliminate picsearch.com grabbing images. # # robots.txt 2004:07:09 # Try to eliminate altavista grabbing images. # # robots.txt for www.smythies.com 2003:12:21 # User-agent: panscient.com Disallow: / User-agent: vscooter Disallow: / User-agent: psbot Disallow: / User-agent: ia_archiver Disallow: / User-agent: MJ12bot Disallow: / User-agent: twiceler Disallow: / User-agent: Yandex Disallow: / User-agent: taptubot Disallow: / User-agent: Googlebot-Image Disallow: / User-agent: Slurp Crawl-delay: 3600 Disallow: /*.jpg Disallow: /*.png Disallow: /*.PDF Disallow: /*.pdf Disallow: /disclaimer.html Disallow: /security.html Disallow: /poweredby.html Disallow: /about_smythies.html Disallow: /unused_link.html Disallow: /old_pages.html Disallow: /index_0* Disallow: /*index_0*$ Disallow: /digital_camera/ Disallow: /lab/ Disallow: /xmas_* Disallow: /~doug/archives/ User-agent: Googlebot Disallow: /*.jpg$ Disallow: /*.JPG$ Disallow: /*.png$ Disallow: /*.PDF$ Disallow: /*.pdf$ Disallow: /index_0*$ Disallow: /*index_0*$ Disallow: /xmas_* Disallow: /~doug/archives/ Disallow: /~doug/2010.01.23/ Disallow: /~doug/2007.11.20/ Disallow: /~doug/2004.06.26/ Disallow: /digital_camera/ Disallow: /old_pages.html Disallow: /unused_link.html Disallow: /disclaimer.html Disallow: /security.html Disallow: /about_smythies.html Disallow: /poweredby.html Disallow: /*.MOV Disallow: /*.mov Disallow: /*.AVI Disallow: /*.avi User-agent: Exabot Crawl-delay: 3600 Disallow: /*.jpg$ Disallow: /*.JPG$ Disallow: /*.png$ Disallow: /*.PDF$ Disallow: /*.pdf$ Disallow: /index_0*$ Disallow: /*index_0*$ Disallow: /xmas_* Disallow: /~doug/archives/ Disallow: /digital_camera/ Disallow: /old_pages.html Disallow: /unused_link.html Disallow: /disclaimer.html Disallow: /security.html Disallow: /about_smythies.html Disallow: /poweredby.html Disallow: /*.MOV$ Disallow: /*.mov$ Disallow: /*.AVI$ Disallow: /*.avi$ User-agent: msnbot Disallow: /*.jpg$ Disallow: /*.png$ Disallow: /*.PDF$ Disallow: /*.pdf$ Disallow: /disclaimer.html Disallow: /security.html Disallow: /poweredby.html Disallow: /about_smythies.html Disallow: /unused_link.html Disallow: /old_pages.html Disallow: /index_0* Disallow: /*index_0*$ Disallow: /digital_camera/ Disallow: /lab/ Disallow: /xmas_* Disallow: /~doug/archives/ User-agent: * Disallow: /*.jpg Disallow: /*.png Disallow: /*.PDF Disallow: /*.pdf Disallow: /disclaimer.html Disallow: /security.html Disallow: /poweredby.html Disallow: /about_smythies.html Disallow: /unused_link.html Disallow: /old_pages.html Disallow: /index_0* Disallow: /*index_0*$ Disallow: /digital_camera/ Disallow: /lab/ Disallow: /xmas_* Disallow: /~doug/archives/