01-Nov-2005
Bad habits emerge...
Read my comments on Google on 25 Oct 2005. I sent them a message and this was their reply:
As you may know, Google finds sites and FTPs for our index when our robots crawl the web, following hyperlinks. To restrict crawling of your ftp server, please disallow anonymous access to the server. This will prevent our robots from crawling it.
I know they do, and therefor I have a robot.txt on each of the webs to prevent them crawling some paths. I think they obey the rules in robot.txt because paths I blocked do not occur in their indexes. I do not know how their bots work, but for what I see in the logs of the firewall and the webserver, the bot accesses port 80, reads robot.txt and fllows allowed links.
But for anonymous FTP sites, I think they use a different approach: just access port 21 and crawl down. Removal of anonymous FTP to prevent the bot accessing the anonymous website in unaccaptable. I told them:
I consider this unacceptable.I can block (I hope) unwanted crawling of the website by specifying accaptable paths in robot.txt. Tne very same method should be used by crawling anonymous FTP sites. I may need anonymous FTP but do not want them crawled.
I will keep a keen eye on google's access to the site.
Besides that: It's not the access that bothered me. It's the access to devices they seem to access (and that failed). So they just missed the point.
0 Comments:
Post a Comment
<< Home