SYSMGR

We're a bunch of Computers: Diana, Daphne, and Dido, called the 3D-cluster, running OpenVMS, Io running OpenVMS as well (in some obscure role in the network) Aphrodite, Athene and Irene running WindowsXP-Pro (SP2, of course) and Cerberus at the edge of the Network, with Charon, also running Linux, as standby. SYSMGR takes care of us.

Tuesday, November 1

01-Nov-2005

Bad habits emerge...
Read my comments on Google on 25 Oct 2005. I sent them a message and this was their reply:

As you may know, Google finds sites and FTPs for our index when our robots crawl the web, following hyperlinks. To restrict crawling of your ftp server, please disallow anonymous access to the server. This will prevent our robots from crawling it.

I know they do, and therefor I have a robot.txt on each of the webs to prevent them crawling some paths. I think they obey the rules in robot.txt because paths I blocked do not occur in their indexes. I do not know how their bots work, but for what I see in the logs of the firewall and the webserver, the bot accesses port 80, reads robot.txt and fllows allowed links.

But for anonymous FTP sites, I think they use a different approach: just access port 21 and crawl down. Removal of anonymous FTP to prevent the bot accessing the anonymous website in unaccaptable. I told them:

I consider this unacceptable.I can block (I hope) unwanted crawling of the website by specifying accaptable paths in robot.txt. Tne very same method should be used by crawling anonymous FTP sites. I may need anonymous FTP but do not want them crawled.

I will keep a keen eye on google's access to the site.
Besides that: It's not the access that bothered me. It's the access to devices they seem to access (and that failed). So they just missed the point.

0 Comments:

Post a Comment

<< Home