16-Mar-2006
Searching documents
For a project I'm looking for a way to easily search freetext in an application's documents (plain ASCII created on VMS) using a web interface. I found htdig via the Apache web pages at http://www.pdv-systeme.de/users/martinv/htdig/ and that proved very easy and good working (with some remarks - there are some wishes left, but these require some code changes in htsearch).
You can bet on it: once demonstrated, there came the request to include Microsft Word and PDF files as well. For that, I got the utilities to do that from the same site, installed them, pushed some Word documents and PDF files onto the VMS box and let the analysis-job run again.
Result: I could locate the Word documents using the search page! PDF files however, proved to be another matter. Digging through the logfile and the file contents in ASCII, I found that some are in PDF 1.0 format and these are fine (Great: the ones that worked were created on VMS using txt2pdf.exe by Craig Berry). But the ones I pushed (all PDF 1.3 created by a (Windowes based) java application) are not.
First impression was it was the conversion program, so I downloaded the lastest version of the retrieval conversion program used (pdftotext.exe) from http://frank.harvard.edu/~coldwell/vms/xpdf.html (pdftotext is part of the xpdf package), but the problem is elsewehere: the logfiles show the file is actually rertrieved but for some reason, not properly handled by the HTDIG program that does the retrieval.
Time to retrieve the (VMS) files and dig into it.
MySQL database
The database has been created! A few adaptions were made to the configuration file (the database is stored on another location than the default) and therefore I had to change the ownership of that location. Once that was done, it was a piece of cake!
Next task is to determine what software will be used for the new look-and-feel of the web(s)
Security
Funny thing found in operator.log:
%%%%%%%%%%% OPCOM 16-MAR-2006 21:13:03.24 %%%%%%%%%%%
Message from user TCPIP$SMTP on DIANA
TCPIP-W-SMTP_NOSPAMRLY, relay to <new_openrelay_test@internl.net>from client IP address 217.149.193.37 is suspected SPAM
This is my ISP testing the relay - to see whther it's an open one (not allowed) or not. Of course, it isn't.
They used to test regularly - but it has been suspended quite some time.
Other things found, wonder if that's to worry about:
%%%%%%%%%%% OPCOM 16-MAR-2006 21:16:55.42 %%%%%%%%%%%
Message from user TCPIP TELNET on DIANA
TELNET Logout Request from Remote Host: athene.intra.grootersnet.nl Port: 1143
%%%%%%%%%%% OPCOM 16-MAR-2006 21:17:12.19 %%%%%%%%%%%
Message from user INTERnet on DIANA
TELNET Login from Host: CERBEROS Port: 1192
...
%%%%%%%%%%% OPCOM 16-MAR-2006 22:22:02.97 %%%%%%%%%%%
Message from user TCPIP TELNET on DIANA
TELNET Logout Request from Remote Host: CERBEROS Port: 1192
I know I had opened a TELNET session on Athene using my access point, had a problem so logged out and in again. It seems that in some way, this is transferred to CERBEROS???
CERBEROS is the LINKSYS router and that has no TELNET software...
Mistake
I have two diskshelves connected to the HSZ50 but one of them doesn't contain production drives - so to save some power cost, I unplugged it.
Next, I made a mistake by hitting the reset button on the HSZ50. But Diana just signalled it lost connection to the system disk - and continues:
%%%%%%%%%%% OPCOM 16-MAR-2006 22:41:57.66 %%%%%%%%%%%
Device $116$DKA100: (DIANA PKB) is offline.
Mount verification is in progress.
%%%%%%%%%%% OPCOM 16-MAR-2006 22:41:57.68 %%%%%%%%%%%
Mount verification has completed for device $116$DKA100: (DIANA PKB)
This message occurred a number of times, but that was about it. No crash, no corruption....
0 Comments:
Post a Comment
<< Home