SYSMGR

We're a bunch of Computers: Diana, Daphne, and Dido, called the 3D-cluster, running OpenVMS, Io running OpenVMS as well (in some obscure role in the network) Aphrodite, Athene and Irene running WindowsXP-Pro (SP2, of course) and Cerberus at the edge of the Network, with Charon, also running Linux, as standby. SYSMGR takes care of us.

Monday, November 7

07-Nov-2005

A weekend job:
Hardware setup
Created a number of power lines, it's cheaper to create them than bying ready ones - I just needed to buy the plugs since I already got the wire. Three fro now, where I need five (and one spare, in case Charon is needed again) because the machine-side plugs were hard to get.
3D cluster in view...
Dido has now been set up - that is: basically: no network (DECNet/TCPIP), no queues...But all licences have been loaded and Dido is part of the cluster. It's a 144Mb box (2x64 + 4 x 8Mb), so if my spaer memory fits, it might even grow to 256. Still have to see Daphne's memory confuguration, might be similar, or smaller - in either case, update to more memory might be possible. Since the SIMMs were originally for Diana, it might work. Then, I would have a real 3D cluster (Diana, Daphne and Dido, all 256Mb - and all 8.2).
..but needs preparation
Removed the graphics card and installed a DE500B NIC instead (the machine as only 2 PCI slots: One for the Differential SCSI adaptor, and one for the second network card). The on-board 10Mb NIC will be used as a "system bus" to other cluster members), the other to be set to 100Mb Full Duplex for normal communication. The same will be done on Daphne, the opposite to Diana (where an additional DE500x will be set to 10Mb Full, or Half Duplex, as are the ones on the Alphastations - since that one doess not support 100Mb FD, as does the built-in one).
I got the HSZ50 and BA356 reference manuals via ITRC so the next thing to do - after I made a number of extra power cords - is configurating the controller and the disks beyond.
A test of Cerberus
It seems there is no trouble accessing the company's secured POP channel via Cerberus, since the site could be accessed by Diana - and I got the mail on Io. Wait and see, it might have come via another way.
Access to any external resource seems faster, once a name is resolved. It might be the issue that Cerberos cannot be used as a forwarde, perhaps I need to do some extra setup on Cerberus, or have Diana (or another machine) do all DNS - including external. No real problem, I think, when all VMS machines have been updated to 8.2.
Access to Io for mail (the test of Communigate!) is slower, needs a re-load, but that could be caused by attempts to contact Charon first. So Io's configuration needs to be checked.
Telnet access (to Diana, fro now) remains blocked, for some reason. Tried that the weekend but I couldn't find a clue in Cerberos' log whether is was blocked there. This is undesireable, so needs to be solved. Trying to find out why it's is not accepted.
But for the rest, all seems Ok. Not just from Diana, IO or Aphrodite, from Hera as well. The kids will like that!
(did I do somethig else: Of course I did. Saturday I cleaned the computer room and workplace, and sunday I had a 15 mile bike away + 15 mile walk back trip.)
Update: PROBLEMS
During the day the internet connection broke: Not any web was accessable. First thought was "power failure", but it turned out that Diana was just running smoothly. Examinaing Cerberus showed that the external address was zeroed, but the DNS references at the ISP were still correct. Power-cycled the router, and the address was restored.

IO seemes to have trouble also, with disks:

%%%%%%%%%%% OPCOM 7-NOV-2005 13:19:27.01 %%%%%%%%%%%
(from node IO at 7-NOV-2005 12:30:23.72)
Device $2$DKA0: (IO) is offline.
Mount verification is in progress.

%%%%%%%%%%% OPCOM 7-NOV-2005 13:43:20.55 %%%%%%%%%%%
Logfile time stamp

%%%%%%%%%%% OPCOM 7-NOV-2005 14:19:27.01 %%%%%%%%%%%
(from node IO at 7-NOV-2005 13:30:23.73)
Device $2$DKA100: (IO) is offline.
Mount verification is in progress.

%%%%%%%%%%% OPCOM 7-NOV-2005 14:43:20.65 %%%%%%%%%%%
Logfile time stamp

%%%%%%%%%%% OPCOM 7-NOV-2005 15:19:28.05 %%%%%%%%%%%
(from node IO at 7-NOV-2005 14:30:24.79)
Mount verification has aborted for device $2$DKA100: (IO)

and the last has happened a number of times. Nuw it looks Ok - but DKA100 could be broken, and that's the one that holds Communigate.

Not sure what has happened in both cases. Cerberos log doesn't show anything - and there is not time to investigate Io this moment, it's too late. See to that tomorrow.

0 Comments:

Post a Comment

<< Home