Product Details

ProLiant Cluster Series S Model 100 User Guide
Third Edition (July 1998)
Part Number 340704-003
Compaq Computer Corporation
Notice
The information in this publication is subject to change without notice.
COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR
EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR
CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR
USE OF THIS MATERIAL. THIS INFORMATION IS PROVIDED "AS IS" AND COMPAQ
COMPUTER CORPORATION DISCLAIMS ANY WARRANTIES, EXPRESS, IMPLIED OR
STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WARRANTIES OF
MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, GOOD TITLE AND AGAINST
INFRINGEMENT.
This publication contains information protected by copyright. No part of this publication may be
photocopied or reproduced in any form without prior written consent from Compaq Computer
Corporation.
1998 Compaq Computer Corporation.
All rights reserved. Printed in the U.S.A.
The software described in this guide is furnished under a license agreement or nondisclosure agreement.
The software may be used or copied only in accordance with the terms of the agreement.
Compaq, Deskpro, Fastart, Compaq Insight Manager, Systempro, Systempro/LT, ProLiant, ROMPaq,
QVision, SmartStart, NetFlex, QuickFind, PaqFax, ProSignia, registered United States Patent and
Trademark Office.
Netelligent, Systempro/XL, SoftPaq, QuickBlank, QuickLock are trademarks and/or service marks of
Compaq Computer Corporation.
Microsoft, MS-DOS, Windows, and Windows NT are registered trademarks of Microsoft Corporation.
Other product names mentioned herein may be trademarks and/or registered trademarks of their
respective companies.
Compaq ProLiant Cluster Series S Model 100
Third Edition (July 1998)
Part Number 340704-003
iii
Contents
About This Guide
Audience ....................................................................................................................... ............... ix
Scope .......................................................................................................................... ................. ix
Additional Resources ........................................................................................................... ........ x
Text Conventions ............................................................................................................... .......... x
Symbols in Text........................................................................................................................... xi
Getting Help ................................................................................................................... ............. xi
Compaq Website ................................................................................................................. . xi
Telephone Numbers ............................................................................................................ xi i
Part I - Introduction to Compaq ProLiant Clusters
Chapter 1
Clustering Overview
Clusters Defined ............................................................................................................... ......... 1-2
Causes of Computer Downtime ................................................................................................ 1-3
Software Failures .............................................................................................................. . 1-3
Planned Service.................................................................................................................. 1-3
Hardware Failures .............................................................................................................. 1-4
Environmental Causes........................................................................................................ 1-5
Cost of Computer Downtime .................................................................................................... 1- 6
Productivity Loss .............................................................................................................. . 1-6
Cost of Servicing a Failed System ..................................................................................... 1-7
Lost Transactions .............................................................................................................. . 1-7
Customer and End User Dissatisfaction ............................................................................ 1-7
Availability Concepts................................................................................................................ 1-8
What Is High Availability? ................................................................................................ 1-8
What Is Scalability? ......................................................................................................... 1- 10
Summary ........................................................................................................................ ......... 1-10
Compaq ProLiant Cluster Series S Model 100
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
iv
Chapter 2
Architecture of the Compaq ProLiant Cluster/S100
Compaq ProLiant Servers ........................................................................................................ . 2-2
Clustering Shared Storage......................................................................................................... 2-3
Compaq ProLiant Storage Systems ................................................................................... 2-4
Compaq SMART-2 Array Controllers............................................................................... 2-6
Compaq Recovery Server Option ...................................................................................... 2-8
On-Line Storage Controller Recovery Option................................................................... 2-9
SCSI Disks ..................................................................................................................... .. 2-10
Shared Storage and Microsoft Cluster Server.................................................................. 2-10
Cluster Interconnect ........................................................................................................... ..... 2-11
Recovery Server Interconnect versus Compaq ProLiant Cluster/S100 Cluster
Interconnect ................................................................................................................... ... 2-11
Interconnect Adapters ...................................................................................................... 2-12
Private vs. Public Interconnect......................................................................................... 2-12
Connecting the Interconnect Adapters............................................................................. 2-12
Increasing Availability of Intra-Cluster Communication ................................................ 2-14
Interconnect Bandwidth ................................................................................................... 2-15
Local Area Network ............................................................................................................. ... 2-15
Software Components ............................................................................................................ . 2-16
Microsoft Software .......................................................................................................... 2-1 6
Compaq Software............................................................................................................. 2-1 6
Application Software ....................................................................................................... 2-18
Chapter 3
Microsoft Cluster Server and Compaq ProLiant Cluster/S100
High Availability Storage.......................................................................................................... 3-2
Compaq Recovery Server Option ............................................................................................. 3-3
Terminology .................................................................................................................... .......... 3-4
Microsoft Cluster Server ....................................................................................................... .... 3-4
Cluster Group Concepts ..................................................................................................... 3-5
Cluster Failover/Failback Concepts .......................................................................................... 3-7
Failover ....................................................................................................................... ....... 3-7
Failback ....................................................................................................................... ..... 3-10
Clustering Applications and Services Concepts ..................................................................... 3-13
Cluster-Aware Applications............................................................................................. 3-14
Non-Cluster Aware Applications..................................................................................... 3-15
Cluster Aware Databases ................................................................................................. 3-17
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
v
Chapter 4
Designing Your Compaq ProLiant Cluster/S100
Planning Considerations............................................................................................................ 4-1
Cluster Configurations ....................................................................................................... 4- 2
Cluster Groups ................................................................................................................. .. 4-7
Configuring Applications and Services ........................................................................... 4-14
Reducing Single Points of Failure ................................................................................... 4-16
Part II - Clustering Planning and Installation
Chapter 5
Capacity and Failover/Failback Planning
Node Capacity.................................................................................................................... 5-3
Shared Storage Capacity .................................................................................................... 5-6
Networking Capacity ....................................................................................................... 5-10
Network Considerations.......................................................................................................... 5-11
Network Configuration .................................................................................................... 5-11
Migrating Network Clients .............................................................................................. 5-12
Failover/Failback Planning ..................................................................................................... 5-14
Performance After Failover ............................................................................................. 5-14
Cluster Server Thresholds and Periods ............................................................................ 5-15
Failover of Directly Connected Devices.......................................................................... 5-16
Manual vs. Automatic Failback ....................................................................................... 5-17
Failover and Failback Policies ......................................................................................... 5-18
Chapter 6
Setting Up Your Compaq ProLiant Cluster/S100
Installation Overview .......................................................................................................... ...... 6-2
Installing the Hardware ........................................................................................................ ..... 6-3
Verifying the Correct Level of Firmware .......................................................................... 6-3
Setting Up the Nodes ......................................................................................................... 6- 6
Server Interconnect Card ................................................................................................... 6-7
Setting up the Storage System ........................................................................................... 6-9
Installing the Software ........................................................................................................ .... 6-17
Prerequisites .................................................................................................................. ... 6-17
Software Installation Procedure ....................................................................................... 6-18
Uninstalling the Compaq ProLiant Cluster/S100 ............................................................ 6-25
Compaq ProLiant Cluster Series S Model 100
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
vi
Setting Up Your Compaq ProLiant Cluster/S100 continued
Verifying the Cluster Installation............................................................................................ 6-27
Verifying Creation of the Cluster .................................................................................... 6-27
Verifying Node Failover .................................................................................................. 6-28
Verifying Network Client Failover .................................................................................. 6-29
Setting up Cluster Groups and Cluster Resources .................................................................. 6-31
Part III - Cluster Management
Chapter 7
Managing Your Compaq ProLiant Cluster/S100
Cluster Management Concepts.................................................................................................. 7- 2
Managing a Cluster Without Interrupting Cluster Services............................................... 7-2
Managing a Cluster in a Degraded Condition.................................................................... 7-2
Managing Network Clients Connected to a Cluster .......................................................... 7-3
Remotely Managing a Cluster............................................................................................ 7-3
Cluster Events ................................................................................................................. ... 7-3
Uses of Microsoft Cluster Administrator.................................................................................. 7-4
Compaq Extensions to Cluster Administrator ................................................................... 7-4
Modifying Physical Cluster Resources ..................................................................................... 7-7
Removing Shared Storage.................................................................................................. 7-7
Adding A Shared Storage System...................................................................................... 7-7
SMART Array Expansion................................................................................................ 7-12
Physically Replacing a Cluster Node............................................................................... 7-13
Backing Up Your Cluster........................................................................................................ 7-16
Server-Based Backup ....................................................................................................... 7-16
LAN-Based Backup ......................................................................................................... 7-17
Failure During Backup..................................................................................................... 7-17
Managing Cluster Performance............................................................................................... 7-19
Chapter 8
Troubleshooting Your Compaq ProLiant Cluster/S100
Troubleshooting Installation Problems ..................................................................................... 8-2
You Receive the Error "RPC Server is Unavailable"........................................................ 8-2
Cluster Administrator Does Not Appear in the Start Menu............................................... 8-2
Node Performance is Sluggish and the Node Fails............................................................ 8-3
Cluster Server Installation Will Not Complete on the First Node..................................... 8-3
The Compaq ProLiant Cluster/S100 Resource(s) Cannot Be Brought Online.................. 8-4
Clients Do Not See the Cluster .......................................................................................... 8-5
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
vii
Troubleshooting Your Compaq ProLiant Cluster/S100 continued
Troubleshooting Node-to-Node Problems ................................................................................ 8-5
The Resources Failed Over and the Nodes do not See Each Other................................... 8-5
The Second Node Cannot Join the Cluster ........................................................................ 8-5
Troubleshooting Shared Storage Problems............................................................................... 8-6
A Cluster Node is Not Available for This Operation ........................................................ 8-6
Troubleshooting Client-to-Cluster Connectivity Problems ...................................................... 8-7
Clients Do Not See the Cluster .......................................................................................... 8-7
Clients Do Not See Virtual Servers ................................................................................... 8-7
Clients Cannot Access Any Resources on a Cluster Node................................................ 8-8
Clients Cannot Access Cluster Resources ......................................................................... 8-8
Clients Cannot Access a Group That Has Failed Over...................................................... 8-9
Troubleshooting Cluster Group and Cluster Resource Problems ............................................. 8-9
Troubleshooting Other Potential Problems............................................................................. 8-10
An Application Starts but Cannot Be Closed .................................................................. 8-10
A Resource Hangs When Taken Offline.......................................................................... 8-10
An IP Address Added to a Cluster Group Fails............................................................... 8-10
A Resource Fails Over but Does Not Fail Back .............................................................. 8-11
Glossary
Appendix A
Cluster Configuration Worksheets
Overview ....................................................................................................................... ........... A-1
Cluster Group Definition Worksheet ....................................................................................... A-2
Shared Storage Capacity Worksheet ........................................................................................ A-3
Group Failover/Failback Policy Worksheet............................................................................. A-4
Preinstallation Worksheet ...................................................................................................... .. A-5
Index
Compaq ProLiant Cluster Series S Model 100
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
ix
About This Guide
This User Guide provides information about the planning, installation,
configuration, and implementation of Compaq ProLiant Clusters.
Audience
This guide contains information that may be used by network administrators,
installation technicians, systems integrators, and other technical personnel in
the enterprise environment for the purpose of cluster planning, installation,
implementation, and maintenance.
IMPORTANT: This User Guide contains installation, configuration, and
maintenance information that can be valuable for a variety of users. If you are
installing the ProLiant Cluster but will not be administering the cluster on a daily
basis, please make this guide available for the person(s) who will be responsible for
the clustered servers when you have completed the installation.
Scope
Because Windows NT-based clusters are relatively new, this guide offers
significant background information about clusters as well as basic concepts
associated with designing clusters. This guide assists you in attaining the
following objectives:
Understanding basic concepts of clustering technology
s
Recognizing and utilizing the high availability features of Compaq
s
ProLiant clusters
Planning and designing your ProLiant Cluster configuration to meet
s
your business needs
Installing and configuring your ProLiant Cluster hardware and software
s
Using Compaq Insight Manager and Microsoft Cluster Server to manage
s
your ProLiant Cluster
Compaq ProLiant Cluster Series S Model 100
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
x About This Guide
Additional Resources
For additional information, refer to documentation related to specific hardware
and software components of your ProLiant Cluster, including, but not limited
to, the following:
Documentation related to the ProLiant servers you are clustering (for
s
example, manuals, posters, Performance and Tuning guides)
Microsoft NT 4.0/Enterprise Edition Administrator's Guide
s
TechNotes and other documents available from the Compaq website
s
(http://www.compaq.com)
Text Conventions
This document uses the following conventions to distinguish elements of text:
Keys appear in boldface. A plus sign (+) between two
Keys
keys indicates that they should be pressed
simultaneously.
USER INPUT User input appears in a different typeface and in
uppercase.
File names appear in uppercase italics.
FILENAMES
These appear in initial capital letters.
Menu Options,
Command Names,
Dialog Box Names
These always appear in uppercase.
COMMANDS,
DIRECTORY NAMES,
and DRIVE NAMES
Type When you are instructed to type information, type the
information without pressing the Enter key.
Enter When you are instructed to enter information, type the
information and then press the Enter key.
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
xi
Symbols in Text
These symbols may be found in the text of this guide. They have the
following meanings.
WARNING: Indicates that failure to follow directions in the warning could
! result in bodily harm or loss of life.
CAUTION: Indicates that failure to follow directions could result in damage
to equipment or loss of information.
IMPORTANT: Presents clarifying information or specific instructions.
NOTE: Presents commentary, sidelights, or interesting points of information.
Getting Help
If you have a problem and have exhausted the information in this guide, you
can get further information and other help in the following locations.
Compaq Website
The Compaq website has information on this product as well as the latest
drivers and Flash ROM images. You can access the Compaq website by
logging on to the Internet at http://www.compaq.com.
Compaq ProLiant Cluster Series S Model 100
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
xii About This Guide
Telephone Numbers
For the name of your nearest Compaq Authorized Reseller:
In the United States, call 1-800-345-1518
In Canada, call 1-800-263-5868
For Compaq Technical Support:
In the United States and Canada, call 1-800-386-2172
Elsewhere, call one of the numbers listed in the following table.
Compaq Worldwide Technical Support Telephone Numbers
Location Voice FAX
APD 65-7503030 65-7504909
Argentina 54-1 313 3100 54-1 313 3100 Ext 21
Australia 61-2-9911-1955 61-2-9911-1900
Austria 0222-87816-16 0222-87816-82
Bahrain 973-210-214
Belgium (02) 716-96-96 (02) 725-22-13
Brazil 55 11 5505-3600 55 11 5505-3922
Ext 4336
Canada 1-800-386-2172
Caribbean 1-800-345-1518
Central America 281-378-2206
Chile 562-274-3007
China 86-10-834-6721 86-10-834-6713
Colombia 571-345-0266 571-312-0157
Czech Republic 42-2-232-8772 42-2-232-8773
Denmark 45-90-4545 45-90-4595
Ecuador 593-2504540
continued
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
xiii
Compaq Worldwide Technical Support Telephone Numbers continued
Location Voice FAX
Europe/Middle East/Africa (49) 089-9933-2891
Finland 9800-206-720 90-6155-9899
(+358-800-1-206720) (+358-0-61559899
France (33 1) 41-33-4455 (33 1) 41-33-4263
Germany 0180-5-212111 089-9933-3399
Hong Kong 852-90116633 852-28671734
Hungary 36-1-201-8776 36-1-201-9696
India (91-80) 559-6023
Italy 392-57-90300 392-575-00686
Japan 0120-101589 +81 3-5402-5959
Korea 82-2-523-3575 82-2-3471-0321
Malaysia (603) 718-1636
Mexico (525) 229-7910 (525) 229-7988
Netherlands 06-91681616 06-8991116
New Zealand 649-307-3969
Norway 22-072-020 22-072-021
Poland 48-2-630-3535 48-2-630-3553
Portugal 351-1-4120132 351-1-4120654
Singapore 65-7503030 65-7504909
South Africa +27-11-728-6999 +27-11-728-3335
Spain 341-640-1302 341-640-0124
Sweden (46) 8 703 5240 (46) 8 703 5222
Switzerland 411 838 410/2222 01-837-0969
Taiwan (886) 2-3761170 (886) 2-7322660
Thailand 62-2-679-6222 62-2-679-6220
United Kingdom 44-81-332-3888 44-81-332-3409
United States 1-800-386-2172 1-800-345-1518
Venezuela (582) 953.69.44 (582) 952.86.70
Compaq ProLiant Cluster Series S Model 100
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003
File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
PART I
Introduction to
Compaq ProLiant Clusters
Writer: Caroline Juszczak Project: ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: PART1.DOC Last Saved On: 6/30/98 1:58 PM
1-1
Chapter 1
Clustering Overview
The computer industry has been using a wide range of solutions to counteract
the effects of computer system downtime for several years. Previously, these
solutions have been difficult to set up and expensive to maintain. Historically,
only mission-critical applications, such as those controlling stock exchange
trading floors and aerospace mission controls, were important enough to justify
expensive, proprietary clustering solutions.
As businesses have increased their reliance on computer systems in day-to-day
operations, the amount of acceptable downtime has decreased. Today, another
class of applications exists. They are business-critical applications: those that
are key to business success but not significant enough to justify the high price
of a proprietary clustering solution. More applications are becoming business-
critical; their failure causes lost revenue, decreased productivity, and,
potentially, customer dissatisfaction.
Due to the increasing demand to keep business-critical applications available,
clustering technology is entering mainstream, industry-standard computing.
These new clustering solutions use industry-standard hardware and software,
thereby providing key clustering features at a lower price than proprietary
clustering systems. They also give you the opportunity to increase the
usefulness and life span of software applications used.
Before examining the features and benefits of Compaq ProLiant Clusters, it is
helpful to understand the concepts and terminology of these traditional cluster
systems. Concepts and terminology addressed in this chapter include:
Clusters
s
Causes and costs of computer downtime
s
Availability
s
Compaq ProLiant Cluster Series S Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
1-2 Clustering Overview
Clusters Defined
Clustering is an integration of software and hardware technologies that enables
a set of loosely coupled servers and storage to present a single image to clients
and to operate as a single system. As a cluster, the group of servers offers a
level of availability and scalability that far exceeds the level obtained if each
cluster node operated as a standalone server. To end-users, this integration
translates into increased performance and data availability.
Cluster
Shared Storage
Node2
Node1
Interconnect
LAN
Clients
Figure 1-1. Diagram of a Cluster
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
1-3
Causes of Computer Downtime
Computer downtime is the period of time that a computer system cannot meet
the requests of its users. Computer downtime is inversely related to availability,
which characterizes the amount of time a computer system can meet the needs
of its users. So what causes a lack of data and application availability, and
thereby fosters the need for clustering? The following are the leading causes of
downtime:
Software failures
s
Planned service
s
Hardware failures
s
Environmental causes
s
Although the majority of failures occur in hardware, the majority of downtime
is due to software failures.
Software Failures
The most prominent software failure that affects ongoing operation is a hang
condition brought on by a processing error in application software or in the
operating system. Because clustering provides a mechanism to automatically
failover processes when a discernible software failure occurs, the overall
system operation can continue with minimal or no interruption.
Planned Service
All computer systems require downtime for service. A typical service event
might include the upgrade of a hardware component or replacement of an old or
broken hardware component. Service events are also used to install new
software, upgrade existing software, patch software with vendor-supplied fixes,
or even to modify application or operating system settings.
In a cluster, a single server (cluster node) can be taken offline, while another
server (the partner cluster node) takes on the workload of the offline server.
This configuration allows planned service to occur with minimal or no
interruption to the client's use of cluster-aware applications and data.
Compaq ProLiant Cluster Series S Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
1-4 Clustering Overview
Hardware Failures
The main causes of hardware problems are:
Disk drive failures
s
Computer system power supply failures
s
Cooling fan failures
s
Memory and bus errors
s
Adapter and controller card errors
s
The figure below shows that disk drive failures and power outages cause the
vast majority of hardware downtime.
Causes of Downtime
I/C Cards
Memory 4%
Fans 5% Disk Drives
8%
55%
Power Outages
28%
Figure 1-2. Causes of Hardware Downtime
Clustering provides a mechanism to detect and analyze hardware errors. If it is
determined that the error will result in computer system downtime, the cluster
fails over cluster-aware applications from one server to another, allowing the
overall system operation to continue with little or no interruption.
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
1-5
Technologies other than clustering can also alleviate the effects of a hardware
failure. For example, use of Redundant Array of Inexpensive Disks (RAID)
levels 1, 4, or 5 minimizes the impact drive failures have on the system. Use of
Uninterruptable Power Supplies (UPSs) gives system administrators time to
cleanly shut down the system or to find an alternate power supply when power
outages occur. Use of redundant network interface controllers (NICs) allows
sustainable network traffic even when one of the controllers experiences a
failure.
Still, clustering addresses more than just hardware failures and therefore is an
important addition to any business-critical computer system. Clustering
provides maximum protection against operating system, application, and
hardware failures.
Environmental Causes
Some computer downtime is a result of environmental causes. Examples of
environmental causes are:
Excessive humidity
s
Water damage
s
Extreme temperatures (high or low)
s
Physical damage
s
Power interruptions originating from outside power lines
s
Rodent attack
s
Dust/airborne particles
s
Vandalism
s
Natural disasters
s
If the computer system is contained in a "computer room" environment, the
likelihood of encountering an environmental problem is minimal. Still, if one
does occur and is detectable by the cluster management software, proper
actions can be taken by the cluster to ensure continued operation.
Compaq ProLiant Cluster Series S Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
1-6 Clustering Overview
Cost of Computer Downtime
Now that the causes of downtime are known, the next questions are: "How is
my computer system affected by these causes?" and "What is the cost of
downtime?"
Several factors must be included in the cost of downtime formula:
Productivity loss
s
Cost of servicing the failed system
s
Lost transactions
s
Customer or end user dissatisfaction
s
Each factor is weighted differently, depending on the critical nature of each as
it pertains to your business and to your specific application systems. For
example, downtime during peak hours of a point-of-sale operation would have
a much greater impact on customer satisfaction issues than downtime in an end-
of-day email server backup operation. To understand the true cost of computer
downtime in your business environment, you must examine each of the
following factors as they apply to your business-critical applications.
Productivity Loss
To calculate costs associated with the loss of productivity during system
downtime:
1. Determine the average hourly rate of the employees using the system.
2. Multiply the average hourly rate by the number of employees who are
unable to perform their work.
3. Multiply again by the number of hours the system is down.
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
1-7
Cost of Servicing a Failed System
Service technicians and system administrators are usually required to repair a
failed system.
1. Determine, on average, how much it costs per hour to have these people
repair the system.
2. Multiply the average hourly rate by the number of technicians and
system administrators working to solve the problem.
3. Multiply again by the number of hours the system is down.
Lost Transactions
While the system is up, it is performing transactions. These may be payroll
calculations for the HR department, sales transactions at a video rental store, or
ATM requests from bank customers. When the system is down, no transactions
are being performed. To calculate the cost of lost transactions:
1. Determine what business transactions are performed by this computer
system.
2. Apply an estimate of lost revenue per hour to these transactions.
3. Multiply the estimate of lost revenue by the number of hours the system
is down.
Customer and End User Dissatisfaction
Computer downtime causes varying levels of customer and end user
dissatisfaction. While dissatisfaction can be difficult to express in specific
dollar amounts, it is important to understand its effects on the financial aspects
of your business. If it is unreasonable in your business environment to assign a
specific cost to customer and end user dissatisfaction as a result of computer
downtime, at least be aware that these "hidden costs" exist.
Compaq ProLiant Cluster Series S Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
1-8 Clustering Overview
Availability Concepts
The previous section discussed the causes and costs of downtime and the
reasoning for increased availability. The next section discusses clustering
concepts that minimize the effects of downtime.
What Is High Availability?
Simply defined, availability is the measure of how well a computer system can
continuously deliver services to clients. This measure is dependent upon the
system's ability to prevent and recover from failures or "faults."
There are different classes of availability defined by a system's critical
application requirements:
Applications that require 100% uptime, where a failing component or
s
subsystem never interrupts the system's operation. For example, the
applications used on the stock exchange trading floor or mission control
in the aerospace industry are termed mission-critical.
Applications that can tolerate minimal interruption. For example,
s
Electronic Fund Transfers (EFT) in the banking industry are termed
business-critical applications. The vast majority of applications fall into
this business-critical category.
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
1-9
"High Availability" and "Fault Tolerant" are commonly used terms that
describe these different classes of availability. Correct use of these terms is
shown in the table below. Table 1-1 identifies the differences between the two
in percentage of uptime.
Table 1-1
Availability Definitions
% Uptime Downtime Class Example
99.0 3.5 days/year Conventional A standalone Compaq
ProLiant Server
99.9 8.5 hours/year High Availability Compaq ProLiant
Cluster/S100
Himalaya by Tandem, A
99.999 5 minutes/year Fault Tolerant
Compaq Company
(also known as
Non-Stop or
Continuous
Availability)
NOTE: A distinction must be made between the availability of a standalone server
versus that of a cluster. Availability of a standalone server includes only the
availability of the server itself, not the operating system, applications, or network
connections. Availability of a cluster includes not only server hardware availability,
but also availability of the operating system, the client/server applications, and to
some extent the network between the cluster and the client machines.
To distinguish between a high availability and a fault-tolerant system in terms
of design, a high-availability system includes many fault-tolerant features,
whereas, all of a true fault-tolerant system's components are fault tolerant. A
mission-critical application, which requires virtually 100% application system
uptime, will require a true fault-tolerant system. A business-critical application,
which requires less than 100% uptime, can reside on a high availability
clustered system.
Since all of a fault-tolerant system's components, including the operating
system and application software, are redundant and must be kept running
99.999% of the time, fault-tolerant systems have traditionally been proprietary
and expensive solutions. They offer greater availability than high availability
systems, at a much higher cost.
Compaq ProLiant Cluster Series S Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
1-10 Clustering Overview
What Is Scalability?
Scalability, as mentioned earlier in this guide, is one benefit of clustering.
Clustering for scalability means increasing performance beyond that of a single
computer node by adding more nodes to the cluster. Performance scalability
across cluster nodes is difficult to achieve and requires not only scalable
hardware, but also scalable software (for example, a parallel database).
Summary
In general, clusters can provide both high availability and scalability for
business-critical applications. In today's market, the vast majority of clusters
are employed to take advantage of the increase in availability.
The use of clusters to reduce computer system downtime has a direct impact on
a company's revenue and MIS department costs. The size of the impact
depends on your calculated costs of downtime. In most cases, the cost of
installing and maintaining a cluster will likely be more than offset by the
reduction in downtime costs. As businesses' reliance on computer systems
intensifies, the cost of downtime will increase.
The remainder of this guide covers the Compaq ProLiant Cluster Series S
Model 100.
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003
File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
2-1
Chapter 2
Architecture of the Compaq
ProLiant Cluster/S100
Compaq ProLiant Cluster platforms are comprised of a number of different
industry-standard, industry-leading Compaq hardware products. This chapter
discusses how each of these products plays a role in bringing a complete
clustering solution to your computing environment. These products are:
Compaq ProLiant Servers
s
Compaq ProLiant Storage Systems
s
Compaq SMART-2 Array Controllers
s
Compaq Recovery Server Option
s
Compaq Network/Interconnect Adapters
s
Additionally, this chapter describes the Compaq and Microsoft software
required to run a Compaq ProLiant Cluster/S100.
Compaq ProLiant Cluster S Series Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003
File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
2-2 Architecture of the Compaq ProLiant Cluster/S100
Compaq ProLiant Servers
A primary component of any cluster is a server, or cluster node. The initial
release of Microsoft Cluster Server (MSCS) supports a two-node cluster, where
each node is a server. Throughout the development of MSCS, Compaq has been
a partner with Microsoft to ensure that Compaq ProLiant servers meet
clustering requirements. Compaq has logged thousands of hours testing
ProLiant Clusters, and the Compaq ProLiant Cluster/S100 has successfully
passed Microsoft's Cluster Server Certification test suite. This rigorous suite of
tests ensures that the cluster works as a whole, not just as a collection of
individual components.
The following are Compaq ProLiant servers that have passed both Microsoft's
Cluster Server Certification and Compaq's stringent cluster testing.
Compaq ProLiant 850R
s
Compaq ProLiant 1500
s
Compaq ProLiant 1600
s
Compaq ProLiant 2500
s
Compaq ProLiant 3000
s
Compaq ProLiant 4500
s
Compaq ProLiant 5000
s
Compaq ProLiant 5500
s
Compaq ProLiant 6500
s
NOTE: Check the Compaq website at http://www.compaq.com to obtain the most
up-to-date list of cluster-certified servers.
In addition to the increased application and data availability enabled by
clustering, Compaq ProLiant Servers include many reliability features that
provide a solid foundation for effective clustered server solutions. (See Chapter
3, Table 3-2 for more details.)
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003
File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
2-3
Clustering Shared Storage
Microsoft Cluster Server is based on a cluster architecture known as Shared
Storage Clustering, where clustered servers share access to a common set of
hard drives. Microsoft Cluster Server requires all clustered (shared) data to be
stored in an external storage system.
Throughout this guide you will see references to the Compaq ProLiant
Cluster/S100 Storage System. When this term is used it is referring,
collectively, to all components that make up the Compaq ProLiant Cluster/S100
storage system. The Compaq ProLiant Cluster/S100 implementation of shared
storage relies on the following components: a Compaq ProLiant Storage
System, a Compaq SMART-2 Array Controller in each cluster node, and the
switching mechanism of the Compaq Recovery Server Option installed in each
storage system.
Node1 Node2
7 7
6 6
5 5
4 4
3 3
2 2
1 1
ProLiant
Storage System
SMART-2 SMART-2
Figure 2-1. Compaq ProLiant Cluster/S100 Shared Storage Diagram with ProLiant
Storage System
Compaq ProLiant Cluster S Series Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003
File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
2-4 Architecture of the Compaq ProLiant Cluster/S100
A delineation needs to be made between shared storage utilized by the Compaq
ProLiant Cluster/S100 and shared storage defined in Microsoft Cluster Server
documentation. Cluster Server relies on an ability to change the access path to
physical storage units to obtain the shared characteristic required for clustering.
In some products, the physical hardware enables Cluster Server to share storage
at the physical disk level. The access to each physical disk can be shared among
the cluster nodes. This sharing allows one cluster node to access data on one
physical disk while the other cluster node accesses data on a different disk in
the same storage system.
Compaq ProLiant Cluster/S100 enables Cluster Server to share storage at the
physical storage system level, not at a disk level. All disks attached to a
SMART-2 Controller operate as a group. This sharing allows one cluster node
to access all disks in a single storage system; the other cluster node does not
have access to any of the disks until Cluster Server performs a failover of this
storage resource. The possible affect this action may have on the definition of
your cluster groups is discussed in Chapter 4.
Compaq ProLiant Storage Systems
The ProLiant Storage System houses the SCSI disk drives. In a Compaq
ProLiant Cluster/S100 configuration, you are required to have at least one
Compaq ProLiant Storage System to create the cluster's shared storage. Refer
to the section "Setting Up the Storage System" in Chapter 6 for information on
configuring multiple external storage systems.
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003
File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
2-5
The following Compaq ProLiant Storage Systems are supported by the Compaq
ProLiant Cluster/S100:
Table 2-1
Supported ProLiant Storage Systems
Part Number Chassis Type SCSI Interface Required
Type Recovery
Server Option
Part Number
197100 (U.S.) Tower Fast-SCSI-2 213817
197150 (International)
163750 (U.S.) Rack-Mountable Fast-SCSI-2 213817
163755 (International)
189600 (U.S.) Tower Fast-Wide SCSI-2 213817
189640 (International)
189900 (U.S.) Rack-Mountable Fast-Wide SCSI-2 213817
189905 (International)
272900 (U.S.) Tower /F1 Fast-Wide SCSI-2 272829
272904 (International)
272800 (U.S.) Rack-Mountable/F1 Fast-Wide SCSI-2 272829
272804 (International)
304110 (U.S.) Tower/U1 Fast-Wide SCSI-2 304117
304114 (International)
304100 (U.S.) Rack-Mountable/U1 Fast-Wide SCSI-2 304117
304104 (International)
IMPORTANT: The original Compaq ProLiant Storage Systems (U.S.) Part Number
146700 and International Part Number 146750 are not supported by the Recovery
Server Option due to the fact that these systems do not have a knock-out panel in
which to install the SCSI connector bracket assembly. Also, neither the tower nor
rack-mount ProLiant Storage System /F2 or /U2 models are supported by Recovery
Server and therefore are not support by the Compaq ProLiant Cluster/S100.
For detailed information, refer to the Compaq User Guide for the ProLiant
Storage System you are employing in your Compaq ProLiant Cluster/S100
Cluster.
Compaq ProLiant Cluster S Series Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003
File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
2-6 Architecture of the Compaq ProLiant Cluster/S100
Compaq SMART-2 Array Controllers
The SMART-2 Array Controllers are the interface between the cluster node and
the ProLiant Storage System. At least two SMART-2 Array Controllers, one for
each cluster node, are required in your Compaq ProLiant /S100 Cluster. Each
controller will have a standard SCSI cable run from itself to a port in the
ProLiant Storage System.
In contrast to the SCSI cabling discussed in the Microsoft Cluster
Administrator's Guide, Compaq ProLiant Cluster/S100 uses standard SCSI
cables. Y-Cables and/or TriLink connectors are not required or supported.
The Compaq ProLiant Cluster/S100 requires that any two SMART-2 Array
Controllers sharing a ProLiant Storage System must be of the same model. For
example, a SMART-2SL in Node1 cannot be paired with a SMART-2DH in
Node2; it must be paired with a SMART-2SL in Node2.
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003
File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
2-7
Node1 Node2
7 7
6 6
5 5
4 4
3 3
2 2
1 1
ProLiant
Storage System
SMART-2DH SMART-2DH
Figure 2-2. Sample Compaq ProLiant Cluster/S100 Storage Diagram
SMART-2 Array Controllers contain the RAID technology used to protect the
data on your clustered disk drives. Each SMART-2 Array Controller supports
RAID 1 and 5 fault tolerant options. Some support the RAID 4 fault tolerant
option. RAID levels for the two SMART-2 Array Controllers sharing a
ProLiant Storage System must be configured identically.
There is much more information about SMART-2 Array Controllers in the
SMART-2 Controller Reference Guides for each of the controller models. If you
have not already read this document, it is recommended that you familiarize
yourself with all the features and benefits of SMART-2 by reading the
reference guide for your controller model.
Compaq ProLiant Cluster S Series Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003
File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
2-8 Architecture of the Compaq ProLiant Cluster/S100
The SMART-2 Array Controllers supported by the Compaq ProLiant
Cluster/S100 are listed below. Be sure to consult the QuickSpecs for your
ProLiant Servers to ensure you select a SMART-2 Array Controller that is
supported by your cluster node.
Compaq SMART-2/E
s
Compaq SMART-2/P
s
Compaq SMART-2DH
s
Compaq SMART-2SL
s
For additional information, refer to the Compaq SMART-2 Installation Guide
and the SMART-2 Reference Guide for the SMART-2 Array Controllers you are
employing.
Compaq Recovery Server Option
The Recovery Server Option provides the mechanism to switch access to the
ProLiant Storage System from one cluster node to another. The primary
component needed to achieve this mechanism is a Recovery Server Switch that
must be placed in the ProLiant Storage System.
Several other components are used and are explained in detail in the Recovery
Server Option User Guide. You will need to be very familiar with the hardware
aspects of the Recovery Server Option. Since installation of each component is
not detailed in this book you are referred to the Recovery Server Option User
Guide in Chapter 6 of this User Guide. Although you will need to follow the
board installation procedures in the Recovery Server Option User Guide, do
NOT follow any of the cabling procedures or software installation procedures.
Cluster Server is used as the cluster management software, so none of the
Recovery Server Software and none of the Recovery Server design rules apply
to the Compaq ProLiant Cluster/S100.
IMPORTANT: Because you are using Microsoft Cluster Server, you should not
follow the software installation steps shown in the Recovery Server Option User
Guide.
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003
File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED
2-9
The figure below depicts one of the supported ProLiant Storage System after
installation of a Recovery Server Option.
1 2 3 4
1 (Port)
SMART-2
Controller
2 (Port)
1 (Port)
SMART-2
Controller
2 (Port)
Figure 2-3. Fast-Wide SCSI-2 ProLiant Storage System after installation of
Recovery Server Option
On-Line Storage Controller Recovery Option
Several Compaq products use the Recovery Server Switch mentioned above.
Along with the Compaq ProLiant Cluster/S100, the Compaq On-Line Storage
Controller Recovery Option utilizes the switch.
The Compaq On-Line Storage Controller Recovery Option uses the switch as a
means to merge two SMART-2 Array Controllers into a redundant controller
pair. In such a pair, one controller is active, and the other remains in standby
mode. Should a problem occur with the active controller, the array controller
device driver switches traffic to the standby controller without loss of data or
interruption of service.
This Compaq product is noted here because configuring a Recovery Server
switch for use in the ProLiant Cluster/S100 and in the On-Line Storage
Controller Recovery Option is technically infeasible. By configuring a ProLiant
Storage System as a clustering shared storage, you exclude that storage system
from being used in conjunction with the On-Line Storage Controller Recovery
Option.
Compaq ProLiant Cluster S Series Model 100 User Guide
Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003
File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
184206-001
213812-001
22-87816-16
22-87816-82
340704-003
Page 1 - Page 2 - Page 3 - Page 4 - Page 5 - Page 6 -

3prime solutions for all your Digital requirements

     
 


HP is a registered trademark