Third Edition (July 1998) Part Number 340704-003 Compaq Computer Corporation Notice The information in this publication is subject to change without notice. COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. THIS INFORMATION IS PROVIDED "AS IS" AND COMPAQ COMPUTER CORPORATION DISCLAIMS ANY WARRANTIES, EXPRESS, IMPLIED OR STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, GOOD TITLE AND AGAINST INFRINGEMENT. This publication contains information protected by copyright. No part of this publication may be photocopied or reproduced in any form without prior written consent from Compaq Computer Corporation. 1998 Compaq Computer Corporation. All rights reserved. Printed in the U.S.A. The software described in this guide is furnished under a license agreement or nondisclosure agreement. The software may be used or copied only in accordance with the terms of the agreement. Compaq, Deskpro, Fastart, Compaq Insight Manager, Systempro, Systempro/LT, ProLiant, ROMPaq, QVision, SmartStart, NetFlex, QuickFind, PaqFax, ProSignia, registered United States Patent and Trademark Office. Netelligent, Systempro/XL, SoftPaq, QuickBlank, QuickLock are trademarks and/or service marks of Compaq Computer Corporation. Microsoft, MS-DOS, Windows, and Windows NT are registered trademarks of Microsoft Corporation. Other product names mentioned herein may be trademarks and/or registered trademarks of their respective companies. Compaq ProLiant Cluster Series S Model 100 Third Edition (July 1998) Part Number 340704-003 iii Contents About This Guide Audience ....................................................................................................................... ............... ix Scope .......................................................................................................................... ................. ix Additional Resources ........................................................................................................... ........ x Text Conventions ............................................................................................................... .......... x Symbols in Text........................................................................................................................... xi Getting Help ................................................................................................................... ............. xi Compaq Website ................................................................................................................. . xi Telephone Numbers ............................................................................................................ xi i Part I - Introduction to Compaq ProLiant Clusters Chapter 1 Clustering Overview Clusters Defined ............................................................................................................... ......... 1-2 Causes of Computer Downtime ................................................................................................ 1-3 Software Failures .............................................................................................................. . 1-3 Planned Service.................................................................................................................. 1-3 Hardware Failures .............................................................................................................. 1-4 Environmental Causes........................................................................................................ 1-5 Cost of Computer Downtime .................................................................................................... 1- 6 Productivity Loss .............................................................................................................. . 1-6 Cost of Servicing a Failed System ..................................................................................... 1-7 Lost Transactions .............................................................................................................. . 1-7 Customer and End User Dissatisfaction ............................................................................ 1-7 Availability Concepts................................................................................................................ 1-8 What Is High Availability? ................................................................................................ 1-8 What Is Scalability? ......................................................................................................... 1- 10 Summary ........................................................................................................................ ......... 1-10 Compaq ProLiant Cluster Series S Model 100 Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED iv Chapter 2 Architecture of the Compaq ProLiant Cluster/S100 Compaq ProLiant Servers ........................................................................................................ . 2-2 Clustering Shared Storage......................................................................................................... 2-3 Compaq ProLiant Storage Systems ................................................................................... 2-4 Compaq SMART-2 Array Controllers............................................................................... 2-6 Compaq Recovery Server Option ...................................................................................... 2-8 On-Line Storage Controller Recovery Option................................................................... 2-9 SCSI Disks ..................................................................................................................... .. 2-10 Shared Storage and Microsoft Cluster Server.................................................................. 2-10 Cluster Interconnect ........................................................................................................... ..... 2-11 Recovery Server Interconnect versus Compaq ProLiant Cluster/S100 Cluster Interconnect ................................................................................................................... ... 2-11 Interconnect Adapters ...................................................................................................... 2-12 Private vs. Public Interconnect......................................................................................... 2-12 Connecting the Interconnect Adapters............................................................................. 2-12 Increasing Availability of Intra-Cluster Communication ................................................ 2-14 Interconnect Bandwidth ................................................................................................... 2-15 Local Area Network ............................................................................................................. ... 2-15 Software Components ............................................................................................................ . 2-16 Microsoft Software .......................................................................................................... 2-1 6 Compaq Software............................................................................................................. 2-1 6 Application Software ....................................................................................................... 2-18 Chapter 3 Microsoft Cluster Server and Compaq ProLiant Cluster/S100 High Availability Storage.......................................................................................................... 3-2 Compaq Recovery Server Option ............................................................................................. 3-3 Terminology .................................................................................................................... .......... 3-4 Microsoft Cluster Server ....................................................................................................... .... 3-4 Cluster Group Concepts ..................................................................................................... 3-5 Cluster Failover/Failback Concepts .......................................................................................... 3-7 Failover ....................................................................................................................... ....... 3-7 Failback ....................................................................................................................... ..... 3-10 Clustering Applications and Services Concepts ..................................................................... 3-13 Cluster-Aware Applications............................................................................................. 3-14 Non-Cluster Aware Applications..................................................................................... 3-15 Cluster Aware Databases ................................................................................................. 3-17 Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED v Chapter 4 Designing Your Compaq ProLiant Cluster/S100 Planning Considerations............................................................................................................ 4-1 Cluster Configurations ....................................................................................................... 4- 2 Cluster Groups ................................................................................................................. .. 4-7 Configuring Applications and Services ........................................................................... 4-14 Reducing Single Points of Failure ................................................................................... 4-16 Part II - Clustering Planning and Installation Chapter 5 Capacity and Failover/Failback Planning Node Capacity.................................................................................................................... 5-3 Shared Storage Capacity .................................................................................................... 5-6 Networking Capacity ....................................................................................................... 5-10 Network Considerations.......................................................................................................... 5-11 Network Configuration .................................................................................................... 5-11 Migrating Network Clients .............................................................................................. 5-12 Failover/Failback Planning ..................................................................................................... 5-14 Performance After Failover ............................................................................................. 5-14 Cluster Server Thresholds and Periods ............................................................................ 5-15 Failover of Directly Connected Devices.......................................................................... 5-16 Manual vs. Automatic Failback ....................................................................................... 5-17 Failover and Failback Policies ......................................................................................... 5-18 Chapter 6 Setting Up Your Compaq ProLiant Cluster/S100 Installation Overview .......................................................................................................... ...... 6-2 Installing the Hardware ........................................................................................................ ..... 6-3 Verifying the Correct Level of Firmware .......................................................................... 6-3 Setting Up the Nodes ......................................................................................................... 6- 6 Server Interconnect Card ................................................................................................... 6-7 Setting up the Storage System ........................................................................................... 6-9 Installing the Software ........................................................................................................ .... 6-17 Prerequisites .................................................................................................................. ... 6-17 Software Installation Procedure ....................................................................................... 6-18 Uninstalling the Compaq ProLiant Cluster/S100 ............................................................ 6-25 Compaq ProLiant Cluster Series S Model 100 Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED vi Setting Up Your Compaq ProLiant Cluster/S100 continued Verifying the Cluster Installation............................................................................................ 6-27 Verifying Creation of the Cluster .................................................................................... 6-27 Verifying Node Failover .................................................................................................. 6-28 Verifying Network Client Failover .................................................................................. 6-29 Setting up Cluster Groups and Cluster Resources .................................................................. 6-31 Part III - Cluster Management Chapter 7 Managing Your Compaq ProLiant Cluster/S100 Cluster Management Concepts.................................................................................................. 7- 2 Managing a Cluster Without Interrupting Cluster Services............................................... 7-2 Managing a Cluster in a Degraded Condition.................................................................... 7-2 Managing Network Clients Connected to a Cluster .......................................................... 7-3 Remotely Managing a Cluster............................................................................................ 7-3 Cluster Events ................................................................................................................. ... 7-3 Uses of Microsoft Cluster Administrator.................................................................................. 7-4 Compaq Extensions to Cluster Administrator ................................................................... 7-4 Modifying Physical Cluster Resources ..................................................................................... 7-7 Removing Shared Storage.................................................................................................. 7-7 Adding A Shared Storage System...................................................................................... 7-7 SMART Array Expansion................................................................................................ 7-12 Physically Replacing a Cluster Node............................................................................... 7-13 Backing Up Your Cluster........................................................................................................ 7-16 Server-Based Backup ....................................................................................................... 7-16 LAN-Based Backup ......................................................................................................... 7-17 Failure During Backup..................................................................................................... 7-17 Managing Cluster Performance............................................................................................... 7-19 Chapter 8 Troubleshooting Your Compaq ProLiant Cluster/S100 Troubleshooting Installation Problems ..................................................................................... 8-2 You Receive the Error "RPC Server is Unavailable"........................................................ 8-2 Cluster Administrator Does Not Appear in the Start Menu............................................... 8-2 Node Performance is Sluggish and the Node Fails............................................................ 8-3 Cluster Server Installation Will Not Complete on the First Node..................................... 8-3 The Compaq ProLiant Cluster/S100 Resource(s) Cannot Be Brought Online.................. 8-4 Clients Do Not See the Cluster .......................................................................................... 8-5 Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED vii Troubleshooting Your Compaq ProLiant Cluster/S100 continued Troubleshooting Node-to-Node Problems ................................................................................ 8-5 The Resources Failed Over and the Nodes do not See Each Other................................... 8-5 The Second Node Cannot Join the Cluster ........................................................................ 8-5 Troubleshooting Shared Storage Problems............................................................................... 8-6 A Cluster Node is Not Available for This Operation ........................................................ 8-6 Troubleshooting Client-to-Cluster Connectivity Problems ...................................................... 8-7 Clients Do Not See the Cluster .......................................................................................... 8-7 Clients Do Not See Virtual Servers ................................................................................... 8-7 Clients Cannot Access Any Resources on a Cluster Node................................................ 8-8 Clients Cannot Access Cluster Resources ......................................................................... 8-8 Clients Cannot Access a Group That Has Failed Over...................................................... 8-9 Troubleshooting Cluster Group and Cluster Resource Problems ............................................. 8-9 Troubleshooting Other Potential Problems............................................................................. 8-10 An Application Starts but Cannot Be Closed .................................................................. 8-10 A Resource Hangs When Taken Offline.......................................................................... 8-10 An IP Address Added to a Cluster Group Fails............................................................... 8-10 A Resource Fails Over but Does Not Fail Back .............................................................. 8-11 Glossary Appendix A Cluster Configuration Worksheets Overview ....................................................................................................................... ........... A-1 Cluster Group Definition Worksheet ....................................................................................... A-2 Shared Storage Capacity Worksheet ........................................................................................ A-3 Group Failover/Failback Policy Worksheet............................................................................. A-4 Preinstallation Worksheet ...................................................................................................... .. A-5 Index Compaq ProLiant Cluster Series S Model 100 Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED ix About This Guide This User Guide provides information about the planning, installation, configuration, and implementation of Compaq ProLiant Clusters. Audience This guide contains information that may be used by network administrators, installation technicians, systems integrators, and other technical personnel in the enterprise environment for the purpose of cluster planning, installation, implementation, and maintenance. IMPORTANT: This User Guide contains installation, configuration, and maintenance information that can be valuable for a variety of users. If you are installing the ProLiant Cluster but will not be administering the cluster on a daily basis, please make this guide available for the person(s) who will be responsible for the clustered servers when you have completed the installation. Scope Because Windows NT-based clusters are relatively new, this guide offers significant background information about clusters as well as basic concepts associated with designing clusters. This guide assists you in attaining the following objectives: Understanding basic concepts of clustering technology s Recognizing and utilizing the high availability features of Compaq s ProLiant clusters Planning and designing your ProLiant Cluster configuration to meet s your business needs Installing and configuring your ProLiant Cluster hardware and software s Using Compaq Insight Manager and Microsoft Cluster Server to manage s your ProLiant Cluster Compaq ProLiant Cluster Series S Model 100 Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED x About This Guide Additional Resources For additional information, refer to documentation related to specific hardware and software components of your ProLiant Cluster, including, but not limited to, the following: Documentation related to the ProLiant servers you are clustering (for s example, manuals, posters, Performance and Tuning guides) Microsoft NT 4.0/Enterprise Edition Administrator's Guide s TechNotes and other documents available from the Compaq website s (http://www.compaq.com) Text Conventions This document uses the following conventions to distinguish elements of text: Keys appear in boldface. A plus sign (+) between two Keys keys indicates that they should be pressed simultaneously. USER INPUT User input appears in a different typeface and in uppercase. File names appear in uppercase italics. FILENAMES These appear in initial capital letters. Menu Options, Command Names, Dialog Box Names These always appear in uppercase. COMMANDS, DIRECTORY NAMES, and DRIVE NAMES Type When you are instructed to type information, type the information without pressing the Enter key. Enter When you are instructed to enter information, type the information and then press the Enter key. Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED xi Symbols in Text These symbols may be found in the text of this guide. They have the following meanings. WARNING: Indicates that failure to follow directions in the warning could ! result in bodily harm or loss of life. CAUTION: Indicates that failure to follow directions could result in damage to equipment or loss of information. IMPORTANT: Presents clarifying information or specific instructions. NOTE: Presents commentary, sidelights, or interesting points of information. Getting Help If you have a problem and have exhausted the information in this guide, you can get further information and other help in the following locations. Compaq Website The Compaq website has information on this product as well as the latest drivers and Flash ROM images. You can access the Compaq website by logging on to the Internet at http://www.compaq.com. Compaq ProLiant Cluster Series S Model 100 Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED xii About This Guide Telephone Numbers For the name of your nearest Compaq Authorized Reseller: In the United States, call 1-800-345-1518 In Canada, call 1-800-263-5868 For Compaq Technical Support: In the United States and Canada, call 1-800-386-2172 Elsewhere, call one of the numbers listed in the following table. Compaq Worldwide Technical Support Telephone Numbers Location Voice FAX APD 65-7503030 65-7504909 Argentina 54-1 313 3100 54-1 313 3100 Ext 21 Australia 61-2-9911-1955 61-2-9911-1900 Austria 0222-87816-16 0222-87816-82 Bahrain 973-210-214 Belgium (02) 716-96-96 (02) 725-22-13 Brazil 55 11 5505-3600 55 11 5505-3922 Ext 4336 Canada 1-800-386-2172 Caribbean 1-800-345-1518 Central America 281-378-2206 Chile 562-274-3007 China 86-10-834-6721 86-10-834-6713 Colombia 571-345-0266 571-312-0157 Czech Republic 42-2-232-8772 42-2-232-8773 Denmark 45-90-4545 45-90-4595 Ecuador 593-2504540 continued Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED xiii Compaq Worldwide Technical Support Telephone Numbers continued Location Voice FAX Europe/Middle East/Africa (49) 089-9933-2891 Finland 9800-206-720 90-6155-9899 (+358-800-1-206720) (+358-0-61559899 France (33 1) 41-33-4455 (33 1) 41-33-4263 Germany 0180-5-212111 089-9933-3399 Hong Kong 852-90116633 852-28671734 Hungary 36-1-201-8776 36-1-201-9696 India (91-80) 559-6023 Italy 392-57-90300 392-575-00686 Japan 0120-101589 +81 3-5402-5959 Korea 82-2-523-3575 82-2-3471-0321 Malaysia (603) 718-1636 Mexico (525) 229-7910 (525) 229-7988 Netherlands 06-91681616 06-8991116 New Zealand 649-307-3969 Norway 22-072-020 22-072-021 Poland 48-2-630-3535 48-2-630-3553 Portugal 351-1-4120132 351-1-4120654 Singapore 65-7503030 65-7504909 South Africa +27-11-728-6999 +27-11-728-3335 Spain 341-640-1302 341-640-0124 Sweden (46) 8 703 5240 (46) 8 703 5222 Switzerland 411 838 410/2222 01-837-0969 Taiwan (886) 2-3761170 (886) 2-7322660 Thailand 62-2-679-6222 62-2-679-6220 United Kingdom 44-81-332-3888 44-81-332-3409 United States 1-800-386-2172 1-800-345-1518 Venezuela (582) 953.69.44 (582) 952.86.70 Compaq ProLiant Cluster Series S Model 100 Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 Comments: 340704-003 File Name: A-FRNT.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED PART I Introduction to Compaq ProLiant Clusters Writer: Caroline Juszczak Project: ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: PART1.DOC Last Saved On: 6/30/98 1:58 PM 1-1 Chapter 1 Clustering Overview The computer industry has been using a wide range of solutions to counteract the effects of computer system downtime for several years. Previously, these solutions have been difficult to set up and expensive to maintain. Historically, only mission-critical applications, such as those controlling stock exchange trading floors and aerospace mission controls, were important enough to justify expensive, proprietary clustering solutions. As businesses have increased their reliance on computer systems in day-to-day operations, the amount of acceptable downtime has decreased. Today, another class of applications exists. They are business-critical applications: those that are key to business success but not significant enough to justify the high price of a proprietary clustering solution. More applications are becoming business- critical; their failure causes lost revenue, decreased productivity, and, potentially, customer dissatisfaction. Due to the increasing demand to keep business-critical applications available, clustering technology is entering mainstream, industry-standard computing. These new clustering solutions use industry-standard hardware and software, thereby providing key clustering features at a lower price than proprietary clustering systems. They also give you the opportunity to increase the usefulness and life span of software applications used. Before examining the features and benefits of Compaq ProLiant Clusters, it is helpful to understand the concepts and terminology of these traditional cluster systems. Concepts and terminology addressed in this chapter include: Clusters s Causes and costs of computer downtime s Availability s Compaq ProLiant Cluster Series S Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 1-2 Clustering Overview Clusters Defined Clustering is an integration of software and hardware technologies that enables a set of loosely coupled servers and storage to present a single image to clients and to operate as a single system. As a cluster, the group of servers offers a level of availability and scalability that far exceeds the level obtained if each cluster node operated as a standalone server. To end-users, this integration translates into increased performance and data availability. Cluster Shared Storage Node2 Node1 Interconnect LAN Clients Figure 1-1. Diagram of a Cluster Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 1-3 Causes of Computer Downtime Computer downtime is the period of time that a computer system cannot meet the requests of its users. Computer downtime is inversely related to availability, which characterizes the amount of time a computer system can meet the needs of its users. So what causes a lack of data and application availability, and thereby fosters the need for clustering? The following are the leading causes of downtime: Software failures s Planned service s Hardware failures s Environmental causes s Although the majority of failures occur in hardware, the majority of downtime is due to software failures. Software Failures The most prominent software failure that affects ongoing operation is a hang condition brought on by a processing error in application software or in the operating system. Because clustering provides a mechanism to automatically failover processes when a discernible software failure occurs, the overall system operation can continue with minimal or no interruption. Planned Service All computer systems require downtime for service. A typical service event might include the upgrade of a hardware component or replacement of an old or broken hardware component. Service events are also used to install new software, upgrade existing software, patch software with vendor-supplied fixes, or even to modify application or operating system settings. In a cluster, a single server (cluster node) can be taken offline, while another server (the partner cluster node) takes on the workload of the offline server. This configuration allows planned service to occur with minimal or no interruption to the client's use of cluster-aware applications and data. Compaq ProLiant Cluster Series S Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 1-4 Clustering Overview Hardware Failures The main causes of hardware problems are: Disk drive failures s Computer system power supply failures s Cooling fan failures s Memory and bus errors s Adapter and controller card errors s The figure below shows that disk drive failures and power outages cause the vast majority of hardware downtime. Causes of Downtime I/C Cards Memory 4% Fans 5% Disk Drives 8% 55% Power Outages 28% Figure 1-2. Causes of Hardware Downtime Clustering provides a mechanism to detect and analyze hardware errors. If it is determined that the error will result in computer system downtime, the cluster fails over cluster-aware applications from one server to another, allowing the overall system operation to continue with little or no interruption. Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 1-5 Technologies other than clustering can also alleviate the effects of a hardware failure. For example, use of Redundant Array of Inexpensive Disks (RAID) levels 1, 4, or 5 minimizes the impact drive failures have on the system. Use of Uninterruptable Power Supplies (UPSs) gives system administrators time to cleanly shut down the system or to find an alternate power supply when power outages occur. Use of redundant network interface controllers (NICs) allows sustainable network traffic even when one of the controllers experiences a failure. Still, clustering addresses more than just hardware failures and therefore is an important addition to any business-critical computer system. Clustering provides maximum protection against operating system, application, and hardware failures. Environmental Causes Some computer downtime is a result of environmental causes. Examples of environmental causes are: Excessive humidity s Water damage s Extreme temperatures (high or low) s Physical damage s Power interruptions originating from outside power lines s Rodent attack s Dust/airborne particles s Vandalism s Natural disasters s If the computer system is contained in a "computer room" environment, the likelihood of encountering an environmental problem is minimal. Still, if one does occur and is detectable by the cluster management software, proper actions can be taken by the cluster to ensure continued operation. Compaq ProLiant Cluster Series S Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 1-6 Clustering Overview Cost of Computer Downtime Now that the causes of downtime are known, the next questions are: "How is my computer system affected by these causes?" and "What is the cost of downtime?" Several factors must be included in the cost of downtime formula: Productivity loss s Cost of servicing the failed system s Lost transactions s Customer or end user dissatisfaction s Each factor is weighted differently, depending on the critical nature of each as it pertains to your business and to your specific application systems. For example, downtime during peak hours of a point-of-sale operation would have a much greater impact on customer satisfaction issues than downtime in an end- of-day email server backup operation. To understand the true cost of computer downtime in your business environment, you must examine each of the following factors as they apply to your business-critical applications. Productivity Loss To calculate costs associated with the loss of productivity during system downtime: 1. Determine the average hourly rate of the employees using the system. 2. Multiply the average hourly rate by the number of employees who are unable to perform their work. 3. Multiply again by the number of hours the system is down. Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 1-7 Cost of Servicing a Failed System Service technicians and system administrators are usually required to repair a failed system. 1. Determine, on average, how much it costs per hour to have these people repair the system. 2. Multiply the average hourly rate by the number of technicians and system administrators working to solve the problem. 3. Multiply again by the number of hours the system is down. Lost Transactions While the system is up, it is performing transactions. These may be payroll calculations for the HR department, sales transactions at a video rental store, or ATM requests from bank customers. When the system is down, no transactions are being performed. To calculate the cost of lost transactions: 1. Determine what business transactions are performed by this computer system. 2. Apply an estimate of lost revenue per hour to these transactions. 3. Multiply the estimate of lost revenue by the number of hours the system is down. Customer and End User Dissatisfaction Computer downtime causes varying levels of customer and end user dissatisfaction. While dissatisfaction can be difficult to express in specific dollar amounts, it is important to understand its effects on the financial aspects of your business. If it is unreasonable in your business environment to assign a specific cost to customer and end user dissatisfaction as a result of computer downtime, at least be aware that these "hidden costs" exist. Compaq ProLiant Cluster Series S Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 1-8 Clustering Overview Availability Concepts The previous section discussed the causes and costs of downtime and the reasoning for increased availability. The next section discusses clustering concepts that minimize the effects of downtime. What Is High Availability? Simply defined, availability is the measure of how well a computer system can continuously deliver services to clients. This measure is dependent upon the system's ability to prevent and recover from failures or "faults." There are different classes of availability defined by a system's critical application requirements: Applications that require 100% uptime, where a failing component or s subsystem never interrupts the system's operation. For example, the applications used on the stock exchange trading floor or mission control in the aerospace industry are termed mission-critical. Applications that can tolerate minimal interruption. For example, s Electronic Fund Transfers (EFT) in the banking industry are termed business-critical applications. The vast majority of applications fall into this business-critical category. Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 1-9 "High Availability" and "Fault Tolerant" are commonly used terms that describe these different classes of availability. Correct use of these terms is shown in the table below. Table 1-1 identifies the differences between the two in percentage of uptime. Table 1-1 Availability Definitions % Uptime Downtime Class Example 99.0 3.5 days/year Conventional A standalone Compaq ProLiant Server 99.9 8.5 hours/year High Availability Compaq ProLiant Cluster/S100 Himalaya by Tandem, A 99.999 5 minutes/year Fault Tolerant Compaq Company (also known as Non-Stop or Continuous Availability) NOTE: A distinction must be made between the availability of a standalone server versus that of a cluster. Availability of a standalone server includes only the availability of the server itself, not the operating system, applications, or network connections. Availability of a cluster includes not only server hardware availability, but also availability of the operating system, the client/server applications, and to some extent the network between the cluster and the client machines. To distinguish between a high availability and a fault-tolerant system in terms of design, a high-availability system includes many fault-tolerant features, whereas, all of a true fault-tolerant system's components are fault tolerant. A mission-critical application, which requires virtually 100% application system uptime, will require a true fault-tolerant system. A business-critical application, which requires less than 100% uptime, can reside on a high availability clustered system. Since all of a fault-tolerant system's components, including the operating system and application software, are redundant and must be kept running 99.999% of the time, fault-tolerant systems have traditionally been proprietary and expensive solutions. They offer greater availability than high availability systems, at a much higher cost. Compaq ProLiant Cluster Series S Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 1-10 Clustering Overview What Is Scalability? Scalability, as mentioned earlier in this guide, is one benefit of clustering. Clustering for scalability means increasing performance beyond that of a single computer node by adding more nodes to the cluster. Performance scalability across cluster nodes is difficult to achieve and requires not only scalable hardware, but also scalable software (for example, a parallel database). Summary In general, clusters can provide both high availability and scalability for business-critical applications. In today's market, the vast majority of clusters are employed to take advantage of the increase in availability. The use of clusters to reduce computer system downtime has a direct impact on a company's revenue and MIS department costs. The size of the impact depends on your calculated costs of downtime. In most cases, the cost of installing and maintaining a cluster will likely be more than offset by the reduction in downtime costs. As businesses' reliance on computer systems intensifies, the cost of downtime will increase. The remainder of this guide covers the Compaq ProLiant Cluster Series S Model 100. Writer: Caroline Juszczak Project: Compaq ProLiant Cluster Series S Model 100 User Guide Comments: 340704-003 File Name: B-CH01.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 2-1 Chapter 2 Architecture of the Compaq ProLiant Cluster/S100 Compaq ProLiant Cluster platforms are comprised of a number of different industry-standard, industry-leading Compaq hardware products. This chapter discusses how each of these products plays a role in bringing a complete clustering solution to your computing environment. These products are: Compaq ProLiant Servers s Compaq ProLiant Storage Systems s Compaq SMART-2 Array Controllers s Compaq Recovery Server Option s Compaq Network/Interconnect Adapters s Additionally, this chapter describes the Compaq and Microsoft software required to run a Compaq ProLiant Cluster/S100. Compaq ProLiant Cluster S Series Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003 File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 2-2 Architecture of the Compaq ProLiant Cluster/S100 Compaq ProLiant Servers A primary component of any cluster is a server, or cluster node. The initial release of Microsoft Cluster Server (MSCS) supports a two-node cluster, where each node is a server. Throughout the development of MSCS, Compaq has been a partner with Microsoft to ensure that Compaq ProLiant servers meet clustering requirements. Compaq has logged thousands of hours testing ProLiant Clusters, and the Compaq ProLiant Cluster/S100 has successfully passed Microsoft's Cluster Server Certification test suite. This rigorous suite of tests ensures that the cluster works as a whole, not just as a collection of individual components. The following are Compaq ProLiant servers that have passed both Microsoft's Cluster Server Certification and Compaq's stringent cluster testing. Compaq ProLiant 850R s Compaq ProLiant 1500 s Compaq ProLiant 1600 s Compaq ProLiant 2500 s Compaq ProLiant 3000 s Compaq ProLiant 4500 s Compaq ProLiant 5000 s Compaq ProLiant 5500 s Compaq ProLiant 6500 s NOTE: Check the Compaq website at http://www.compaq.com to obtain the most up-to-date list of cluster-certified servers. In addition to the increased application and data availability enabled by clustering, Compaq ProLiant Servers include many reliability features that provide a solid foundation for effective clustered server solutions. (See Chapter 3, Table 3-2 for more details.) Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003 File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 2-3 Clustering Shared Storage Microsoft Cluster Server is based on a cluster architecture known as Shared Storage Clustering, where clustered servers share access to a common set of hard drives. Microsoft Cluster Server requires all clustered (shared) data to be stored in an external storage system. Throughout this guide you will see references to the Compaq ProLiant Cluster/S100 Storage System. When this term is used it is referring, collectively, to all components that make up the Compaq ProLiant Cluster/S100 storage system. The Compaq ProLiant Cluster/S100 implementation of shared storage relies on the following components: a Compaq ProLiant Storage System, a Compaq SMART-2 Array Controller in each cluster node, and the switching mechanism of the Compaq Recovery Server Option installed in each storage system. Node1 Node2 7 7 6 6 5 5 4 4 3 3 2 2 1 1 ProLiant Storage System SMART-2 SMART-2 Figure 2-1. Compaq ProLiant Cluster/S100 Shared Storage Diagram with ProLiant Storage System Compaq ProLiant Cluster S Series Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003 File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 2-4 Architecture of the Compaq ProLiant Cluster/S100 A delineation needs to be made between shared storage utilized by the Compaq ProLiant Cluster/S100 and shared storage defined in Microsoft Cluster Server documentation. Cluster Server relies on an ability to change the access path to physical storage units to obtain the shared characteristic required for clustering. In some products, the physical hardware enables Cluster Server to share storage at the physical disk level. The access to each physical disk can be shared among the cluster nodes. This sharing allows one cluster node to access data on one physical disk while the other cluster node accesses data on a different disk in the same storage system. Compaq ProLiant Cluster/S100 enables Cluster Server to share storage at the physical storage system level, not at a disk level. All disks attached to a SMART-2 Controller operate as a group. This sharing allows one cluster node to access all disks in a single storage system; the other cluster node does not have access to any of the disks until Cluster Server performs a failover of this storage resource. The possible affect this action may have on the definition of your cluster groups is discussed in Chapter 4. Compaq ProLiant Storage Systems The ProLiant Storage System houses the SCSI disk drives. In a Compaq ProLiant Cluster/S100 configuration, you are required to have at least one Compaq ProLiant Storage System to create the cluster's shared storage. Refer to the section "Setting Up the Storage System" in Chapter 6 for information on configuring multiple external storage systems. Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003 File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 2-5 The following Compaq ProLiant Storage Systems are supported by the Compaq ProLiant Cluster/S100: Table 2-1 Supported ProLiant Storage Systems Part Number Chassis Type SCSI Interface Required Type Recovery Server Option Part Number 197100 (U.S.) Tower Fast-SCSI-2 213817 197150 (International) 163750 (U.S.) Rack-Mountable Fast-SCSI-2 213817 163755 (International) 189600 (U.S.) Tower Fast-Wide SCSI-2 213817 189640 (International) 189900 (U.S.) Rack-Mountable Fast-Wide SCSI-2 213817 189905 (International) 272900 (U.S.) Tower /F1 Fast-Wide SCSI-2 272829 272904 (International) 272800 (U.S.) Rack-Mountable/F1 Fast-Wide SCSI-2 272829 272804 (International) 304110 (U.S.) Tower/U1 Fast-Wide SCSI-2 304117 304114 (International) 304100 (U.S.) Rack-Mountable/U1 Fast-Wide SCSI-2 304117 304104 (International) IMPORTANT: The original Compaq ProLiant Storage Systems (U.S.) Part Number 146700 and International Part Number 146750 are not supported by the Recovery Server Option due to the fact that these systems do not have a knock-out panel in which to install the SCSI connector bracket assembly. Also, neither the tower nor rack-mount ProLiant Storage System /F2 or /U2 models are supported by Recovery Server and therefore are not support by the Compaq ProLiant Cluster/S100. For detailed information, refer to the Compaq User Guide for the ProLiant Storage System you are employing in your Compaq ProLiant Cluster/S100 Cluster. Compaq ProLiant Cluster S Series Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003 File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 2-6 Architecture of the Compaq ProLiant Cluster/S100 Compaq SMART-2 Array Controllers The SMART-2 Array Controllers are the interface between the cluster node and the ProLiant Storage System. At least two SMART-2 Array Controllers, one for each cluster node, are required in your Compaq ProLiant /S100 Cluster. Each controller will have a standard SCSI cable run from itself to a port in the ProLiant Storage System. In contrast to the SCSI cabling discussed in the Microsoft Cluster Administrator's Guide, Compaq ProLiant Cluster/S100 uses standard SCSI cables. Y-Cables and/or TriLink connectors are not required or supported. The Compaq ProLiant Cluster/S100 requires that any two SMART-2 Array Controllers sharing a ProLiant Storage System must be of the same model. For example, a SMART-2SL in Node1 cannot be paired with a SMART-2DH in Node2; it must be paired with a SMART-2SL in Node2. Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003 File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 2-7 Node1 Node2 7 7 6 6 5 5 4 4 3 3 2 2 1 1 ProLiant Storage System SMART-2DH SMART-2DH Figure 2-2. Sample Compaq ProLiant Cluster/S100 Storage Diagram SMART-2 Array Controllers contain the RAID technology used to protect the data on your clustered disk drives. Each SMART-2 Array Controller supports RAID 1 and 5 fault tolerant options. Some support the RAID 4 fault tolerant option. RAID levels for the two SMART-2 Array Controllers sharing a ProLiant Storage System must be configured identically. There is much more information about SMART-2 Array Controllers in the SMART-2 Controller Reference Guides for each of the controller models. If you have not already read this document, it is recommended that you familiarize yourself with all the features and benefits of SMART-2 by reading the reference guide for your controller model. Compaq ProLiant Cluster S Series Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003 File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 2-8 Architecture of the Compaq ProLiant Cluster/S100 The SMART-2 Array Controllers supported by the Compaq ProLiant Cluster/S100 are listed below. Be sure to consult the QuickSpecs for your ProLiant Servers to ensure you select a SMART-2 Array Controller that is supported by your cluster node. Compaq SMART-2/E s Compaq SMART-2/P s Compaq SMART-2DH s Compaq SMART-2SL s For additional information, refer to the Compaq SMART-2 Installation Guide and the SMART-2 Reference Guide for the SMART-2 Array Controllers you are employing. Compaq Recovery Server Option The Recovery Server Option provides the mechanism to switch access to the ProLiant Storage System from one cluster node to another. The primary component needed to achieve this mechanism is a Recovery Server Switch that must be placed in the ProLiant Storage System. Several other components are used and are explained in detail in the Recovery Server Option User Guide. You will need to be very familiar with the hardware aspects of the Recovery Server Option. Since installation of each component is not detailed in this book you are referred to the Recovery Server Option User Guide in Chapter 6 of this User Guide. Although you will need to follow the board installation procedures in the Recovery Server Option User Guide, do NOT follow any of the cabling procedures or software installation procedures. Cluster Server is used as the cluster management software, so none of the Recovery Server Software and none of the Recovery Server design rules apply to the Compaq ProLiant Cluster/S100. IMPORTANT: Because you are using Microsoft Cluster Server, you should not follow the software installation steps shown in the Recovery Server Option User Guide. Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003 File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM COMPAQ CONFIDENTIAL - NEED TO KNOW REQUIRED 2-9 The figure below depicts one of the supported ProLiant Storage System after installation of a Recovery Server Option. 1 2 3 4 1 (Port) SMART-2 Controller 2 (Port) 1 (Port) SMART-2 Controller 2 (Port) Figure 2-3. Fast-Wide SCSI-2 ProLiant Storage System after installation of Recovery Server Option On-Line Storage Controller Recovery Option Several Compaq products use the Recovery Server Switch mentioned above. Along with the Compaq ProLiant Cluster/S100, the Compaq On-Line Storage Controller Recovery Option utilizes the switch. The Compaq On-Line Storage Controller Recovery Option uses the switch as a means to merge two SMART-2 Array Controllers into a redundant controller pair. In such a pair, one controller is active, and the other remains in standby mode. Should a problem occur with the active controller, the array controller device driver switches traffic to the standby controller without loss of data or interruption of service. This Compaq product is noted here because configuring a Recovery Server switch for use in the ProLiant Cluster/S100 and in the On-Line Storage Controller Recovery Option is technically infeasible. By configuring a ProLiant Storage System as a clustering shared storage, you exclude that storage system from being used in conjunction with the On-Line Storage Controller Recovery Option. Compaq ProLiant Cluster S Series Model 100 User Guide Writer: Caroline Juszczak Project: Compaq ProLiant Cluster S Series Model 100 User Guide Comments: 340704-003 File Name: C-CH02.DOC Last Saved On: 7/1/98 3:21 PM
| 184206-001 213812-001 22-87816-16 22-87816-82 340704-003 |