Second Edition (January 2000) Part Number 128360-002 Compaq Computer Corporation Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM Notice The information in this publication is subject to change without notice. COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. THIS INFORMATION IS PROVIDED "AS IS" AND COMPAQ COMPUTER CORPORATION DISCLAIMS ANY WARRANTIES, EXPRESS, IMPLIED OR STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, GOOD TITLE AND AGAINST INFRINGEMENT. This publication contains information protected by copyright. No part of this publication may be photocopied or reproduced in any form without prior written consent from Compaq Computer Corporation. 2000 Compaq Computer Corporation. All rights reserved. Printed in the U.S.A. The software described in this guide is furnished under a license agreement or nondisclosure agreement. The software may be used or copied only in accordance with the terms of the agreement. Compaq, Compaq Insight Manager, ProLiant, ROMPaq, SmartStart, registered United States Patent and Trademark Office. Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation. Other product names mentioned herein may be trademarks and/or registered trademarks of their respective companies. Compaq ProLiant ML350 Troubleshooting Guide Second Edition (January 2000) Part Number 128360-002 Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM Contents About This Guide Text Conventions........................................................................................................ix Symbols in Text...........................................................................................................x Symbols on Equipment...............................................................................................xi Rack Stability ............................................................................................................xii Warning Information ............................................................................................... xiii Additional Resources................................................................................................xiv Telephone Numbers...........................................................................................xiv Other Information Resources..............................................................................xv Chapter 1 Server Startup Errors Minimum Hardware Configuration ......................................................................... 1-1 When the Server Will Not Start ............................................................................... 1-2 Diagnosis Steps........................................................................................................ 1-3 Chapter 2 Server Operation Errors Operating System Support ....................................................................................... 2-2 Server Installation Problems.................................................................................... 2-3 Chapter 3 Status Indicators System Status LED Indicators and Power On/Standby Switch ............................... 3-2 Hot-Plug Hard Drive LED Indicators ...................................................................... 3-4 Hot-Plug Drive Replacement Guidelines ......................................................... 3-6 Unsafe Hot-Plug Drive Replacement Precautions............................................ 3-7 Predictive Failure Alert .................................................................................... 3-8 Network Controller Status LED Indicators............................................................ 3-10 Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM iv Compaq ProLiant ML350 Troubleshooting Guide Chapter 4 Switch Settings System Configuration Switch (SW1) Settings ......................................................... 4-2 Clearing and Resetting System Configuration Settings .................................... 4-3 Enabling ROMPaq Disaster Recovery Mode ................................................... 4-3 Server Feature Board Switch (SW3) Settings .......................................................... 4-4 Appendix A Array Diagnostic Utility Array Diagnostic Utility (ADU) ............................................................................. A-1 Starting the Array Diagnostic Utility (ADU) .......................................................... A-2 Array Diagnostic Utility (ADU) Error Messages.................................................... A-3 Appendix B POST Error Messages Index List of Figures Figure 3-1. Power On/Standby switch, and system status
LED indicators ................................................................................................... 3-3 Figure 3-2. Hot-plug hard drive LED indicators..................................................... 3-4 Figure 3-3. Network controller status LED indicators .......................................... 3-10 Figure 4-1. System configuration switch (SW1) settings........................................ 4-2 Figure 4-2. Server Feature Board switch (SW3) settings........................................ 4-4 Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM Contents v List of Tables Organization ..............................................................................................................vii ProLiant ML350 Resources ....................................................................................... xv Table 1-1 Minimum Hardware Configuration ......................................................... 1-1 Table 1-2 Diagnosis Steps ....................................................................................... 1-3 Table 1-3 The Front Panel Power On/Standby LED Is Not On............................... 1-4 Table 1-4 Server Does Not Have Video .................................................................. 1-5 Table 1-5 Audible Power-On Self-Test Error Messages ......................................... 1-6 Table 2-1 Supported Operating Systems ................................................................. 2-2 Table 2-2 Installation Problems ............................................................................... 2-3 Table 3-1 System Status LED Indicators................................................................. 3-3 Table 3-2 Hot-Plug Hard Drive LED Indicator Status Combinations ..................... 3-5 Table 3-3 Network Controller Status LED Indicators ........................................... 3-10 Table 4-1 System Configuration Switch (SW1) Settings ........................................ 4-2 Table 4-2 Server Feature Board Switch (SW3) Settings ......................................... 4-4 Table A-1 Array Diagnostic Utility (ADU) Error Messages ...................................A-3 Table B-1 POST Error Messages.............................................................................B-2 Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/29/99 10:10 AM About This Guide This troubleshooting guide provides specific information to troubleshoot your ProLiant ML350 server. Use this guide to find details about server startup problems, switch settings, status indicators, Power-On Self-Test messages, Array Diagnostic Utility messages, and more. For information about general troubleshooting techniques, diagnostic tools, and preventative maintenance, refer to the Compaq Servers Troubleshooting Guide, also included in your user documentation. WARNING: To reduce the risk of personal injury or damage to the equipment, refer to the user documentation supplied with the server and observe the appropriate safety precautions. Organization What will I find? Where do I find it? Compliance with the minimum hardware configuration for your Chapter 1 ProLiant ML350 server is required to power up the server and run an operating system. This is often a good place to start diagnosis after you have changed your hardware configuration. You are provided with step-by-step instructions on what to try Chapter 1 and where to go for help, for the most common problems encountered during the initial Power-On Self-Test. This test must be completed each time you power up your server, before the server can load the operating system and start running software applications. continued Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM viii Compaq ProLiant ML350 Troubleshooting Guide Organization continued What will I find? Where do I find it? It is possible to encounter errors when using an operating system Chapter 2 that is not compatible with your server. Check Table 2-1, Supported Operating Systems, for the name and version of each operating system supported for your server. Once your server has passed the Power-On Self-Test, you may Chapter 2 still encounter errors, such as an inability to load your operating system. You are provided with instructions on what to try, and where to go for help, when you encounter errors after completion of the Power-On Self-Test. There are several status LEDs located on the front and back of Chapter 3 your server. These LEDs can communicate the current status of varying aspects of your server's components and operations, thus aiding you in diagnosing your problem. You are provided with an illustration of the location of each LED on your server, as well as an explanation of uses and possible statuses. There are 2 switch banks in your server. These may need to be Chapter 4 changed from time to time, and can cause problems if they are not set correctly. You are provided with a list of all switches, a description of what each setting means, and an illustration of where each may be found inside your server. The Array Diagnostic Utility collects information on all the array Appendix A controllers in the system, and generates a list of detected problems in the form of error messages. Each message is listed alphabetically, with a corresponding explanation and list of corrective measures to take. The Power-On Self-Test error messages are generated each time Appendix B the server is powered up. Each message is listed numerically, with a corresponding explanation and list of corrective measures to take. For further troubleshooting information, both general and specific to ProLiant ML350 servers, see "Additional Resources," later in this section. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM About This Guide ix Text Conventions This document uses the following conventions to distinguish elements of text: Keys appear in boldface. A plus sign (+) between Keys two keys indicates that they should be pressed simultaneously. USER INPUT User input appears in a different typeface and in uppercase. FILENAMES File names appear in uppercase italics. Menu Options, These elements appear in initial capital letters. Command Names, Dialog Box Names COMMANDS, These elements appear in uppercase. DIRECTORY NAMES, and DRIVE NAMES When you are instructed to type information, type Type the information without pressing the Enter key. without Enter When you are instructed to enter information, type Enter the information and then press the Enter key. Enter Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM x Compaq ProLiant ML350 Troubleshooting Guide Symbols in Text These symbols may be found in the text of this guide. They have the following meanings. WARNING: Text set off in this manner indicates that failure to follow directions in the warning could result in bodily harm or loss of life. CAUTION: Text set off in this manner indicates that failure to follow directions could result in damage to equipment or loss of information. IMPORTANT: Text set off in this manner presents clarifying information or specific instructions. NOTE: Text set off in this manner presents commentary, sidelights, or interesting points of information. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM About This Guide xi Symbols on Equipment These icons may be located on equipment in areas where hazardous conditions may exist. Any surface or area of the equipment marked with these symbols indicates the presence of electrical shock hazards. Enclosed area contains no operator serviceable parts. WARNING: To reduce the risk of injury from electrical shock hazards, do not open this enclosure. Any RJ-45 receptacle marked with these symbols indicates a Network Interface Connection. WARNING: To reduce the risk of electrical shock, fire, or damage to the equipment, do not plug telephone or telecommunications connectors into this receptacle. Any surface or area of the equipment marked with these symbols indicates the presence of a hot surface or hot component. If this surface is contacted, the potential for injury exists. WARNING: To reduce the risk of injury from a hot component, allow the surface to cool before touching. Power Supplies or Systems marked with these symbols indicate the equipment is supplied by multiple sources of power. WARNING: To reduce the risk of injury from electrical shock, remove all power cords to completely disconnect power from the system. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM xii Compaq ProLiant ML350 Troubleshooting Guide WARNING: Any product or assembly marked with these symbols indicates that the component exceeds the recommended weight for one individual to handle safely. XX kg XX lb WARNING: To reduce the risk of personal injury or damage to the equipment, observe local occupational health and safety requirements and guidelines for manual material handling. Rack Stability WARNING: To reduce the risk of personal injury or damage to the equipment, be sure that: s The leveling jacks are extended to the floor. s The full weight of the rack rests on the leveling jacks. s The stabilizing feet are attached to the rack if it is a single rack installation. s The racks are coupled together in multiple rack installations. s Only one component is extended at a time. A rack may become unstable if more than one component is extended for any reason. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM About This Guide xiii Warning Information For a complete list of warnings associated with your server, refer to the user documentation provided with your server. WARNING: To reduce the risk of personal injury from hot surfaces, allow the internal system components to cool before touching. WARNING: This product is very heavy. To reduce the risk of personal injury or damage to the equipment: s Remove all pluggable power supplies and modules to reduce 25-30 kg the weight of the product before lifting it. 55-65 lb s Observe local occupational health and safety requirements and guidelines for manual material handling. s Get help to lift and stabilize the product during installation or removal, especially when the product is not fastened to the rails. s When installing the product in or removing the product from the rack, the product will be unstable when not fastened to the rails. WARNING: To reduce the risk of electric shock or damage to the equipment: s Do not disable the power cord grounding plug. The grounding plug is an important safety feature. s Plug the power cord into a grounded (earthed) electrical outlet that is easily accessible at all times. s Install the power supply before connecting the power cord to the power supply. s Unplug the power cord before removing the power supply from the server. s If the system has multiple power supplies, disconnect power from the system by unplugging all power cords from the power supplies. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM xiv Compaq ProLiant ML350 Troubleshooting Guide Additional Resources If you have a problem and have exhausted the information in this guide, you can obtain further information and help. Telephone Numbers In the United States and Canada, call the Compaq Technical Support Center at 1-800-OK-COMPAQ (1-800-652-6672), where a technical support specialist will help you diagnose the problem. For continuous quality improvement, calls may be recorded or monitored. For the latest worldwide telephone numbers, refer to the Compaq website: http://www.compaq.com For the name of your nearest Compaq authorized reseller: In the United States, call 1-800-345-1518 s In Canada, call 1-800-263-5868 s NOTE: For additional resources outside the United States and Canada, contact your Compaq authorized service provider. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM About This Guide xv Other Information Resources Refer to the following additional information for help. ProLiant ML350 Resources Resource What it is Compaq ProLiant ML350 This book provides step-by-step instructions on the setup and Setup and Installation Guide installation of your server. Compaq Servers This is a resource for obtaining troubleshooting information that is Troubleshooting Guide beyond the scope of this document, including general hardware and software troubleshooting information for all Compaq ProLiant servers. Compaq ProLiant ML350 This resource provides a complete list of all replacement parts Maintenance and Service Guide available, along with step-by-step instructions on installation and replacement. Find this book on the Compaq website: http://www.compaq.com/support/servers/ Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: a-frnt.doc Last Saved On: 12/7/99 10:33 AM 1 Chapter Server Startup Errors This chapter provides step-by-step instructions on what to try and where to go for help, for the most common problems encountered during the initial Power-On Self-Test. Remember, this test must be completed each time you power up your server, before the server can load the operating system and start running software applications. Minimum Hardware Configuration Before beginning, make sure your server meets the requirements for minimum hardware configuration. During the troubleshooting process, it may be necessary to reduce your system to its minimum configuration, replacing each option one at a time to determine the cause of failure. Table 1-1 Minimum Hardware Configuration Component Minimum Specification Processors One processor with a corresponding Processor Power Module (PPM) must be installed in processor slot 1, and a processor terminator card in processor slot 2. Fan A fan must be installed and connected to the system board. Memory At least one slot must be populated with a 64-MB (ECC 133-MHz registered) or higher SDRAM DIMM module. Server Feature Board The Server Feature Board is installed in PCI slot 1. The Server Management Information Cable (SMIC) must be connected. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: b-ch1 Server Startup Errors.doc Last Saved On: 12/3/99 3:24 PM 1-2 Compaq ProLiant ML350 Troubleshooting Guide When the Server Will Not Start Follow the sequence of steps below when the server will not start: 1. Verify the computer and monitor are plugged into a working outlet. 2. Make sure your server meets the minimum hardware requirements. See the section "Minimum Hardware Configuration" in this chapter. 3. Make sure your power source is working properly. Refer to the "Power Source" section in Chapter 2 of the Compaq Servers Troubleshooting Guide. 4. If the system does not complete the Power-On Self-Test (POST) or start loading an operating system, refer to "General Loose Connections" in Chapter 2 of the Compaq Servers Troubleshooting Guide. NOTE: If the server is power cycling, verify the system is not rebooting due to an Automatic Server Recovery boot caused by another problem. Check Compaq Insight Manager for notification of this event. Refer to chapters 4 and 6 in the Compaq Servers Troubleshooting Guide for more information. 5. Continue with the following section to decide your next course of action. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: b-ch1 Server Startup Errors.doc Last Saved On: 12/3/99 3:24 PM Server Startup Errors 1-3 Diagnosis Steps If your server does not power up, or powers up but does not complete the Power-On Self-Test (POST), answer the questions in this table to determine appropriate actions for the symptoms observed. Based on the answers you give, you will be directed to the appropriate table in this section for further instructions. The following table outlines possible reasons for the problem, options available to assist in diagnosis, possible solutions available, and references to other sources of information. Table 1-2 Diagnosis Steps Question The Next Step Question 1: Is the front panel If no, go to Table 1-3. Power On/Standby LED on? If yes, continue to Question 2. Question 2: Do you have video? If no, go to Table 1-4. If yes, video is available for diagnosis. Determine next action by observing Power-On Self-Test (POST) progress and error messages. See Appendix B for a complete description of each POST error message. NOTE: If your server attempts to load the operating system, continue to Chapter 2 of this guide. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: b-ch1 Server Startup Errors.doc Last Saved On: 12/3/99 3:24 PM 1-4 Compaq ProLiant ML350 Troubleshooting Guide Table 1-3 The Front Panel Power On/Standby LED Is Not On See Chapter 3 for a complete description of system status LEDs. Possible Reasons The Next Step 1. Be sure the bezel doors are fully closed. There is no AC power connection. 2. Depress the power switch. Bezel door is not closed. 3. Check the power cables. Make sure they are fully Power switch is not completely depressed. connected. Power switch connector cable is not connected 4. Check power source. Refer to "Power Problems" properly to the motherboard. in Chapter 2 of the Compaq Servers Processor or Processor Power Module (PPM) Troubleshooting Guide for further options. has failed, or is not seated properly. 5. Try reconnecting the power switch cable to the motherboard. Refer to the Compaq ProLiant ML350 Setup and Installation Guide for complete instructions. 6. Reseat all expansion boards, processors, memory modules and PPMs, and verify all cables are securely connected. Refer to "General Hardware Problems" in Chapter 2 of the Compaq Servers Troubleshooting Guide for tips on proper procedures. 7. If these steps do not correct the problem, the most likely cause lies in the power supply subsystem. Contact your Compaq authorized service provider for further technical support. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: b-ch1 Server Startup Errors.doc Last Saved On: 12/3/99 3:24 PM Server Startup Errors 1-5 Table 1-4 Server Does Not Have Video Possible Reasons The Next Step 1. Verify the monitor has power and the monitor Video may not be connected properly. cable is securely connected. If more than one Switches may not be set correctly on the Server video adapter is installed, make sure the monitor Feature Board (SW3). is connected to the correct video card. If an optional video card was installed, check 2. Verify the monitor is functional by attaching it to a the monitor cable. known working server. The monitor may be connected to the wrong 3. Verify the switch settings on the Server Feature video connector. Board are correctly set. See "Server Feature Board Switch (SW3) Settings" in Chapter 4. 4. Reseat all cards, processors, DIMMs and PPMs, and verify all cable connections. 5. Are there any audible indicators, such as a series of beeps? A series of beeps indicates the presence of a Power-On Self-Test (POST) error message. Table 1-5 lists the error messages most likely encountered upon reaching this stage of the diagnosis process. See Appendix B for a complete listing of possible POST error messages. 6. Refer to "Video Problems" in Chapter 2 of the Compaq Servers Troubleshooting Guide. 7. If these steps do not correct the problem, the most likely diagnosis lies in the system ROM or video subsystem. Contact your Compaq authorized service provider for further technical support. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: b-ch1 Server Startup Errors.doc Last Saved On: 12/3/99 3:24 PM 1-6 Compaq ProLiant ML350 Troubleshooting Guide Table 1-5 Audible Power-On Self-Test Error Messages Audible Alarm Possible Source Recommended Action of Problem 1 long, 2 short Missing or 1. Check Server Feature Board switch disabled video settings. See Chapter 4 for correct adapter settings. 2. Reseat Server Feature Board. 2 long Missing or failed 1. If no memory modules are present, system memory install at least one memory module, to conform to minimum hardware configuration specifications. 2. Reseat all installed memory modules. 3. If the system contains more than one memory module, remove one module, and then restart the server. Repeat as needed to isolate the failed memory module. 1 long, 3 short Corrupted system Refer to Chapter 5 of the Compaq Servers ROM Troubleshooting Guide for detailed instructions on the process used for ROMPaq disaster recovery. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: b-ch1 Server Startup Errors.doc Last Saved On: 12/3/99 3:24 PM 2 Chapter Server Operation Errors This chapter outlines the process of troubleshooting your server when problems occur after the initial power-up sequence and during server operation. NOTE: If the server is power cycling, verify the system is not restarting due to an Automatic Server Recovery power-up, caused by another problem. Check Compaq Insight Manager for notification of this event. Refer to chapters 4 and 6 in the Compaq Servers Troubleshooting Guide for more information. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: c-ch2 Server Operation Errors.doc Last Saved On: 12/3/99 3:32 PM 2-2 Compaq ProLiant ML350 Troubleshooting Guide Operating System Support In order to operate properly, your server must have a supported operating system. The following table lists the operating systems and version numbers supported by this server. Table 2-1 Supported Operating Systems Operating System Version(s) Supported Novell NetWare 5, 4.2, 3.2 Novell NetWare for Small Business 4.2 Novell NetWare Small Business Suite 5 4.0 Microsoft Windows NT Server 4.01 Microsoft Windows NT Server Terminal Edition When available Windows 2000 Server 4.5 Windows BackOffice Small Business Server SCO UNIXWare 7.1 SCO UNIX OpenServer 5.0.5 For updated information on supported operating systems for your specific platform, refer to the Compaq website: http://www.compaq.com Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: c-ch2 Server Operation Errors.doc Last Saved On: 12/3/99 3:32 PM Server Operation Errors 2-3 Server Installation Problems Table 2-2 Installation Problems Problem Possible Cause Possible Solution System Wrong version of Check the SmartStart Release and the SmartStart user cannot load SmartStart. documentation. SmartStart. CD is not set as a 1. Press F10 to run the BIOS Setup Utility. bootable device. 2. Set defaults and exit the utility. 3. Re-run this utility to select the primary operating system. Refer to Chapter 5 in the Compaq ProLiant ML350 Setup and Installation Guide for complete instructions on the use of this utility. IDE Cable is not Check the cable between the system board and CD-ROM to connected to CD-ROM. ensure proper connection. Diskette in disk drive is Remove the diskette from the diskette drive. preventing the system from loading. SmartStart Operating system has Press F10 to run the BIOS Setup Utility, and select the fails during not been selected. primary operating system. Refer to Chapter 5 in the installation. Compaq ProLiant ML350 Setup and Installation Guide for complete instructions. Error occurs during * Follow the error information provided. If it is necessary to installation. reinstall, run the Compaq System Erase Utility. Refer to Chapter 4 of the Compaq Servers Troubleshooting Guide. * CAUTION: The Compaq System Erase Utility will cause loss of all configuration information, as well as loss of existing data on all connected hard drives. Please read "Running the Compaq System Erase Utility" and the associated warning in Chapter 4 of the Compaq Servers Troubleshooting Guide, prior to performing this operation. continued Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: c-ch2 Server Operation Errors.doc Last Saved On: 12/3/99 3:32 PM 2-4 Compaq ProLiant ML350 Troubleshooting Guide Table 2-2 Installation Problems continued Problem Possible Cause Possible Solution Server cannot Required operating Follow these steps: load operating system step was 1. Note at which phase the operating system failed. system. missed. 2. Remove any loaded operating system components. 3. Refer to your operating system documentation. 4. Reinitiate installation procedures. Installation problem Refer to your operating system documentation and to the occurred. SmartStart Release Notes. Primary hard disk Press F10 to run the BIOS Setup Utility and correct this controller problem. Refer to Chapter 5 in the Compaq ProLiant ML350 installation is Setup and Installation Guide for complete instructions. incorrect. Hard disk controller Press F10 to run the BIOS Setup Utility and correct this order is incorrect. problem. Refer to Chapter 5 in the Compaq ProLiant ML350 Setup and Installation Guide for complete instructions. Problem Refer to the documentation provided with the hardware. encountered with Remove the new hardware. hardware newly added to the system. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: c-ch2 Server Operation Errors.doc Last Saved On: 12/3/99 3:32 PM 3 Chapter Status Indicators There are a variety of status LEDs located on the front and back of your server. These LEDs can communicate the current status of varying aspects of your server's components and operations, thus aiding you in diagnosing your problem. The following LEDs, provided for use with the ProLiant ML350, are explained in this chapter: System status LEDs (on the front panel of the server) s G Power On/Standby status G Hard drive status (for all installed hot-plug and non-hot-plug drives Hot-plug hard drive LEDs (located on the front of the physical drive) s G Drive activity G Power/online status G Fault status Network controller status LEDs (on the rear of the server) s G Network link status G Network activity status G Connection speed Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/7/99 10:35 AM 3-2 Compaq ProLiant ML350 Troubleshooting Guide System Status LED Indicators and Power On/Standby Switch The system status LEDs and the Power On/Standby switch are located on the front panel of each server. The Power On/Standby switch is the button you press to: Turn the server on (provide AC power). s Place the server in or out of standby mode. s Remove power from the server. s The System Status LEDs show: Power On/Standby status s Hard drive status (for all installed hot-plug and non-hot-plug drives) s See Figure 3-1 and Table 3-1 for an explanation of each possible LED status. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/7/99 10:35 AM Status Indicators 3-3 1 2 3 Figure 3-1. Power On/Standby switch, and system status
LED indicators Table 3-1 System Status LED Indicators Indicator Status Means Green System on, AC power OK. Do Power On/Standby
not remove power from system. status Flashing System in standby mode. AC power OK. Do not remove power from system. Off System off, no AC power. On or Drive being accessed. Hard drive status
Flashing Off Drive not being accessed. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/7/99 10:35 AM 3-4 Compaq ProLiant ML350 Troubleshooting Guide Hot-Plug Hard Drive LED Indicators If your server does not include the hot-plug hard drive option, view drive activity and status by way of the system status LEDs, discussed earlier in this chapter. The hot-plug hard drive LEDs, located on each physical hot-plug drive, are visible on the front of the server or external storage unit. They provide activity, online, and fault status for each corresponding drive when configured as a part of an array and attached to a powered-on controller. Their behavior may vary, depending on the status of other drives in the array. This section provides the following information about hard drive LEDs: An illustration detailing the location of each LED s A table of the possible LED configurations and what each combination s means Details about hot-plug drive rapid error recovery, and guidelines for s utilizing Compaq Insight Manager's predictive failure alert Guidelines for hot-plug drive replacement s For additional information on troubleshooting hard drive problems, refer to "Hard Drive Problems" and "SCSI Device Problems" in Chapter 2 of the Compaq Servers Troubleshooting Guide. Use the following illustration in conjunction with Table 3-2 to analyze the current status for hot-plug hard drives attached to a Compaq Smart Array controller: 3 2 1 Figure 3-2. Hot-plug hard drive LED indicators Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/7/99 10:35 AM Status Indicators 3-5 Table 3-2 Hot-Plug Hard Drive LED Indicator Status Combinations Means Activity Online Fault
IMPORTANT: It is recommended that you familiarize yourself with the guidelines following this table prior to performing a drive replacement. On Off Off Do not remove the drive. Removing a drive during this process will cause data loss. The drive is being accessed and is not configured as part of an array. On Flashing Off Do not remove the drive. Removing a drive during this process will cause data loss. The drive is rebuilding or undergoing capacity expansion. Flashing Flashing Flashing Do not remove the drive. Removing a drive during this process will cause data loss. The drive is part of an array being selected by the Array Configuration Utility. -Or- The Options ROMPaq is upgrading the drive. Off Off Off OK to replace the drive online if a predictive failure alert is received (see the following section for details) and the drive is attached to an array controller. The drive is not configured as part of an array. -Or- If this drive is part of an array, then a powered-on controller is not accessing the drive. -Or- The drive is configured as an online spare. continued Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/7/99 10:35 AM 3-6 Compaq ProLiant ML350 Troubleshooting Guide Table 3-2 Hot-Plug Hard Drive LED Indicator Status Combinations continued Means Activity Online Fault
Off Off On OK to replace the drive online. The drive has failed, and has been placed off-line. Off On Off OK to replace the drive online if a predictive failure alert is received (see the following section for details), provided that the array is configured for fault tolerance and all other drives in the array are online. The drive is online and configured as part of an array. On or On Off OK to replace the drive online if a predictive failure Flashing alert is received (see the following section for details), provided that the array is configured for fault tolerance and all other drives in the array are online. The drive is online and being accessed. Hot-Plug Drive Replacement Guidelines You should be able to hot-plug a drive during normal activity. Be aware, however, that hot-plugging a disk drive will affect system performance and fault tolerance. NOTE: Depending upon your configuration, both a drive failure and the subsequent rebuild process will cause storage subsystem performance degradation. For example, the replacement of a single drive on an array with 50 logical drives will have less impact to performance than if the array has 3 logical drives. When a disk drive is hot-plugged, although the system is functionally operational, the disk subsystem may no longer be fault tolerant. Fault tolerance will be lost until the removed drive is subsequently replaced and the rebuild and operation is completed (this will take several hours, even if the system is not busy while the rebuild is in progress). If another drive in the array should incur an error during the period when fault tolerance is unavailable, it is possible to cause a fatal system error due to a data error. If another drive fails during this period, the entire contents of the array will be lost. IMPORTANT: It is recommended that disk drive replacement be performed during low activity periods whenever possible. In addition, ensure you have a complete backup of the array containing the drive being replaced. This applies even if drive replacement is being made during server downtime. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/7/99 10:35 AM Status Indicators 3-7 Unsafe Hot-Plug Drive Replacement Precautions Be aware of the following Compaq guidelines cautioning unsafe hot-plug replacement. Do not remove a degraded drive (one that has been marked for s predictive failure) if any other member of the array is off-line (the online LED is off). No other drive in the array can be hot-plugged without data loss. The possible exception might be the utilization of RAID 0+1 as a fault tolerant form. In this case, drives are mirrored in pairs; more than one drive can fail and be replaced as long as the drive(s) to which they are mirrored are online. Refer to the Smart Array controller user guide for information on fault tolerance options. Do not remove a degraded drive if any member of an array is missing s (previously removed and not yet replaced). Do not remove a degraded drive if any member of an array is being s rebuilt unless the drive being rebuilt has been configured as an online spare. The drive's online LED will be flashing, indicating that a replaced drive is being rebuilt from data stored on the other drives. NOTE: An online spare will not activate and start rebuilding after a predictive failure alert, because the degraded drive is still online. The online spare activates only after a drive in the array has failed. Do not replace multiple degraded drives at the same time (for example, s when the system is off), since the fault tolerance may be compromised. When a drive is replaced, the controller uses data from the other drives in the array to reconstruct data on the replacement drive. If more than one drive is removed, a complete data set is not available to reconstruct data on the replacement drive(s), and permanent data loss could occur. CAUTION: Do not turn off an attached disk drive enclosure when the server containing the Smart Array controller is powered up. Also, do not turn on the server before turning on the disk enclosure. If these ordering rules are not followed, the Smart Array controller may mark the drives in this enclosure as "failed," which could result in permanent data loss. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/7/99 10:35 AM 3-8 Compaq ProLiant ML350 Troubleshooting Guide Predictive Failure Alert The predictive failure alert is a powerful problem-prevention tool that warns you when the system has determined a drive failure is imminent. This alert allows you to proactively schedule downtime for maintenance and not interrupt critical business operations. In addition, with hot-pluggable drives attached to Compaq Smart Array controllers, you can remove and replace one or several drives within a server while the system is online, which minimizes the interruption of the network, server downtime, and data loss. Refer to your Compaq Insight Manager and Compaq Management Agents documentation (found on the Compaq Management CD) for instructions on implementing this function. CAUTION: Not following these guidelines could result in data loss. CAUTION: It is recommended that some level of fault tolerance be utilized in your RAID configuration. Refer to your Smart Array controller user guide for information on fault tolerance options. IMPORTANT: You must use Compaq Insight Manager and a Compaq Smart Array controller to manage the drive array on your server if you wish to implement Predictive Failure Alert. Predictive Failure Replacement Guidelines To minimize server downtime and data loss, use these guidelines when Compaq Insight Manager implements a predictive failure alert. The alert indicates that a drive is degraded and should be replaced. Make sure all physical drives in the affected array are present and have s the online LEDs illuminated before removing the degraded hot-plug drive. If any online LEDs are flashing (indicating a rebuild) or are not illuminated, the degraded drive should not be removed. For step-by-step instructions on hot-plugging your hard drive, refer to the Compaq ProLiant ML350 Setup and Installation Guide. If you are upgrading to larger drives in the array, follow the previously s stated rules and ensure each drive has completed its rebuild before adding the next new drive to the array. You must follow the Compaq cabling guidelines when configuring your s array to implement the best possible cabling solution for your server. Refer to the Compaq ProLiant ML350 Setup and Installation Guide for step-by-step instructions. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/29/99 10:44 AM Status Indicators 3-9 Check for cabling configurations that are not supported. Signal integrity s errors may be injected into the SCSI bus when an active drive is hot-plugged. Make sure fault tolerance is not currently being used to recover from s errors to other drives in the array, such as media errors or signal integrity errors. Loss of fault tolerance following a drive replacement may result in problems. CAUTION: In extreme cases, when the number of errors is greater than the firmware error recovery is able to sustain, hot-plugging an online drive may cause some unrecoverable errors to be reported to the operating system or may cause a complete failure of the array. Refer to your operating system documentation for more information on implications, as well as possible recovery options. IMPORTANT: Before replacing a degraded drive, use Compaq Insight Manager to examine the error counters recorded for each physical drive in the array and to verify that such errors are not presently occurring. Refer to the Compaq Insight Manager documentation on the Compaq Management CD. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/7/99 10:35 AM 3-10 Compaq ProLiant ML350 Troubleshooting Guide Network Controller Status LED Indicators The network controller status LEDs are located on the back of the server. They provide the following information: Whether the server is linked to the network s Whether there is current network activity s The speed at which the network is being accessed. s Table 3-3 provides an explanation of each LED status. 1 2 3 Figure 3-3. Network controller status LED indicators Table 3-3 Network Controller Status LED Indicators Indicator Status Description Off No network link Link
On Linked to network Off No network activity Activity
On or flashing Network activity Off 10 Base TX 10 Mb/sec (10 base TX Ethernet) Speed
On 100 Base TX 100 Mb/sec (100 base TX Ethernet) Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: d-ch3 Status Indicators.doc Last Saved On: 12/7/99 10:35 AM 4 Chapter Switch Settings The ProLiant ML350 server contains two switch banks. These may need to be changed from time to time and can cause problems if they are not set correctly. This chapter explains the use of each nonreserved switch. The system configuration switch (SW1) is located on the system board. You may use it to: Enable or disable power-on password protection. s Clear all system configuration information from CMOS and nonvolatile s RAM (NVRAM). Specify a tower or rack configuration for server management software. s Enable or disable ROMPaq disaster recovery mode. s The server feature board switch (SW3) allows you to enable or disable embedded video. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: e- ch4 Switch Settings.doc Last Saved On: 12/3/99 9:09 AM 4-2 Compaq ProLiant ML350 Troubleshooting Guide System Configuration Switch (SW1) Settings on 1234 Figure 4-1. System configuration switch (SW1) settings Table 4-1 System Configuration Switch (SW1) Settings Position Default Function Description Settings 1 Off Clear/setup Used to disable Off = Password is enabled. Password password On = Password is disabled. 2 Off Clear CMOS Used to clear system Off = Normal and NVRAM configuration settings On = When server is powered on, all system configuration information will be erased. 3 Off Tower or Used to specify a Off = Tower configuration rack tower or rack On = Rack configuration configuration configuration 4 Off ROMPaq Used to enable Off = Normal server operations disaster ROMPaq disaster mode recovery recovery mode when On = ROMPaq disaster recovery enable system ROM is mode corrupted Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: e- ch4 Switch Settings.doc Last Saved On: 12/3/99 9:09 AM Switch Settings 4-3 Clearing and Resetting System Configuration Settings When SW1, position 2 is set to the on position, the system is prepared to erase all system configuration settings from both CMOS and NVRAM. You should then: 1. Power up the server. All configuration settings are then erased. At the completion of this process, all system operations will halt. 2. Power down the server. 3. Reset the position 2 switch to the default off position. 4. Power up the server. 5. Press F10 to run the BIOS Setup Utility and reset all system F10 configuration settings. NOTE: For complete instructions on how to use the BIOS Setup Utility, refer to Chapter 5 of the Compaq ProLiant ML350 Setup and Installation Guide. Enabling ROMPaq Disaster Recovery Mode A corrupted system ROM will require that you recreate the ROM BIOS by a process called ROM flash. This can be accomplished only when the system is in disaster recovery mode. Set SW1 position 4 to on, to enable ROMPaq disaster recovery mode. IMPORTANT: Prior to performing this operation, refer to Chapter 5 of the Compaq Servers Troubleshooting Guide for complete instructions on disaster recovery. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: e- ch4 Switch Settings.doc Last Saved On: 12/3/99 9:09 AM 4-4 Compaq ProLiant ML350 Troubleshooting Guide Server Feature Board Switch (SW3) Settings on 12345678 Figure 4-2. Server Feature Board switch (SW3) settings Table 4-2 Server Feature Board Switch (SW3) Settings Position Default Function Description Settings 1 Off Embedded Used to disable the onboard Off = Embedded video is video video controller when an enabled. enable optional video adapter is On = Embedded video is installed. disabled. 2 Off Reserved 3 Off Reserved 4 Off Reserved 5 Off Reserved 6 Off Reserved 7 Off Reserved 8 Off Reserved Note: Positions 2 through 8 are reserved for Compaq authorized service providers only. Please do not change these switches from the specified default settings. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: e- ch4 Switch Settings.doc Last Saved On: 12/3/99 9:09 AM A Appendix Array Diagnostic Utility Array Diagnostic Utility (ADU) The Array Diagnostic Utility (ADU) is a Windows-based tool designed to run on all Compaq servers that support Compaq array controllers and are running SmartStart 4.10 or later. The two main functions of ADU are to collect all possible information about the array controllers in the system and generate a list of detected problems. The error messages and codes listed include all codes generated by Compaq products. Your system generates only codes applicable to your configuration and options. ADU works by issuing multiple commands to the array controllers to determine if a problem exists. This data can then be saved to a file. In severe situations, this file can be sent to Compaq for analysis. In most cases, ADU will provide enough information to initiate problem resolution immediately. NOTE: ADU does not write to the drives or destroy data. It does not change or remove configuration information. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: f-appa Array Diagnostic Utility.doc Last Saved On: 12/1/99 5:24 AM A-2 Compaq ProLiant ML350 Troubleshooting Guide Starting the Array Diagnostic Utility (ADU) 1. Insert the SmartStart CD into the CD-ROM drive. 2. Restart the system from the SmartStart CD. 3. Select Array Diagnostic Utility (ADU) on the System Utilities Menu. 4. A "Please Wait" panel is displayed, indicating that ADU is identifying the system parameters. 5. ADU gathers information from all the array controllers in the system. The time it takes to gather this information depends upon the extent of your array configuration. CAUTION: Do not cycle the power during this process. ADU must perform low-level operations that, if interrupted, could cause the controller to revert to a previous level of firmware (if the firmware was soft-upgraded). 6. When the information gathering process is complete, ADU will display either the main screen or a panel indicating problems detected. 7. To generate an ADU report, select File, then Save Data from the command menu. Compaq Confidential Need to Know Required Writer: Susana Weinstein Salazar Project: Compaq ProLiant ML350 Troubleshooting Guide Comments: Part Number: 128360-002 File Name: f-appa Array Diagnostic Utility.doc Last Saved On: 12/1/99 5:24 AM Array Diagnostic Utility A-3 Array Diagnostic Utility (ADU) Error Messages Table A-1 Array Diagnostic Utility (ADU) Error Messages Message Description Recommended Action Accelerator board Array controller did not detect a Install array accelerator board on array not detected configured array accelerator board. controller. If an array accelerator board is installed, check for proper seating on the array controller board. Accelerator error log List of the last 32 parity errors on If there are many parity errors, you may need transfers to or from memory on the to replace the array accelerator board. array accelerator board. Displays starting memory address, transfer count, and operation (read and
| 128360-002 on-fault-to |