

- #Check ssd health write cycles how to
- #Check ssd health write cycles software upgrade
- #Check ssd health write cycles upgrade
Option 2 - Upgrade the SSD Firmware Using the SMU (No Reload Required)
#Check ssd health write cycles software upgrade
See the Cisco Nexus 9000 Series NX-OS Software Upgrade and Downgrade Guide, Release 9.3(x) for more information. When the switch is upgraded (disruptive/non-disruptive) or reloaded using the fixed NX-OS version, the SSD firmware version will be automatically upgraded. The issue has been fixed in these NX-OS versions: If the Temperature_Celsius attribute is read as 128, it will bail out and recommend a reload of the switch to the user. Nxos.CSCvx21260-n9k_ALL-1.0.1-.lib32_n9000.rpm bundle (Note: 1.0.1) automatically performs this Temperature_Celsius attribute precheck. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUEġ94 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 128 (0 65 0 10 255) If slot 28 is the Standby supervisor, enter this command:īash-4.2# rlogin slot 27 is the Standby supervisor, enter this command:īash-4.2# rlogin smartctl -a /dev/sda | egrep 'Temperature_Celsius|ID#' Any RAW_VALUE other than 128 for Temperature_Celsius is valid.Ĭonfigure bash if not enabled and then run bash:įor Nexus 9500, enter the rlogin command from the Active supervisor in order to log in to Standby supervisor. Note: An upgrade of the SSD Firmware of the switch with a RAW_VALUE of 128 might result in unexpected behavior after a firmware upgrade (for example, an unexpected reload or read-only drive). If it is 128, then power-cycle/reload the switch before you proceed with the SSD Firmware upgrade options. Check to see if the Temperature_Celsius attribute is 128 with the smartctl -a /dev/sda | egrep 'Temperature_Celsius|ID#' command.Then, power-cycle/reload the switch in order to recover the bootflash. Verify if the bootflash is already in the failed state (read-only state).Precheck Before You Upgrade the SSD Firmware

For all options, it is strongly recommended to upgrade the firmware in a Maintenance Window. There are three options to mitigate this issue. Note: A Return Material Authorization (RMA) is not recommended as the upgrade process will resolve the issue. If the system is already impacted, the SSD firmware upgrade will permanently resolve this defect.
#Check ssd health write cycles how to
See the How to Identify Affected Products section and follow the firmware upgrade procedure accordingly. In order to prevent this issue and disruption to the network and operations, Cisco recommends to upgrade the firmware of the SSD proactively before the uptime reaches 28,224 hours. However, this failure will reappear after 1008 hours of operation. Power-cycle the system in order to temporarily recover from this problem. The switch might continue to work, but there will be an error when you try to save the configuration or write to any file on bootflash. %$ VDC-1 %$ %DEVICE_TEST-2-COMPACT_FLASH_FAIL: Module 1 has failed test BootFlash 5 times on device BootFlash due to error Failure %$ VDC-1 %$ %DIAGCLIENT-2-EEM_ACTION_HM_SHUTDOWN: Test has been disabled as a part of default EEM action %$ VDC-1 %$ %KERN-2-SYSTEM_MSG: EXT4-fs (sda3): Remounting filesystem read-only - kernelįurther, the logs also indicate a bootflash diagnostic test failure.

%$ VDC-1 %$ %KERN-2-SYSTEM_MSG: EXT4-fs error (device sda3): ext4_journal_check_start:61: Detected aborted journal - kernel %$ VDC-1 %$ %KERN-2-SYSTEM_MSG: EXT4-fs error (device sda3) in ext4_write_begin:1358: Journal has aborted - kernel In addition, these log messages are displayed and indicate that the bootflash is in read-only mode. It might also cause an unexpected reload. The bootflash on Nexus 9000/3000 switches will no longer respond, which causes failure of operations such as configuration changes/saves, read/write operations, and so on. The drive continues to operate normally for approximately six weeks (1008 additional accumulated POH), at which time the drive will become unresponsive again. A power-cycle restores normal operation of the drive. No data loss will occur when the memory buffer overrun firmware event occurs. This causes the drive to become unresponsive until the drive is power-cycled. BackgroundĪfter approximately 3.2 years (28,224 accumulated Power On Hours (POH)), a memory buffer overrun condition occurs which triggers the firmware event in the SSD. Nexus 9000/3000 NXOS : M500IT Bootflash in readonly modeĭue to a flaw in the Solid State Drive (SSD) firmware, the SSD will no longer respond after approximately 3.2 years of cumulative operation.Īfter the first unresponsive event is experienced, every subsequent power-cycle of the system will allow the drive to operate for another 1008 hours (approximately six weeks) before it will no longer respond again.
