Text preview for : Thermal Monitoring for Advanced Data Protection.pdf part of . Various Thermal Monitoring for Advanced Data Protection . Various Thermal Monitoring for Advanced Data Protection.pdf



Back to : Thermal Monitoring for Ad | Home

Thermal Monitoring for Advanced Data Protection
New Feature Helps Prevent Drive Failure
Caused By Overheating




OVERVIEW/EXECUTIVE SUMMARY
In its commitment to provide the highest quality, most reliable hard drives in the industry,
Western Digital has added an important new feature to its advanced data protection

package Thermal Monitoring. As a new property of the S.M.A.R.T. (Self-Monitoring,
Analysis, and Reporting Technology) interface, thermal monitoring specifically alerts the
host to potential damage from the drive operating at too high a temperature. It provides
information to the host whenever certain thresholds have been met for various conditions.
S.M.A.R.T. is the industry standard reliability tool that monitors and controls hard drive
operation to minimize the probability of drive failure.

The new thermal monitoring feature has been implemented in the WD Enterprise
WDE18300 and WDE9180 Ultra2 SCSI hard drives. This new technology focuses on
system-level protection and monitors the temperature of the drive, providing pertinent
information to the host and modifying drive behavior as needed to protect the drive from
damage.


BACKGROUND
Data-intensive applications place very high demands on data storage devices. These devices
are responsible for keeping critical data safe and providing fast, reliable access to that data.
The effect of excessive heat, especially over long periods of time, is harmful to hard drives
and could lead to permanent data loss. High temperatures can reduce drive reliability.

The S.M.A.R.T. reliability monitor assists users in preventing possible system downtime by
warning of an impending risk of data loss. Through S.M.A.R.T., the drive can
communicate its predicted reliability status to the user, thereby providing protection against
system downtime and possible loss of productivity and data. Western Digital uses advanced
diagnostics to monitor the hard drive's internal operations and provide early warnings of
potential problems. The new thermal monitoring feature expands S.M.A.R.T.'s capabilities
for preventing hard drive failure by monitoring the drive's temperature and alerting the host
when the temperature goes beyond the recommended level. It also monitors and predicts
the hard drive's performance and reliability. S.M.A.R.T. notifications warn of impending
drive failure so the user can take corrective action before data is lost or operations are
affected.
Thermal Monitoring Functions
In a hard drive, both the electronic and mechanical components such as actuator bearings,
spindle motor and voice coil motor can be affected by excessive temperatures. See Figure 1.
There are many conditions that could contribute to a temperature increase, such as:
a clogged cooling fan
a failed room air conditioner
a cooling system that is overextended by too many drives




Figure 1. Three Hard Drive Mechanical Components

Running the drive for extended periods of time at too high a temperature can damage it and
lead to permanent data loss. A thermal sensor can detect the environmental conditions that
affect drive reliability, including ambient temperature, rate of cooling airflow, voltage,
vibration, and duty cycle.

To carry out the new feature's thermal monitoring capability, a dedicated thermal sensor,
mounted on the printed circuit board assembly (PCBA), automatically polls, records, and
analyzes drive temperature at set intervals as dictated by the drive firmware. Using this data,
the hard drive can:
report the current temperature of the drive
log drive temperatures over time
maintain a record of the highest observed temperature
provide S.M.A.R.T. notifications on reaching a customer-specified temperature
provide S.M.A.R.T. notifications upon reaching a drive-threatening temperature
optimize drive operations to keep drive temperatures under acceptable levels
spin down, if enabled to do so, upon reaching a drive-threatening temperature


Page 2
How Thermal Monitoring Works
Thermal monitoring capabilities can be enabled and controlled using various Mode Page
parameters, i. e., commands using regular parameter structures referred to as pages. The
Enable Warning Additional Sense Code (EWASC) bit in the Information Exceptions
Control Page (page 1Ch) controls whether or not any S.M.A.R.T. notifications will be
generated due to thermal monitoring events. When this bit is set to 1, all thermal
monitoring S.M.A.R.T. notifications will be generated, with the possible exception of the
customer threshold, which can be individually disabled if desired.


The first thermal threshold the customer threshold, is entirely user-defined and
programmable in Mode Page 0 (default value 60C). A disable bit allows the customer to
disable this specific threshold in cases where other thermal monitoring notifications are
desired but no customer threshold is set. When this threshold is crossed, the drive returns a
01/0B/01 error code (Warning