Tuesday, August 14, 2007

harddisk monitoring using smartctl

The smartmontools package contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology System (SMART) built into most modern ATA and SCSI hard disks. In many cases, these utilities will provide advanced warning of disk degradation and failure.

Man smartctl says:
smartctl is a command line utility designed to perform SMART tasks such as printing the SMART self-test and error logs, enabling and disabling SMART automatic testing, and initiating device self-tests. smartctl also provides support for polling TapeAlert messages from SCSI tape drives and changers.

Most BIOS that ships with a computer nowadays comes with SMART BIOS feaure. This S.M.A.R.T. BIOS feature can be toggled by entering the BIOS setup before OS boots up. If you found it disabled from the BIOS you can enable them from there or using smartctl tool.

Here's quick ways to install and manage smartctl for your hard drive from Fedora OS.

FEDORA 7 INSTALLATION:
~~~~~~~~~~~~~~~~~~~~~~
# yum -y install smartmontools


USAGE AND PARAMETERS:
~~~~~~~~~~~~~~~~~~~~~~

How to start and check smartctl daemon service?

# service smartd status
# service smartd start

smartctl uses /etc/smartd.conf as its configuration file.

From this file, there is a line that says

DEVICESCAN

The above line makes sure that smartd daemon service starts by scanning and monitoring all attached ATA and SCSI. It is possible not to automatically scan and monitor all attached harddisk by explicitly specifying your harddisk line per line with arguments and by commenting out DEVICESCAN line from smartd.conf. You can also send notifications by email like so

#DEVICESCAN
/dev/sda -S on -o on -a -I 194 -m email@domain.com

The /dev/sda the device to be processed and monitored.
The -S enables automatic Attribute autosave.
The -o enables the automatic off-line testing.
The -a instructs smartd to monitor all SMART features of the disk.
The -I 194 means to ignore changes in Attribute #194, because disk temperatures change often
The -m followed by an e-mail address to which warning messages are sent.

Sysctl makes use of /var/log/messages as its log file for any error and warning issues detected along the harddisk checking.

If you wish to enable smartd daemon service between reboots, you know what to do, like so

# chkconfig --levels 35 smartd on


How to know if harddisk type?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# fdisk -l | grep Disk | head -1

returns
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Disk /dev/sda: 40.0 GB, 40019582464 bytes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Now, how to know if your non-SATA harddisk supports SMART?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# smartctl -i /dev/sda
# smartctl -A /dev/sda

Now, how to know if your SATA harddisk supports SMART?

# smartctl -i -d ata /dev/sda

and for Array SCSI disks

# smartctl -i -d cciss,0 /dev/cciss/c0d0
# smartctl -i -d cciss,1 /dev/cciss/c0d0

How to get vendor harddisk specific attributes?
How to get the harddisk temperature ?

# smartctl -A -d ata /dev/sda

ID 194 and its value would be line for harddisk temperature in IDE/SATA harddisk

For Array SCSI

# smartctl -A -d cciss,0 /dev/cciss/c0d0
# smartctl -A -d cciss,0 /dev/cciss/c0d0

The above commands would also tells you the number of minutes left until the next internal SMART test.

which display more harddisk information such as harddisk types, models, serial number, firmware versions, ATA version and more details.

***Note, I have read somewhere that /dev/hda is gradually being phased out as a detected harddisk name.


How to enable SMART with harddisk?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# smartctl -s on /dev/sda

And for SATA

# smartctl -s on -d ata /dev/sda

and for Compaq Array SCSI drives

# smartctl -s on -d cciss,0 /dev/cciss/c0d0
# smartctl -s on -d cciss,1 /dev/cciss/c0d0


How to check for SMART health status after enabling it?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# smartctl -H /dev/sda

and for SCSI Array disks

# smartctl -H -d cciss,0 /dev/cciss/c0d0
# smartctl -H -d cciss,1 /dev/cciss/c0d0

The above would show you a PASSED or OK considering that your harddisk is running in good condition, if not, it is advisable to immediately backup all your data and settings immediately before your harddisk fails!


You can also check for SMART error log if there is any, by doing so:

# smartctl -l error /dev/sda

Offline testing can also be done using smartctl which does not affect current harddisk activity.
This can be done by issuing the following commands:

# smartctl -c /dev/sda

The above command shows how long this short and extended test routine would take. You can choose several harddisk test from the below arguments sample.

# smartctl -t offline /dev/sda
# smartctl -t short /dev/sda
# smartctl -t long /dev/sda
# smartctl -t conveyance /dev/sda


and finally, after waiting for several minutes for the above test to finish, you can now proceed on checking the error log again for any harddisk error found like so

# smartctl -l error /dev/sda


Smartctl note:
If the user issues a SMART command that is (apparently) not implemented by the device, smartctl will print a warning message but issue the command anyway (see the -T, --tolerance option below). This should not cause problems: on most devices, unimplemented SMART commands issued to a drive are ignored and/or return an error.

For more info,

# man smartctl

Further readings: http://www.linuxjournal.com/article.php?sid=6983

0 comments:

Sign up for PayPal and start accepting credit card payments instantly.
ILoveTux - howtos and news | About | Contact | TOS | Policy