Rockstable Wiki:

smart

About

smartd is a daemon that monitors the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into most ATA/SATA and SCSI/SAS hard drives and solid-state drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests. This version of smartd is compatible with ACS-3, ACS-2, ATA8-ACS, ATA/ATAPI-7 and earlier standards.

tools

smartctl

CLI tool smartctl to investigate disks.

   1 smartctl -h
   2 smartctl --scan
   3 smartctl -a /dev/sda

One shot on all devices

   1 for DEVICE in $(smartctl --scan |cut -f 1 -d\ ); do
   2         echo -e "\n== DEVICE: '$DEVICE' ==\n"
   3         sudo smartctl -a "$DEVICE"
   4 done|less

GSmartControl

GSmartControl Homepage

Hard disk drive and SSD health inspection tool

GSmartControl is a graphical user interface for smartctl (from smartmontools package), which is a tool for querying and controlling SMART (Self-Monitoring, Analysis, and Reporting Technology) data on modern hard disk and solid-state drives. It allows you to inspect the drive's SMART data to determine its health, as well as run various tests on it.

sudo gsmartcontrol

Notes

Installation

   1 aptitude install smartmontools mailutils

Test mail

The defaults are probably alright, but you have to test your mail delivery.

If you are using /etc/hosts to set FQDN, make sure the hostname is set correctly (with no trailing dots)

Make sure your aliases are setup correctly
/etc/aliases

   1 root: root@rockstable.it

Translate the aliases

   1 newaliases

Run a quick check

   1 smartd -c - -q onecheck \
   2         <<< 'DEVICESCAN -d removable -n standby -m root -M test'

Configure

The default config will:

/etc/smartd.conf

   1 # Sample configuration file for smartd.  See man smartd.conf.
   2 
   3 # Home page is: http://www.smartmontools.org
   4 
   5 # smartd will re-read the configuration file if it receives a HUP
   6 # signal
   7 
   8 # The file gives a list of devices to monitor using smartd, with one
   9 # device per line. Text after a hash (#) is ignored, and you may use
  10 # spaces and tabs for white space. You may use '\' to continue lines.
  11 
  12 # You can usually identify which hard disks are on your system by
  13 # looking in /proc/ide and in /proc/scsi.
  14 
  15 # The word DEVICESCAN will cause any remaining lines in this
  16 # configuration file to be ignored: it tells smartd to scan for all
  17 # ATA and SCSI devices.  DEVICESCAN may be followed by any of the
  18 # Directives listed below, which will be applied to all devices that
  19 # are found.  Most users should comment out DEVICESCAN and explicitly
  20 # list the devices that they wish to monitor.
  21 DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
  22 
  23 # Alternative setting to ignore temperature and power-on hours reports
  24 # in syslog.
  25 #DEVICESCAN -I 194 -I 231 -I 9
  26 
  27 # Alternative setting to report more useful raw temperature in syslog.
  28 #DEVICESCAN -R 194 -R 231 -I 9
  29 
  30 # Alternative setting to report raw temperature changes >= 5 Celsius
  31 # and min/max temperatures.
  32 #DEVICESCAN -I 194 -I 231 -I 9 -W 5
  33 
  34 # First ATA/SATA or SCSI/SAS disk.  Monitor all attributes, enable
  35 # automatic online data collection, automatic Attribute autosave, and
  36 # start a short self-test every day between 2-3am, and a long self test
  37 # Saturdays between 3-4am.
  38 #/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03)
  39 
  40 # Monitor SMART status, ATA Error Log, Self-test log, and track
  41 # changes in all attributes except for attribute 194
  42 #/dev/sdb -H -l error -l selftest -t -I 194
  43 
  44 # Monitor all attributes except normalized Temperature (usually 194),
  45 # but track Temperature changes >= 4 Celsius, report Temperatures
  46 # >= 45 Celsius and changes in Raw value of Reallocated_Sector_Ct (5).
  47 # Send mail on SMART failures or when Temperature is >= 55 Celsius.
  48 #/dev/sdc -a -I 194 -W 4,45,55 -R 5 -m admin@example.com
  49 
  50 # An ATA disk may appear as a SCSI device to the OS. If a SCSI to
  51 # ATA Translation (SAT) layer is between the OS and the device then
  52 # this can be flagged with the '-d sat' option. This situation may
  53 # become common with SATA disks in SAS and FC environments.
  54 # /dev/sda -a -d sat
  55 
  56 # A very silent check.  Only report SMART health status if it fails
  57 # But send an email in this case
  58 #/dev/sdc -H -C 0 -U 0 -m admin@example.com
  59 
  60 # First two SCSI disks.  This will monitor everything that smartd can
  61 # monitor.  Start extended self-tests Wednesdays between 6-7pm and
  62 # Sundays between 1-2 am
  63 #/dev/sda -d scsi -s L/../../3/18
  64 #/dev/sdb -d scsi -s L/../../7/01
  65 
  66 # Monitor 4 ATA disks connected to a 3ware 6/7/8000 controller which uses
  67 # the 3w-xxxx driver. Start long self-tests Sundays between 1-2, 2-3, 3-4, 
  68 # and 4-5 am.
  69 # NOTE: starting with the Linux 2.6 kernel series, the /dev/sdX interface
  70 # is DEPRECATED.  Use the /dev/tweN character device interface instead.
  71 # For example /dev/twe0, /dev/twe1, and so on.
  72 #/dev/sdc -d 3ware,0 -a -s L/../../7/01
  73 #/dev/sdc -d 3ware,1 -a -s L/../../7/02
  74 #/dev/sdc -d 3ware,2 -a -s L/../../7/03
  75 #/dev/sdc -d 3ware,3 -a -s L/../../7/04
  76 
  77 # Monitor 2 ATA disks connected to a 3ware 9000 controller which
  78 # uses the 3w-9xxx driver (Linux, FreeBSD). Start long self-tests Tuesdays
  79 # between 1-2 and 3-4 am.
  80 #/dev/twa0 -d 3ware,0 -a -s L/../../2/01
  81 #/dev/twa0 -d 3ware,1 -a -s L/../../2/03
  82 
  83 # Monitor 2 SATA (not SAS) disks connected to a 3ware 9000 controller which
  84 # uses the 3w-sas driver (Linux). Start long self-tests Tuesdays
  85 # between 1-2 and 3-4 am.
  86 # On FreeBSD /dev/tws0 should be used instead
  87 #/dev/twl0 -d 3ware,0 -a -s L/../../2/01
  88 #/dev/twl0 -d 3ware,1 -a -s L/../../2/03
  89 
  90 # Same as above for Windows. Option '-d 3ware,N' is not necessary,
  91 # disk (port) number is specified in device name.
  92 # NOTE: On Windows, DEVICESCAN works also for 3ware controllers.
  93 #/dev/hdc,0 -a -s L/../../2/01
  94 #/dev/hdc,1 -a -s L/../../2/03
  95 #
  96 # Monitor 2 disks connected to the first HP SmartArray controller which
  97 # uses the cciss driver. Start long tests on Sunday nights and short
  98 # self-tests every night and send errors to root
  99 #/dev/sda -d cciss,0 -a -s (L/../../7/02|S/../.././02) -m root
 100 #/dev/sda -d cciss,1 -a -s (L/../../7/03|S/../.././03) -m root
 101 
 102 # Monitor 3 ATA disks directly connected to a HighPoint RocketRAID. Start long
 103 # self-tests Sundays between 1-2, 2-3, and 3-4 am. 
 104 #/dev/sdd -d hpt,1/1 -a -s L/../../7/01
 105 #/dev/sdd -d hpt,1/2 -a -s L/../../7/02
 106 #/dev/sdd -d hpt,1/3 -a -s L/../../7/03
 107 
 108 # Monitor 2 ATA disks connected to the same PMPort which connected to the
 109 # HighPoint RocketRAID. Start long self-tests Tuesdays between 1-2 and 3-4 am
 110 #/dev/sdd -d hpt,1/4/1 -a -s L/../../2/01
 111 #/dev/sdd -d hpt,1/4/2 -a -s L/../../2/03
 112 
 113 # HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE.
 114 # PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS
 115 #
 116 #   -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N
 117 #   -T TYPE set the tolerance to one of: normal, permissive
 118 #   -o VAL  Enable/disable automatic offline tests (on/off)
 119 #   -S VAL  Enable/disable attribute autosave (on/off)
 120 #   -n MODE No check. MODE is one of: never, sleep, standby, idle
 121 #   -H      Monitor SMART Health Status, report if failed
 122 #   -l TYPE Monitor SMART log.  Type is one of: error, selftest
 123 #   -f      Monitor for failure of any 'Usage' Attributes
 124 #   -m ADD  Send warning email to ADD for -H, -l error, -l selftest, and -f
 125 #   -M TYPE Modify email warning behavior (see man page)
 126 #   -s REGE Start self-test when type/date matches regular expression (see man page)
 127 #   -p      Report changes in 'Prefailure' Normalized Attributes
 128 #   -u      Report changes in 'Usage' Normalized Attributes
 129 #   -t      Equivalent to -p and -u Directives
 130 #   -r ID   Also report Raw values of Attribute ID with -p, -u or -t
 131 #   -R ID   Track changes in Attribute ID Raw value with -p, -u or -t
 132 #   -i ID   Ignore Attribute ID for -f Directive
 133 #   -I ID   Ignore Attribute ID for -p, -u or -t Directive
 134 #   -C ID   Report if Current Pending Sector count non-zero
 135 #   -U ID   Report if Offline Uncorrectable count non-zero
 136 #   -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit
 137 #   -v N,ST Modifies labeling of Attribute N (see man page)
 138 #   -a      Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198
 139 #   -F TYPE Use firmware bug workaround. Type is one of: none, samsung
 140 #   -P TYPE Drive-specific presets: use, ignore, show, showall
 141 #    #      Comment: text after a hash sign is ignored
 142 #    \      Line continuation character
 143 # Attribute ID is a decimal integer 1 <= ID <= 255
 144 # except for -C and -U, where ID = 0 turns them off.
 145 # All but -d, -m and -M Directives are only implemented for ATA devices
 146 #
 147 # If the test string DEVICESCAN is the first uncommented text
 148 # then smartd will scan for devices.
 149 # DEVICESCAN may be followed by any desired Directives.
 150 

   1 

   1 

   1 

   1 

Rockstable Wiki: smart (last edited 2021-03-21 08:42:04 by RockStable)