Checking hard drive and filesystem health
Table of Contents
...wip...
SMART ∆
Assuming the smartctl
command (manual) is available (part of smartmontools
package) and your drive is SMART1 capable and enabled.
If it isn't, check this article.
SMART tests and queries can be run on a hard drive that is in use.
All smartctl
commands require elevated privileges.
First, let's see what we have:
#> smartctl --scan-open
/dev/sda -d sat # /dev/sda [SAT], ATA device
/dev/sdb -d sat # /dev/sdb [SAT], ATA device
/dev/sdc -d sat # /dev/sdc [SAT], ATA device
#> smartctl --info /dev/sda # let's use /dev/sda in this article
This output includes both spinning and solid state drives, and all have SMART enabled, as the relatively short --info
output shows.
Is it necessary to specify the device type with
-d sat
for every subsequent command? I think not (the man page doesn't clarify). But if you do, it always has to be the last option.2
Getting all SMART information:
#> smartctl /dev/sda -a
## including non-SMART info:
#> smartctl /dev/sda -x
That's a lot. Let's see the overall health report only:
#> smartctl /dev/sda -H
...
SMART overall-health self-assessment test result: PASSED
That's nice to hear, but a little thin. Let's see what data has been stored:
#> smartctl /dev/sda -A -l error
...
Let's store that to a file for later comaprison:
#> smartctl /dev/sda -A -l error > sda.before
I highly recommend reading the -A, --attributes
section of the man page to fully understand what all this means, and avoid shock reactions when misunderstanding certain labels or values.
You can try -f [brief|old|hex]
for slightly different output; the new brief
format decodes additional data in the FLAGS
column.
We will perform a long (extended) self-test. But first, let's see when the last such test was performed:
#> smartctl /dev/sda -a | grep -iE 'Hours|Extended'
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 10101
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10042 -
...
The first line tells us how long the device has been in operation: 10101 hours. The last line tells us at which lifetime hour the last extended test was performed: 10042 hours.
The test results are stored indefinitely. You don't need to perform a new test to read out the most recent test's data.
Let's assume that the last extended test was long ago, and we want to run it now. But first, let's make sure "autosave of device vendor-specific Attributes" is enabled:
#> smartctl /dev/sda -S on
...
SMART Attribute Autosave Enabled.
I had hoped to get extra information from this after performing a test, but I cannot find it anywhere. Maybe over time. It should affect the "autosave of device vendor-specific Attributes". The man page says that a) smartctl has no way of checking the status of this setting (whether it is turned on or not) and b) the setting "is preserved across disk power cycles, so you should only need to issue it once".
#> smartctl /dev/sda -t long
...
All SMART tests are performed in the background. You can check where the test execution is at with something like
#> smartctl -a /dev/sda | grep -A1 execution
Self-test execution status: ( 244) Self-test routine in progress...
40% of test remaining.
until it says ( 0) The previous self-test routine completed without error or no self-test has ever
...
Check results ∆
#> smartctl /dev/sda -A -l error
...
That's the most important part. Again: read the -A, --attributes
section of the man page to fully understand what all this means, and avoid shock reactions when misunderstanding certain labels or values.
In short, you usually want to pay closest attention to RAW_VALUE.
Store it to a file again, and compare with before:
#> smartctl /dev/sda -A -l error > sda.after
#> diff sda.before sda.after
...
I see nothing that worries me there, but of course yours may be different.
Some other values may have been logged, e.g.:
smartctl -l scttemp /dev/sda
...
The presence of such data varies from device to device. Please refer to the -l TYPE, --log=TYPE
section of the man page.
Gsmartcontrol ∆
I recommend installing gsmartcontrol. If you followed this article it should be familiar already, but in addition to doing much of the legwork for you it can give you valuable additional explanation (when hovering fields with your pointer).
Badblocks ∆
SMART only goes that far. While I believe that it works, I want to make doubly sure and check for bad blocks again - with badblocks
.
Its man page says:
Important note: If the output of badblocks is going to be fed to the e2fsck or mke2fs programs, it is important that the block size is properly specified, since the block numbers which are generated are very dependent on the block size in use by the filesystem. For this reason, it is strongly recommended that users not run badblocks directly, but rather use the -c option of the e2fsck and mke2fs programs.
After some deliberation I decide to make a read-only check with badblocks
and only use e2fsck if errors are found.
Both utilities are owned by the e2fsprogs
package.
The drive/filesystem in question needs to be unmounted, so I fired up my trusty SysRescueCD USB stick.
This was simple:
#> for x in a b c; do badblocks -v /dev/sd$x > badblocks.sd$x; done
All three files were empty...
Some helpful links:
https://askubuntu.com/questions/539184/how-do-i-check-the-integrity-of-a-storage-medium-hard-disk-or-flash-drive
https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl
https://blog.shadypixel.com/monitoring-hard-drive-health-on-linux-with-smartmontools/
https://www.maketecheasier.com/check-repair-filesystem-fsck-linux/
https://www.linuxtechi.com/check-hard-drive-for-bad-sector-linux/
Further reading ∆
Careful with this dodgy site, better disable javascript first: https://recoverit.wondershare.com/harddrive-tips/repair-linux-disk.html
I still found the article surprisingly useful.
-
I know it's really "S.M.A.R.T." but I refuse to write that out every time. ↩
-
Here's a bash function that will do just that:
bashsmartctl() { local sc="$(type -P smartctl)" local dt=( $($sc --scan-open) ) local t='' for i in "$@"; do for((j=0;j<${#dt[@]};j++)); do [[ "$i" == "${dt[j]}" ]] && t="${dt[j+2]}" && break 2 done done [ -n "$t" ] && $sc "$@" -d "$t" || $sc "$@" }