Checking all hard drives for errors
First of all, I fired up my trusty SysRescueCD USB stick.
Start with getting some SMART information:
smartctl -a /dev/sda
Perform a short test - takes a few minutes only:
smartctl -t short /dev/sda
Check results:
smartctl -l selftest /dev/sda
General health report:
smartctl -H /dev/sda
My external USB 3.0 hard drive was not automatically recognized; I guessed that it was scsi
:
smartctl -d scsi -a /dev/sdc
smartctl -t short -d scsi /dev/sdc
smartctl -l selftest -d scsi /dev/sdc
smartctl -H -d scsi /dev/sdc
This took the longest; around 1.5h for a 250GB hard drive:
badblocks -b 4096 -c 4096 -s -v /dev/sda
An additional filesystem check - make sure that the partitions are not mounted.
With the -f
switch, this took a minute or so per partition.
fdisk -l /dev/sda
fsck -fV /dev/sda1
fsck -fV /dev/sda3
fsck -fV /dev/sda4
It did find some errors on one partition, which it asked me to fix, and I answered yes. Alarmed by this, I re-ran the SMART test on that hard drive, a bit more thorought his time. First, let's see how long this will take:
smartctl -c /dev/sdb
Over an hour, ugh. Nevertheless:
smartctl -t long /dev/sdb
But only after a few minutes, smartctl -l selftest /dev/sdb
tells me that the extended test completed without error. Hmmm :-|
I guess this will have to do for now.
Rinse and repeat for all hard drives; make sure they're not mounted.
Some helpful links:
https://askubuntu.com/questions/539184/how-do-i-check-the-integrity-of-a-storage-medium-hard-disk-or-flash-drive
https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl
https://blog.shadypixel.com/monitoring-hard-drive-health-on-linux-with-smartmontools/
https://www.maketecheasier.com/check-repair-filesystem-fsck-linux/
What if Problems Are Found?
# fsck -fV /dev/sdc1
fsck from util-linux 2.32
[/usr/bin/fsck.ext4 (1) -- /home/backup] fsck.ext4 -f /dev/sdc1
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found. Create<y>? yes
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(131366912--131371007)
Fix<y>? yes
Free blocks count wrong for group #4009 (10240, counted=32768).
Fix<y>? yes
Free blocks count wrong for group #4046 (28672, counted=32768).
Fix<y>? yes
Free blocks count wrong (123602180, counted=123628804).
Fix<y>? yes
recovery+backup: ***** FILE SYSTEM WAS MODIFIED *****
recovery+backup: 205354/61046784 files (1.8% non-contiguous), 120553212/244182016 blocks
# fsck -fV /dev/sdc1
fsck from util-linux 2.32
[/usr/bin/fsck.ext4 (1) -- /home/backup] fsck.ext4 -f /dev/sdc1
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
recovery+backup: 205354/61046784 files (1.8% non-contiguous), 120553212/244182016 blocks
I had a look in lost+found
, and it was empty. I assume that means no data loss, and this chapter of another useful article seems to confirm the assumption.
Nevertheless, this partitions hosts my backups, so i want to be very sure:
borg check --info --verify-data /path/to/borgbackupdir
Starting repository check
Starting repository index check
Completed repository check, no problems found.
Starting archive consistency check...
Starting cryptographic data integrity verification...
Finished cryptographic data integrity verification, verified 74519 chunks with 0 integrity errors.
Analyzing archive 201709030041 (1/9)
Analyzing archive 201709031032 (2/9)
Analyzing archive 201709031221 (3/9)
Analyzing archive 201709091605 (4/9)
Analyzing archive 201709161743 (5/9)
Analyzing archive 201709240050 (6/9)
Analyzing archive 201709301658 (7/9)
Analyzing archive 201710071610 (8/9)
Analyzing archive 201710141636 (9/9)
Archive consistency check complete, no problems found.
This is very slow, for a daily or weekly check one might want to remove --verify-data
.