This post is part of the ZFS SSD pool series.
This post was suppose to be a “look I made an SSD ZFS pool”, but will instead be the first post in a trouble-shooting series. Trying to get my SSDs to behave.
But — I am getting ahead of myself… Let’s start at the beginning.
Table of contents
The beginning
I had two unused 500 GB SSDs lying around, a Western Digital Blue and a Samsung 850 EVO. I figured I’d put them to good use in a ZFS mirror pool.
I ordered a 4×2.5" disk bay to install in my file server, and two Samsung 860 EVO 500GB SSDs.
The disk bay
I had some issues fitting the disk bay in the 5.25" slot on my Inter-Tech 4U-4416 case, as the “fingers” on the brackets were too long and actually interfered with the disk trays. I did get it installed, but had to bend some of these “fingers” out. They are going to break off if I bend them back, for sure.
Next issue was the noise, the disk bay has two tiny fans on the back. They were now the loudest part of my file server.
I’m going to use this bay for SSDs only, and there is good airflow from the internal fans already. So I figured I didn’t need the fans — and disconnected them.
Oh… Great, now it’s constantly flashing red, indicating a fan problem.
The 850 EVO SSD
While waiting for the additional disks — I figured I’d make a pool with a single mirrored VDEV to play with. Everything was going great, but I quickly noticed that the write speed was not good.
Using atop
I could see that the problem was the Samsung 850 EVO SSD, it was 100% busy but only writing 30-35 MB/s. That seemed very low. I destroyed the pool and started testing with this disk alone.
sdr busy 100% read 5 write 3248 KiB/w 103 MBr/s 0.0 MBw/s 33.0 avio 3.05 ms
sds busy 8% read 2 write 3788 KiB/w 87 MBr/s 0.0 MBw/s 32.3 avio 0.21 ms
I found that after writing 6-10 GB at the expected speed, the write performance dropped dramatically. I tested the disk on two computers, and the behaviour was the same.
After some googling I found that the 500GB 850 EVO has 6 GB of “TurboWrite buffer”. In essence this means that 6 GB of the NAND is in SLC mode, which is much faster than TLC mode. So the data will first be transferred to the fast SLC buffer, and moved to the TLC array during idle time. Unless of course — you fill the buffer in a single write operation, like I did.
I still felt 30-35 MB/s was unreasonably slow, so I filed a RMA and got a new disk — a 870, since the 850 was obsolete.
The build
After receiving all of my SSDs I was ready to install and configure, let’s get to it!
I mounted all the SSDs on trays and put them in the disk bay. A quick lsblk
to see that they were all present:
$ lsblk
sdr 65:16 0 465.8G 0 disk
├─sdr1 65:17 0 465.8G 0 part
└─sdr9 65:25 0 8M 0 part
sds 65:32 0 465.8G 0 disk
sdt 65:48 0 465.8G 0 disk
sdu 65:64 0 465.8G 0 disk
Preparing the disks
Cool, now to figure out which was which:
$ sudo smartctl -i /dev/sdr
=== START OF INFORMATION SECTION ===
Device Model: WDC WDS500G2B0A-00SM50
$ sudo smartctl -i /dev/sds
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 870 EVO 500GB
$ sudo smartctl -i /dev/sdt
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 860 EVO 500GB
$ sudo smartctl -i /dev/sdu
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 860 EVO 500GB
My sdr
had some data from earlier, let’s wipe that:
$ sudo wipefs --all /dev/sdr
/dev/sdr: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/sdr: 8 bytes were erased at offset 0x7470c05e00 (gpt): 45 46 49 20 50 41 52 54
/dev/sdr: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/sdr: calling ioctl to re-read partition table: Success
$ lsblk
sdr 65:16 0 465.8G 0 disk
sds 65:32 0 465.8G 0 disk
sdt 65:48 0 465.8G 0 disk
sdu 65:64 0 465.8G 0 disk
Creating the pool
Nice, now they are all clean and ready to go!
Let’s create the pool, but do a dry-run (-n
) first:
$ sudo zpool create -n spool0 -o ashift=12 \
mirror /dev/disk/by-id/ata-WDC_WDS500G2B0A-00SM50_xxxxxxxxxxxx /dev/disk/by-id/ata-Samsung_SSD_870_EVO_500GB_xxxxxxxxxxxxxxx \
mirror /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_xxxxxxxxxxxxxxx /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_xxxxxxxxxxxxxxx
would create 'spool0' with the following layout:
spool0
mirror
ata-WDC_WDS500G2B0A-00SM50_191005A00184
ata-Samsung_SSD_870_EVO_500GB_S62BNJ0NC00646F
mirror
ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09399X
ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09451Z
Looks good — repeat the comment, but without the -n
:
pool: spool0
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
spool0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WDS500G2B0A-00SM50_191005A00184 ONLINE 0 0 0
ata-Samsung_SSD_870_EVO_500GB_S62BNJ0NC00646F ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09399X ONLINE 0 0 0
ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09451Z ONLINE 0 0 0
Our pool is created, now for compression:
$ sudo zfs set compression=lz4 spool0
$ sudo zfs get compression spool0
NAME PROPERTY VALUE SOURCE
spool0 compression lz4 local
And set a mount point:
$ sudo zfs set mountpoint=/srv/spool0 spool0
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
spool0 372K 899G 96K /srv/spool0
Create a data set, and set myself as the owner:
$ sudo zfs create spool0/home
NAME USED AVAIL REFER MOUNTPOINT
spool0 468K 899G 96K /srv/spool0
spool0/home 96K 899G 96K /srv/spool0/home
$ sudo chown hebron:hebron /srv/spool0/home/
Great! The pool is configured. It contains two mirrors, with a total of 1 TB usable space (ish).
Disappointment
I should be enjoying my new SSDs pool at this points, but as the heading implies — that is not what happened. When I started copying data to the pool, errors popped up in dmesg
:
[ 1252.513717] ata4.00: failed command: WRITE FPDMA QUEUED
[ 1294.501815] ata6.00: failed command: WRITE FPDMA QUEUED
Fuck… 🤬
Samsung firmware
Some googling revealed that this was a problem with the Samsung firmware, claiming to have TRIM support, when it in fact does not.
A solution to this was apparently to disable NCQ support for the affected drives. I figured I’d turn it off for all my Samsung SSDs, so first to find what ata ports they were connected to:
~$ lsscsi
[4:0:0:0] disk ATA Samsung SSD 870 1B6Q /dev/sds
[5:0:0:0] disk ATA Samsung SSD 860 4B6Q /dev/sdt
[6:0:0:0] disk ATA Samsung SSD 860 4B6Q /dev/sdu
$ dmesg | grep NCQ
[ 1.548626] ,NCQ
[ 1.735877] ata3.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.736197] ata1.00: NCQ Send/Recv Log not supported
[ 1.736203] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.736677] ata5.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.739051] ata4.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.739298] ata1.00: NCQ Send/Recv Log not supported
[ 1.740838] ata6.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
ata4
, ata5
and ata6
. Let’s disable NCQ on those ports with a kernel parameter:
$ sudo vim /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="libata.force=4.00:noncq,5.00:noncq,6.00:noncq"
$ sudo update-grub
$ sudo reboot
And to verify that the parameters had any affect:
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.15.0-135-generic root=UUID=xxxx ro libata.force=4.00:noncq,5.00:noncq,6.00:noncq
$ dmesg | grep NCQ
[ 1.563357] ,NCQ
[ 1.756057] ata1.00: NCQ Send/Recv Log not supported
[ 1.756062] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.758620] ata1.00: NCQ Send/Recv Log not supported
[ 1.759741] ata3.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.765980] ata4.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
[ 1.769174] ata6.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
[ 1.769287] ata5.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
Alright, NCQ not used for our Samsung SSDs. Let’s try some more copying…
[ 49.509194] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
[ 49.509207] ata4.00: irq_stat 0x08000000, interface fatal error
[ 49.509215] ata4: SError: { UnrecovData Handshk }
[ 49.509223] ata4.00: failed command: WRITE DMA EXT
[ 49.509236] ata4.00: cmd 35/00:b0:50:95:15/00:00:1d:00:00/e0 tag 0 dma 90112 out
res 50/00:00:4f:95:15/00:00:1d:00:00/e0 Emask 0x10 (ATA bus error)
[ 49.509248] ata4.00: status: { DRDY }
Still issues
Damn… Different error this time, but still an error — on ata4
. Let’s try and swap the lower left (ata4
) and lower right (ata6
) trays.
[ 47.101223] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
[ 47.101229] ata6.00: irq_stat 0x08000000, interface fatal error
[ 47.101232] ata6: SError: { UnrecovData Handshk }
[ 47.101236] ata6.00: failed command: WRITE DMA EXT
[ 47.101241] ata6.00: cmd 35/00:08:00:2a:00/00:00:2e:00:00/e0 tag 1 dma 4096 out
res 50/00:00:ff:29:00/00:00:2e:00:00/e0 Emask 0x10 (ATA bus error)
[ 47.101246] ata6.00: status: { DRDY }
So the error moved from ata4
to ata6
. Probably not an issue with the disk bay them.
More errors on ata6
in dmesg
:
[ 47.101250] ata6: hard resetting link
[ 47.416539] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 47.420881] ata6.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 47.420883] ata6.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 47.420885] ata6.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 47.421141] ata6.00: supports DRM functions and may not be fully accessible
[ 47.423520] ata6.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 47.423522] ata6.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 47.423523] ata6.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 47.423742] ata6.00: supports DRM functions and may not be fully accessible
[ 47.425234] ata6.00: configured for UDMA/133
BIOS or firmware?
Configured for UDMA/133? What the hell does that mean.
I booted to the BIOS setup, verified that the storage controller was set to AHCI and checked for BIOS upgrades. No new BIOS available…
I then checked if there was any new firmware available for the Samsung 860 and 870 disks — there was not 😕
Initially I got failed command: WRITE FPDMA QUEUED
on both ata4
and ata6
. But now only the 870 seems to report problems. Maybe there is more than one issue at play here.
I tried copying more files, same error pops up in demsg
:
[ 229.158053] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
[ 229.158182] ata6.00: irq_stat 0x08000000, interface fatal error
[ 229.158274] ata6: SError: { UnrecovData Handshk }
[ 229.158349] ata6.00: failed command: WRITE DMA EXT
[ 229.158431] ata6.00: cmd 35/00:48:f8:b9:80/00:07:2e:00:00/e0 tag 8 dma 954368 out
res 50/00:00:67:82:80/00:00:2e:00:00/e0 Emask 0x10 (ATA bus error)
[ 229.158653] ata6.00: status: { DRDY }
Desperation
I tried to recreate the pool but with ashift=9
, that didn’t make any difference.
The 870 actually got kicked out of the pool, because it had too many errors.
pool: spool0
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: resilvered 19.1G in 0h1m with 0 errors on Mon Feb 1 21:54:17 2021
config:
NAME STATE READ WRITE CKSUM
spool0 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ata-WDC_WDS500G2B0A-00SM50_191005A00184 ONLINE 0 0 0
ata-Samsung_SSD_870_EVO_500GB_S62BNJ0NC00646F FAULTED 0 0 0 too many errors
mirror-1 ONLINE 0 0 0
ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09399X ONLINE 0 0 0
ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09451Z ONLINE 0 0 0
New test; destroy the ZFS pool and wipe the disks.
Create an mdadm RAID 10 array instead:
$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/sdr /dev/sds /dev/sdt /dev/sdu
Nope, still errors in dmesg
when copying. But now the errors has moved back to ata4
?!
[ 1880.671647] ata4: limiting SATA link speed to 3.0 Gbps
[ 1880.671650] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
[ 1880.671688] ata4.00: irq_stat 0x08000000, interface fatal error
[ 1880.671715] ata4: SError: { UnrecovData Handshk }
[ 1880.671744] ata4.00: failed command: WRITE DMA EXT
[ 1880.671773] ata4.00: cmd 35/00:00:00:10:95/00:0a:0b:00:00/e0 tag 7 dma 1310720 out
res 50/00:00:7f:8c:95/00:00:0b:00:00/e0 Emask 0x10 (ATA bus error)
[ 1880.671845] ata4.00: status: { DRDY }
[ 1880.671865] ata4: hard resetting link
[ 1880.986864] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 1880.990693] ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1880.990699] ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1880.990704] ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1880.991091] ata4.00: supports DRM functions and may not be fully accessible
[ 1880.994883] ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1880.994889] ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1880.994893] ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1880.995166] ata4.00: supports DRM functions and may not be fully accessible
[ 1880.996793] ata4.00: configured for UDMA/133
[ 1880.996813] ata4: EH complete
Goddammit! I’m going to bed. 😫
To be continued…
Last commit 2024-04-05, with message: Tag cleanup.
All posts in ZFS SSD pool series
- My anticlimactic ZFS SSD pool story
- My ZFS SSD pool seems to be working!