My anticlimactic ZFS SSD pool story

This post is part of the ZFS SSD pool series.

This post was suppose to be a “look I made an SSD ZFS pool”, but will instead be the first post in a trouble-shooting series. Trying to get my SSDs to behave.

But — I am getting ahead of myself… Let’s start at the beginning.

Table of contents

The beginning

I had two unused 500 GB SSDs lying around, a Western Digital Blue and a Samsung 850 EVO. I figured I’d put them to good use in a ZFS mirror pool.

I ordered a 4×2.5" disk bay to install in my file server, and two Samsung 860 EVO 500GB SSDs.

The disk bay

I had some issues fitting the disk bay in the 5.25" slot on my Inter-Tech 4U-4416 case, as the “fingers” on the brackets were too long and actually interfered with the disk trays. I did get it installed, but had to bend some of these “fingers” out. They are going to break off if I bend them back, for sure.

Next issue was the noise, the disk bay has two tiny fans on the back. They were now the loudest part of my file server.

Backside of disk bay, installed in server

I’m going to use this bay for SSDs only, and there is good airflow from the internal fans already. So I figured I didn’t need the fans — and disconnected them.

Oh… Great, now it’s constantly flashing red, indicating a fan problem.

Disk bay with fan alarm

PSA: Get a disk bay that has replaceable fans, or a switch to turn them down, or off.

The 850 EVO SSD

While waiting for the additional disks — I figured I’d make a pool with a single mirrored VDEV to play with. Everything was going great, but I quickly noticed that the write speed was not good.

Using atop I could see that the problem was the Samsung 850 EVO SSD, it was 100% busy but only writing 30-35 MB/s. That seemed very low. I destroyed the pool and started testing with this disk alone.

sdr busy 100%  read 5  write 3248  KiB/w 103  MBr/s 0.0  MBw/s 33.0  avio 3.05 ms
sds busy 8%    read 2  write 3788  KiB/w 87   MBr/s 0.0  MBw/s 32.3  avio 0.21 ms

I found that after writing 6-10 GB at the expected speed, the write performance dropped dramatically. I tested the disk on two computers, and the behaviour was the same.

After some googling I found that the 500GB 850 EVO has 6 GB of “TurboWrite buffer”. In essence this means that 6 GB of the NAND is in SLC mode, which is much faster than TLC mode. So the data will first be transferred to the fast SLC buffer, and moved to the TLC array during idle time. Unless of course — you fill the buffer in a single write operation, like I did.

More information about the disk, and the TurboWrite buffer in this review. And here is a nice Super User Q&A regarding the topic.

I still felt 30-35 MB/s was unreasonably slow, so I filed a RMA and got a new disk — a 870, since the 850 was obsolete.

The build

After receiving all of my SSDs I was ready to install and configure, let’s get to it!

I mounted all the SSDs on trays and put them in the disk bay. A quick lsblk to see that they were all present:

$ lsblk

sdr     65:16   0 465.8G  0 disk
├─sdr1  65:17   0 465.8G  0 part
└─sdr9  65:25   0     8M  0 part
sds     65:32   0 465.8G  0 disk
sdt     65:48   0 465.8G  0 disk
sdu     65:64   0 465.8G  0 disk

Preparing the disks

Cool, now to figure out which was which:

$ sudo smartctl -i /dev/sdr

=== START OF INFORMATION SECTION ===
Device Model:     WDC  WDS500G2B0A-00SM50

$ sudo smartctl -i /dev/sds

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 870 EVO 500GB

$ sudo smartctl -i /dev/sdt

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 860 EVO 500GB

$ sudo smartctl -i /dev/sdu

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 860 EVO 500GB

My sdr had some data from earlier, let’s wipe that:

$ sudo wipefs --all /dev/sdr
/dev/sdr: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/sdr: 8 bytes were erased at offset 0x7470c05e00 (gpt): 45 46 49 20 50 41 52 54
/dev/sdr: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/sdr: calling ioctl to re-read partition table: Success

$ lsblk

sdr     65:16   0 465.8G  0 disk
sds     65:32   0 465.8G  0 disk
sdt     65:48   0 465.8G  0 disk
sdu     65:64   0 465.8G  0 disk

Creating the pool

Nice, now they are all clean and ready to go!

Let’s create the pool, but do a dry-run (-n) first:

$ sudo zpool create -n spool0 -o ashift=12 \
    mirror /dev/disk/by-id/ata-WDC_WDS500G2B0A-00SM50_xxxxxxxxxxxx /dev/disk/by-id/ata-Samsung_SSD_870_EVO_500GB_xxxxxxxxxxxxxxx \
    mirror /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_xxxxxxxxxxxxxxx /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_xxxxxxxxxxxxxxx

would create 'spool0' with the following layout:

	spool0
	  mirror
	    ata-WDC_WDS500G2B0A-00SM50_191005A00184
	    ata-Samsung_SSD_870_EVO_500GB_S62BNJ0NC00646F
	  mirror
	    ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09399X
	    ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09451Z

Looks good — repeat the comment, but without the -n:

  pool: spool0
 state: ONLINE
  scan: none requested
config:

	NAME                                               STATE     READ WRITE CKSUM
	spool0                                             ONLINE       0     0     0
	  mirror-0                                         ONLINE       0     0     0
	    ata-WDC_WDS500G2B0A-00SM50_191005A00184        ONLINE       0     0     0
	    ata-Samsung_SSD_870_EVO_500GB_S62BNJ0NC00646F  ONLINE       0     0     0
	  mirror-1                                         ONLINE       0     0     0
	    ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09399X  ONLINE       0     0     0
	    ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09451Z  ONLINE       0     0     0

Our pool is created, now for compression:

$ sudo zfs set compression=lz4 spool0
$ sudo zfs get compression spool0

NAME    PROPERTY     VALUE     SOURCE
spool0  compression  lz4       local

And set a mount point:

$ sudo zfs set mountpoint=/srv/spool0 spool0
$ zfs list

NAME                        USED  AVAIL  REFER  MOUNTPOINT
spool0                      372K   899G    96K  /srv/spool0

Create a data set, and set myself as the owner:

$ sudo zfs create spool0/home

NAME                        USED  AVAIL  REFER  MOUNTPOINT
spool0                      468K   899G    96K  /srv/spool0
spool0/home                  96K   899G    96K  /srv/spool0/home

$ sudo chown hebron:hebron /srv/spool0/home/

Great! The pool is configured. It contains two mirrors, with a total of 1 TB usable space (ish).

Disappointment

I should be enjoying my new SSDs pool at this points, but as the heading implies — that is not what happened. When I started copying data to the pool, errors popped up in dmesg:

[ 1252.513717] ata4.00: failed command: WRITE FPDMA QUEUED
[ 1294.501815] ata6.00: failed command: WRITE FPDMA QUEUED

Fuck… 🤬

Samsung firmware

Some googling revealed that this was a problem with the Samsung firmware, claiming to have TRIM support, when it in fact does not.

A solution to this was apparently to disable NCQ support for the affected drives. I figured I’d turn it off for all my Samsung SSDs, so first to find what ata ports they were connected to:

~$ lsscsi

[4:0:0:0]    disk    ATA      Samsung SSD 870  1B6Q  /dev/sds
[5:0:0:0]    disk    ATA      Samsung SSD 860  4B6Q  /dev/sdt
[6:0:0:0]    disk    ATA      Samsung SSD 860  4B6Q  /dev/sdu

$ dmesg | grep NCQ
[    1.548626] ,NCQ
[    1.735877] ata3.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.736197] ata1.00: NCQ Send/Recv Log not supported
[    1.736203] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.736677] ata5.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.739051] ata4.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.739298] ata1.00: NCQ Send/Recv Log not supported
[    1.740838] ata6.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)

ata4, ata5 and ata6. Let’s disable NCQ on those ports with a kernel parameter:

$ sudo vim /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="libata.force=4.00:noncq,5.00:noncq,6.00:noncq"

$ sudo update-grub
$ sudo reboot

And to verify that the parameters had any affect:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.15.0-135-generic root=UUID=xxxx ro libata.force=4.00:noncq,5.00:noncq,6.00:noncq

$ dmesg | grep NCQ
[    1.563357] ,NCQ
[    1.756057] ata1.00: NCQ Send/Recv Log not supported
[    1.756062] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.758620] ata1.00: NCQ Send/Recv Log not supported
[    1.759741] ata3.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.765980] ata4.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
[    1.769174] ata6.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
[    1.769287] ata5.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)

Alright, NCQ not used for our Samsung SSDs. Let’s try some more copying…

[   49.509194] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
[   49.509207] ata4.00: irq_stat 0x08000000, interface fatal error
[   49.509215] ata4: SError: { UnrecovData Handshk }
[   49.509223] ata4.00: failed command: WRITE DMA EXT
[   49.509236] ata4.00: cmd 35/00:b0:50:95:15/00:00:1d:00:00/e0 tag 0 dma 90112 out
                        res 50/00:00:4f:95:15/00:00:1d:00:00/e0 Emask 0x10 (ATA bus error)
[   49.509248] ata4.00: status: { DRDY }

Still issues

Damn… Different error this time, but still an error — on ata4. Let’s try and swap the lower left (ata4) and lower right (ata6) trays.

[   47.101223] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
[   47.101229] ata6.00: irq_stat 0x08000000, interface fatal error
[   47.101232] ata6: SError: { UnrecovData Handshk }
[   47.101236] ata6.00: failed command: WRITE DMA EXT
[   47.101241] ata6.00: cmd 35/00:08:00:2a:00/00:00:2e:00:00/e0 tag 1 dma 4096 out
                        res 50/00:00:ff:29:00/00:00:2e:00:00/e0 Emask 0x10 (ATA bus error)
[   47.101246] ata6.00: status: { DRDY }

So the error moved from ata4 to ata6. Probably not an issue with the disk bay them.

More errors on ata6 in dmesg:

[   47.101250] ata6: hard resetting link
[   47.416539] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   47.420881] ata6.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   47.420883] ata6.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   47.420885] ata6.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   47.421141] ata6.00: supports DRM functions and may not be fully accessible
[   47.423520] ata6.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   47.423522] ata6.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   47.423523] ata6.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   47.423742] ata6.00: supports DRM functions and may not be fully accessible
[   47.425234] ata6.00: configured for UDMA/133

BIOS or firmware?

Configured for UDMA/133? What the hell does that mean.

I booted to the BIOS setup, verified that the storage controller was set to AHCI and checked for BIOS upgrades. No new BIOS available…

I then checked if there was any new firmware available for the Samsung 860 and 870 disks — there was not 😕

Initially I got failed command: WRITE FPDMA QUEUED on both ata4 and ata6. But now only the 870 seems to report problems. Maybe there is more than one issue at play here.

I tried copying more files, same error pops up in demsg:

[  229.158053] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
[  229.158182] ata6.00: irq_stat 0x08000000, interface fatal error
[  229.158274] ata6: SError: { UnrecovData Handshk }
[  229.158349] ata6.00: failed command: WRITE DMA EXT
[  229.158431] ata6.00: cmd 35/00:48:f8:b9:80/00:07:2e:00:00/e0 tag 8 dma 954368 out
                        res 50/00:00:67:82:80/00:00:2e:00:00/e0 Emask 0x10 (ATA bus error)
[  229.158653] ata6.00: status: { DRDY }

Desperation

I tried to recreate the pool but with ashift=9, that didn’t make any difference.

The 870 actually got kicked out of the pool, because it had too many errors.

  pool: spool0
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
	repaired.
  scan: resilvered 19.1G in 0h1m with 0 errors on Mon Feb  1 21:54:17 2021
config:

	NAME                                               STATE     READ WRITE CKSUM
	spool0                                             DEGRADED     0     0     0
	  mirror-0                                         DEGRADED     0     0     0
	    ata-WDC_WDS500G2B0A-00SM50_191005A00184        ONLINE       0     0     0
	    ata-Samsung_SSD_870_EVO_500GB_S62BNJ0NC00646F  FAULTED      0     0     0  too many errors
	  mirror-1                                         ONLINE       0     0     0
	    ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09399X  ONLINE       0     0     0
	    ata-Samsung_SSD_860_EVO_500GB_S4XBNJ0NB09451Z  ONLINE       0     0     0

New test; destroy the ZFS pool and wipe the disks.

Create an mdadm RAID 10 array instead:

$ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/sdr /dev/sds /dev/sdt /dev/sdu

Nope, still errors in dmesg when copying. But now the errors has moved back to ata4?!

[ 1880.671647] ata4: limiting SATA link speed to 3.0 Gbps
[ 1880.671650] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
[ 1880.671688] ata4.00: irq_stat 0x08000000, interface fatal error
[ 1880.671715] ata4: SError: { UnrecovData Handshk }
[ 1880.671744] ata4.00: failed command: WRITE DMA EXT
[ 1880.671773] ata4.00: cmd 35/00:00:00:10:95/00:0a:0b:00:00/e0 tag 7 dma 1310720 out
                        res 50/00:00:7f:8c:95/00:00:0b:00:00/e0 Emask 0x10 (ATA bus error)
[ 1880.671845] ata4.00: status: { DRDY }
[ 1880.671865] ata4: hard resetting link
[ 1880.986864] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 1880.990693] ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1880.990699] ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1880.990704] ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1880.991091] ata4.00: supports DRM functions and may not be fully accessible
[ 1880.994883] ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 1880.994889] ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 1880.994893] ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 1880.995166] ata4.00: supports DRM functions and may not be fully accessible
[ 1880.996793] ata4.00: configured for UDMA/133
[ 1880.996813] ata4: EH complete

Goddammit! I’m going to bed. 😫

To be continued…

Last commit 2024-04-05, with message: Tag cleanup.

Homelab ZFS File server

ZFS SSD pool series

All posts in ZFS SSD pool series

My anticlimactic ZFS SSD pool story
My ZFS SSD pool seems to be working!

My ZFS SSD pool seems to be working! →