Weirdly variable USB performance - from 2MB/s to 400MB/s

Running through a suite of benchmark tests on the MNT Reform 2, and I’ve come across the strangest result: USB performance that bounces all over the place.

Test setup: A USB 3.0 SSD, the same one I always use, connected to one of the Reform’s USB ports. Drive is formatted ext4, mounted, and FIO is run with the following settings:

sudo fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=read --size=500m --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting

The drive’s capable of about 400MB/s, give-or-take - I last ran the benchmark on a Helios64 at 413MB/s read, and before that on a Raspberry Pi 400 at 362MB/s read.

The Reform… well, take a look:

Starting 1 process
Jobs: 1 (f=1): [R(1)][11.7%][r=109MiB/s][r=109 IOPS][eta 00m:53s]
Jobs: 1 (f=1): [R(1)][21.7%][r=118MiB/s][r=118 IOPS][eta 00m:47s]
Jobs: 1 (f=1): [R(1)][31.7%][r=14.0MiB/s][r=14 IOPS][eta 00m:41s]
Jobs: 1 (f=1): [R(1)][41.7%][r=162MiB/s][r=162 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [R(1)][51.7%][r=49.0MiB/s][r=49 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [R(1)][61.7%][r=77.1MiB/s][r=77 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [R(1)][71.7%][r=351MiB/s][r=351 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [R(1)][81.7%][r=206MiB/s][r=206 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [R(1)][91.7%][r=14.0MiB/s][r=14 IOPS][eta 00m:05s]
Jobs: 1 (f=0): [f(1)][100.0%][r=36.0MiB/s][r=36 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=7582: Thu Sep  9 11:33:33 2021
  read: IOPS=85, BW=85.7MiB/s (89.9MB/s)(5214MiB/60819msec)
    slat (usec): min=41, max=371996, avg=11271.27, stdev=41497.21
    clat (msec): min=23, max=2628, avg=360.97, stdev=499.40
     lat (msec): min=25, max=2752, avg=372.24, stdev=510.39
    clat percentiles (msec):
     |  1.00th=[   58],  5.00th=[   73], 10.00th=[   79], 20.00th=[   81],
     | 30.00th=[   83], 40.00th=[   86], 50.00th=[  102], 60.00th=[  159],
     | 70.00th=[  253], 80.00th=[  575], 90.00th=[ 1183], 95.00th=[ 1653],
     | 99.00th=[ 2022], 99.50th=[ 2089], 99.90th=[ 2467], 99.95th=[ 2635],
     | 99.99th=[ 2635]
   bw (  KiB/s): min= 2052, max=396518, per=99.88%, avg=87680.17, stdev=102817.14, samples=121
   iops        : min=    2, max=  387, avg=85.59, stdev=100.35, samples=121
  lat (msec)   : 50=0.65%, 100=49.04%, 250=19.91%, 500=8.84%, 750=4.64%
  lat (msec)   : 1000=4.78%, 2000=11.05%, >=2000=1.09%
  cpu          : usr=0.14%, sys=1.36%, ctx=2583, majf=0, minf=554
  IO depths    : 1=0.2%, 2=0.4%, 4=0.8%, 8=1.7%, 16=3.4%, 32=93.5%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.2%, 64=0.0%, >=64=0.0%
     issued rwts: total=5214,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=85.7MiB/s (89.9MB/s), 85.7MiB/s-85.7MiB/s (89.9MB/s-89.9MB/s), io=5214MiB (5467MB), run=60819-60819msec

Disk stats (read/write):
  sda: ios=10386/0, merge=0/0, ticks=3356212/0, in_queue=3356213, util=100.00%

It finishes at 89.9MB/s, which is way too low - but take a look at how it gets there: the highest throughput recorded is 396,518kiB/s, which is roughly what I’d expect to see; the lowest is just 2,025kiB/s.

I’ve never seen results like it. Oddly, there isn’t a single error to be seen. The only relevant lines:

[Thu Sep  9 11:27:52 2021] usb 2-1.2: new SuperSpeed Gen 1 USB device number 3 using xhci-hcd
[Thu Sep  9 11:27:52 2021] usb 2-1.2: New USB device found, idVendor=26bd, idProduct=9917, bcdDevice= 1.00
[Thu Sep  9 11:27:52 2021] usb 2-1.2: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[Thu Sep  9 11:27:52 2021] usb 2-1.2: Product: Portable SSD 3.0
[Thu Sep  9 11:27:52 2021] usb 2-1.2: Manufacturer: Integral
[Thu Sep  9 11:27:52 2021] usb 2-1.2: SerialNumber: 5111905021822DD30026
[Thu Sep  9 11:27:52 2021] scsi host0: uas
[Thu Sep  9 11:27:52 2021] scsi 0:0:0:0: Direct-Access     Integral Portable SSD 3.0 0    PQ: 0 ANSI: 6
[Thu Sep  9 11:27:52 2021] sd 0:0:0:0: Attached scsi generic sg0 type 0
[Thu Sep  9 11:27:52 2021] sd 0:0:0:0: [sda] Spinning up disk...
[Thu Sep  9 11:27:53 2021] .ready
[Thu Sep  9 11:27:53 2021] sd 0:0:0:0: [sda] 234441648 512-byte logical blocks: (120 GB/112 GiB)
[Thu Sep  9 11:27:53 2021] sd 0:0:0:0: [sda] Write Protect is off
[Thu Sep  9 11:27:53 2021] sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
[Thu Sep  9 11:27:53 2021] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[Thu Sep  9 11:27:53 2021] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes
[Thu Sep  9 11:27:53 2021]  sda: sda1
[Thu Sep  9 11:27:53 2021] sd 0:0:0:0: [sda] Attached SCSI disk
[Thu Sep  9 11:28:44 2021] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

Any ideas?

Something I’m curious about having looked at the schematics, but try running the test with the SSD in the USB 3 port closest to the HDMI port. That one (J17) connects directly to the SoC, while the other two are on a USB 3 hub shared with the trackball and keyboard. Curious if that changes anything.

I don’t have a USB SSD to test but I can test against a USB 3 external hard drive to see if it fluctuates for me. Something else to make sure as little else is running on the Reform, it’s possible that it could cause dips as it isn’t the fastest core.

Ah, good shout - I was using the USB port nearest the front edge. I’ll test again with the furthest one.

I’m running all these tests without the desktop loaded, so it’s as clean as I can easily make it!

Woof, there’s a difference: using the USB port nearest the HDMI port, I’m seeing an average of 419MB/s - and a minimum throughput of 370,688kiB/s. So, two orders of magnitude better.

I’ll shift back to the front USB port for a sanity check, just in case a reboot was what fixed it(!)…

So, I’m back in the front USB port… and I’m seeing 402MB/s. Stable. No massive variance like before - a test I repeated three times to make sure I wasn’t seeing things.

So a reboot did fix it. The question is: fix what?

I’ll keep an eye on it, retest every now and again - the system had been running for a quite a while when I tested it the first time, so it could be uptime or temperature related - but for now it looks like it was Just One Of Those Things.