When dealing with disks and I/O things on Linux, you’d regularly run commands like lsblk
, lsscsi
, nvme list
, etc. All of them tend to report a different set of information, so I ended up running multiple commands and correlating their output based on the device name or number.
And then I had to run commands like these, to get extra info about the current OS-level configuration settings for specific disks:
grep . /sys/class/block/sd*/device/queue_depth
grep . /sys/class/block/*/queue/nr_requests
The above commands would show the hardware-advertised device queue depth and the OS block device level (software) queue depth.
I finally had enough and created a single Python program lsds for showing all interesting bits for a disk in one place. The lsds tool does not execute any other commands under the hood, it just reads the relevant info from sysfs /sys/class/blocks/...
directories directly.
See my 0x.tools toolset for more Linux performance & troubleshooting tools.
Here’s an output from my machine with 21 SSDs in it. You may need to scroll right to see the full output:
$ lsds DEVNAME MAJ:MIN SIZE TYPE SCHED ROT MODEL QDEPTH NR_RQ WCACHE nvme0n1 259:4 931.5 GiB NVMeDisk none 0 Samsung SSD 9100 PRO 1TB - 1023 write back nvme10n1 259:21 465.8 GiB NVMeDisk none 0 Samsung SSD 980 PRO 500GB - 1023 write back nvme11n1 259:22 465.8 GiB NVMeDisk none 0 Samsung SSD 980 PRO 500GB - 1023 write back nvme12n1 259:9 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back nvme13n1 259:16 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back nvme14n1 259:8 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back nvme15n1 259:15 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back nvme16n1 259:12 1863.0 GiB NVMeDisk none 0 T-FORCE TM8FF1002T - 1023 write back nvme17n1 259:11 1863.0 GiB NVMeDisk none 0 T-FORCE TM8FF1002T - 1023 write back nvme18n1 259:13 1863.0 GiB NVMeDisk none 0 T-FORCE TM8FF1002T - 1023 write back nvme19n1 259:14 1863.0 GiB NVMeDisk none 0 T-FORCE TM8FF1002T - 1023 write back nvme1n1 259:1 931.5 GiB NVMeDisk none 0 Samsung SSD 9100 PRO 1TB - 1023 write back nvme20n1 259:3 1863.0 GiB NVMeDisk none 0 Samsung SSD 990 PRO 2TB - 1023 write back nvme2n1 259:2 931.5 GiB NVMeDisk none 0 Samsung SSD 9100 PRO 1TB - 1023 write back nvme3n1 259:0 931.5 GiB NVMeDisk none 0 Samsung SSD 9100 PRO 1TB - 1023 write back nvme4n1 259:18 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back nvme5n1 259:23 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back nvme6n1 259:19 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back nvme7n1 259:20 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back nvme8n1 259:17 1788.5 GiB NVMeDisk none 0 SAMSUNG MZQL21T9HCJR-00A07 - 1023 write through nvme9n1 259:10 1397.3 GiB NVMeDisk none 0 INTEL SSDPE21D015TA - 1023 write through
All the above disks are NVMe SSDs, so the I/O scheduler (SCHED) is “none” and the rotational flag (ROT) is 0.
If you prefer narrower pivoted output, you can use the --pivot
option as you see below. I also added the --verbose
flag to explain where this tool got its information from:
$ lsds -pv DEVNAME NAME VALUE nvme0n1 MAJ:MIN 259:4 /sys/class/block/nvme0n1/dev nvme0n1 SIZE 931.5 GiB /sys/class/block/nvme0n1/size * 512 nvme0n1 TYPE NVMeDisk devname, /sys/class/block/nvme0n1/partition nvme0n1 SCHED none /sys/class/block/nvme0n1/queue/scheduler nvme0n1 ROT 0 /sys/class/block/nvme0n1/queue/rotational nvme0n1 MODEL Samsung SSD 9100 PRO 1TB /sys/class/block/nvme0n1/device/model nvme0n1 QDEPTH - /sys/class/block/nvme0n1/device/queue_depth (N/A for NVMe) nvme0n1 NR_RQ 1023 /sys/class/block/nvme0n1/queue/nr_requests nvme0n1 WCACHE write back /sys/class/block/nvme0n1/queue/write_cache ... nvme9n1 MAJ:MIN 259:10 /sys/class/block/nvme9n1/dev nvme9n1 SIZE 1397.3 GiB /sys/class/block/nvme9n1/size * 512 nvme9n1 TYPE NVMeDisk devname, /sys/class/block/nvme9n1/partition nvme9n1 SCHED none /sys/class/block/nvme9n1/queue/scheduler nvme9n1 ROT 0 /sys/class/block/nvme9n1/queue/rotational nvme9n1 MODEL INTEL SSDPE21D015TA /sys/class/block/nvme9n1/device/model nvme9n1 QDEPTH - /sys/class/block/nvme9n1/device/queue_depth (N/A for NVMe) nvme9n1 NR_RQ 1023 /sys/class/block/nvme9n1/queue/nr_requests nvme9n1 WCACHE write through /sys/class/block/nvme9n1/queue/write_cache
This tool is somewhat customizable too, check the help section:
$ lsds -h usage: lsds [-h] [-c COLUMN [COLUMN ...]] [-a COL1,COL2,...] [-l] [-v] [-p] [-r] lsds: List Linux block devices v1.0.0 by Tanel Poder [0x.tools] options: -h, --help show this help message and exit -c, --columns COLUMN [COLUMN ...] Specify which columns to display. Overrides defaults. Default: DEVNAME MAJ:MIN SIZE TYPE SCHED ROT MODEL QDEPTH NR_RQ WCACHE -a, --add COL1,COL2,... Comma-separated list of columns to add to the default list. -l, --list List all available column names and exit. -v, --verbose Show the source file path or derivation for each value. -p, --pivot Pivot output: print each device/column value on a separate line. -r, --realpath Show /sys source file real path instead of symlink. Reads data directly from sysfs, does not execute external commands.
To see all currently available fields, you can use the --list
option:
$ lsds -l CAP DAX DEVNAME DISCARD DISC_GRAN DISC_MAX DISC_MAXHW FUA HWSEC INFLIGHT IOPOLL IOPOLL_DEL IOSTATS LOGSEC MAJ:MIN MODEL NR_RQ NVME_QDEPTH P2P_QUEUES PHYSEC QDEPTH RANDOM REMOVABLE RO ROT SCHED SIZE TIMEOUT TYPE VENDOR WBT_LAT WCACHE
You can add the pivot and verbose flags to the --list
option too, to see where it would lookup the values for each field:
$ lsds -lpv CAP {dev_path}/capability DAX {dev_path}/queue/dax DEVNAME {dev_path} DISCARD {dev_path}/queue/discard_granularity DISC_GRAN {dev_path}/queue/discard_granularity DISC_MAX {dev_path}/queue/discard_max_bytes DISC_MAXHW {dev_path}/queue/discard_max_hw_bytes FUA {dev_path}/queue/fua HWSEC {dev_path}/queue/hw_sector_size INFLIGHT {dev_path}/inflight IOPOLL {dev_path}/queue/io_poll IOPOLL_DEL {dev_path}/queue/io_poll_delay IOSTATS {dev_path}/queue/iostats LOGSEC {dev_path}/queue/logical_block_size MAJ:MIN {dev_path}/dev MODEL {dev_path}/device/model NR_RQ {dev_path}/queue/nr_requests NVME_QDEPTH {module_base}/nvme*/parameters/io_queue_depth P2P_QUEUES {dev_path}/device/num_p2p_queues PHYSEC {dev_path}/queue/physical_block_size QDEPTH {dev_path}/device/queue_depth (N/A for NVMe) RANDOM {dev_path}/queue/add_random REMOVABLE {dev_path}/removable RO {dev_path}/ro ROT {dev_path}/queue/rotational SCHED {dev_path}/queue/scheduler SIZE {dev_path}/size * {sector_size} TIMEOUT {dev_path}/queue/io_timeout TYPE devname, {dev_path}/partition VENDOR {dev_path}/device/vendor WBT_LAT {dev_path}/queue/wbt_lat_usec WCACHE {dev_path}/queue/write_cache
Let’s use a different machine with both spinning disks and NVMe SSDs for reporting the hardware sector size and NVMe device queue depth too. The NVMe queue depth is different from the normal SCSI device-advertized queue depth.
NB! (Errata) The NVME_QDEPTH unfortunately shows the Linux level global maximum qdepth per device that is just a Linux NVMe module level maximum setting and not an individual physical maximum capability of each device. As NVMe devices can have many submission/completion queues, namespaces and different queues can have different lengths, there’s no single /sys file that shows the total queue depth for an individual device. Usually the nvme id-ctrl command is used for running ioctl() calls against the device and for parsing the output. I might be able to modify my script to optionally run such commands and present the output as another column in this table, but for now the NVME_QDEPTH shows just a Linux level max per device, not individual device controller/namespace capacity.
$ lsds -a HWSEC,NVME_QDEPTH DEVNAME MAJ:MIN SIZE TYPE SCHED ROT MODEL QDEPTH NR_RQ WCACHE HWSEC NVME_QDEPTH nvme0n1 259:0 186.3 GiB NVMeDisk none 0 Micron_7400_MTFDKBA960TDZ - 1023 write through 4096 1024 nvme0n2 259:6 200.0 GiB NVMeDisk none 0 Micron_7400_MTFDKBA960TDZ - 1023 write through 4096 1024 nvme1n1 259:1 1863.0 GiB NVMeDisk none 0 Samsung SSD 990 PRO 2TB - 1023 write back 512 1024 nvme2n1 259:2 260.8 GiB NVMeDisk none 0 INTEL SSDPED1D280GA - 1023 write through 512 1024 sda 8:0 3726.0 GiB Disk mq-deadline 1 P9233 30 60 write back 4096 - sdb 8:16 3726.0 GiB Disk mq-deadline 1 P9233 30 60 write back 4096 - sdc 8:32 3726.0 GiB Disk mq-deadline 1 P9233 30 60 write back 4096 - sdd 8:48 3726.0 GiB Disk mq-deadline 1 P9233 30 60 write back 4096 -
As you see, the bottom 4 sd* devices are using the mq-deadline
I/O scheduler, the “rotational” flag is 1 and these disks are capable of holding 30 in-flight I/O requests in the device firmware level hardware queues. NVMe I/O handling works differently, when you scroll right, you see the NVME_QDEPTH showing that each SSD can handle up to 1024 simultaneously in-flight I/O requests in the SSD device controller level.
With enterprise-grade SSDs with PLP (controller DRAM based write cache has Power-Loss Protection) or Optane media that doesn’t even need write caching thanks to its low write latency, you’ll see some disks being in write through write caching mode. This is where I typically place my WAL, redo-logs and any other files requiring low latency durable writes.
$ lsds -a FUA,HWSEC,NVME_QDEPTH DEVNAME MAJ:MIN SIZE TYPE SCHED ROT MODEL QDEPTH NR_RQ WCACHE FUA HWSEC NVME_QDEPTH nvme0n1 259:0 186.3 GiB NVMeDisk none 0 Micron_7400_MTFDKBA960TDZ - 1023 write through 0 4096 1024 nvme0n2 259:6 200.0 GiB NVMeDisk none 0 Micron_7400_MTFDKBA960TDZ - 1023 write through 0 4096 1024 nvme1n1 259:1 1863.0 GiB NVMeDisk none 0 Samsung SSD 990 PRO 2TB - 1023 write back 1 512 1024 nvme2n1 259:2 260.8 GiB NVMeDisk none 0 INTEL SSDPED1D280GA - 1023 write through 0 512 1024 sda 8:0 3726.0 GiB Disk mq-deadline 1 P9233 30 60 write back 0 4096 - sdb 8:16 3726.0 GiB Disk mq-deadline 1 P9233 30 60 write back 0 4096 - sdc 8:32 3726.0 GiB Disk mq-deadline 1 P9233 30 60 write back 0 4096 - sdd 8:48 3726.0 GiB Disk mq-deadline 1 P9233 30 60 write back 0 4096 -
I’m also listing the support for Force Unit Access (FUA) flag, that allows writers to instruct the disk controller to immediately write the I/O payload to durable (NAND) media and not let it linger in the unprotected write cache, waiting for some fsync() operation that might happen later on.
In the output above, the Micron SSDs have PLP write cache, so they do not have to honor any FUA operations as their write cache is durable (thanks to capacitors on the devices, the SSD has enough time to flush the write cache to NAND in case of power loss). This assumes that the hardware and firmware work correctly of course, but this is the case with everything you rely on. This is the case with the SAMSUNG MZQL21T9HCJR-00A07 disk below too, it has capacitors and PLP. The INTEL SSDs in both outputs are Optane SSDs (sadly discontinued) - their 3D XPoint media is so fast that it doesn’t even need any write caching, every write goes immediately to durable media, no emergency cache flushing needed.
Therefore, both of these types of disks are in write through cache mode and FUA = 0, they do not support the FUA flag, they don’t need to honor it for achieving durability from the OS point of view:
$ lsds -a FUA,HWSEC,NVME_QDEPTH DEVNAME MAJ:MIN SIZE TYPE SCHED ROT MODEL QDEPTH NR_RQ WCACHE FUA HWSEC NVME_QDEPTH nvme0n1 259:4 931.5 GiB NVMeDisk none 0 Samsung SSD 9100 PRO 1TB - 1023 write back 1 512 1024 nvme10n1 259:21 465.8 GiB NVMeDisk none 0 Samsung SSD 980 PRO 500GB - 1023 write back 1 512 1024 nvme11n1 259:22 465.8 GiB NVMeDisk none 0 Samsung SSD 980 PRO 500GB - 1023 write back 1 512 1024 nvme12n1 259:9 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back 1 512 1024 nvme13n1 259:16 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back 1 512 1024 nvme14n1 259:8 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back 1 512 1024 nvme15n1 259:15 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back 1 512 1024 nvme16n1 259:12 1863.0 GiB NVMeDisk none 0 T-FORCE TM8FF1002T - 1023 write back 1 4096 1024 nvme17n1 259:11 1863.0 GiB NVMeDisk none 0 T-FORCE TM8FF1002T - 1023 write back 1 4096 1024 nvme18n1 259:13 1863.0 GiB NVMeDisk none 0 T-FORCE TM8FF1002T - 1023 write back 1 4096 1024 nvme19n1 259:14 1863.0 GiB NVMeDisk none 0 T-FORCE TM8FF1002T - 1023 write back 1 4096 1024 nvme1n1 259:1 931.5 GiB NVMeDisk none 0 Samsung SSD 9100 PRO 1TB - 1023 write back 1 512 1024 nvme20n1 259:3 1863.0 GiB NVMeDisk none 0 Samsung SSD 990 PRO 2TB - 1023 write back 1 512 1024 nvme2n1 259:2 931.5 GiB NVMeDisk none 0 Samsung SSD 9100 PRO 1TB - 1023 write back 1 512 1024 nvme3n1 259:0 931.5 GiB NVMeDisk none 0 Samsung SSD 9100 PRO 1TB - 1023 write back 1 512 1024 nvme4n1 259:18 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back 1 512 1024 nvme5n1 259:23 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back 1 512 1024 nvme6n1 259:19 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back 1 512 1024 nvme7n1 259:20 931.5 GiB NVMeDisk none 0 Samsung SSD 980 PRO 1TB - 1023 write back 1 512 1024 nvme8n1 259:17 1788.5 GiB NVMeDisk none 0 SAMSUNG MZQL21T9HCJR-00A07 - 1023 write through 0 4096 1024 nvme9n1 259:10 1397.3 GiB NVMeDisk none 0 INTEL SSDPE21D015TA - 1023 write through 0 512 1024
Here’s an output from a machine with a bunch of NVMe SSDs and one crappy (external) USB disk that can handle only one I/O at a time even at the disk controller level!
$ lsds DEVNAME MAJ:MIN SIZE TYPE SCHED ROT MODEL QDEPTH NR_RQ WCACHE nvme0n1 259:3 110.3 GiB NVMeDisk none 0 INTEL SSDPEK1A118GA - 1023 write through nvme10n1 259:4 894.3 GiB NVMeDisk none 0 INTEL SSDPE21D960GA - 1023 write through nvme1n1 259:1 110.3 GiB NVMeDisk none 0 INTEL SSDPEK1A118GA - 1023 write through nvme2n1 259:2 110.3 GiB NVMeDisk none 0 INTEL SSDPEK1A118GA - 1023 write through nvme3n1 259:0 110.3 GiB NVMeDisk none 0 INTEL SSDPEK1A118GA - 1023 write through nvme4n1 259:6 931.5 GiB NVMeDisk none 0 Samsung SSD 970 EVO Plus 1TB - 1023 write back nvme5n1 259:13 931.5 GiB NVMeDisk none 0 Samsung SSD 970 EVO Plus 1TB - 1023 write back nvme6n1 259:7 931.5 GiB NVMeDisk none 0 Samsung SSD 970 EVO Plus 1TB - 1023 write back nvme7n1 259:8 931.5 GiB NVMeDisk none 0 Samsung SSD 970 EVO Plus 1TB - 1023 write back nvme8n1 259:9 953.9 GiB NVMeDisk none 0 Samsung SSD 970 PRO 1TB - 1023 write back nvme9n1 259:5 354.0 GiB NVMeDisk none 0 INTEL SSDPEL1D380GA - 1023 write through sda 8:0 3726.0 GiB Disk mq-deadline 1 My Book 25ED 1 2 write through
And here’s an output from Linux running in a VMWare virtual machine. Even the CD-ROM block device is displayed:
$ lsds DEVNAME MAJ:MIN SIZE TYPE SCHED ROT MODEL QDEPTH NR_RQ WCACHE sda 8:0 58.6 GiB Disk mq-deadline 1 VMware Virtual S 32 64 write through sda1 8:1 1.0 GiB Part - - - - - - sda2 8:2 4.0 GiB Part - - - - - - sda3 8:3 53.6 GiB Part - - - - - - sr0 11:0 1.0 GiB BlockDev mq-deadline 1 VMware IDE CDR10 1 2 write through
I compiled a list of interesting fields in the /sys
directories, then fed it to Gemini for initial script creation. Then I went through the code, fixed a few issues that I found and then added some improvements using the good old manual coding method. You need to have at least Python 3.6 installed as this program uses Python f-strings. Have fun!