benchmarking your disks

March 17th, 2017

 

While at Delphix, we did a lot of storage benchmarking. The I/O response times of Delphix depends, as one would logically imagine, heavily on the underlying disks. Sure Delphix can cache a lot ( with 1 TB of ram and 3x compression that’s 3TB and that 3TB can be shared by 10 or a 100 copies being the equivalent to 30TB or 300TB of databases) but really there will always be important I/O coming from the storage subsystem.

Now Delphix mainly runs databases loads, so the best test for storage that is hooked up to Delphix is to benchmark the storage I/O for a database workload. Two questions arise

  • What tool can benchmark the I/O?
  • What is a typical database I/O workload?

For the tool, the clear answer seems to be fio which not only is quite flexible but has an active community and updates and was started by the renown Linux I/O master Jens Axboe and still is actively maintained by him.

Now for database workloads there are 4 types of I/O

  1. Random single block reads (typically lookups by index)
  2. Large multi block reads (typically full table scans)
  3. Small sequential I/Os to transaction log files
  4. Large sequential  I/Os totransaction log files (generally when commits are infrequent and change rate is high)

Now in a database, all of these can be done concurrently by multiple processes concurrently and thus we need to benchmark for concurrency as well.

Thus the matrix is the 4 types of I/Os to be tested by different number of concurrent users.

Now fio doesn’t come with such prepackaged I/O benchmarks, so I created a script fio.sh to run configurable database I/O benchmarks.

The code and examples are on github at

Prerequisite is to have a binary of the fio command ready.

Download the source at

and compile it.

Here is an example running the script (-b  followed by full path to fio binary , -w followed by the directory to create a temporary large file for I/O testing)

  $  fio.sh -b `pwd`/fio.opensolaris -w /domain0/fiotest   

configuration: 
    binary=/home/oracle/fiodir/fio.opensolaris
    work directory=/domain0/fiotest   
    output directory=/home/oracle/fiodir
    tests=readrand read write
    direct=1
    seconds=60
    megabytes=65536
    custom users=-1
    custom blocksize=-1
    recordsize =8k
    filename (blank if multiple files)="filename=fiodata"
    size per file of multiple files=""
proceed?
y
CREATE 65536 MB file /home/oracle/fiodir/workdir/fiodata

creating 10 MB  seed file of random data

20480+0 records in
20480+0 records out
10485760 bytes (10 MB) copied, 1.17997 seconds, 8.9 MB/s

creating 65536 MB of random data on 

............................................................. 64926 MB remaining  300 MB/s 216 seconds left
............................................................. 64316 MB remaining  600 MB/s 107 seconds left
............................................................. 63706 MB remaining  300 MB/s 212 seconds left
............................................................. 63096 MB remaining  300 MB/s 210 seconds left
............................................................. 62486 MB remaining  200 MB/s 312 seconds left
............................................................. 61876 MB remaining  100 MB/s 618 seconds left
............................................................. 61266 MB remaining  35 MB/s 1750 seconds left
............................................................. 60656 MB remaining  300 MB/s 202 seconds left
............................................................. 60046 MB remaining  150 MB/s 400 seconds left
............................................................. 59436 MB remaining  75 MB/s 792 seconds left
............................................................. 58826 MB remaining  75 MB/s 784 seconds left
............................................................. 58216 MB remaining  85 MB/s 684 seconds left
............................................................. 57606 MB remaining  75 MB/s 768 seconds left
............................................................. 56996 MB remaining  75 MB/s 759 seconds left
............................................................. 56386 MB remaining  85 MB/s 663 seconds left

(more output)

test  users size         MB       ms  IOPS    50us   1ms   4ms  10ms  20ms  50ms   .1s    1s    2s   2s+
    read  1   8K r   28.299    0.271  3622           99     0     0     0
    read  1  32K r   56.731    0.546  1815           97     1     1     0     0           0
    read  1 128K r   78.634    1.585   629           26    68     3     1     0           0
    read  1   1M r   91.763   10.890    91                 14    61    14     8     0     0
    read  8   1M r   50.784  156.160    50                              3    25    31    38     2
    read 16   1M r   52.895  296.290    52                              2    24    23    38    11
    read 32   1M r   55.120  551.610    55                              0    13    20    34    30
    read 64   1M r   58.072 1051.970    58                                    3     6    23    66     0
randread  1   8K r    0.176   44.370    22      0     1     5     2    15    42    20    10
randread  8   8K r    2.763   22.558   353            0     2    27    30    30     6     1
randread 16   8K r    3.284   37.708   420            0     2    23    28    27    11     6
randread 32   8K r    3.393   73.070   434                  1    20    24    25    12    15
randread 64   8K r    3.734  131.950   478                  1    17    16    18    11    33
   write  1   1K w    2.588    0.373  2650           98     1     0     0     0
   write  1   8K w   26.713    0.289  3419           99     0     0     0     0
   write  1 128K w   11.952   10.451    95           52    12    16     7    10     0     0           0
   write  4   1K w    6.684    0.581  6844           90     9     0     0     0     0
   write  4   8K w   15.513    2.003  1985           68    18    10     1     0     0     0
   write  4 128K w   34.005   14.647   272            0    34    13    25    22     3     0
   write 16   1K w    7.939    1.711  8130           45    52     0     0     0     0     0     0
   write 16   8K w   10.235   12.177  1310            5    42    27    15     5     2     0     0
   write 16 128K w   13.212  150.080   105                  0     0     3    10    55    26     0     2

What we see is

  • test – the test we are running either randread, write or read
  • users – number of concurrent users
  • size – size of I/O requests. Databases typically request 8kb  at a time
  • MB – MB per second throughput
  • ms – average latency
  • min – min latency(not shown here)
  • max – max latency (not shown here)
  • std – standard deviation on latency (not shown here)
  • IOPS – I/O operations per second
  • 50us 1ms 4ms 10ms 20ms 50ms .1s 1s 2s 2s+ – histogram of number of I/Os faster than heading value

 

This can be useful to just run on your laptop.

This summer I bought a used Mac laptop that had something called Hybrid SSD. I had been using a Mac with preinstalled SSD disks and thought the Hybrid would be similar response wise, but once I started using it, there was something cleary wrong, but before sending it back I wanted some empirical proof, so I ran fio.sh.

Here is the comparison

SSD - came with the Mac

test  	users size      MB       ms      min      max      std    IOPS 
randread    1   8K  32.684    0.234    0.002    9.393    0.144   4183,
randread    8   8K 240.703    0.257    0.001    2.516    0.137  30810,
randread   16   8K 372.503    0.333    0.001    1.994    0.185  47680,
randread   32   8K 478.863    0.520    0.001    5.281    0.294  61294,
randread   64   8K 476.948    1.045    0.001   11.564    0.582  61049,

SSHD - hybrid SSD installed after market

test  	users size      MB       ms      min      max      std    IOPS 
randread    1   8K   0.533   14.608    0.005  138.783    8.989     68,
randread    8   8K   0.767   80.769    0.035  256.965   53.891     98,
randread   16   8K   0.801  152.982    0.012  331.538   63.256    102,
randread   32   8K   0.810  298.122    0.015  519.073   79.781    103,
randread   64   8K   0.796  590.696    0.030  808.146  143.490    101,

(full list of SSD vs HSSD on my Macs at https://github.com/khailey/fio_scripts/blob/master/macssd)

The hybrid is atrocious compared to the SSD.

The random read is 14.6 ms which is the speed of a slow HDD.
A 7K RPM HDD should respond at under 10ms.
A 15K RPM HDD should respond at around 6ms.
An SSD on a 2 year old Mac responds at 0.23 ms.

Its nice to just have an easy to run script to test out storage.
Here is my linux box

test  users size         MB       ms      min      max      std    IOPS 
                                            
randread  1   8K r   14.417    0.517    0.005    8.922    0.382    1845
randread  8   8K r   26.497    2.355    0.004   12.668    0.790    3391
randread 16   8K r   24.631    5.069    0.004   15.168    1.080    3152
randread 32   8K r   24.726   10.101    0.005   32.042    2.124    3164
randread 64   8K r   24.899   20.051    0.005   37.782    4.171    3187

On my Linux desktop you can see how the MB/sec throughput maxes out about 26 MB/sec and after that latency just goes down proportionally as we add more concurrency.

The github repository all has r scripts to visualize the data (see the readme in github for details on how to generate the graphics)

 

skytap1_randread_bs_8K

Here is an explanation of the graphics.

graph_key

 

There are a number of factors that are important when benchmarking I/O such as whether using Direct I/O or not, what the size of caching is on the host running fio, what the back end storage  cache size is, what the size is of the file used to test I/O, how that file is initialized other with 0’s, or  patterned data, or random data, whether the file system compresses or not, etc. Check out this blog post for some anomalies and surprises: http://datavirtualizer.com/lies-damned-lies-and-io-statistics/


Uncategorized

  1. Trackbacks

  1. Comments

  2. David Johnson
    | #1

    Are you forgetting that default settings for each instance is to configure processor affinity mask and I/O affinity mask automatically for all processors on the host server.
    In SQL Server Management Studio you cannot configure both processor affinity and I/O affinity for the same processor.


2 × = six