Orion I/O calibration tool bug

July 14th, 2014

I use fio for all my I/O testing. Why not Orion from Oracle since almost all of my I/O testing and benchmarking has been geared toward Oracle? Several reasons

fio is

  • super flexible – able to configure it for almost all types of test
  • active community – updates almost every week, many by Jens Axobe (who wrote much of the Linux I/O layer)
  • reliable – if there are problems, it’s open source and one can discuss on the fio commuity email list
  • easy to distribute – just one executable, doesn’t require getting for example a full Oracle distribution

Orion on the other hand unfortunately has had some problems that have made it too undependable for me to trust.

In some cases Orion re-reads the same blocks covering a much smaller data set size than requested.The following strange behavior is with orion on X86 Solaris. The orion binary was from an 11g distribution. The root of the strange behavior is that orion seems to revisit the same blocks over and over when doing it’s random read testing.

A dtrace script was used to trace which blocks orion was reading. The blocks in the test were on /domain.

    #!/usr/sbin/dtrace -s
    #pragma D option quiet
    ::zfs_read:entry
    / strstr((args[0])->v_path, "/domain") != NULL /
    {  printf("%lld\n", args[1]->_uio_offset._f); }
Steps:
  1. Created a 96GB file and put it’s path in /domain/mytest.lun
  2. Modified io.d to filter for /domain .
  3. Ensure no non-orion I/O is going to the filesystem.
  4. Start running io.d > blocks-read.txt
  5. Kicked off orion with:
    export LD_LIBRARY_PATH=.  
    ./orion -testname mytest -run advanced -matrix row -num_disks 5 -cache_size 51200 \
               -duration 60 -simulate raid0 -write 0 -num_large 0

-run advanced : users can specify customizations
-matrix row : only small random I/O
-num_disks 5 : actual number of physical disks in test. Used to generate a range of loads
-cache_size 51200 : defines a warmup period
-duration 60 : duration of each point
-simulate raid0 : simulate striping across all the LUNs specified. There is only one LUN in this test
-write 0 : percentage of I/O that is write, which is zero in this test
-num_large 0 : maximum outstanding I/Os for large Random I/O. There is no large random I/O in this test.

Once the test is finished, stopped the dtrace script io.d .

Example output from a run
   ORION VERSION 11.2.0.1.0
   Command line:
   -testname mytest -run advanced -matrix row -num_disks 5 -cache_size 51200 -duration 60 
   -simulate raid0 -write 0 -num_large 0 

   These options enable these settings:
   Test: mytest
   Small IO size: 8 KB
   Large IO size: 1024 KB
   IO types: small random IOs, large random IOs
   Sequential stream pattern: RAID-0 striping for all streams
   Writes: 0%
   Cache size: 51200 MB
   Duration for each data point: 60 seconds
   Small Columns:,      0,      1,      2,      3,      4,      5,      6,      7,      8,      9,     10,     11,     12, 
                       13,     14,     15,     16,     17,     18,     19,     20,     21,     22,     23,     24,     25
   Large Columns:,      0
   Total Data Points: 26

   Name: /domain0/group0/external/lun96g	Size: 103079215104
   1 files found.

   Maximum Small IOPS=62700 @ Small=16 and Large=0
   Minimum Small Latency=81.81 usecs @ Small=2 and Large=0
Things look wrong right away.
The average latency is in 100s microseconds (above the fastest minute was average of 81us) over a file that is 96G which is twice as big as the cache of 48G.
The max throughput was 489MB/s
Total blocks read
    # wc -l blocks-touched.txt
    78954834 blocks-touched.txt
Unique blocks read
    # sort blocks-touched.txt | uniq -c | sort -rn > block-count.txt
    # wc -l block-count.txt
    98305 block-count.txt
We only hit 98,305 unique offsets in the file yet a 96GB file has 12,582,912 unique 8k offsets.
The unique block hits totals around 768 MB of data which is easily cached.
Blocks  access frequency
    # tail block-count.txt
    695 109297664
    694 34532360192
    693 76259328
    693 34558271488
The least frequently hit blocks were hit almost 700 times and the average was over 800 yet there were 78,954,834 block access in a file of
12,582,912 unique blocks , so the average should have been about 6 hits per block.

This may be caused by having multiple steams starting from the beginning of the  file  or at the same “random” offset every test duration of 60 seconds. I’m not sure. If this is the case, the only work around would be to increase the duration to an amount of time that would insure kicking out most of the blocks from the beginning of the test. If each thread starts out at the same location and reads the same set of “random” blocks, then there is no workaround. Ideally I’d want each stream to be starting from a different random location and reading a different set of random blocks.

 


Uncategorized

  1. Trackbacks

  1. Comments

  2. | #1

    > If the datafile was on “/tmp” it worked fine but if it was on my NFS mount it failed with the above error. Hmm – doesn’t work over NFS?

    Hi Kyle,

    Orion is sensitive to NFS mount options. What options were in place when you got the error?

    Also, haev you ever explored the randomness of SLOB using dtrace? I was surprised to read that you use fio for “all” your I/O testing. A database is also a good tool to test for database I/O suitability :) SLOB!

  3. khailey
    | #2

    Hi Kevin

    Thanks for stopping by. Oh yes, you are quite right. The other half of our I/O benchmarking equation is SLOB!
    As far as exploring the randomness of SLOB with DTrace, no haven’t yet. Will put that down on the interesting things to do list.

    As far as NFS testing with Orion it’s just water under the bridge. I’ve forgotten the NFS mount parameters we used, though they would be along the lines of what Oracle recommends. At this point I have no plans to go back and test out Orion any more.


6 × four =