Stress Testing and Benchmarking


  1. Stress Testing and Benchmarking
    1. Software Used
      1. Linux
      2. Windows
    2. Running the Tests on Linux
      1. Prerequisites
      2. xdd
      3. iozone
      4. bonnie++
      5. tar/untar/md5sum
      6. Putting it all together

This document was written to test storage devices, make sure they work (in a very limited way) and benchmark them. The main target is the iSCSI Enterprise Target and I use it for sbp2 too, but it can be used for any storage device.
For the impationt: go directly here and see results here.

Software Used

Linux


  1. xdd (http://www.ioperformance.com or  http://209.98.16.6/)
    An excerpt from the documentation:
    Xdd is a tool for measuring and characterizing disk subsystem I/O on single systems and clusters of systems. It is a command-line based tool that grew out of the UNIX world and has been ported to run in Window s environments as well. It is designed to provide consistent and reproducible performance measurements of disk I/O traffic. There are three basic components to xdd that include the xdd program itself, a timeserver program, and a gettime program. The timeserver and gettime programs are used to synchronize the clocks of xdd programs simultaneously running across multiple computer systems.
  2. iozone (http://www.iozone.org/)
    An excerpt from the web page:
    IOzone is a filesystem benchmark tool. The benchmark generates and measures a variety of file operations.
  3. bonnie++ (http://www.coker.com.au/bonnie++/)
    From the web page:
    Bonnie++ is based on the Bonnie hard drive benchmark by Tim Bray. The most notable features that have been added are support for >2G of storage and testing operations involving thousands of files in a directory. This program is used by ReiserFS developers, but can be useful for anyone who wants to know how fast their hard drive or file system is.
  4. tar/untar test
    This is a small shell script (see below) which does the following:

Windows

For Windows initiators a good test seems to be the HCT "Designed for Windows" Testing which covers more, but includes iSCSI.
My Windows skills are rather poor, thus feedback would be welcome.

Running the Tests on Linux

Prerequisites

Compile all programs. Make sure the resulting binaries are in your path. Using Gentoo everything compiles without a hitch.

Make sure the to be tested filesystem/disk is larger than your main memory is. Otherwise you'll mainly test your disk caches.

For the following examples we'll call the to be tested device /dev/sda5, mountable as /export with any supported filesystem. Please replace those with your configuration.

xdd

xdd can run on the raw disk or it can operate in a file.
In case of the raw disk:

xdd -op read -targets 1 /dev/sda5 -rwratio 100 -queuedepth 1 -blocksize 1024 -reqsize 128 -mbytes 2048 -passes 2 -verbose

To test with a filesystem, first create it and populate it with a testfile:

mke2fs -j /dev/sda5

mount -t ext3 /dev/sda5 /export
dd if=/dev/zero of=/export/test.2G count=2048 bs=1M
xdd -op read -targets 1 /export/test.2G -rwratio 100 -queuedepth 1 -blocksize 1024 -reqsize 128 -mbytes 2048 -passes 2 -verbose

Here is a sample run on a raw disk (2GB size, request size 128kB, sequential read, on /dev/sda5, 2 passes):

burner bin # ./xdd.linux -op read -targets 1 /dev/sda5 -rwratio 100 -queuedepth 1 -blocksize 512 -reqsize 256 -mbytes 2048 -passes 2 -verbose
IOIOIOIOIOIOIOIOIOIOI XDD version 6.2c IOIOIOIOIOIOIOIOIOIOIOI
xdd - Version,  6.2c,   I/O     Performance     Inc.
Starting time for this  run, Sun Feb  6 22:39:03 2005

ID      for     this run, 'No   ID Specified'
Maximum Process Priority, disabled
Passes, 2
Pass Delay in   seconds, 0
Maximum Error   Threshold, 0
I/O Synchronization, 0
Target Offset, 0
Total   run-time limit in seconds, 0
Output file name,       stdout
CSV output file name,
Error   output file     name, stderr
Pass seek randomization, disabled
File write      synchronization, disabled
Pass synchronization barriers,  enabled
Number  of Targets,     1
Number  of I/O Threads, 1
Target  information,
        Target[0] Q[0], /dev/sda5
                Target  directory, "./"
                Process ID, 8935
                Processor,      -1
                Read/write      ratio, 100.00,  0.00
                Throttle in MB/sec,   0.00
                Per-pass time limit in  seconds, 0
                Blocksize       in bytes, 512
                Request size, 256, blocks, 131072, bytes
                Start   offset, 0
                Number of       MegaBytes, 2048
                Pass Offset in blocks, 0
                I/O memory      buffer is a normal memory buffer
                I/O memory      buffer alignment in     bytes, 4096
                Data pattern in buffer, 0x0
                Data buffer verification is , disabled.
                Direct  I/O, disabled
                Seek pattern,   sequential
                Seek range, 1048576
                Preallocation, 0
                Queue   Depth, 1
                Timestamping, disabled
                Delete  file, disabled

Seconds before  starting, 0
                     T  Q       Bytes        Ops      Time      Rate       IOPS   Latency     %CPU
TARGET   PASS0001    0  1    2147483648    16384    38.062    56.420     430.45    0.0023    14.92
TARGET   PASS0002    0  1    2147483648    16384    38.771    55.389     422.58    0.0024    15.79
TARGET   Average     0  1    4294967296    32768    76.833    55.900     426.48    0.0023    15.36
         Combined    1  1    4294967296    32768    76.833    55.900     426.48    0.0023    15.36
Ending time     for     this run, Sun Feb  6 22:40:20 2005

Pass 1 and pass 2 should not show significatt differences. If the values for pass 2 are much better and the CPU is much busier, then there might be disk buffers at work. In those cases increase the data set size (at least 2xmain memory).

iozone

iozone is well knows for the nice graphics it can create in conjunction with gnuplot. But even without the graphics, iozone does a series of tests with varying request and data set sizes. In the default settings the tests run for quite some time, so it's a good idea to limit the tests to some samples. For nice graphics it's best to run the full tests though.
iozone only works on filesystems, not on raw disks. Default is to check in the current working directory.
To run one test:

iozone -i0 -c -e -S256 -s2g

This will run the read test (-i0), including close and flush time (-c and -e), CPU cache is set to 256kB (-S256), file size is 2GB (-s2g). To run a more thourough test use the automatic mode (-a) inctead of specifying a certain test to run (-i<NUMBER>).

Sample runs:
A single read test:

burner export # iozone -i0 -c -e -S256 -n16 -s2g
        Iozone: Performance Test of File I/O
                Version $Revision: 3.226 $
                Compiled for 32 bit mode.
                Build: linux

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million,
                     Jean-Marc Zucconi, Jeff Blomberg,
                     Erik Habbinga, Kris Strecker.

        Run began: Sun Feb  6 23:16:08 2005

        Include close in write timing
        Include fsync in write timing
        Using minimum file size of 16 kilobytes.
        File size set to 2097152 KB
        Command line used: iozone -i0 -c -e -S256 -n16 -s2g
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 256 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  random    bkwd  record  stride
              KB  reclen   write rewrite    read    reread    read   write    read rewrite    readfwrite frewrite   fread  freread
         2097152       4   51299   45462

iozone test complete.

An automatic test (takes long time):
iozone -a -c -f -s2g -S256

Throughput test:
iozone -t -T -u1024

 1024 processes (-u1024), using POSIX threads. File size per process is 512kB.

bonnie++


bonnie++ has less feature and is suitable to get a rough idea of what to expect performance wise. Basic usage is simple:

bonnie++ -s 2048

to let bonnie++ run in the current working directory. A more complicated example would be:

mkdir -m 0777 /export/test ; bonnie++ -s 2048 -d /export/test -m `hostname` -n 100


which first creates /export/test, then runs bonnie++ for 2GB of data, including the file creation test for 100*1024 files (of size 0).

tar/untar/md5sum

This simple script creates some I/O and verifies that the data is unchanged. It untar's an archive, calculates the md5sum of all files, calculates the md5sum of the md5sums giving you a single number, then recreates the archive. It will do this n times concurrently and m times sequentially. If at the end it prints out "All archives identical." then everything is as expected. Otherwise you had data corruption.

#!/bin/bash
# Usage: tartest.sh n m dir file.tar
# n=concurrent process
# m=sequential loops
# dir=where to test
# file.tar=a sample tar file (larger is better)

RESULT=/tmp/tartest.$$.out

function  oneround { # $1=dir $2=tar file $3=loops
 for k in `seq 1 $3` ; do
  [ ! -d $1 ] && mkdir $1
  echo "k=$k"
  ( cd $1 ; tar xf $2 )
  rm $2
  find $1 -type f | xargs md5sum | awk '{print $1}' | md5sum
  ( cd $1 ; tar cf $2 . )
  rm -rf $1
 done
}

if [ $# -ne 4 ] ; then
 echo "Usage: $0 n m dir file.tar"
 echo "n is number of concurrent I/O processes"
 echo "m is number of overall loops"
 echo "dir is the directory where to test (no leading /)"
 echo "file.tar is any tar file (the larger the better)"
 exit 1
fi

rm -f $RESULT >/dev/null 2>&1
for j in `seq 1 $1` ; do
  ( echo "j=$j" >>$RESULT ; cp -p $4 $3/$j.tar ; oneround $3/$j $3/$j.tar $2 >>$RESULT ; rm $3/$j.tar ) &
done

wait

lines=`awk '{if ($2 == "-") print $1}' $RESULT | sort | uniq | wc -l`
if [ $lines != "1" ] ; then
 echo "Archives differ. Got $lines different archives. Should be 1!"
 echo "See $RESULT for details"
 exit 10
else
 echo "All archives identical."
 exit 0
fi


A sample run with 2 concurrent processes each doing 10 loops:

harald@burner$ ./test.sh 2 10 test.tar

All archives identical.

Short and simple.

Putting it all together

Instead of calling each program manually and noting the results, this is a wrapper to call them automatically. The result can be optionally be mailed to me (default is to send mail), so that I can collect them and publish on my web site. Results can be found here.
Download this archive. It contains all software needed (not the source of xdd/iozone/bonnie++). Then run

tar cfv iotest.tar
cd iotest
make
bench.sh MOUNTPOINT - "Description of the I/O system"


Sample:

harald@burner$ ./bench.sh /export/test - "Seagate 15k3 18GB ext3"
Result is in /tmp/bench.10506.out
Will mail the results (if you do not want to send out a mail, edit this script and set MAILRESULTS=0)
Result sent to benchresults@iscsi.studiokubota.com

The output file contains the results of all tests which were run including some data like cpuinfo and meminfo.

A remark about the time used: Using a Seagate 15k3 18GB which bonnie++ reports to get about 50MB/s and 256MB main memory, the full test takes about 2h. Double memory, increases the time by about 50%. Halfen the disk throughput, double the test time. Thus a machine with 1GB RAM and 10MB/s throughput will take approx. 20h. To limit the time used, reduce the memory available to Linux. When you boot Linux, use "mem=256m" to limit memory to 256MB.