Benchmarking tool: IOzone - Benchmarking a file system

2.2 Benchmarking a file system

2.2.4 Benchmarking tool: IOzone

IOzone is an open source benchmarking tool originally proposed by William D.

Norcott (from Oracle) and developed by Don Capps and Tom McNeal (from Hewlett Packard). IOzone is designed to work only for file system performance tests. It doesn’t work on raw disk that doesn’t have a file system [5].

IOzone’s main popularity comes from its nice and attractive creation of 3D graph-ics which works in conjunction with gnuplot. Gnuplot is a portable command-line graphing utility for Linux that can generate 3D graphs or plots of a function in addi-tion to just two-dimensional [5, 3].In addiaddi-tion to that iozone has a lot of interesting options which is discussed below.

i Why IOzone?

There are, of course, other similar benchmarking tools which can do similar work. Benchmarking a file system and some of similar benchmarking tools are mentioned in section 2.2. In general, IOzone has the following especial features and advantageous [29, 19, 5] compare to the other available bench-marking tools. And, that is why it is chosen to benchmark ceph distributed file system.

• It works for all types of file systems (local, network, and distributed file systems).

• It is easy to use and it works under many platforms (or operating systems) which includes Linux and Windows.

• It assumes its execution is bottlenecked by storage devices to avoid the significant effect of CPU speed and RAM size specifications.

• It’s Compatible for very large file sizes.

• It’s Compatible for multi-process measurement.

• It’s Compatible for both single and multiple stream measurement.

• It’s Compatible for POSIX Asynchronous I/O

• It’s Compatible for POSIX Threads, or Pthreads.

• Its I/O Latency plots feature.

• Its processor cache size configurable feature.

• Excel importable output for graph generation feature.

• Compared to bonnie++, IOzone has more features and generates more de-tailed outputs than the common read and write speeds. It measures many file system’s operations (files I/O performance), like: read, write, re-read, re-write, read backwards, read strided, fread, fwrite, random read/write, pread/pwrite, aio_read/write, and mmap 56,66.

ii IOzone how-to:

One can download the latest IOzone source code from IOzone website http://www.iozone.org using ’wget’ command one can download one of the following latest IOzone

codes of

18-Mar-2011:-http://www.iozone.org/src/current/iozone3\_373.tar

After downloading IOzone source code, go to its folder and type ’make’ in the command line to compile it and then’make target’to install. Then, enjoy using IOzone.

It is also possible to install IOzone on the Linux command line by typing:

$ apt-get install iozone3.

Since the file system benchmarking result is highly influenced by the size of the system’s buffer cache, before running IOzone one need to know the following rules [5]:

Rule 1: For accuracy the max size of the file going to be tested should be bigger than buffer cache. If the buffer cache is dynamic or confusing to know its size, make the max file size bigger than the total physical memory which is in the platform [5]

Rule 2: Unless the max file size is set very smaller than the buffer cache, you must see at least the following three plateaus:

- File size fits in processor cache.

- File size fits in buffer cache.

- File size is bigger than buffer cache.

Rule 3: Use -g option to set the maximum file size value. Refer manual page of IOzone command (man iozone) for more information.

IOzone Command Line Options:

For simple start use the automatic mode:

$ iozone -a

-a Run in automatic mode; it generates output that covers all tested file operations for

record sizes of 4k to 16M for file sizes of 64KB to 512MB.

See below for other important options \cite{56}:

-b filename

Iozone will create a binary file format file in Excel compatible output of results.

-c

Include close() in the timing calculations.

-C

Show bytes transferred by each child in throughput testing.

-d #

Microsecond delay out of barrier.

-e

Include flush in the timing calculations.

-f filename

Used to specify the filename for the temporary file under test.

-g #

Set maximum file size (in Kbytes) for auto mode.

-h

Displays help.

-i #

Used to specify which tests to run. (0=write/rewrite, 1=read/reread, 2=random-read/write

3=Read-backwards, 4=Re-write-record, 5=stride-read, 6=fwrite/re-fwrite, 7=fread/Re-fread,

8=random mix, 9=pwrite/Re-pwrite, 10=pread/Re-pread, 11=pwritev/Re-pwritev, 12=preadv/Re-preadv).

One will always need to specify 0 so that any of the following tests will have a file to measure.

-i # -i # -i # is also supported so that one may select more than one test.

-s Sets file size in KB for the test. It also accepts MB and GB which needs to be explicitly specified (-s #m for MB, -s #g for GB).

-R Generate Excel report.

For more information read IOzone manual page.

The other very interesting feature of IOzone is for every run it gives a summary of the conditions of that particular run in its output including the command line used. See below one sample run condition in the output of IOzone test.

Sample IOzone output (only the run condition)

1 Run began: Sat Apr 30 17:18:32 2011

3 Record Size 256 KB

4 File size set to 83886080 KB

5 Command line used: iozone -i 0 -i 1 -r 256 -s 80G

6 Output is in Kbytes/sec

7 Time Resolution = 0.000001 seconds.

8 Processor cache size set to 1024 Kbytes.

9 Processor cache line size set to 32 bytes.

10 File stride size set to 17 * record size.

See Appendix B for more IOzone sample runs and outputs.

Chapter 3

Ceph and its architecture

As described in the above two Chapters, Ceph is one of the very recent and very promising parallel, object-based, distributed file system; especially from the point of its architectural benefits. This chapter discusses about its general architecture and its architectural advantages which help ceph to be extremely scalable and reli-able, with excellent performance as per [12, 24, 11, 1]. Its installation methods and ceph ’how-to’ are discussed in the next chapter (Chapter 4: Methodology). One of the main goals of ceph is to be POSIX-compatible in addition to being free soft-ware (an open source which is distributed under the GNU Lesser General Public License) [1].

3.1 Ceph General Architecture

Ceph general architecture consists of Metadata server (MDS), Object-based Stor-age Devices (OSDs), Monitors (MONs), and, of course, clients. A minimum of one from each type is required to make ceph start, and then ceph architecture al-lows you to scale up the number of MONs, MDSs, and OSDs up to thousands more according to the need of your file size [1]. The hardware requirement for each type of server in the ceph cluster is found in the ceph methodology chapter (chapter 4).

Figure 3.1: Ceph General Architecture

Figure 3.1 shows the general architecture of ceph. The ceph cluster representative is the MON node(s). One typical example of client and ceph file system interaction is described below after the client node mounted the ceph file system via MON IP-

address:-- Client ---> MDS (open request)

- MDS ---> Client (provides file inode, file size, and stripe info,...) - Client <---> OSDs (direct read/write from Client to OSDs) - Client ---> MDSs (close request)

- MDSs ---> (saves the changes)

In the above ceph file system interaction with client, MON part is not mentioned.

Actually, the main job of MON nodes is monitoring the whole process. The next section (section 3.2) MONs job in the ceph cluster is discussed very briefly.

In document Investigatiing the performance scalability & reliability of a distributed file system: Ceph (sider 31-36)