Large File Sequential Write Comparison Using FFSB

Test bed:

Kernel: 2.6.21-rc4 with delayed allocation patches from Alex Tomas
Partition size: 68 GB (sdc1) SCSI ultra320 10 000rpm

Tests are done on a bi-Xeon machine with 2G of RAM and with hyper-threading enabled (-> 4 CPUs).

processor      : 4
vendor_id      : GenuineIntel
cpu family     : 15
model          : 4
model name     : Intel(R) Xeon(TM) CPU 2.80GHz
cpu MHz        : 2793.078
cache size     : 1024 KB
bogomips       : 5586.59

# hdparm -t /dev/sdc1

/dev/sdc1:
 Timing buffered disk reads:  202 MB in  3.01 seconds =  67.10 MB/sec

Synopsis:

FFSB filesystem benchmarking software 5.1 available at
http://sourceforge.net/projects/ffsb/

FFSB is a multi-threaded filesystem performance measurement tool. For our tests, we define three profiles which only differ in the number of threads (4, 16 or 64).
Only one operation is defined (create) as we want to compare large file sequential write between different filesystems.

4-thread profile:
num_filesystems=1
num_threadgroups=1
directio=0
time=300

[filesystem0]
        location=/mnt/test/
        num_files=0
        num_dirs=0
        max_filesize=1073741824
        min_filesize=1073741824
[end0]

[threadgroup0]
        num_threads=4
        write_blocksize=65536
        create_weight=1
[end0]

24 1-GB files are created in the 4-thread test case, 32 1-GB files in the 16-thread test case and 64 1-GB files in the 64-thread test case.
We collect the throughput scores and the CPU utilization values (User + System) given by ffsb output. Each test is run 3 times.

The partition used is formated with mkfs and mounted with appropriate options before each run.
Filesystems and options are:


filesystems mount options
1
ext3 data=writeback
2
ext4 data=writeback,extents,delalloc
3
xfs
defaults


Results

The following table summarizes the results for each filesystem. All numbers represent the average value of three tests.




       ext3   
ext4+extents+delalloc
    xfs      

4 threads
Throughput
MB/sec
32.7
44.7
45.1
Percent CPU usage
18.7
11.4
10.9

16 threads
Throughput
MB/sec
34.8
45.1
38.8
Percent CPU usage
23.8
11.3
10.3

  64 threads
Throughput
MB/sec
32.8
40.5
33.3
Percent CPU usage
22.2
12.3
9.0


The graphs below are obtained using the previous results.

These graphs show  a gain in write throughput  for ext4 compared to ext3 and a reduction of cpu usage of 50% for ext4 compared to ext3 due to the delayed allocation in ext4.