Utility Computing Benchmark Results

To illustrate the capabilities utility computing services based on our AppLogic grid operating system offer users, we undertook a series of tests in conjunction with our partner Layered Technologies. We chose an industry standard open benchmark, UnixBench WHT, in order to ensure that the results were referenceable by anyone familiar with the hosting industry. On July 24, 2007 we announced the results of these tests on the world's most powerful Virtual Private Datacenter, featuring 443 CPUs and over 100 servers. The procedures, configurations and results follow.


1. Summary

An overall performance benchmark of 42,540 was achieved with the UnixBench WHT utility while consuming 443 CPUs. Based on the single CPU server result for the same benchmark of approximately 100, this implies that users can harness the equivalent of 420 single CPU dedicated servers through the utility computing service. This is no way represents the maximum performance of the system. During the benchmark we did not reach any fundamental architectural limit of the system, but ran out of hardware resources to add to the test. Further, performance was shown to scale linearly from 10 servers to 100, as servers were added to the system, allowing small systems to be scaled on demand without sacrificing performance in any way.


2. Objectives

Measure Overall Performance: To demonstrate the resources utility computing puts in the hands of users, we sought to measure the aggregate performance achievable on a mixed load benchmark representing real world workload distribution.

Demonstrate Scalability: Because being able to start with limited resources and scale on demand is critical for web services, we measured benchmark performance with varying resource allocations as resources were added to the system to determine if we have achieved linear scalability of performance.


3. Configuration

Servers: 116 HP Proliant DL145-G3, Two Dual Core Opteron 2218 CPUs (2.6GHz), 8 GB RAM, 2x250GB SATA disks
Backbone switch (1x): HP ProCurve 5412zl-96G Intelligent Edge (288 GigE ports)
Grid OS (1x): 3Tera AppLogic Grid OS 2.0.2 (beta)
Benchmark : Unixbench 4.1-wht.1 - the webhostingtalk version of the UnixBench benchmark


4. Procedure

For each measurement, we recorded all results for each individual instance, this means 12 numbers (11 tests plus the summary index). For the composite benchmark apps, this is all of the items below (about 60 numbers):

  • for each controller, each appliance: 12 numbers - 11 tests plus summary index
  • for each group of controllers with the same instance size (CPU): min/max/avg of each of the 12 numbers (36 numbers)
  • the sums of each test and the summary index (12 numbers)

All benchmark apps were run with --cap_cpu (except the baseline tests), which ensures the instance gets only the amount of CPU designated.

Baseline Measurement
These tests were performed on individual servers to provide a reference for the benchmark results

    1.1. Physical baseline
    • Single CPU (--nosmp on boot)
    • 4 CPU
    1.2. Xen dom0 baseline (local disk)
    • Single CPU (--nosmp on boot)
    • 4 CPU
    1.3. Xen domU baseline (local disk)
    • Single CPU (--nosmp on boot)
    • 4 CPU

Mixed load performance
This test loaded the largest configuration with a representative mix of benchmark instances to provide an approximation of real world utilization of the utility computing service.

Scalability test with incremental appliance sizes and number of servers:
4 runs, one for each benchmark instance size: 0.50, 1.00, 2.00 and 4.00 were performed. For each run:

  • we started a single benchmark instance and recorded its 12 numbers (11 tests + summary) in clean conditions -- with nothing else running on that server
  • fill approx 10 cpus with instances of this size, collect results
  • fill approx 20 servers with instances of this size, collect results.
  • add 20 servers for a total of 40 servers, collect results
  • add 20 servers for a total of 60 servers, collect results
  • add 20 servers for a total of 80 servers, collect results
  • add 20 servers for a total of 100 servers, collect results

This allowed us to plot a graph, from 10 servers to 100 servers showing scalability of performance versus number of servers.


5. Performance benchmark results

Total VDC performance index: 42,540

(comparison: 1 CPU physical server, index of 100; 4 CPU physical server of same class, index of 450; typical VPS, index of 20-50)

Total CPU Cores: 464

Detailed breakdown of CPU allocation:

  • 423.25 physical CPUs running benchmark instances
  • 19.75 physical CPUs allocated for benchmark control
  • 443.00 CPUs consumed in the test (this is the number reported publicly)
  • 1.00 CPU AppLogic controller
  • 20.00 CPUs Unused due to a scheduler inefficiency -
    We were not able to achieve 100% utilization of resources available in the virtual private datacenter due to an error we discovered in the scheduler when using large amounts of CPU and memory. This error resulted in less than 5% overhead addition to the system at the time and has since been corrected.

Total benchmark instances: 649

Detailed break down of benchmark instances (distributed by CPU size of benchmark appliance instance):

  • 349x 0.25 CPU
  • 144x 0.50 CPU
  • 96x 1.00 CPU
  • 36x 2.00 CPU
  • 24x 4.00 CPU
  • 649 Total instances

Final benchmark dashboard:

Benchmark Dashboard


6. Scalability benchmark results

Benchmark results for overall performance vs servers available

Utility Computing Benchmark Results


7. Detailed Test Results


Test name: Dhrystone

The test measures CPU performance relevant for most computational operations.

Dhrystone 2 using register variables
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
2.5E + 08
2.63E + 08
2.59E + 08 2.49E + 08
20
5.17E +08
5.39E +08
5.23E + 08
5.17E + 08
40
1.04E + 09
1.09E + 09
1.05E + 09
1.04E + 09
60
1.57E +09
1.62E + 09
1.57E + 09
1.57E + 09
80
2.11E + 09
2.18E + 09
2.09E + 09
2.09E + 09
100
2.63E + 09
2.69E + 09
2.62E + 09 2.62E + 09


Test name: Whetstone

The test measures double-precision floating point performance.

Double-Precision Whetstone
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
133740.0 66776.0 33695.4 26017.9
20
274886.4 136068.5 67530.4 53186.1
40
552310.4 274394.1 137067.1 108232.9
60
830275.1 409718.5 204587.0 163229.4
80
1105332 549261.7 273590.9 216095.5
100
1381163 688628.0 342849.3 271397.4


Test name: Execl Throughput

Execl Throughput
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
80912.5 81103.5 48496.0 26017.9
20
177535.4 164105.2 100515.9 53186.1
40
359665.0 331889.6 202126.9 108232.9
60
520674.9 494562.9 301928.8 163229.4
80
680037.4 820897.2 404156.4 216095.5
100
841743.1 820897.2 506431.0 271397.4


Test name: File Copy

File Copy 1024 bufsize 2000 maxblocks
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
863791 851918 582482 314973
20
1779117 1721641 1193200 634443
40
3546423 3516166 2395473 1291108
60
5255347 5215475 3557845 1954853
80
6885831 6931472 4793631 2592855
100
8482386 8498686 5980745 3262302


Test name: File copy/small files

The test measures disk read and write performance for operations like user session tracking and e-mail processing.

File Copy 256 bufsize 500 maxblocks
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
257082 247483 164128 81338
20
553599 498121 329743 164810
40
1113823 1012396 671267 334148
60
1585832 1499034 1001243 505749
80
2107363 1999728 1349884 671466
100
2630062 2477165 1676241 845174


Test name File Read

The test measures disk read speed, typical for serving web content and database searches.

File Read 4096 bufsize 8000 maxblocks
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
6331050 6276501 5108817 2675042
20
14009845 12796900 10084888 5430646
40
28048693 25909080 19746756 10967738
60
40958165 38649109 29456452 16512934
80
54193412 51174806 39610632 21822367
100
67487881 63483378 49208176 27442016


Test name Pipe Throughput

Pipe Throughput
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
35193487 36662168 15841879 4315781
20
72820112 75451667 33575987 8923294
40
1.47E + 08
1.5E + 08 64969531 17563527
60
2.21E + 08 2.26E + 08 99335811 26840813
80
2.95E + 08 3.02E + 08 1.33E + 08 35792034
100
3.68E + 08 3.75E + 08 1.66E + 08 44668470


Test name:  Pipe-based Context Switching

The test measures inter-application communications relevant to most advanced web serving (like RoR, J2EE, etc.)

Pipe-based Context Switching
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
4980483 4958144 3908907 2540360
20
10416977 9898892 7968602 5359988
40
20902013 20105915 15960291 10527138
60
31604373 30133655 23720171 15768209
80
41771466 40566952 31492469 21226122
100
52182900 50938741 39309164 26369379


Test name: Process Creation

The test measures how fast applications start, relevant in serving CGI requests and background processing.

Process Creation
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
159420.3 156523.4 73947 39328.8
20
350770.5 317529.4 159159.3 80261.1
40
683692.3 641240.5 321824.9 163829.8
60
1011331.0 958536.9 480587.0 246995.7
80
1338541.0 1282713.0 641091.1 326178.8
100
1652078.0 1592594.0 808136.1 410281.3


Test Shell scripts

The test measures how quickly the OS inside the appliance can start concurrent scripts, relevant for scheduled processing.

Shell Scripts (8 concurrent)
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
18982.5 19311.0 14072.1 8350.7
20
40862.4 39083.5 28784.3 17064.5
40
82775.7 79264.2 57256.3 34706.6
60
121069.6 117811.7 85299.6 52396.6
80
158859.6 157631.5 114956.1 69390.3
100
195571.1 195446.5 143071.4 86941.6


Test name System call Overhead

System Call Overhead
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
49811555 50909868.2 49612837.7 48083786.1
20
1.04E + 08
1.05E + 08 99937560
98016770
40
2.07E + 08 2.11E + 08 2.01E + 08 1.99E + 08
60 3.12E + 08 3.15E + 08 3E + 08 3.01E + 08
80
4.15E + 08 4.22E + 08 4.01E + 08 3.98E + 08
100
5.18E + 08 5.24E + 08 5.02E + 08 4.99E + 08


Final Test Score

Presents overall performance measured in 11 different characteristics

UnixBench Final Score
# of Servers 0.5 CPU 1 CPU 2 CPUs 4 CPUs
10
4213.4
3982.1
2669.6 1513.7
20
8894.6
8088.7
5462.7 3101.6
40
17839.4
16358.2 10919.9 6258.4
60
26471.7
24396.2 16320.3 9453.1
80 35078.6 32618.6 21860.6 12543.2
100 43628.2 40476.8 27324.4 15727.0

How we Did It

A video demostrating of the performance results is available here.

AppLogic

Overview
What It Can Do
Who Should Use It
Features
User Interface
Monitoring
Hardware Requirements
Dynamic Appliances
Latest Release

AppLogic Programs

Assured Success Plan
Grid University

Get A Grid

Grid Hosting
Enterprise AppLogic License

Benchmark Results

Our AppLogic utility computing system scored 42,540 on the UnixBench WHT 4.1 benchmark. View results »