Utility Computing Benchmark Results
To illustrate the capabilities utility computing services based on our AppLogic grid operating system offer users, we undertook a series of tests in conjunction with our partner Layered Technologies. We chose an industry standard open benchmark, UnixBench WHT, in order to ensure that the results were referenceable by anyone familiar with the hosting industry. On July 24, 2007 we announced the results of these tests on the world's most powerful Virtual Private Datacenter, featuring 443 CPUs and over 100 servers. The procedures, configurations and results follow.
- 1. Summary
2. Objectives
3. Configuration
4. Procedure
5. Performance Benchmark Results
6. Scalability Bemchmark Results
7. Detailed Test Results
1. Summary
An overall performance benchmark of 42,540 was achieved with the UnixBench WHT utility while consuming 443 CPUs. Based on the single CPU server result for the same benchmark of approximately 100, this implies that users can harness the equivalent of 420 single CPU dedicated servers through the utility computing service. This is no way represents the maximum performance of the system. During the benchmark we did not reach any fundamental architectural limit of the system, but ran out of hardware resources to add to the test. Further, performance was shown to scale linearly from 10 servers to 100, as servers were added to the system, allowing small systems to be scaled on demand without sacrificing performance in any way.
2. Objectives
Measure Overall Performance: To demonstrate the resources utility computing puts in the hands of users, we sought to measure the aggregate performance achievable on a mixed load benchmark representing real world workload distribution.
Demonstrate Scalability: Because being able to start with limited resources and scale on demand is critical for web services, we measured benchmark performance with varying resource allocations as resources were added to the system to determine if we have achieved linear scalability of performance.
3. Configuration
Servers: 116 HP Proliant DL145-G3, Two Dual Core Opteron 2218 CPUs (2.6GHz), 8 GB RAM, 2x250GB SATA disks
Backbone switch (1x): HP ProCurve 5412zl-96G Intelligent Edge (288 GigE ports)
Grid OS (1x): 3Tera AppLogic Grid OS 2.0.2 (beta)
Benchmark : Unixbench 4.1-wht.1 - the webhostingtalk version of the UnixBench benchmark
4. Procedure
For each measurement, we recorded all results for each individual instance, this means 12 numbers (11 tests plus the summary index). For the composite benchmark apps, this is all of the items below (about 60 numbers):
- for each controller, each appliance: 12 numbers - 11 tests plus summary index
- for each group of controllers with the same instance size (CPU): min/max/avg of each of the 12 numbers (36 numbers)
- the sums of each test and the summary index (12 numbers)
All benchmark apps were run with --cap_cpu (except the baseline tests), which ensures the instance gets only the amount of CPU designated.
Baseline Measurement
These tests were performed on individual servers to provide a reference for the benchmark results
- 1.1. Physical baseline
- Single CPU (--nosmp on boot)
- 4 CPU
- Single CPU (--nosmp on boot)
- 4 CPU
- Single CPU (--nosmp on boot)
- 4 CPU
Mixed load performance
This test loaded the largest configuration with a representative mix of benchmark instances to provide an approximation of real world utilization of the utility computing service.
Scalability test with incremental appliance sizes and number of servers:
4 runs, one for each benchmark instance size: 0.50, 1.00, 2.00 and 4.00 were performed. For each run:
- we started a single benchmark instance and recorded its 12 numbers (11 tests + summary) in clean conditions -- with nothing else running on that server
- fill approx 10 cpus with instances of this size, collect results
- fill approx 20 servers with instances of this size, collect results.
- add 20 servers for a total of 40 servers, collect results
- add 20 servers for a total of 60 servers, collect results
- add 20 servers for a total of 80 servers, collect results
- add 20 servers for a total of 100 servers, collect results
This allowed us to plot a graph, from 10 servers to 100 servers showing scalability of performance versus number of servers.
5. Performance benchmark results
Total VDC performance index: 42,540
(comparison: 1 CPU physical server, index of 100; 4 CPU physical server of same class, index of 450; typical VPS, index of 20-50)Total CPU Cores: 464
Detailed breakdown of CPU allocation:
- 423.25 physical CPUs running benchmark instances
- 19.75 physical CPUs allocated for benchmark control
- 443.00 CPUs consumed in the test (this is the number reported publicly)
- 1.00 CPU AppLogic controller
- 20.00 CPUs Unused due to a scheduler inefficiency -
We were not able to achieve 100% utilization of resources available in the virtual private datacenter due to an error we discovered in the scheduler when using large amounts of CPU and memory. This error resulted in less than 5% overhead addition to the system at the time and has since been corrected.
Total benchmark instances: 649
Detailed break down of benchmark instances (distributed by CPU size of benchmark appliance instance):
- 349x 0.25 CPU
- 144x 0.50 CPU
- 96x 1.00 CPU
- 36x 2.00 CPU
- 24x 4.00 CPU
- 649 Total instances
Final benchmark dashboard:
6. Scalability benchmark results
Benchmark results for overall performance vs servers available
7. Detailed Test Results
Test name: Dhrystone
The test measures CPU performance relevant for most computational operations.
| Dhrystone 2 using register variables | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
2.5E + 08 |
2.63E + 08 |
2.59E + 08 | 2.49E + 08 |
| 20 |
5.17E +08 |
5.39E +08 |
5.23E + 08 |
5.17E + 08 |
| 40 |
1.04E + 09 |
1.09E + 09 |
1.05E + 09 |
1.04E + 09 |
| 60 |
1.57E +09 |
1.62E + 09 |
1.57E + 09 |
1.57E + 09 |
| 80 |
2.11E + 09 |
2.18E + 09 |
2.09E + 09 |
2.09E + 09 |
| 100 |
2.63E + 09 |
2.69E + 09 |
2.62E + 09 | 2.62E + 09 |
Test name: Whetstone
The test measures double-precision floating point performance.
| Double-Precision Whetstone | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
133740.0 | 66776.0 | 33695.4 | 26017.9 |
| 20 |
274886.4 | 136068.5 | 67530.4 | 53186.1 |
| 40 |
552310.4 | 274394.1 | 137067.1 | 108232.9 |
| 60 |
830275.1 | 409718.5 | 204587.0 | 163229.4 |
| 80 |
1105332 | 549261.7 | 273590.9 | 216095.5 |
| 100 |
1381163 | 688628.0 | 342849.3 | 271397.4 |
Test name: Execl Throughput
| Execl Throughput | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
80912.5 | 81103.5 | 48496.0 | 26017.9 |
| 20 |
177535.4 | 164105.2 | 100515.9 | 53186.1 |
| 40 |
359665.0 | 331889.6 | 202126.9 | 108232.9 |
| 60 |
520674.9 | 494562.9 | 301928.8 | 163229.4 |
| 80 |
680037.4 | 820897.2 | 404156.4 | 216095.5 |
| 100 |
841743.1 | 820897.2 | 506431.0 | 271397.4 |
Test name: File Copy
| File Copy 1024 bufsize 2000 maxblocks | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
863791 | 851918 | 582482 | 314973 |
| 20 |
1779117 | 1721641 | 1193200 | 634443 |
| 40 |
3546423 | 3516166 | 2395473 | 1291108 |
| 60 |
5255347 | 5215475 | 3557845 | 1954853 |
| 80 |
6885831 | 6931472 | 4793631 | 2592855 |
| 100 |
8482386 | 8498686 | 5980745 | 3262302 |
Test name: File copy/small files
The test measures disk read and write performance for operations like user session tracking and e-mail processing.
| File Copy 256 bufsize 500 maxblocks | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
257082 | 247483 | 164128 | 81338 |
| 20 |
553599 | 498121 | 329743 | 164810 |
| 40 |
1113823 | 1012396 | 671267 | 334148 |
| 60 |
1585832 | 1499034 | 1001243 | 505749 |
| 80 |
2107363 | 1999728 | 1349884 | 671466 |
| 100 |
2630062 | 2477165 | 1676241 | 845174 |
Test name File Read
The test measures disk read speed, typical for serving web content and database searches.
| File Read 4096 bufsize 8000 maxblocks | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
6331050 | 6276501 | 5108817 | 2675042 |
| 20 |
14009845 | 12796900 | 10084888 | 5430646 |
| 40 |
28048693 | 25909080 | 19746756 | 10967738 |
| 60 |
40958165 | 38649109 | 29456452 | 16512934 |
| 80 |
54193412 | 51174806 | 39610632 | 21822367 |
| 100 |
67487881 | 63483378 | 49208176 | 27442016 |
Test name Pipe Throughput
| Pipe Throughput | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
35193487 | 36662168 | 15841879 | 4315781 |
| 20 |
72820112 | 75451667 | 33575987 | 8923294 |
| 40 |
1.47E + 08 |
1.5E + 08 | 64969531 | 17563527 |
| 60 |
2.21E + 08 | 2.26E + 08 | 99335811 | 26840813 |
| 80 |
2.95E + 08 | 3.02E + 08 | 1.33E + 08 | 35792034 |
| 100 |
3.68E + 08 | 3.75E + 08 | 1.66E + 08 | 44668470 |
Test name: Pipe-based Context Switching
The test measures inter-application communications relevant to most advanced web serving (like RoR, J2EE, etc.)
| Pipe-based Context Switching | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
4980483 | 4958144 | 3908907 | 2540360 |
| 20 |
10416977 | 9898892 | 7968602 | 5359988 |
| 40 |
20902013 | 20105915 | 15960291 | 10527138 |
| 60 |
31604373 | 30133655 | 23720171 | 15768209 |
| 80 |
41771466 | 40566952 | 31492469 | 21226122 |
| 100 |
52182900 | 50938741 | 39309164 | 26369379 |
Test name: Process Creation
The test measures how fast applications start, relevant in serving CGI requests and background processing.
| Process Creation | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
159420.3 | 156523.4 | 73947 | 39328.8 |
| 20 |
350770.5 | 317529.4 | 159159.3 | 80261.1 |
| 40 |
683692.3 | 641240.5 | 321824.9 | 163829.8 |
| 60 |
1011331.0 | 958536.9 | 480587.0 | 246995.7 |
| 80 |
1338541.0 | 1282713.0 | 641091.1 | 326178.8 |
| 100 |
1652078.0 | 1592594.0 | 808136.1 | 410281.3 |
Test Shell scripts
The test measures how quickly the OS inside the appliance can start concurrent scripts, relevant for scheduled processing.
| Shell Scripts (8 concurrent) | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
18982.5 | 19311.0 | 14072.1 | 8350.7 |
| 20 |
40862.4 | 39083.5 | 28784.3 | 17064.5 |
| 40 |
82775.7 | 79264.2 | 57256.3 | 34706.6 |
| 60 |
121069.6 | 117811.7 | 85299.6 | 52396.6 |
| 80 |
158859.6 | 157631.5 | 114956.1 | 69390.3 |
| 100 |
195571.1 | 195446.5 | 143071.4 | 86941.6 |
Test name System call Overhead
| System Call Overhead | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
49811555 | 50909868.2 | 49612837.7 | 48083786.1 |
| 20 |
1.04E + 08 |
1.05E + 08 | 99937560 |
98016770 |
| 40 |
2.07E + 08 | 2.11E + 08 | 2.01E + 08 | 1.99E + 08 |
| 60 | 3.12E + 08 | 3.15E + 08 | 3E + 08 | 3.01E + 08 |
| 80 |
4.15E + 08 | 4.22E + 08 | 4.01E + 08 | 3.98E + 08 |
| 100 |
5.18E + 08 | 5.24E + 08 | 5.02E + 08 | 4.99E + 08 |
Final Test Score
Presents overall performance measured in 11 different characteristics
| UnixBench Final Score | ||||
| # of Servers | 0.5 CPU | 1 CPU | 2 CPUs | 4 CPUs |
| 10 |
4213.4 |
3982.1 |
2669.6 | 1513.7 |
| 20 |
8894.6 |
8088.7 |
5462.7 | 3101.6 |
| 40 |
17839.4 |
16358.2 | 10919.9 | 6258.4 |
| 60 |
26471.7 |
24396.2 | 16320.3 | 9453.1 |
| 80 | 35078.6 | 32618.6 | 21860.6 | 12543.2 |
| 100 | 43628.2 | 40476.8 | 27324.4 | 15727.0 |
