1 Objective and Scope
The primary aim of this document is to highlight key considerations in Performance Testing and to provide an insight into the rigor and depth of performance testing
2 Performance Testing
Performance Testing can be viewed as the systematic process of collecting and monitoring the results of system usage and analyzing them to aid system improvement towards desired results.As part of the performance testing process, one needs to gather statistical information, examine logs of system state histories, determine system performance under natural and artificial conditions and alter system modes of operation.
Performance testing complements functional testing. Functional testing can validate proper functionality under correct usage and proper error handling under incorrect usage. It cannot, however, tell how much load an application can handle before it breaks or performs improperly. Finding the breaking points and performance bottlenecks, as well as identifying functional errors that only occur under stress requires performance testing.
The purpose of Performance testing is to demonstrate that
• The application processes required transaction volumes within specified response times in a real-time production database (Speed).
• The application can handle various user load scenarios (stresses), ranging from a sudden load “spike” to a persistent load “soak” (Scalability).
• The application is consistent in availability and functional integrity (Stability).
• Determination of minimum configuration that will allow the system to meet the formal stated performance expectations of stakeholders
Basis for inclusion in Load Test
High frequency transactions: The most frequently used transactions have the potential to impact the performance of all of the other transactions if they are not efficient.
Mission Critical transactions The more important transactions that facilitate the core objectives of the system should be included, as failure under load of these transactions has, by definition, the greatest impact.
Read Transactions: At least one READ ONLY transaction should be included, so that performance of such transactions can be differentiated from other more complex transactions.
Update Transactions At least one update transaction should be included so that performance of such transactions can be differentiated from other transactions.
2.1 Types of Performance Testing
Benchmark Testing:The objective of Benchmark tests is to determine end-to-end timing of various critical business processes and transactions while the system is under low load with a production sized database.
The best time to execute benchmark tests is at the earliest opportunity. Developing performance test scripts at such an early stage provides opportunity to identify and remedy serious performance problems and expectations before load testing commences.
A key indicator of the quality of a benchmark test is its repeatability. That is, the re-execution of a performance test should give the same set of results. If the results are not the same each time, differences in results cannot be attributed to changes in the application, configuration or environment being tested.
Stress Tests:
Stress tests have one primary objective, and that is to determine the maximum load under which a system fails, and how it fails.
It is important to know in advance if a ‘stress’ situation will result in catastrophic system failure or if all components of the system simply ‘just go really slow’. Catastrophic failures often require the restarting of various infrastructures and contribute to downtime, stressful work environments for support staff and management, as well as possible financial loss and breaching of SLAs.
Targeted Infrastructure Tests:
The objective of Targeted Infrastructure tests is to individually test isolated areas of an end-to-end system configuration. This type of testing would include communications infrastructure such as:
-Load balancers;
-Web servers;
-Applications servers;
-Databases.
Targeted Infrastructure testing allows for the identification of any performance issues that would fundamentally limit the overall ability of a system to deliver at a given performance level. Targeted Infrastructure testing separately generates load on each component of an end-to-end system, measuring the response of each component under load.
Each test can be simple, focusing specifically upon the individual component being tested. It is often wise to execute Targeted Infrastructure tests upon isolated components prior to Load or Stress testing as it is much easier to identify (and quicker to rectify) performance issues in this situation rather than in a full end-to-end test.
Soak Tests (Endurance Testing):
Objective of soak testing is to identify any performance problems that may appear after a system has been running at a high level for an extended period of time. It is possible that a system may ‘stop’ working after a certain number of transactions have been processed, maybe due to:
-Serious memory leaks that would eventually result in a memory crisis;
-Failure to close connections between tiers of a multi-tiered system which could halt some or all modules of a system;
-Failure to close database cursors under some conditions which could eventually result in the entire systems stalling;
-Gradual degradation in response time of some function as internal data structures become less efficient during a long high intensity test.
Volume Tests:
Volume tests are tests directly relating to throughput, and are usually associated with the testing of ‘messaging’, ‘batch’ or ‘conversion’ type processing situations.
The objectives of Volume tests are:
- To determine throughput associated with a specific process or transaction;
- To determine the ‘capacity drivers’ associated with a specific process or transaction.
Volume testing a system involves the focusing of throughput through a system function (say, in bytes) rather than response time of a system function (say, in seconds).
It is important when designing Volume tests that the capacity drivers are identified prior to the execution of the Volume testing to ensure meaningful results are recorded. Capacity drivers in a batch processing function could be:
Record Types: The record types contained within one specific batch job run may require significant CPU processing while other record types may invoke substantial database and disk activity. Some batch processing function can also contain aggregation processing, and the mix of data contained within a batch job can significantly impact the processing requirements of the aggregation phase.
Database Size: The total amount of processing effort for a batch processing function may also depend upon the size and make-up of the database the batch job is interacting with.
Failover Tests:
Objective of Failover Test is to get the system under test into steady state and start failing components (servers, routers, etc) and observe how response times are effected during and after the failover and how long the system takes to transition back to steady state.
Failover testing determines what will occur if multiple web-servers are being used under peak anticipated load, and one of them dies. Does the load balancer used in this architecture react quickly enough? Can the other web-servers handle the sudden dumping of extra load?
Network Sensitivity Tests:
Network sensitivity tests specifically focus on Wide Area Network (WAN) limitations and network activity (traffic, latency, error rates, etc) and then measure the impact of that traffic on an application that is bandwidth dependant. The primary objectives of Network Sensitivity Tests are:
• Determine impact on system response time over a WAN;
• Determine the capacity of a system based on a given WAN;
• Determine the impact on a system under test that is under ‘dirty’ communications load.
Response time is the primary metric of measure for Network Sensitivity testing, and is recorded as part of scenario test execution. Response time can be estimated as –
Response Time = Transmission Time + Delays + Client Processing Time + Server Processing Time
Where:
Transmission Time = Data to be transferred divided by bandwidth
Delays = Number of turns multiplied by ‘Round Trip’ response time
Client Processing Time = Time taken on users software to fulfill request
Server Processing Time = Time taken on server computer to fulfill request
2.2 When to Start and Stop Performance Testing?
When to Start Performance Testing
A common practice is to start performance testing only after functional, integration, and system testing are complete; that way, it is understood that the target application is “sufficiently sound and stable” to ensure valid performance test results.
However, the problem with the above approach is that it delays performance testing until the latter part of the development lifecycle. Then, if the tests uncover performance-related problems, one has to resolve problems with potentially serious design implications at a time when the corrections made might invalidate earlier test results. In addition, the changes might destabilize the code just when one wants to freeze it, prior to beta testing or the final release.
A better approach is to begin performance testing as early as possible, just as soon as any of the application components can support the tests. This will enable users to establish some early benchmarks against which performance measurement can be conducted as the components are developed.
When to Stop Performance Testing
The conventional approach is to stop testing once all planned tests are executed and there is consistent and reliable pattern of performance improvement. This approach gives users accurate performance information at that instance. However, one can quickly fall behind by just standing still. The environment in which clients will run the application will always be changing, so it’s a good idea to run ongoing performance tests.
Another alternative is to set up a continual performance test and periodically examine the results. One can “overload” these tests by making use of real world conditions. Regardless of how well it is designed, one will never be able to reproduce all the conditions that application will have to contend with in the real-world environment.
2.3 Pre-Requisites for Performance Testing
Following are the prerequisite which should be in place before performance testing is commenced –
• Quantitative, relevant, measurable, realistic, achievable requirements
As a foundation to all tests, performance requirements should be agreed prior to the test. This helps in determining whether or not the system meets the stated requirements. The following attributes will help to have a meaningful performance comparison.
• Stable system
A test team attempting to construct a performance test of a system whose software is of poor quality is unlikely to be successful. If the software crashes regularly, it will probably not withstand the relatively minor stress of repeated use. Testers will not be able to record scripts in the first instance, or may not be able to execute a test for a reasonable length of time.
• Realistic test environment
The test environment should ideally be the production environment or a close simulation and be dedicated to the performance test team for the duration of the test. A test environment that bears no similarity to the actual production environment may be useful for finding obscure errors in the code, but is, however, useless for a performance test.
• Controlled test environment
Performance testers require stability not only in the hardware and software in terms of its reliability and resilience, but also need changes in the environment or software under test to be minimized. Automated scripts are extremely sensitive to changes in the behavior of the software under test. Test scripts designed to drive client software GUIs are prone to fail immediately, if the interface is changed even slightly. Changes in the operating system environment or database are equally likely to disrupt test preparation as well as execution and should be strictly controlled.
• Performance testing toolkit
The execution of a performance test must be, by its nature, completely automated. However, there are requirements for tools throughout the test process. Main tool requirements for Performance Testing Toolkit are as following -
Test database creation/maintenance
Load generation tools
Resource monitoring
Reporting Tools
3 Performance Testing Methodology
The typical performance testing methodology includes four phases: preparation, development, execution and results summary as shown in the diagram below.
Performance Test Preparation: The first phase starts prior to commencing the performance testing. Project Manager / QA Manager perform preparation tasks such as planning, designing, configuring the environment setup etc.
Script Development: The second phase involves creating the performance test scenarios and relevant test scripts that will be used to test the system.
Test Execution/Analysis: The third phase includes running the scenario. The data gathered during the run is then used to analyze system performance, develop suggestions for system improvement and implement those improvements. The scenarios may be iteratively rerun to achieve load test goals.
Test Results Reporting: The purpose of the last phase is to report the outcome of the work performed for the load test.
3.1 Performance Test Preparation
The first step in a successful implementation is to perform preparation tasks which include planning, analysis/design, defining “white box” measurement, configuring the environment setup, completing product training and making any customization, if needed.
Planning Purpose of planning is to define the implementation goals, objectives, and project timeline. Project managers and/or technical leads typically perform the planning phase in conjunction with the implementation teams –
• Project goals broadly define the problems that will be addressed and the desired outcome for testing.
• The project objectives are measurable tasks that, once completed, will help meet the goals.
• The project timeline will outline the sequence, duration and staff responsibility for each task.
Analysis/Design In this context, the analysis/design should first identify a set of scenarios that model periods of critical system activity. This analysis is especially important in global operations where one continent’s batch processing is running concurrently with another continent’s online processing.
High volume business processes/transactions should be built into the test. Choosing too few transactions might leave gaps in the test while choosing too many will expand the script creation time. It is effective to model the most common 80% of the transaction throughput; trying to achieve greater accuracy is difficult and expensive. This is typically represented by 20% of the business processes—roughly five to 10 key business processes for each system module.
Margin for Error Since load testing is not an exact science, there should be accommodations made to provide a margin for error in the test results. This can compensate for poor design and help avoid false positives or negatives. A load test should include at least one stress test or a peak utilization scenario. A stress test will overdrive the system for a period of time by multiplying the load by some factor—120% or greater. Peak utilization will address the testing of peak system conditions.
White-Box Measurement The white-box measurement section defines the tools and metrics used to measure internal system
Under-test (SUT) performance. This information helps to pinpoint the cause of external performance issues. It also leads to recommendations for resolving those issues.
Environment Setup The purpose of the environment setup phase is to install and configure the system under test. Preparation includes setting up hardware, software, data, performance test tool and white-box tools. Since the sole purpose of this test environment is to conduct performance tests, it must accurately represent the production environment. It is crucial to know the specifications of the Web server, databases, or any other external dependencies the application might have.
• Software In addition to the hardware required for a load test, the test bed must also have fully installed and functioning software. Since Performance Test Tool functions, “just like a user,” the system would need to successfully support all user actions.
• Network it is probably impossible to accurately model each and every network access (FTP, print, Web browse, e-mail download, etc.), it is judicious to examine the current network utilization and understand the impact of incremental network traffic.
• Geography Often the application under test will support a global enterprise. In this environment tests may often need to be run at remote sites across the WAN. WAN connectivity needs to be emulated in the lab, or assumptions must be made.
• Interfaces Large systems seldom service a company’s entire information needs without interfacing to existing legacy systems. The interfaces to these external data sources need to be emulated during the test, or excluded with supporting analysis and justification.
3.2 Script Development
During the script development phase, the test team builds the tests specified in the design phase. This depends on the number of tests, test complexity and the quality of the test design.
Initial Script Development
It is desirable to have a high degree of transparency between virtual users and real human users. In other words, the virtual users should perform exactly the same tasks as the human users. At the most basic level, any performance test tool offers script capture by recording test scripts as the users navigate the application. This recording simplifies test development by translating user activities into test code. Scripts can be replayed to perform exactly the same actions on the system. These scripts are specified in the design and should be self-explanatory. Any of these following issues can easily increase script development time. –
Lack of Functional Support
One of the most important factors in script creation productivity is the amount of functional support provided—access to individuals who understand application functionality. This manifests itself when a test team member encounters a functional error while scripting—the business process won’t function properly. The team member typically has to stop since he or she is not equipped with the skills to solve the issue. At that point, script creation is temporarily halted until a functional team member helps resolve the issue.
Poor Quality of Test Design
The second script development factor is the quality of the test design. Ideally the test design should specify enough information for an individual with little or no application experience to build tests. System test documentation is often an excellent source of this information. Often designs are incorrect or incomplete. As a result, any omission will require functional support to complete script development.
Low Process Stability
To load/stress test a large system, the system’s business processes first need to function properly. It is typically not effective to attempt to load test a system that won’t even work for one user. This typically means that the system needs to be nearly completed.
System Changes
A key factor in script development is the frequency of system changes. For each system revision, test scripts need to be evaluated. Tests may require simple rework or complete reconstruction. While testing tools are engineered to minimize the effect of system change, limiting the system changes will reduce scripting time.
Availability of Test Data
The system will need to be loaded with development test data. This data often comes from a legacy-system conversion and will be a predecessor to the volume data for the test.
Parameterization Script
Replaying the same user actions is not a load test. This is especially true for large multi-user systems where all the users perform different actions. Virtual user development should create a more sophisticated emulation—users should iteratively perform each business process with varying data. Script development next extends the tests to run reliably with parameterized data. This process reflects the randomness of the user population activity.
Build Volume Data Parallel to script development
Volume data should be constructed to support the execution of the load test. Typically business processes consume data—each data value may be used only once. As a result, there needs to be sufficient data to support large numbers of users running for a number of iterations— often 10,000 items or more.
3.3 Test Execution/Analysis
The execution/analysis phase is an iterative process that runs scenarios, analyzes results and debugs system issues. Test runs are performed on a system that is representative of the production environment. Performance test tool is installed on driver hardware that will create traffic against the application under test.
Data Seeding
The system should be “pre-seeded” with data consumed by the testing process. To keep testing
productivity high, there should be enough data to support several iterations before requiring a system refresh.
System Support
The purpose of system support is to help interpret performance results and white-box data. While the Performance tool will describe what occurred, the system support could help to describe why and suggest how to remedy the problems. These suggestions can be implemented and the tests rerun. This iterative process is a natural part of the development process, just like debugging.
Light Load
The first step is to run the scenario’s test scripts with a small numbers of users. Since the scripts functioned properly in the development environment, the emphasis should be to recreate this functional environment for execution. Any new script execution errors will typically indicate system configuration differences. It is advisable to avoid script modifications at this stage and concentrate on system-under-test installation.
Heavy Load
Finally the last step is to run a full-scale load test. This typically consumes 50% of the total execution/analysis time. Once the entire scenario is running, the effort shifts to analyzing the transaction response times and white-box measurements. The goal here is to determine if the system performed properly.
3.4 Test Results Reporting
Finally, the results summary describes the testing, analysis, discoveries and system improvements, as well as the status of the objectives and goals. This typically occurs after the completion of testing and during any final “go live” preparation that is outside the scope of testing.
The Performance Test deliverables could be:
• Performance Test Strategy
• Load Scenarios
• Virtual User Scripts
• Status/Analysis Reports
• Performance Test Summary Document
Following measurements can be illustrated in performance test report –
Attempted Connections: The total number of times the virtual clients attempted to connect to the AUT (Application under Test).
Connect Time: The time it takes for a virtual client to connect to the application being tested (the ABT), in seconds. In other words, the time it takes from the beginning of the HTTP request to the TCP/IP connection.
Hit Time: The time it takes to complete a successful HTTP request, in seconds. (Each request for each database transaction, business logic execution, etc. is a single hit). The time of a hit is the sum of the Connect Time, Send Time, Response Time, and Process Time.
Load Size: The number of virtual clients running concurrently.
Receive Time: The elapsed time between receiving the first byte and the last byte. (Network traffic)
Response Time: The time it takes the ABT to send the object of an HTTP request back to a Virtual Client, in seconds. In other words, the time from the end of the HTTP request until the Virtual Client has received the complete item it requested (Wait Time + Receive Time).
Successful Hits: The total number of times the virtual clients made an HTTP request and received the correct HTTP response from the ABT. (Each request for each database transaction, business logic execution, etc. is a single hit).
Transactions/sec (passed): The number of completed, successful transactions performed per second.
Transactions/sec (failed): The number of incomplete failed transactions per second.
Bandwidth Utilization: Assesses the network health during performance testing.
Memory Utilization: Comparison of memory usage on the server before and during the load test.
%CPU Utilization: Comparison of % CPU utilization on the server before and during the load test.
DB connections: Variation in the number of open db connections