Tuesday, July 7, 2009

Jmeter Tips

JMeter is a popular open source tool for load testing, with many useful modeling features such as thread group, timer, and HTTP sampler elements. This article complements the JMeter User's Manual and provides guidelines for using some of the JMeter modeling elements to develop a quality test script.

This article also addresses an important issue in a larger context: specifying precise response-time requirements and validating test results. Specifically, a rigorous statistical method, the confidence interval analysis, is applied.

Please note that I assume readers know the basics of JMeter. This article's examples are based on JMeter 2.0.3.

Determine a thread group's ramp-up period

The first ingredient in your JMeter script is a thread group, so let's review it first. As shown in Figure 1, a Thread Group element contains the following parameters:

  • Number of threads.
  • The ramp-up period.
  • The number of times to execute the test.
  • When started, whether the test runs immediately or waits until a scheduled time. If the latter, the Thread Group element must also include the start and end times.


Figure 1. JMeter Thread Group. Click on thumbnail to view full-sized image.

Each thread executes the test plan independently of other threads. Therefore, a thread group is used to model concurrent users. If the client machine running JMeter lacks enough computing power to model a heavy load, JMeter's distributive testing feature allows you to control multiple remote JMeter engines from a single JMeter console.

The ramp-up period tells JMeter the amount of time for creating the total number of threads. The default value is 0. If the ramp-up period is left unspecified, i.e., the ramp-up period is zero, JMeter will create all the threads immediately. If the ramp-up period is set to T seconds, and the total number of threads is N, JMeter will create a thread every T/N seconds.

Most of a thread group's parameters are self-explanatory, but the ramp-up period is a bit weird, since the appropriate number is not always obvious. For one thing, the ramp-up period should not be zero if you have a large number of threads. At the beginning of a load test, if the ramp-up period is zero, JMeter will create all the threads at once and send out requests immediately, thus potentially saturating the server and, more importantly, deceivingly increasing the load. That is, the server could become overloaded, not because the average hit rate is high, but because you send all the threads' first requests simultaneously, causing an unusual initial peak hit rate. You can see this effect with a JMeter Aggregate Report listener.

As this anomaly is not desirable, therefore, the rule of thumb for determining a reasonable ramp-up period is to keep the initial hit rate close to the average hit rate. Of course, you may need to run the test plan once before discovering a reasonable number.

By the same token, a large ramp-up period is also not appropriate, since the peak load may be underestimated. That is, some of the threads might not have even started, while some initial threads have already terminated.

So how do you verify that the ramp-up period is neither too small nor too large? First, guess the average hit rate and then calculate the initial ramp-up period by dividing the number of threads by the guessed hit rate. For example, if the number of threads is 100, and the estimated hit rate is 10 hits per second, the estimated ideal ramp-up period is 100/10 = 10 seconds. How do you come up with an estimated hit rate? There is no easy way. You just have to run the test script once first.

Second, add an Aggregate Report listener, shown in Figure 2, to the test plan; it contains the average hit rate of each individual request (JMeter samplers). The hit rate of the first sampler (e.g., an HTTP request) is closely related to the ramp-up period and the number of threads. Adjust the ramp-up period so the hit rate of the test plan's first sampler is close to the average hit rate of all other samplers.


Figure 2. JMeter Aggregate Report. Click on thumbnail to view full-sized image.

Third, verify in the JMeter log (located in JMeter_Home_Directory/bin) that the first thread that finishes does indeed finish after the last thread starts. The time difference between the two should be as far apart as possible.

In summary, the determination of a good ramp-up time is governed by the following two rules:

  • The first sampler's hit rate should be close to the average hit rate of other samplers, thereby preventing a small ramp-up period
  • The first thread that finishes does indeed finish after the last thread starts, preferably as far apart as possible, thereby preventing a large ramp-up period

Sometimes the two rules conflict with each other. That is, you simply cannot find a suitable ramp-up period that passes both rules. A trivial test plan usually causes this problem, because, in such a plan, you lack enough samplers for each thread; thus, the test plan is too short, and a thread quickly finishes its work.

User think time, timer, and proxy server

An important element to consider in a load test is the think time, or the pause between successive requests. Various circumstances cause the delay: user needs time to read the content, or to fill out a form, or to search for the right link. Failure to properly consider think time often leads to seriously biased test results. For example, the estimated scalability, i.e., the maximum load (concurrent users) that the system can sustain, will appear low.

JMeter provides a set of timer elements to model the think time, but a question still remains: how do you determine an appropriate think time? Fortunately, JMeter offers a good answer: the JMeter HTTP Proxy Server element.

The proxy server records your actions while you browse a Web application with a normal browser (such as FireFox or Internet Explorer). In addition, JMeter creates a test plan when recording your actions. This feature is extremely convenient for several purposes:

  • You don't need to create an HTTP request manually, especially those tedious form parameters. (However, non-English parameters may not work correctly.) JMeter will record everything in the auto-generated requests, including hidden fields.
  • In the generated test plan, JMeter includes all the browser-generated HTTP headers for you, such as User-Agent (e.g., Mozilla/4.0), or AcceptLanguage (e.g., zh-tw,en-us;q=0.7,zh-cn;q=0.3).
  • JMeter can create timers of your choice, where delay time is set according to the actual delay during the recording period.

Let's see how to configure JMeter with the recording feature. In the JMeter console, right-click the WorkBench element and add the HTTP Proxy Server element. Note that you right-click the WorkBench element, not the Test Plan element, because the configuration here is for recording, not for an executable test plan. The HTTP Proxy Server element's purpose is for you to configure the browser's proxy server so all requests go through JMeter.

Page 3 of 6

As illustrated in Figure 3, several fields must be configured for the HTTP Proxy Server element:

  • Port: The listening port used by the proxy server.
  • Target Controller: The controller where the proxy stores the generated samples. By default, JMeter will look for a recording controller in the current test plan and store the samples there. Alternatively, you can select any controller element listed in the menu. Usually, the default is okay.
  • Grouping: How you would like to group the generated elements in the test plan. Several options are available, and the most sensible one is probably "Store 1st sampler of each group only," otherwise, URLs embedded in a page such as those for images and JavaScripts will be recorded as well. However, you may want to try the default "Do not group samples" option to find out what exactly JMeter creates for you in the test plan.
  • Patterns to Include and Patterns to Exclude: Help you filter out some unwanted requests.


Figure 3. JMeter Proxy Server. Click on thumbnail to view full-sized image.

When you click the Start button, the proxy server starts and begins recording the HTTP requests it receives. Of course, before clicking Start, you must configure your browser's proxy server setting.

You can add a timer as a child of the HTTP Proxy Server element, which will instruct JMeter to automatically add a timer as a child of the HTTP request it generates. JMeter automatically stores the actual time delay to a JMeter variable called T, so if you add a Gaussian random timer to the HTTP Proxy Server element, you should type ${T} in the Constant Delay field, as shown in Figure 4. This is another convenient feature that saves you a lot of time.


Figure 4. Add a Gaussian random timer to the HTTP Proxy Server element. Click on thumbnail to view full-sized image.

Note that a timer causes the affected samplers to be delayed. That is, the affected sampling requests are not sent before the specified delay time has passed since the last received response. Therefore, you should manually remove the first sampler's generated timer since the first sampler usually does not need one.

Before starting the HTTP proxy server, you should add a thread group to the test plan and then, to the thread group, add a recording controller, where the generated elements will be stored. Otherwise, those elements will be added to WorkBench directly. In addition, it is important to add an HTTP Request Defaults element (a Configuration element) to the recording controller, so that JMeter will leave blank those fields specified by the HTTP request defaults.

After the recording, stop the HTTP proxy server; right-click the Recording Controller element to save the recorded elements in a separate file so you can retrieve them later. Don't forget to resume your browser's proxy server setting.

Specify response-time requirements and validate test results

Although not directly related to JMeter, specifying response-time requirements and validating test results are two critical tasks for load testing, with JMeter being the bridge that connects them.

In the context of Web applications, response time refers to the time elapsed between the submission of a request and the receipt of the resulting HTML. Technically, response time should include time for the browser to render the HTML page, but a browser typically displays the page piece by piece, making the perceived response time less. In addition, typically, a load-test tool calculates the response time without considering rendering time. Therefore, for practical purposes of performance testing, we adopt the definition described above. If in doubt, add a constant to the measured response time, say 0.5 seconds.

There is a set of well-known rules for determining response time criteria:

  • Users do not notice a delay of less than 0.1 second
  • A delay of less than 1 second does not interrupt a user's flow of thought, but some delay is noticed
  • Users will still wait for the response if it is delayed by less than 10 seconds
  • After 10 seconds, users lose focus and start doing something else

These thresholds are well known and won't change since they are directly related to the cognitive characteristics of humans. Though you should set your response-time requirements in accordance with these rules, you should also adjust them for your particular application. For example, Amazon.com's homepage abides by the rules above, but because it prefers a more stylistic look, it sacrifices a little response time.

At first glance, there appears to be two different ways to specify response-time requirements:

  • Average response time
  • Absolute response time; that is, the response times of all responses must be under the threshold

Specifying average response-time requirements is straightforward, but the fact that this requirement fails to take into account data variation is disturbing. What if the response time of 20 percent of the samples is more than three times the average? Note that JMeter calculates the average response time as well as the standard deviation for you in the Graph Results listener.

On the other hand, the absolute response-time requirement is quite stringent and statistically not practical. What if only 0.5 percent of the samples failed to pass the tests? Again, this is related to sampling variation. Fortunately, a rigorous statistical method does consider sampling variation: the confidence interval analysis.

Let's review basic statistics before going further.

The central limit theorem

The central limit theorem states that if the population distribution has mean μ and standard deviation σ, then, for sufficiently large n (>30), the sampling distribution of the sampling mean is approximately normal, with mean μmean = μ and standard deviation σmean = σ/√n.

Note that the distribution of the sampling mean is normal. The distribution of the sampling itself is not necessarily normal. That is, if you run your test script many times, the distribution of the resulting average response times will be normal.

Figures 5 and 6 below show two normal distributions. In our context, the horizontal axis is the sampling mean of response time, shifted so the population mean is at the origin. Figure 5 shows that 90 percent of the time, the sampling means are within the interval ±Zσ, where Z=1.645 and σ is the standard deviation. Figure 6 shows the 99-percent case, where Z=2.576. For a given probability, say 90 percent, we can look up the corresponding Z value with a normal curve and vice versa.

Page 5 of 6


Figure 5. Z value for 90 percent



Figure 6. Z value for 99 percent

A few Websites for normal curve calculation are listed in Resources. Note that in those sites, we can calculate the probability of either a symmetric bounded region (e.g., -1.5 < X < 1.5) or a cumulated area (e.g., X < 1.5). You may also look up approximate values from the tables below.

Table 1. Standard deviation range corresponding to a given confidence interval

Confidence Interval

Z

0.800

±1.28155

0.900

±1.64485

0.950

±1.95996

0.990

±2.57583

0.995

±2.80703

0.999

±3.29053


Table 2. Confidence interval corresponding to given standard deviation

Z

Confidence Interval

1

0.6826895

2

0.9544997

3

0.9973002

4

0.9999366

5

0.9999994


Confidence interval

The confidence interval is defined as [sampling mean - Z*σ/√n, sampling mean + Z*σ/√n]. For example, if the confidence interval is 90 percent, we can look up the Z value to be 1.645, and the confidence interval is [sampling mean - 1.645*σ/√n, sampling mean + 1.645*σ/√n], which means that 90 percent of the time, the (unknown) population mean is within this interval. That is, our measurement is "close." Note that if σ is larger, the confidence interval will be larger, which means that it is more likely that the upper bound of the interval will exceed an acceptable value. That is, if σ is larger, it is more likely that the result is not acceptable.

Response-time requirements

Let's translate all this information into response-time requirements. First, you can define the performance requirements like so: The upper bound of the 95-percent confidence interval of the average response time must be less than 5 seconds. Of course, you must add loading requirements and specify a particular scenario as well.

Now, after the performance tests, suppose you analyze the results and discover that the average response time is 4.5 seconds, while the standard deviation is 4.9 seconds. The sample size is 120. You then calculate the 95-percent confidence interval. By looking in Table 1, you find the Z value is 1.95996. Therefore the confidence interval is [4.5 - 1.95996*4.9/√120, 4.5 + 1.95996*4.9/√120], which is [3.62, 5.38]. The result is not acceptable, even though the average response time looks pretty good. In fact, you can verify that the result is not acceptable even for an 80-percent confidence interval. As you can see, applying confidence interval analysis gives you a much more precise method to estimating the quality of your tests.

Note that in the context of Web applications, to measure a scenario's response time, we typically need to instruct the load-testing tool to send multiple requests, for example:

  1. Login
  2. Display a form
  3. Submit the form

Assume we are interested in Request 3. To conduct a confidence interval analysis, we need the average response time and the standard deviation of all of Request 3's samples, not the statistics of all samples.

Note that JMeter's Graph Result listener calculates the average response time and standard deviation of all requests. JMeter's Aggregate Report listener calculates the average response time of individual samplers for you, but, unfortunately, does not give the standard deviation.

Page 6 of 6

In summary, specifying the requirement of average response times alone is dangerous, since it says nothing about data variation. What if the average response time is acceptable, but your confidence interval is only 75 percent? Most likely, you cannot accept the result. Applying the confidence internal analysis, however, gives you much more certainty.

Conclusion

In this article, I have discussed:

  • A fine point of specifying loads with the JMeter Thread Group element
  • Guidelines for creating a JMeter test script automatically using the JMeter Proxy Server element, with emphasis on modeling user think time
  • Confidence interval analysis, a statistical method that we can leverage to specify better response-time requirements

You can improve the quality of your JMeter scripts with the techniques described in this article. From a larger viewpoint, what I have discussed is really part of a performance testing workflow, which differs from an ordinary functional testing workflow. A performance testing workflow includes, but is not limited to, the following activities:

  • Developing performance requirements
  • Selecting testing scenarios
  • Preparing environment for testing
  • Developing test scripts
  • Performing tests
  • Reviewing test scripts and test results
  • Identifying bottlenecks
  • Writing test reports

In addition, the performance test results, including the identified bottlenecks, are fed back to the development team or to an architect for additional optimization design. During this process, developing quality test scripts and reviewing test scripts are probably the trickiest parts and really need careful management. Armed with test-script writing guidelines and a good performance testing workflow, you will have a much better chance for optimizing the performance of your software under heavy loads.

About the author

Chi-Chang Kung is a Java architect with Sun Microsystems Taiwan. He is a member of IEEE Computer Society and ACM.

No comments: