Saturday, July 11, 2009

SQL for Beginners

The database model

-------Name Table------------- 

-------Address Table---------- 

NameId type Long 

AddressId Type Long 

Surname type String 

Line1 type string 

Firstname type String 

line2 type string 

Middlename type string 

City type string 

Male type boolean 

ZipCde type string 

AddressId type Long 

  

The two tables are linked by the AddressId in a one to many relationship.

This means that there can be many Names linked to one address.


 

The data in the tables

Name Table

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

1 

Smith 

Andrew 

John 

true 

1 

2 

Smithe 

Fred 

John 

true 

2 

3 

Wright 

Anne 

  

false 

3 

4 

Jones

Emily 

Anne 

false 

1 

5 

Wright 

David 

Peter 

true 

3 


 

Address Table

AddressId 

Line1 

Line2 

City 

ZipCode 

1 

A Street 

  

London 

  

2 

A Road 

A Town 

Oxon 

  

3 

A House 

Village 

Oxon 

OX1 3ED 

So lets get down to the SQL.

SELECT

SYNTAX: SELECT [{tableName}.]{fieldname}[,[{tablename}.] {fieldname}] FROM {tablename}

This allows us to return all, or a subset, of the data in the tables.

SQL: to return all the fields and records in the name table

SELECT * FROM Name;

Result:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId

  

1 

Smith 

Andrew 

John 

true 

1 

  

2 

Smithe 

Fred 

John 

true 

2 

  

3 

Wright 

Anne 

  

false 

3 

  

4 

Jones 

Emily 

Anne 

false 

1 

  

5 

Wright 

David 

Peter 

true 

3 

Or we could only return certain fields

SQL: to return the id, surname, firstname fields of all the records of the name table

SELECT NameId, Surname, FirstName FROM Name;

Result:

NameID 

Surname 

FirstName 

  

1 

Smith 

Andrew 

  

2 

Smithe 

Fred 

  

3 

Wright 

Anne 

  

4 

Jones 

Emily 

  

5 

Wright 

David 

WHERE

To enable us to have a subset of the data we can add a Where clause to the end of the statement.

SQL: to return all the fields, but only the records that contain Smith in the Surname field

SELECT * FROM Name WHERE Surname='Smith';

Result:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew 

John 

true

1 

If we want to do a search.

SQL: to return all the fields, but only the records that start with 'An' in the Firstname table

SELECT * FROM Name WHERE Firstname Like 'An%';

Result:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew 

John

true 

1 

  

3 

Wright 

Anne 

  

false 

3 

There is no limited to the number of fields we can add to the WHERE clause

SQL: to return all fields from the records that have the Surname 'Wright AND that are Male

SELECT * FROM Name WHERE Surname='Wright' AND Male=True;

Result:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

5 

Wright 

David 

Peter 

true 

3 

SQL: to return all the records that have the Surname 'Wright' OR that are Male

SELECT * FROM Name WHERE Surname='Wright' OR Male=True;

Result:

NameID 

Surname 

FirstName

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew 

John 

true 

1 

  

2 

Smithe 

Fred 

John 

true 

2 

  

3 

Wright 

Anne 

  

false 

3 

  

5 

Wright 

David 

Peter 

true 

3 

SQL:

SELECT * FROM Name WHERE (Surname Like 'Smith%' AND

MiddleName='John') Or Male=False;

Result:

NameID

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew 

John 

true 

1 

  

2 

Smithe 

Fred 

John 

true 

2 

  

3 

Wright 

Anne 

  

false 

3 

  

4 

Jones 

Emily 

Anne 

false 

1 

ORDER BY

It is possible to have the records returned in a certain order

  • Ascending order is A to Z, 0 to 9
  • Descending order is Z to A, 9 to 0

If nothing is specified then it is sorted into ascending order

SQL: to return the Surname and Firstname from all the records sorted in ascending order by the Firstname

SELECT Firstname, Surname FROM Name ORDER BY

FirstName;

or

SELECT Firstname, Surname FROM Name ORDER BY FirstName ASC;

Result:

FirstName 

Surname 

  

Andrew 

Smith 

  

Anne 

Wright 

  

David 

Wright 

  

Jones 

Emily 

  

Fred 

Smithe 

SQL: to return the Surname and Firstname from all the records sorted in descending order by the Firstname

SELECT Firstname, Surname FROM Name ORDER BY FirstName DESC;

Result:

FirstName 

Surname 

  

Fred 

Smithe 

  

Jones 

Emily 

  

David 

Wright 

  

Anne 

Wright 

  

Andrew 

Smith 

GROUP BY

It is also possible to Group identical information together, but you have to put the fields that you want returned. It is not possible to put a * to say the whole table as we have been doing in the previous examples.

SQL: to return all the Male records and grouping the Middlename fields together, then the Surname and finally the FirstName

SELECT Surname, Firstname, MiddleName FROM Name WHERE Male=True

GROUP BY Middlename, Surname, Firstname;

Result:

Surname 

FirstName 

MiddleName 

  

Smith 

Andrew 

John 

  

Smithe 

Fred 

John 

  

Wright 

David 

Peter 

What if we want to select the address for the names. For that we need to use a JOIN (The way tables are joined together in a SQL statement depends on the database so I will give you 2 types Access and Oracle)

Access SQL:

SELECT Address.*, Name.* FROM Address INNER JOIN Name ON

Address.AddressId = Name.AddressId;

Oracle SQL:

SELECT * FROM Address, Name WHERE

Address.AddressID=Name.AddressID

Result: 

AddressId 

Line1 

Line2 

City 

ZipCode 

NameId 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

A Street 

  

London 

  

1 

Smith

Andrew 

John 

true 

1 

  

1 

A Street 

  

London 

  

4 

Jones 

Emily 

Anne 

false 

1 

  

2 

A Road 

A Town 

Oxon 

  

2 

Smithe 

Fred 

John 

true 

2 

  

3 

A House 

Village 

Oxon 

OX1 3ED 

3 

Wright 

Anne 

  

false 

3 

  

3 

A House 

Village 

Oxon 

OX1 3ED 

5 

Wright 

David 

Peter 

true 

3 

We can put a Where clause statement on the end

Access SQL:

SELECT Address.*, Name.* FROM Address INNER JOIN Name ON

Address.AddressId = Name.AddressId WHERE Name.Surname='Wright';

Oracle SQL:

SELECT * FROM Address, Name WHERE AddressID=Name.AddressID AND

Name.Surname='Wright';

Result: 

AddressId 

Line1 

Line2 

City 

ZipCode 

NameId 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

3 

A House 

Village 

Oxon 

OX1 3ED 

3 

Wright 

Anne 

  

false 

3 

  

3 

A House 

Village 

Oxon 

OX1 3ED 

5 

Wright 

David 

Peter 

true 

3 

Modifying records

It's all very well being able to select the records but now we are looking at how to modify them.

The select statements are not going to return any errors if the sql is correct, they might return nothing, but they will work. The queries that modify records can return errors. You must make sure that all the fields that must have something in them are populated and that the fields have the correct type of data (no letters in number fields etc). Otherwise it will not be able to save the record and will return an error.

Another hick-up might be if you had a relationship between two or more tables, you may find that you cannot add data to one table before having a corresponding record in another table (i.e We have to have an address in the address table before we can create a record in the Name table to link to it). This could cause problems with deleting a record as well. There might be records in another table that are joined to the record you are trying to delete. This again will cause an error and stop the process.

UPDATE

SYNTAX: UPDATE {tablename} SET [{tablename}.]{fieldname}=newvalue WHERE {criteria}

So if we want to change the record.

SQL:

UPDATE Name SET Surname="Dickens" WHERE NameID=3;

Before update:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

3 

Wright

Anne 

false 

3 

 

After update:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId  

 

  

3 

Dickens

Anne 

false 

3 

  

SQL:

UPDATE Name SET Surname="Wright", FirstName="Ann" Where

NameID=3;

Before update:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId  

 

  

3 

Dickens 

Anne 

false 

3 

  

After update:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId  

 

  

3 

Wright

Ann</EM< td>

false 

3 

  

INSERT

OK so now we need to add new records to the table. For this we use the INSERT command

SYNTAX: INSERT INTO {Tablename}({fieldname}[,{fieldname}]) VALUES ({value}[,{value}])

So to add a recordSQL:

INSERT INTO Name(NameId, Surname, FirstName, Male) VALUES

(6, "Davis", "Ivan", true)

Result:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew 

John 

true 

1 

  

2 

Smithe 

Fred 

John 

true 

2 

  

3 

Wright 

Anne 

  

false 

3 

  

4 

Jones 

Emily 

Anne 

false 

1 

  

5 

Wright 

David 

Peter 

true 

3 

  

6

Davis

Ivan

  

true

  

To add records to a table from another table assume we had another table called OtherNames:

OtherNames

Surname 

Name 

Age 

Green 

Vicky 

12 

Black 

Steve 

32 

Howells 

Zara 

25 

SQL:

INSERT INTO Name(Surname, FirstName) FROM SELECT Surname, Name

FROM OtherNames;

Result:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew 

John 

true 

1 

  

2 

Smithe 

Fred 

John 

true 

2 

  

3 

Wright 

Anne 

  

false 

3 

  

4 

Jones 

Emily 

Anne 

false 

1 

  

5 

Wright 

David 

Peter 

true 

3 

  

  

Green 

Vicky 

  

  

  

  

  

Black 

Steve 

  

  

  

  

  

Howells 

Zara 

  

  

  

Notice the select statement it is wirtten in just the same way as if it was a SQL query on its own. So we could have had a subset of OtherNames added to the Name table.

i.e SQL:

INSERT INTO Name(Surname, FirstName) FROM SELECT Surname, Name

From OtherNames WHERE Age < 30;

Result:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew

John 

true 

1 

  

2 

Smithe 

Fred 

John 

true 

2 

  

3 

Wright 

Anne 

  

false 

3 

  

4 

Jones 

Emily 

Anne 

false 

1 

  

5 

Wright 

David 

Peter 

true 

3 

  

  

Green 

Vicky 

  

  

  

  

  

Howells 

Zara 

  

  

  

DELETE

Deleting a record This is acheived by using the DELETE command

SYNTAX: DELETE FROM {TableName} WHERE {criteria}

So if we wanted a table with just the women in we could use the following:

SQL:

DELETE FROM Names WHERE Male=true;

Before Delete:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew 

John 

true 

1 

  

2 

Smithe 

Fred 

John 

true 

2 

  

3 

Wright 

Anne 

  

false 

3 

  

4 

Jones 

Emily 

Anne 

false 

1 

  

5 

Wright 

David 

Peter 

true 

3 

After Delete:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

3 

Wright 

Anne 

  

false 

3 

  

4 

Jones 

Emily 

Anne 

false 

1 

Or if we wanted to delete just one record

SQL:

DELETE FROM Names WHERE NameId=3

Before Delete:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew 

John 

true 

1 

  

2 

Smithe 

Fred 

John 

true 

2 

  

3 

Wright 

Anne 

  

false 

3 

  

4 

Jones 

Emily 

Anne 

false 

1 

  

5 

Wright 

David 

Peter 

true 

3 

After Delete:

NameID 

Surname 

FirstName 

MiddleName 

Male 

AddressId 

  

1 

Smith 

Andrew 

John 

true 

1 

  

2 

Smithe 

Fred 

John 

true 

2 

  

4 

Jones 

Emily 

Anne 

false 

1 

  

5 

Wright 

David 

Peter 

true 

3 

This I hope has given you a simple idea of how SQL works. There is a lot more too it, but knowing this should allow you to create small database applications.

Tuesday, July 7, 2009

Jmeter Tips

JMeter is a popular open source tool for load testing, with many useful modeling features such as thread group, timer, and HTTP sampler elements. This article complements the JMeter User's Manual and provides guidelines for using some of the JMeter modeling elements to develop a quality test script.

This article also addresses an important issue in a larger context: specifying precise response-time requirements and validating test results. Specifically, a rigorous statistical method, the confidence interval analysis, is applied.

Please note that I assume readers know the basics of JMeter. This article's examples are based on JMeter 2.0.3.

Determine a thread group's ramp-up period

The first ingredient in your JMeter script is a thread group, so let's review it first. As shown in Figure 1, a Thread Group element contains the following parameters:

  • Number of threads.
  • The ramp-up period.
  • The number of times to execute the test.
  • When started, whether the test runs immediately or waits until a scheduled time. If the latter, the Thread Group element must also include the start and end times.


Figure 1. JMeter Thread Group. Click on thumbnail to view full-sized image.

Each thread executes the test plan independently of other threads. Therefore, a thread group is used to model concurrent users. If the client machine running JMeter lacks enough computing power to model a heavy load, JMeter's distributive testing feature allows you to control multiple remote JMeter engines from a single JMeter console.

The ramp-up period tells JMeter the amount of time for creating the total number of threads. The default value is 0. If the ramp-up period is left unspecified, i.e., the ramp-up period is zero, JMeter will create all the threads immediately. If the ramp-up period is set to T seconds, and the total number of threads is N, JMeter will create a thread every T/N seconds.

Most of a thread group's parameters are self-explanatory, but the ramp-up period is a bit weird, since the appropriate number is not always obvious. For one thing, the ramp-up period should not be zero if you have a large number of threads. At the beginning of a load test, if the ramp-up period is zero, JMeter will create all the threads at once and send out requests immediately, thus potentially saturating the server and, more importantly, deceivingly increasing the load. That is, the server could become overloaded, not because the average hit rate is high, but because you send all the threads' first requests simultaneously, causing an unusual initial peak hit rate. You can see this effect with a JMeter Aggregate Report listener.

As this anomaly is not desirable, therefore, the rule of thumb for determining a reasonable ramp-up period is to keep the initial hit rate close to the average hit rate. Of course, you may need to run the test plan once before discovering a reasonable number.

By the same token, a large ramp-up period is also not appropriate, since the peak load may be underestimated. That is, some of the threads might not have even started, while some initial threads have already terminated.

So how do you verify that the ramp-up period is neither too small nor too large? First, guess the average hit rate and then calculate the initial ramp-up period by dividing the number of threads by the guessed hit rate. For example, if the number of threads is 100, and the estimated hit rate is 10 hits per second, the estimated ideal ramp-up period is 100/10 = 10 seconds. How do you come up with an estimated hit rate? There is no easy way. You just have to run the test script once first.

Second, add an Aggregate Report listener, shown in Figure 2, to the test plan; it contains the average hit rate of each individual request (JMeter samplers). The hit rate of the first sampler (e.g., an HTTP request) is closely related to the ramp-up period and the number of threads. Adjust the ramp-up period so the hit rate of the test plan's first sampler is close to the average hit rate of all other samplers.


Figure 2. JMeter Aggregate Report. Click on thumbnail to view full-sized image.

Third, verify in the JMeter log (located in JMeter_Home_Directory/bin) that the first thread that finishes does indeed finish after the last thread starts. The time difference between the two should be as far apart as possible.

In summary, the determination of a good ramp-up time is governed by the following two rules:

  • The first sampler's hit rate should be close to the average hit rate of other samplers, thereby preventing a small ramp-up period
  • The first thread that finishes does indeed finish after the last thread starts, preferably as far apart as possible, thereby preventing a large ramp-up period

Sometimes the two rules conflict with each other. That is, you simply cannot find a suitable ramp-up period that passes both rules. A trivial test plan usually causes this problem, because, in such a plan, you lack enough samplers for each thread; thus, the test plan is too short, and a thread quickly finishes its work.

User think time, timer, and proxy server

An important element to consider in a load test is the think time, or the pause between successive requests. Various circumstances cause the delay: user needs time to read the content, or to fill out a form, or to search for the right link. Failure to properly consider think time often leads to seriously biased test results. For example, the estimated scalability, i.e., the maximum load (concurrent users) that the system can sustain, will appear low.

JMeter provides a set of timer elements to model the think time, but a question still remains: how do you determine an appropriate think time? Fortunately, JMeter offers a good answer: the JMeter HTTP Proxy Server element.

The proxy server records your actions while you browse a Web application with a normal browser (such as FireFox or Internet Explorer). In addition, JMeter creates a test plan when recording your actions. This feature is extremely convenient for several purposes:

  • You don't need to create an HTTP request manually, especially those tedious form parameters. (However, non-English parameters may not work correctly.) JMeter will record everything in the auto-generated requests, including hidden fields.
  • In the generated test plan, JMeter includes all the browser-generated HTTP headers for you, such as User-Agent (e.g., Mozilla/4.0), or AcceptLanguage (e.g., zh-tw,en-us;q=0.7,zh-cn;q=0.3).
  • JMeter can create timers of your choice, where delay time is set according to the actual delay during the recording period.

Let's see how to configure JMeter with the recording feature. In the JMeter console, right-click the WorkBench element and add the HTTP Proxy Server element. Note that you right-click the WorkBench element, not the Test Plan element, because the configuration here is for recording, not for an executable test plan. The HTTP Proxy Server element's purpose is for you to configure the browser's proxy server so all requests go through JMeter.

Page 3 of 6

As illustrated in Figure 3, several fields must be configured for the HTTP Proxy Server element:

  • Port: The listening port used by the proxy server.
  • Target Controller: The controller where the proxy stores the generated samples. By default, JMeter will look for a recording controller in the current test plan and store the samples there. Alternatively, you can select any controller element listed in the menu. Usually, the default is okay.
  • Grouping: How you would like to group the generated elements in the test plan. Several options are available, and the most sensible one is probably "Store 1st sampler of each group only," otherwise, URLs embedded in a page such as those for images and JavaScripts will be recorded as well. However, you may want to try the default "Do not group samples" option to find out what exactly JMeter creates for you in the test plan.
  • Patterns to Include and Patterns to Exclude: Help you filter out some unwanted requests.


Figure 3. JMeter Proxy Server. Click on thumbnail to view full-sized image.

When you click the Start button, the proxy server starts and begins recording the HTTP requests it receives. Of course, before clicking Start, you must configure your browser's proxy server setting.

You can add a timer as a child of the HTTP Proxy Server element, which will instruct JMeter to automatically add a timer as a child of the HTTP request it generates. JMeter automatically stores the actual time delay to a JMeter variable called T, so if you add a Gaussian random timer to the HTTP Proxy Server element, you should type ${T} in the Constant Delay field, as shown in Figure 4. This is another convenient feature that saves you a lot of time.


Figure 4. Add a Gaussian random timer to the HTTP Proxy Server element. Click on thumbnail to view full-sized image.

Note that a timer causes the affected samplers to be delayed. That is, the affected sampling requests are not sent before the specified delay time has passed since the last received response. Therefore, you should manually remove the first sampler's generated timer since the first sampler usually does not need one.

Before starting the HTTP proxy server, you should add a thread group to the test plan and then, to the thread group, add a recording controller, where the generated elements will be stored. Otherwise, those elements will be added to WorkBench directly. In addition, it is important to add an HTTP Request Defaults element (a Configuration element) to the recording controller, so that JMeter will leave blank those fields specified by the HTTP request defaults.

After the recording, stop the HTTP proxy server; right-click the Recording Controller element to save the recorded elements in a separate file so you can retrieve them later. Don't forget to resume your browser's proxy server setting.

Specify response-time requirements and validate test results

Although not directly related to JMeter, specifying response-time requirements and validating test results are two critical tasks for load testing, with JMeter being the bridge that connects them.

In the context of Web applications, response time refers to the time elapsed between the submission of a request and the receipt of the resulting HTML. Technically, response time should include time for the browser to render the HTML page, but a browser typically displays the page piece by piece, making the perceived response time less. In addition, typically, a load-test tool calculates the response time without considering rendering time. Therefore, for practical purposes of performance testing, we adopt the definition described above. If in doubt, add a constant to the measured response time, say 0.5 seconds.

There is a set of well-known rules for determining response time criteria:

  • Users do not notice a delay of less than 0.1 second
  • A delay of less than 1 second does not interrupt a user's flow of thought, but some delay is noticed
  • Users will still wait for the response if it is delayed by less than 10 seconds
  • After 10 seconds, users lose focus and start doing something else

These thresholds are well known and won't change since they are directly related to the cognitive characteristics of humans. Though you should set your response-time requirements in accordance with these rules, you should also adjust them for your particular application. For example, Amazon.com's homepage abides by the rules above, but because it prefers a more stylistic look, it sacrifices a little response time.

At first glance, there appears to be two different ways to specify response-time requirements:

  • Average response time
  • Absolute response time; that is, the response times of all responses must be under the threshold

Specifying average response-time requirements is straightforward, but the fact that this requirement fails to take into account data variation is disturbing. What if the response time of 20 percent of the samples is more than three times the average? Note that JMeter calculates the average response time as well as the standard deviation for you in the Graph Results listener.

On the other hand, the absolute response-time requirement is quite stringent and statistically not practical. What if only 0.5 percent of the samples failed to pass the tests? Again, this is related to sampling variation. Fortunately, a rigorous statistical method does consider sampling variation: the confidence interval analysis.

Let's review basic statistics before going further.

The central limit theorem

The central limit theorem states that if the population distribution has mean μ and standard deviation σ, then, for sufficiently large n (>30), the sampling distribution of the sampling mean is approximately normal, with mean μmean = μ and standard deviation σmean = σ/√n.

Note that the distribution of the sampling mean is normal. The distribution of the sampling itself is not necessarily normal. That is, if you run your test script many times, the distribution of the resulting average response times will be normal.

Figures 5 and 6 below show two normal distributions. In our context, the horizontal axis is the sampling mean of response time, shifted so the population mean is at the origin. Figure 5 shows that 90 percent of the time, the sampling means are within the interval ±Zσ, where Z=1.645 and σ is the standard deviation. Figure 6 shows the 99-percent case, where Z=2.576. For a given probability, say 90 percent, we can look up the corresponding Z value with a normal curve and vice versa.

Page 5 of 6


Figure 5. Z value for 90 percent



Figure 6. Z value for 99 percent

A few Websites for normal curve calculation are listed in Resources. Note that in those sites, we can calculate the probability of either a symmetric bounded region (e.g., -1.5 < X < 1.5) or a cumulated area (e.g., X < 1.5). You may also look up approximate values from the tables below.

Table 1. Standard deviation range corresponding to a given confidence interval

Confidence Interval

Z

0.800

±1.28155

0.900

±1.64485

0.950

±1.95996

0.990

±2.57583

0.995

±2.80703

0.999

±3.29053


Table 2. Confidence interval corresponding to given standard deviation

Z

Confidence Interval

1

0.6826895

2

0.9544997

3

0.9973002

4

0.9999366

5

0.9999994


Confidence interval

The confidence interval is defined as [sampling mean - Z*σ/√n, sampling mean + Z*σ/√n]. For example, if the confidence interval is 90 percent, we can look up the Z value to be 1.645, and the confidence interval is [sampling mean - 1.645*σ/√n, sampling mean + 1.645*σ/√n], which means that 90 percent of the time, the (unknown) population mean is within this interval. That is, our measurement is "close." Note that if σ is larger, the confidence interval will be larger, which means that it is more likely that the upper bound of the interval will exceed an acceptable value. That is, if σ is larger, it is more likely that the result is not acceptable.

Response-time requirements

Let's translate all this information into response-time requirements. First, you can define the performance requirements like so: The upper bound of the 95-percent confidence interval of the average response time must be less than 5 seconds. Of course, you must add loading requirements and specify a particular scenario as well.

Now, after the performance tests, suppose you analyze the results and discover that the average response time is 4.5 seconds, while the standard deviation is 4.9 seconds. The sample size is 120. You then calculate the 95-percent confidence interval. By looking in Table 1, you find the Z value is 1.95996. Therefore the confidence interval is [4.5 - 1.95996*4.9/√120, 4.5 + 1.95996*4.9/√120], which is [3.62, 5.38]. The result is not acceptable, even though the average response time looks pretty good. In fact, you can verify that the result is not acceptable even for an 80-percent confidence interval. As you can see, applying confidence interval analysis gives you a much more precise method to estimating the quality of your tests.

Note that in the context of Web applications, to measure a scenario's response time, we typically need to instruct the load-testing tool to send multiple requests, for example:

  1. Login
  2. Display a form
  3. Submit the form

Assume we are interested in Request 3. To conduct a confidence interval analysis, we need the average response time and the standard deviation of all of Request 3's samples, not the statistics of all samples.

Note that JMeter's Graph Result listener calculates the average response time and standard deviation of all requests. JMeter's Aggregate Report listener calculates the average response time of individual samplers for you, but, unfortunately, does not give the standard deviation.

Page 6 of 6

In summary, specifying the requirement of average response times alone is dangerous, since it says nothing about data variation. What if the average response time is acceptable, but your confidence interval is only 75 percent? Most likely, you cannot accept the result. Applying the confidence internal analysis, however, gives you much more certainty.

Conclusion

In this article, I have discussed:

  • A fine point of specifying loads with the JMeter Thread Group element
  • Guidelines for creating a JMeter test script automatically using the JMeter Proxy Server element, with emphasis on modeling user think time
  • Confidence interval analysis, a statistical method that we can leverage to specify better response-time requirements

You can improve the quality of your JMeter scripts with the techniques described in this article. From a larger viewpoint, what I have discussed is really part of a performance testing workflow, which differs from an ordinary functional testing workflow. A performance testing workflow includes, but is not limited to, the following activities:

  • Developing performance requirements
  • Selecting testing scenarios
  • Preparing environment for testing
  • Developing test scripts
  • Performing tests
  • Reviewing test scripts and test results
  • Identifying bottlenecks
  • Writing test reports

In addition, the performance test results, including the identified bottlenecks, are fed back to the development team or to an architect for additional optimization design. During this process, developing quality test scripts and reviewing test scripts are probably the trickiest parts and really need careful management. Armed with test-script writing guidelines and a good performance testing workflow, you will have a much better chance for optimizing the performance of your software under heavy loads.

About the author

Chi-Chang Kung is a Java architect with Sun Microsystems Taiwan. He is a member of IEEE Computer Society and ACM.