Certain tasks often require accelerating either the whole or some specific part of an application. If application works with a database, query execution time should be taken into consideration. That is when the SQL query optimization is called up.

In order to understand what queries need optimization, we need to perform some analysis first. Additionally, it can help us identify parts of our query that need improvement. In this article, we will explore how to optimize SQL queries by analyzing database query execution time and using this data to increase performance.


Contents:

Introduction
Searching critical parts
   SQL Server Profiler
Query execution analysis: Execution Plan
   Text execution plan
   Graphical Execution Plan
Query optimization
   Productive indexes
   When to not use indexes
   Optimal order of joining tables
   Hints for a query optimizer
   Statistics
Summary
References


Written by:
Anastasiya Karpenko,
Software Developer of Web Development Team

Introduction

There are many literature and internet publications on techniques and best practices of query optimization, covering all available database management systems, like Oracle and MySQL. But I decided to share my own experience in the field with the focus on SQL Server query optimization tips. I describe the steps I take and try to provide some examples for query optimization in SQL Server.

This article will provide some tips and tricks for any developers working with DBMS MS SQL Server, who want to increase the efficiency of their queries.

We will look at analysis tools provided by MS SQL Server Management Studio, and how to use them for SQL Server query optimization.

Searching critical parts

Before we start analyzing and optimizing SQL queries, we need to understand what queries require optimization.

Depending on a specific case, this task can have varying starting conditions:

  • You already know what query runs slowly;
  • You only know that your entire system has generally poor performance that requires some tuning.

In second case, we need to identify critical parts where improvements need to be made. Certain MS SQL Server query optimization tools are provided for this.

SQL Server Profiler

Query optimization in SQL Server is backed up by in-built tools. SQL Server Profiler is a tool bundled with MS SQL Server package. It helps us to trace queries received by the server. This allows us to see the text of the query and check the execution time.

You can run SQL Server Profiler from SQL Server Management Studio: go to Tools > SQL Server Profiler

Call SQL Server Profiler

Using Profiler, you can set filters to choose the longest running queries.

For this, we need to perform the following steps:

  1. Create a new trace file and go to the Events Selection in a Trace Properties window.
  2. Click the Custom Filters button.
  3. Edit Filter window will appear.
  4. Choose Duration from the list on the left.
  5. Choose the corresponding condition and set the required values.

SQL Server Profiler - filter queries with the longest execution duration

Query execution analysis: Execution Plan

After we detected the problematic queries, we can start analyzing how they are executed.

One of the ways to analyze a query is to use Execution Plan. It is created by SQL Server query optimizer, and provides you with information on how a query will be executed and show you all individual actions that constitute this query.

MS SQL Management Studio provides two ways to view Execution Plan: text and graphical.

Text execution plan

You can turn on Text Execution Plan display by using the statement: SET SHOWPLAN_TEXT ON. Execution Plan will then be shown when you click the Execute button (i.e., when executing the query).

Let’s say we have a query that we need to see an Execution Plan for:

SELECT DISTINCT TOP 100 [PPE].[ID] AS [EmployeeId]
               ,[FI].[ID] AS          [InvoiceId]
               ,[FI].[InvoiceType]
               ,[FI].[SaleDate]
               ,[FI].[Account ID] AS  [CustomerId]
FROM Employee AS PPE
    JOIN Invoice AS FI ON PPe.ID = FI.RepId

After this query is executed, instead of results we will see the execution plan in the following way:

Text execution plan

As a result, we will see the sequence of actions performed during the query execution.

Graphical Execution Plan

However, for SQL performance tips it is much more convenient to use a graphical query execution plan. For the same query, it will look like this:

Graphical execution plan

It is read from right to left and from top to bottom. Icons show what operations with data will be performed, while arrows show the amount of data being processed (larger arrow means more data).

It is worth paying attention to the parts that show how the data was retrieved from the table and how results from each table where joined.

There are several basic ways to retrieve data from the table (i.e. methods that were used to extract data from a particular table):

 

Graphical Execution Plan Icon

Showplan Operator

Description

Table Scan

Retrieves all rows from the table without using any indexes

Index Scan

Retrieves data from the table via an index that does not completely correspond to selection conditions

 

Index Seek

Retrieves data from the table via an index that fully corresponds to the statement in WHERE

Main ways to join the table are shown below:

Graphical Execution Plan Icon

Showplan Operator

Description

Nested Loops

Incremental table processing. On the first stage, data from the main table are retrieved (FROM statement), then records from the joined table (JOIN statement) are selected for every selected row.

Hash Match

Tables are processed independently from each other. Results with the least amount of data are hashed by the join key and compared with results of processing of other tables.

Merge Join

Tables are processed and sorted by the join key and then joined. Database goes through lists, alternating between them, and comparing upper lines on each step and excluding ones that are located above the upper line of the other list and returning matching lines.

You can find more about these and other operators here (https://msdn.microsoft.com/en-us/library/ms191158.aspx).

As you can see, Graphical Execution Plan contains more useful information to optimize SQL query. This view allows to understand what types of operations take the most time in relation to the time it takes to execute the whole query. In this example, we see that selecting data from tables takes the most amount of time, while joining data took only 3% of total time.

Moreover, execution plan allows us to see additional information on every object of the tree structure, which can be used in further SQL optimization techniques. For this, you need to click on an object or simply hover a mouse over it.

Detailed information on Nested Loop joins:

Nested Loops - operation details

This image shows detailed information on data, received from the Invoice table (Index Seek) and transferred further to be joined (by nested loops scheme) with data from employee table:

Execution plan - operation details

 

Execution plan also can show you missing indexes that could improve performance and thus optimize SQL query. It would look like this:

 

SQL query Execution plan - details and percentage of duration

 

Apart from showing the index that needs to be added, plan also shows what kind of effect we will get from it. In this case, it will increase performance on 38%.

After we detected what parts need to be optimized, we can start to optimize the query.

Query optimization

There are several basic ways to improve Execution Plan:

  1. Productive indexes
  2. Optimal table joining order
  3. Hints for SQL query optimizer
  4. Statistics

Productive indexes

Just having indexes does not mean that they speed up your query. Creating another indexes in a table can speed up the reading process, but will have a negative impact when changing the table.  This is why it is important to understand what kind of operations are performed with the table.

Moreover, when creating composite index, the order of columns is very important. But how to determine this order? In order to do this, we can use selectivity coefficient. This coefficient shows how effective specific column is in the query, or exactly how much records it cuts off. In other words, how much records, as compared to the general amount, are selected with condition that uses analyzed column. The fewer records we get, the faster will our query be processed. Therefore, primary keys and unique fields have the best selectivity coefficient. This is why by default for primary keys the clustered index is created.

But how to find out which column has the best coefficient? For example, we have a query:

SELECT *
FROM [Employee]
WHERE [State] = 'UT' AND [Type] = 'Admin'
  AND [expiration date] BETWEEN @beginDate AND @endDate;

We need to understand which column in this script is the most effective one. For this, we calculate the amount of records that will come back as a result of executing the query with each condition separately, and also calculate the general amount of lines in the Employee table:

SELECT count(*) FROM [Employee]                                -- 10100 rows
SELECT count(*) FROM [Employee] WHERE [State] = 'UT'           -- 2896 rows
SELECT count(*) FROM [Employee] WHERE [Type] = 'Admin'         -- 849 rows
SELECT count(*) FROM [Employee] 
WHERE [expiration date] BETWEEN @beginDate AND @endDate         -- 3380 rows
	

Therefore, the field [Type] is the most selective column.

In order to get the next index column we need to run same scripts including the first column.

SELECT count(*) FROM [Employee] WHERE [Type] = 'Admin' 
AND [State] = 'UT'                                      -- 343 rows
SELECT count(*) FROM [Employee] WHERE [Type] = 'Admin' 
AND [expiration date] BETWEEN @beginDate AND @endDate   -- 314 rows

In this case, there is no distinct difference between the last two columns so it does not matter which goes second and which is third.

As a result of this query we can form the index ([Type], [State], [expiration date]).

But this method only works when we have known constants used in query execution. In other case, selectivity of the column can be quickly evaluated by the purpose of the column, i.e. what type of data it will store. For example, selectivity of the field Age will be higher than that of the field Sex. Apart from the purpose of the column, we can find out selectivity by using statistics on the column.

When not to use indexes

Apart from knowing the importance of a column order inside the index, it is also important to know when indexes will not work, in other words, when even if some indexes are introduced, the whole table is scanned anyway.

Indexes are not used when:

  1. Function or operation is applied to a column. For example:
SELECT * FROM [Employee] WHERE UPPER([State]) = 'UT' 
SELECT * FROM [Employee] WHERE ISNULL([IsActive],0) = 1 
SELECT * FROM [Employee] WHERE [Salary] + 1000 >= 5000 

The same goes for the type conversion function:

SELECT * FROM [Employee] WHERE CAST([create date] AS DATE) >= GETDATE() 
  1. Range of values is too big.

If we need to get data from a certain range of values, then it is important to remember that if this range is too big, optimizer may decide that there is no need to use index and it’s better to simply scan the whole table.

Ideally, it is recommended to use the EQUALS operator (=) for indexed fields. Of course, you can’t get by with this single operator, you need to use operators BETWEEN, LIKE, <, >, <=, or >= for comparisons. But it is necessary to set a condition that would produce the smallest amount of results. And when creating clustered index, it is also necessary to put first columns on which EQUALS was used.

Optimal order of joining tables

The order of joining tables requires similar approach to selectivity coefficient index. It is best to join tables starting with the one that will produce the least amount of results after filtering. This method is thoroughly described with examples in the Dan Tow book «SQL Tuning» (http://www.amazon.com/SQL-Tuning-Dan-Tow/dp/0596005733).

In reality, query optimizer used by MS SQL Server chooses the optimal order to join tables automatically. But there is a way to make it follow the predesigned order. For this, we can use the query hint OPTION (FORCE ORDER), that we will look at next.

Hints for a query optimizer

Although query optimizer is fairly independent when it comes to making decisions, there is still a way to control it. We can use query hints to introduce changes in the execution plan.

The following hints exist:

  1. Hints for using indexes
  1. WITH (INDEX<index_name>) – shows which index needs to be used while retrieving data from the table.
  2. WITH (INDEX(0)) – prohibits the use of any index for a table, i.e. full table scan will be performed to get results, or if there is a clustered index, index scan will be performed. This operation can be useful when it is necessary to check the usefulness of adding a certain index.
  1. Hints for joining tables
  1. LOOP, HASH – makes optimizer to join tables using Nested Loops or Hash Match methods. If at least one such hint is present in the script, use of the OPTION (FORCE ORDER) hint is also assumed.
  2. OPTION (LOOP/HASH JOIN) – shows that Nested Loops/Hash Match method should be used for all table joins.
  3. OPTION (FORCE ORDER) – shows that joined tables need to execute in an order in which they are listed in the query.
  4. OPTION (FAST 1) – Shows optimizer that it is necessary to show beginning lines as fast as possible which automatically enables OPTION (LOOP JOIN) hint. In addition, if there is an ORDER BY sorting in your query, then execution of this hint becomes meaningless.  

You can find more information and examples about an OPTION statement here (https://msdn.microsoft.com/en-us/library/ms190322.aspx).

Statistics

Query optimizer uses statistics to create execution plans. Statistics for SQL query optimization are objects with statistical information regarding the distribution of values inside an indexed view or columns of the table. If statistics is formed for an index or a table, then query optimizer can find the optimal execution plan faster. You can form your own statistics or edit existing ones to help optimizer.

On the other hand, execution plans based on large and small volumes can be considerably different. Therefore, statistics will also be different. It is often necessary to check the work of your queries on a big database. But test data (often not real) cannot give the same precision that real data can. Also, it is not always possible to get the copy of a real-life database. In this case, we can get statistics from the real database and load it to a test base. This way, query optimizer will think that it works with large amounts of data and will create execution plans accordingly.

Summary

In this article, we've discussed some basic SQL query optimization tips. As you can see, MS SQL Server provides enough tools to conduct performance analysis and tuning of SQL scripts. The most important thing is to learn the rules on how to use them and to understand nuances of work with main database objects, such as tables and indexes. And eventually, SQL optimization and analysis will seem fun and simple.

References

SQL Server Profiler

Helpful links for working with Execution Plan

SQL Query optimization

Subscribe to updates