Increase you search performance by proper Indexes

Abstract

This is a small article on the Index in DB and defining Clustered and Non-Clustered Index with reference to the Microsoft SQL server DB.

  1. Introduction

    In our day to day business application huge amount of transaction information is being stored on our database. In the beginning when a number of records is less and everything is working fine and then suddenly after some time the support guy comes and say query is taking the time to execute and the system is taking the huge amount of time to process the records. Indexes come to prevention where we forecast our SQL queries and how DB generating the Execution SQL plan for the query. Index result in a tremendous improvement in SQL query if used properly and vice versa also is true.

2. Indexes

Indexes are used to explicitly speed up the SQL statement execution on a table. The index points to the location of the rows containing the value. The index is similar to index as in Book to easily go to the specific chapter.

The default type of the Index used in the DB are B-Tree index

By default, when we execute a select command with no primary key and index based on search criteria e.g. Take the below numbers as an Id and we will be writing our select command with id in where clause. So when the database will execute the query it will search sequentially row by row the Id right from the beginning till it finds the desired row. So we can imagine the scenario if we have billions of data.

So in order to minimize the sequential search problem there when B-Tree comes into the picture. B-Tree is a self-balancing tree data structure which is an extension of the Binary tree that keeps data sorted and allows searches, sequential, insertion, and deletion in a quick manner.

Figure 2: B-Tree

Image Source:
https://technet.microsoft.com/en-us/library/ms177443(v=sql.105).aspx

Sequential Search is called Table Scan and when they are searched using B-tree they are called Index seek or index scan. Table scan is very bad from performance point of view as we have mentioned above.

Let’s do a demo to check what kind of scan does SQL server does when there are no indexes and primary key in the table.

So below we have a table called CityName
which doesn’t have any indexer and will contain information regarding the City.

Figure 3: TblCustomer Data

So let’s run our select command with one particular Cityid selected

As shown in the diagram and we will use, Display Estimated Execution plan.

Figure 4: Demonstrating Display Estimated Execution plan option.

Execution plan tells us how the SQL query performs when we don’t have any index in our table.

Figure 5: Demonstrating the execution plan of the query.

We will make use of STATISTICS IO. STATISTICS IO provides the detailed information about the query impact on the DB server. It tells the number of logical reads, physical reads and how many tables was scanned. A good performing query is that query which has less number of logical reads.

Once we execute below sets of command and check the output statistics:

STATISTICS IO ON
SELECT * FROM CityName WHERE CityID='6137'

Figure 6:
Number of logical reads 26

  • Scan Count: This count specifies that the optimizer has chosen a plan that caused 1 Scan count.
  • Logical Reads: This number specifies the number of pages read from the cache. This is the important parameter that is needed to be focused. This number can be decreased using index structure.
  • Physical Read: This number specifies the number of pages actually read from the disk. These are pages that were not in data cache. SQL server performs everything on cache if the requested page is not present it will read the page from the disk and then put the same in the cache, then use that page.
  • Lob logical Reads: This number grows if request any large object such as an image, varchar(max), nvarchar(max).
  • Lob physical read: is same as physical read for lob pages.
  • Lob read-ahead reads: This number specifies Number of textntextimage or large value type pages placed into the cache for the query.

Now let’s follow the below steps and add an index into the table and check the performance.

Figure 7:
Table Design View and Add Indexes/Keys

Figure 8:
Adding an Index and setting Create as Clustered Index as True

Now let’s again run our execution plan and check is it again using Table Scan?

Figure 9:
Demonstrating Index scan now.

Now let’s run our performance statistics and check is there any difference?

(1 row(s) affected)

Table ‘CityName’. Scan count 1, logical reads 2 and physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

We can see the difference before the logical reads were 26 and now they have decreased to 2. So we can say there is a performance improvement by using an Index.

Let’s talk about Clustered and Non Clustered Index.

3. CLUSTERED INDEX:

Clustered Index determines the physical order of data in the table. In Clustered Index the rows are stored physically on the disk in the same order as the Index. So therefor there can only be one Clustered Index because the records can only be ordered in one order.

Figure 10:
Structure of a clustered index in a single partition

Image Source:
https://technet.microsoft.com/en-us/library/ms177443(v=sql.105).aspx

By the statement Clustered Index determines the physical order of data in the table means that data inserted in the table is order wise reference to the Index Column. That means that if we insert data in unordered way still SQL     server will insert in order wise according to the index. If the table is not a Clustered Index, its rows are inserted in unordered structure known as heap.

Let take an example and create two same table with different name and create index in one table and leave the second and try to insert the values in the table and fetch the same.

CREATE TABLE tblOrder
(
order_id int, sku_id int, description varchar(50), mrp decimal, sp decimal
)

CREATE TABLE tblOrderWithIndex
(
order_id int primary key, sku_id int, description varchar(50), mrp decimal, sp decimal )

To check the Index in the table we can use the following command

Exec Sp_helptIndex tableName

Figure 11:
Demonstrating Index created in SQL Server

So here we have created two tables, one with primary key and leaving the other table without an index. Let’s now insert some values into the table.

Inserting values in tblOrderWithIndex table

insert into tblOrderWithIndex values(10,1234,‘Apple Iphone 5’,23000,22000);

insert into tblOrderWithIndex values(1,1235,‘Samsung Note 3’,21000,20000);

Let’s run the select command to check how the records are being retrieved in an order or not?

Figure 12:
Order wise records are retrieved from the database.

Let’s try the same with table without index.


insert into tblOrder values(10,1234,‘Apple Iphone 5’,23000,22000);

insert into tblOrder values(1,1235,‘Samsung Note 3’,21000,20000);

Figure 13.
Checking if there is any Index present in table
tblOrder.

Figure 14.
Un Ordered records are retrieved from the table.

I.e the Clustered Index stores the records in an ordered format while if there is no index the records are not stored in order format.

We can have only one Clustered Index but however the index can have multiple columns can be termed as (Composite Index)

In order to create the Composite Index, we just add the column name to the Index

CREATE CLUSTERED INDEX
In_tblOrderWithIndex_skui_id_description
on
tblOrderWithIndex(description asc,sku_id asc)

If we try to execute the same we will face an error as mentioned below:

Msg 1902, Level 16, State 3, Line 2

Cannot create more than one clustered index on table ‘tblOrderWithIndex’. Drop the existing clustered index ‘PK__tblOrder__46596229992B8137’ before creating another.

So we need to first drop the first Index and create the new composite index.

Figure 15.
Deleting Index.

Delete the Index and create the composite index keys

CREATE CLUSTERED INDEX
In_tblOrderWithIndex_skui_id_description
on
tblOrderWithIndex(description asc,sku_id asc)

Now let’s execute our select command to retrieve the rows and check how the records are now stored.

Figure 16.
Fetching records with Composite Clustered Index.

We can see that now we are getting records based on sku id and description in asc mode.

4. Non-clustered Index

Non-clustered index is a pointer to the data. The data is stored in one place, the index in another place. The index will have a pointer to the location of the data. Since, the data is stored in another place than the actual data we can have a different Non-Clustered Index. We can determine the following as shown below where we have a pointer table with a row address pointing to the actual data.

order_id

sku_id

Description

mrp

sp

2

1233

Apple Iphone 4

11000

10000

10

1234

Apple Iphone 5

23000

22000

1

1235

Samsung Note 3

21000

20000

Table 1.
Actual Data Stored.

order_id ROW Locator

2

ROW ADDRESS

10

ROW ADDRESS

1

ROW ADDRESS

Table 2.
Reference Table pointing towards the Actual Table rows address.

`

Figure 17.
Non Clustered Index pointing towards Clustered Index

Image source: https://technet.microsoft.com/en-us/library/ms177484(v=sql.105).aspx

A Non Clustered Index can be typically be imagined as Book Index.

Img source: https://www.prismnet.com/~hcexres/textbook/images/print_index.gif

Figure 18.
Book Index Page.

We can have a many Non-Clustered Index as shown in above diagram where we have a number of Indexes. In a Non-Clustered Index the data is stored in the ascending and descending order of the Non-Clustered Index key. The Non-Clustered Index doesn’t influence the way records get inserted in the table. The leaf node of the Non-Clustered index is made up of index pages rather than data pages.

Concluding the Clustered and Non-Clustered.

  • We can have only one clustered Index while we can have as many Non-clustered Index.
  • Clustered Index are faster than the Non Clustered as Clustered Index determines the physical order of the records.

    Disadvantage of Clustered Index: As we have already discussed about the advantages of Clustered Index lets discuss the disadvantage of Clustered index.

    As we know that in Clustered Index determines the physical order of data in the table. So if update a record and change the value of the indexed column, the database will need to move an entire column to a new position to make sure the every row is sorted. So the resulting in update query into a delete query followed by an Insert query which obviously leads to decrease in performance.

References

http://odetocode.com/articles/70.aspx

https://www.simple-talk.com/content/article.aspx?article=934

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: