Transaction Log

Firstly a brief introduction to the SQL Server Transaction Log and a process called write ahead logging.

It is this file (or sometimes files) that guarantees transactional consistency within SQL Server. A committed transaction will always write all transaction log records relating to that transaction to the physical log file before returning a committed notification to the client.

  • Transaction starts
  • DML operations occur and data starts being changed in the buffer pool
  • Transaction is committed
  • Transaction log records are written to disk
    • All as yet unwritten transaction log records in the log buffer are written to disk here and not just those relating to this transaction.
  • Commit success notification sent to the client
  • As some point thereafter, the checkpoint process copies the dirty data pages from the buffer pool to the physical data file(s).
    • The chronology described here could be different. The data pages could be written to disk prior to the transaction committing. SQL Server always guarantees however, that the log records will be written to disk prior to the data pages.
    • The occurrence of a checkpoint can be controlled by specifying the recovery interval at either the server or database level. Checkpoints can also be issued manually or internally by SQL Server.
    • All dirty pages are written to disk by the checkpoint process and not just those pertaining to a specific transaction.

Next, we will have some fun and investigate what happens when a log file is deleted.

If I create a database like so


CREATE DATABASE LogDeleteTest ON PRIMARY
(NAME = N'LogDeleteTest_Data', FILENAME = 'C:\SQL\LogDeleteTest_Data.mdf', SIZE = 10 MB,FILEGROWTH=10 MB, MAXSIZE = UNLIMITED)
LOG ON
(NAME = N'LogDeleteTest_Log', FILENAME = 'C:\SQL\LogDeleteTest_Log.ldf', SIZE = 10 MB,FILEGROWTH=10 MB, MAXSIZE = UNLIMITED);

Simply deleting the log file in windows explorer is not possible and gives this error

Log file delete error.JPG

If I take the database offline however, by issuing


ALTER DATABASE LogDeleteTest SET OFFLINE;

Then the delete works just fine. Bringing the database online


ALTER DATABASE LogDeleteTest SET ONLINE;

Then gives this: –

File activation failure. The physical file name “C:\SQL\LogDeleteTest_log.ldf” may be incorrect.
New log file ‘C:\SQL\LogDeleteTest_log.ldf’ was created.

The log file has been automatically created at C:\SQL\LogDeleteTest_log.ldf and the database is online and available.

However, we shut the database down cleanly prior to deleting the log file. This means a checkpoint occurred and all uncommitted transactions were rolled back and all committed were hardened to disk. This means the transaction log was rebuilt with no risk of data loss.

The fun happens when the database is not shut down cleanly prior to deleting the log file. Do this by performing an DDL or DML action to create a log record, stopping the SQL Server service on a test box and delete the log file as before. This then results in

Recovery pending

We then have to make use of emergency mode like so


USE master;
ALTER DATABASE LogDeleteTest SET SINGLE_USER;
ALTER DATABASE LogDeleteTest SET EMERGENCY;
DBCC CHECKDB (N'LogDeleteTest',REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS;
ALTER DATABASE LogDeleteTest SET MULTI_USER;

This is in the output, the log file has been created and the database is online and available

File activation failure. The physical file name “C:\SQL\LogDeleteTest_Log.ldf” may be incorrect.
Warning: The log for database ‘LogDeleteTest’ has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

If you consider what happened here, this is a really bad situation for the database to now be in. At the start while talking about write ahead logging, we described how SQL Server relies on the transaction log to roll back uncommitted transactions and roll forward committed transactions. This ensures data page updates for a transaction that was never committed are removed, and data page updates that were not previously performed for transactions that were committed are now done. Without the transaction log, this process has been completely omitted, meaning our database could now very well be in a transactionally inconsistent state where partially performed transactions are now within the database. At least your database is now up and available however if that was your primary goal.

Using emergency mode should be a last resort. You should always have a robust, documented disaster recovery solution in place to recover from such situations by other means with less exposure to data loss. If your last good log backup was 30 minutes before the log file was deleted however, you would have to make a judgement call. Either restore from backup and lose 30 minutes of data or use emergency mode and your last checkpoint could have only been 10 seconds before the log file was dropped. Maybe an option here would be to use both methods and then resolve manually by comparing the two. Always keep a copy of the files prior to taking any action, so you can revert to that if you make things worse.

 

Cost Threshold for Parallelism

Cost Threshold for Parallelism is a server wide setting which specifies the lower bound over which plans will execute in parallel.

The default value on installing SQL Server is 5. This means any query with a cost deemed to be greater than 5 will execute in parallel. There are no units for this measure and none are required. Just know that SQL Server is a cost based engine and determines a cost for each operator in a query plan and an overall cost. We just need to know that 6 is greater than 5 and 50 is less than 199 etc.

You can view and specify this setting like so


EXEC sys.sp_configure @configname = 'cost threshold for parallelism',
 @configvalue = 1000;

RECONFIGURE;

SELECT *
FROM sys.configurations
WHERE name = 'cost threshold for parallelism';

If we look at the execution plan for the following query


DBCC FREEPROCCACHE
DBCC SETCPUWEIGHT(1000)
GO

USE AdventureWorks2012;
SELECT * FROM Sales.SalesOrderHeader AS h
INNER JOIN Sales.SalesOrderDetail AS d ON d.SalesOrderID = h.SalesOrderID
GO

DBCC SETCPUWEIGHT(1)
DBCC FREEPROCCACHE
go

We see that the overall cost is 538.914 (DBCC SETCPUWEIGHT is an undocumented command allowing me to artificially increase the cost here for demo purposes).

Cost for query

This means we will have a serial plan as our cost threshold is 1000 as set above. If I change the cost threshold to 538, then the new plan looks like this

Parallel plan

We know this is a parallel plan because of the double yellow arrows on the operators.

The big question remains however, what is the optimal value for cost threshold for parallelism? There is no definitive correct answer to this question. There is overhead for SQL Server to split a workload over multiple cores, then merge the output back together. That is why everything shouldn’t be parallel, for low cost queries it is detrimental for performance.

The range of valid values are 0 – 32,767 and trying the full range would probably result in a bell curve, too low and everything is parallel, too high and the majority of queries will remain serial and you will lose the potential performance benefit.

The answer is to test. Run a production like workload with different cost threshold values on a test server and monitor all metrics using a third party tool and compare to your baseline. If you were feeling very brave, you could change the value in live and monitor similarly. I have heard wiser people than me say they change the default to 25 or 50 as a first sensible change.

Some further points before wrapping up. When threads sharing the load in a parallel plan are waiting for the last to complete, the wait type is CXPACKET. Similarly when the repartition operator is waiting on all threads to complete and deliver their results, the wait type is the same. This is why CXPACKET waits are a perfectly normal symptom of parallelism and are not necessarily something to worry about when performance tuning.

Lastly, the whole post above depends on you having more than one CPU core of course. Also that the number of available CPU cores for use has not been throttled by setting max degree of parallelism to be 1 at either the instance or query level.

Index Computed Columns to Help Performance

A great way to improve performance without having to make any application changes is to add a computed column and index it.

This can be demonstrated with the help of an example, create a session log table and insert some data


SET ANSI_NULLS ON;
 SET QUOTED_IDENTIFIER ON;
 SET ANSI_PADDING ON;
 SET NOCOUNT ON;

IF EXISTS ( SELECT 1
 FROM sys.tables
 WHERE [object_id] = OBJECT_ID(N'dbo.SessionLog') )
 BEGIN;
 DROP TABLE dbo.SessionLog;
 END;

CREATE TABLE dbo.SessionLog
 (
 SessionID INT IDENTITY(1, 1)
 NOT NULL ,
 SessionStart DATETIME NOT NULL ,
 SessionEnd DATETIME NOT NULL ,
 CONSTRAINT CK_SessionLog_SessionStart_SessionEnd CHECK ( SessionEnd >= SessionStart )
 );

ALTER TABLE dbo.SessionLog ADD CONSTRAINT pkSessionLog PRIMARY KEY CLUSTERED (SessionID) WITH FILLFACTOR = 100;
 GO

INSERT INTO dbo.SessionLog
 ( SessionStart ,
 SessionEnd
 )
 SELECT DATEADD(MINUTE, ABS(CHECKSUM(NEWID())) % 721,
 CAST(CAST(CURRENT_TIMESTAMP AS DATE) AS DATETIME)) ,
 DATEADD(MINUTE, 720 + ( ABS(CHECKSUM(NEWID())) % 721 ),
 CAST(CAST(CURRENT_TIMESTAMP AS DATE) AS DATETIME))
 FROM dbo.Numbers;

A query for sessions lasting under 60 minutes


SET STATISTICS IO ON; SELECT SessionID, SessionStart, SessionEnd
FROM dbo.SessionLog
WHERE DATEDIFF(MINUTE,SessionStart, SessionEnd) < 60;

results in the following query plan

Index computed column index scan.JPG

However, if we add a computed column and index it like so


ALTER TABLE dbo.SessionLog ADD Duration AS DATEDIFF(MINUTE,SessionStart, SessionEnd);
CREATE NONCLUSTERED INDEX IX_SessionLog_Duration ON dbo.SessionLog(Duration) INCLUDE(SessionStart, SessionEnd);

The scan is replaced by a seek

Index computed column index seek.JPG

And logical reads reduced from 237 to 2.

This is with no change to the query. Both query plans result from running the same query where the computed column Duration is not referenced explicitly


SET STATISTICS IO ON; SELECT SessionID, SessionStart, SessionEnd
FROM dbo.SessionLog
WHERE DATEDIFF(MINUTE,SessionStart, SessionEnd) < 60;

The query optimizer just knows that our where clause predicate matches the computed column definition and uses the index. This means we can potentially vastly improve performance with no application code changes required.

There are some gotchas you have to be careful with

  • SELECT * FROM dbo.SessionLog with now bring back the additional Duration column potentially breaking front end code. This is the reason you should never use SELECT * and always specify an explicit column list.
  • 6 set options must be set correctly when performing the following actions see the Indexes on Computed Columns MSDN article.
    • Creating the index on the computed column
    • Performing any DML (INSERT/UPDATE/DELETE) on the table.
    • If you want the index on the computed column to be used when performing selects.
  • NUMERIC_ROUNDABORT | OFF
  • ANSI_NULLS | ON
  • ANSI_PADDING | ON
  • ANSI_WARNINGS | ON
  • ARITHABORT | ON
  • CONCAT_NULL_YIELDS_NULL | ON
  • QUOTED_IDENTIFIER | ON

Here are some examples of errors

Create the index with SET QUOTED_IDENTIFIER OFF;

Msg 1934, Level 16, State 1, Line 40
CREATE INDEX failed because the following SET options have incorrect settings: ‘QUOTED_IDENTIFIER’.

Delete data with SET QUOTED_IDENTIFIER OFF;

Msg 1934, Level 16, State 1, Line 43
DELETE failed because the following SET options have incorrect settings: ‘QUOTED_IDENTIFIER’.

Perform the same select that can make use of the index but the SET ANSI_PADDING OFF, this causes the query plan to revert to using an index scan i.e. the index on the computed column is now ignored.

Index computed column index scan

The important point here is that you could break existing code by adding an index on a computed column if all connections don’t have the correct set options specified as required.

Missing Indexes

SQL Server can be fantastic on occasions and provide golden nuggets of information just waiting to be mined. One such great source of performance tuning gold dust is around missing indexes.

If I run a query such as

Missing index

You can see the missing index hint in green above the actual execution plan. Not only is this information shown here, but SQL Server is continually tracking these missing indexes. This is fantastic, you can now inherit a database you have no prior knowledge of, and still have an historical reference of missing indexes. Unfortunately this information does not persist over a server restart. Don’t be so lazy, nothing worthwhile is ever easy, you would have to persist this information yourself on a regular basis.

This can be queried like so


SELECT migs.unique_compiles ,
migs.unique_compiles ,
migs.user_seeks ,
migs.user_scans ,
migs.last_user_seek ,
migs.last_user_scan ,
migs.avg_total_user_cost ,
migs.avg_user_impact ,
mid.equality_columns ,
mid.inequality_columns ,
mid.included_columns ,
mid.[statement]
FROM sys.dm_db_missing_index_group_stats AS migs
INNER JOIN sys.dm_db_missing_index_groups AS mig ON mig.index_group_handle = migs.group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid ON mid.index_handle = mig.index_handle
ORDER BY mid.[statement];

This will display

  • Number of seeks and scans
  • Cost information – concentrate on the highest cost first
  • Equality, inequality and included columns
    • Equality typically used in a where clause equality predicate (x=y)
    • Inequality typically used in a where clause inequality predicate (x<>y)
    • Included columns are not used in the where clause, but are referenced elsewhere in the query. Typically these columns are in the select list.
  • The statement i.e. the object affected.

Some points to consider

  • Only non-clustered indexes are suggested and no advanced features such as filtered indexes.
  • Do not just blindly create all suggested indexes. There will be repetition and overlap between suggestions. You will have to use your judgement to decide which indexes to create.
  • Carefully consider the order of the columns you specify when creating the index. Which column is the most selective? Which column is used in the most queries? Which column is used in the most expensive queries?

SQL Server is constantly collecting information on missing indexes. Know how to extract this information and use it to your full advantage.

 

What is Happening on Your Server Currently

A great tool to determine what is occurring on your SQL Server right now is the fantastic free stored procedure sp_WhoIsActive.

This procedure can be executed as an alternative to sp_who2 to give far more insightful and useful information.

There are numerous different switches to control behaviour, view all options by executing: –

EXECUTE sp_WhoIsActive @help = 1;

My particular favourite is @get_plans to return the estimated plan from cache of all current connections. This allows you to immediately drill through to the plan if you think you can identify the culprit root cause of your performance issue.

Another great option is to log the results to a table for later analysis. This can be achieved by the following code: –


DECLARE @destination_table VARCHAR(24) = 'dbo.WhoIsActive_'
+ CONVERT(VARCHAR, GETDATE(), 112);
DECLARE @schema NVARCHAR(1000);

EXEC dbo.sp_WhoIsActive @format_output = 0, @return_schema = 1, @get_plans = 1,
@schema = @schema OUTPUT;

SET @schema = REPLACE(@schema, '&amp;amp;lt;table_name&amp;amp;gt;', @destination_table);

IF NOT EXISTS (SELECT * FROM sys.tables WHERE object_id=OBJECT_ID(@destination_table))
BEGIN;
EXECUTE(@schema);
END;

DECLARE @i INT = 0;

WHILE @i &amp;amp;lt; 100
BEGIN;
EXEC dbo.sp_WhoIsActive @format_output = 0, @get_plans = 1,
@destination_table = @destination_table;

SET @i = @i + 1;

WAITFOR DELAY '00:00:05';
END;
GO

Just be aware that table might grow quite large so this is probably not something you want to run continuously.

Disaster Recovery Testing

YOU HAVE NO BACKUPS UNTIL YOU HAVE TESTED A RESTORE.

REPEAT.

YOU HAVE NO BACKUPS UNTIL YOU HAVE TESTED A RESTORE.

Introduction

Only a person who does not value their job waits until disaster strikes before testing their backup strategy. You have no way of knowing the following unless you are well tested in advance: –

  • Can you recover?
  • How long does it take you to recover?
  • How much data are you going to lose?

Whatever solution you choose, it is essential you are very well tested and have a documented repeatable plan to work from when the cold hand of disaster rests on your shoulder.

Action Plan

  • Pretend you lost production at time X.
  • Start a timer.
  • Restore to an alternative SQL Server and bring your application up.
    • Document what you do.
    • How long did the whole process take end to end?
    • How much data was lost?
  • Repeat the process with someone else performing the restore working from your documented plan.
    • Rinse and repeat until you are 100% confident in your disaster recovery document. This means a tired person in the middle of the night can step through the document and all required actions are clearly laid out with all scripts mentioned readily available to all.
  • Take the numbers to management. Are they happy with how long it takes you to recover from disaster and how much data can potentially be lost? If not you can then have the talk about a higher DR budget with empirical numbers to back you up.

Principle of Least Privilege

Introduction

Always employ the Principle of least privilege when designing a security strategy or developing for SQL Server. Only grant the minimum level of permissions required for a person to perform their job function and nothing more.

Some examples

Developers

Do developers need to be in the sysadmin role on your development and test servers to be able to drop databases etc. Usually not and is done for convenience/laziness. This can lead to lots of problems for the DBA and can be very time consuming to resolve issues where things have changed and the root cause is not immediately obvious.

DBA

If authenticating to production using your windows integrated logon do not add that login into the sysadmin fixed server role. Create another elevated account for each DBA that they have to use to administer production as a sysadmin. This is good practice and makes people think prior to connecting to avoid accidental errors.

Conclusion

The above are just some examples of this principle in action. There are countless others that are all valid and good practice.

Applying this principle strictly throughout your SQL Server estate will stand you in very good stead resulting in a robust environment resistant to accidental errors or hackers.