数据库内核月报

数据库内核月报 - 2017 / 05

MSSQL · 应用案例 · 构建死锁自动收集系统

Author: 风移

摘要

这篇文章介绍SQL Server的一个典型的应用案例,即如何利用Event Notification与Service Broker技术相结合来实现死锁信息自动收集系统。通过这个系统,我们可以全面把控SQL Server数据库环境中所有实例上发生的死锁详细信息,供我们后期分析和解决死锁场景。

死锁自动收集系统需求分析

当 SQL Server 中某组资源的两个或多个线程或进程之间存在循环的依赖关系时,但因互相申请被其他进程所占用,而不会释放的资源处于的一种永久等待状态,将会发生死锁。SQL Server服务自动死锁检查进程默认每5分钟跑一次,当死锁发生时,会选择一个代价较小的进程做为死锁牺牲品,以此来避免死锁导致更大范围的影响。被选择做为死锁牺牲品的进程会报告如下错误:

Msg 1205, Level 13, State 51, Line 8
Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

如果进程间发生了死锁,对于用户业务系统,乃至整个SQL Server服务健康状况影响很大,轻者系统反应缓慢,服务假死;重者服务挂起,拒绝请求。那么,我们有没有一种方法可以完全自动、无人工干预的方式异步收集SQL Server系统死锁信息并远程保留死锁相关信息呢?这些信息包括但不仅限于:

Service Broker和Event Notification简介

在死锁自动收集系统介绍开始之前,先简要介绍下SQL Server Service Broker和Event Notification技术。

Service Broker简介

Service Broker是微软至SQL Server 2005开始集成到数据库引擎中的消息通讯组件,为 SQL Server提供队列和可靠的消息传递的能力,可以用来构建基于异步消息通讯为基础的应用程序。Service Broker既可用于单个 SQL Server 实例的应用程序,也可用于在多个实例间进行消息分发工作的应用程序。Service Broker使用TCP/IP端口在实例间交换消息,所包含的功能有助于防止未经授权的网络访问,并可以对通过网络发送的消息进行加密,以此来保证数据安全性。多实例之间使用Service Broker进行异步消息通讯的结构图如下所示(图片来自微软的官方文档):

01.png

Event Notification简介

Event Notification的中文名称叫事件通知,执行事件通知可对各种Transact-SQL数据定义语言(DDL)语句和SQL跟踪事件做出响应,采取的响应方式是将这些事件的相关信息发送到 Service Broker 服务。事件通知可以用来执行以下操作:

可以将事件通知用作替代DDL 触发器和SQL跟踪的编程方法。事件通知的信息媒介是以xml数据类型的信息传递给Service Broker服务,它提供了有关事件的发生时间、受影响的数据库对象、涉及的 Transact-SQL 批处理语句等详细信息。对于SQL Server死锁而言,可以使用Event Notification来跟踪死锁事件,来获取DEADLOCK_GRAPH XML信息,然后通过异步消息组件Service Broker发送到远端的Deadlock Center上的Service Broker队列,完成死锁信息收集到死锁中央服务。

死锁收集系统架构图

在介绍完Service Broker和Event Notification以后,我们来看看死锁手机系统的整体架构图。在这个系统中,存在两种类型角色:我们定义为死锁客户端(Deadlock Client)和死锁中央服务(Deadlock Center)。死锁客户端发生死锁后,首先会将Deadlock Graph XML通过Service Broker发送给死锁中央服务,死锁中央服务获取到Service Broker消息以后,解析这个XML就可以拿到客户端的死锁相关信息,最后存放到本地日志表中,供终端客户查询和分析使用。最终的死锁收集系统架构图如下所示: 02.png

详细的死锁信息收集过程介绍如下:死锁客户端通过本地SQL Server的Event Notification捕获发生在该实例上的Deadlock事件,并在死锁发生以后将Deadlock Graph XML数据存放到Event Notification绑定的队列中,然后通过绑定在该队列上的存储过程自动触发将Deadlock Graph XML通过Service Broker异步消息通讯的方式发送到死锁中央服务。中央服务在接收到Service Broker消息以后,首先放入Deadlock Center Service Broker队列中,该队列绑定了消息自动处理存储过程,用来解析Deadlock Graph XML信息,并将死锁相关的详细信息存入到Deadlock Center的Log Table中。最后,终端用户可以直接对Log Table来查询和分析所有Deadlock Client上发生的死锁信息。通过这系列的过程,最终达到了死锁信息的自动远程存储、收集,以提供后期死锁场景还原和复盘,达到死锁信息可追溯,及时监控,及时发现的目的。

Service Broker配置

系统架构设计完毕后,接下来是系统的配置和搭建过程,首先看看Service Broker的配置。这个配置还是相对比较繁琐的,包含了以下步骤:

Deadlock Client Server

以下的配置请在Deadlock Client SQL Server实例上操作。

-- Run script on client server to gather deadlock graph xml
USE master
GO
-- Create Database
IF DB_ID('DDLCenter') IS NULL
	CREATE DATABASE [DDLCenter];
GO
-- Change datbase to simple recovery model
ALTER DATABASE [DDLCenter] SET RECOVERY SIMPLE WITH NO_WAIT
GO
-- Enable Service Broker
ALTER DATABASE [DDLCenter] SET ENABLE_BROKER,TRUSTWORTHY ON
GO
-- Change database Owner to sa
ALTER AUTHORIZATION ON DATABASE::DDLCenter TO [sa]
GO

表[DDLCollector].[Deadlock_Traced_Records]:从Event Notification队里接收的消息会记录到该表中。 表[DDLCollector].[Send_Records]:Deadlock Client成功发送Service Broker消息记录 表[DDLCollector].[Error_Records]:记录发生异常情况时的信息。 存储过程[DDLCollector].[UP_ProcessDeadlockEventMsg]:Deadlock Client绑定到队里的激活存储过程,一旦队列中有消息进入,这个存储过程会被自动调用。 存储过程[DDLCollector].[UP_SendDeadlockMsg]:Deadlock Client发送异步消息给Deadlock Center,这个存储过程会被上面的激活存储过程调用。

-- Run on Client Instance
USE [DDLCenter]
GO
-- Create Schema
IF NOT EXISTS(
	SELECT TOP 1 *
	FROM sys.schemas
	WHERE name = 'DDLCollector'
)
BEGIN
	EXEC('CREATE SCHEMA DDLCollector');
END
GO

-- Create table to log Traced Deadlock Records
IF OBJECT_ID('DDLCollector.Deadlock_Traced_Records', 'U') IS NOT NULL
	DROP TABLE [DDLCollector].[Deadlock_Traced_Records]
GO

CREATE TABLE [DDLCollector].[Deadlock_Traced_Records](
	[RowId] [BIGINT] IDENTITY(1,1) NOT NULL,
	[Processed_Msg] [xml] NULL,
	[Processed_Msg_CheckSum] INT,
	[Record_Time] [datetime] NOT NULL 
		CONSTRAINT DF_Deadlock_Traced_Records_Record_Time DEFAULT(GETDATE()),
	CONSTRAINT PK_Deadlock_Traced_Records_RowId PRIMARY KEY
	(RowId ASC)
) ON [PRIMARY]
GO

-- Create table to record deadlock graph xml sent successfully log
IF OBJECT_ID('DDLCollector.Send_Records', 'U') IS NOT NULL
	DROP TABLE [DDLCollector].[Send_Records]
GO

CREATE TABLE [DDLCollector].[Send_Records](
	[RowId] [BIGINT] IDENTITY(1,1) NOT NULL,
	[Send_Msg] [xml] NULL,
	[Send_Msg_CheckSum] INT,
	[Record_Time] [datetime] NOT NULL 
		CONSTRAINT DF_Send_Records_Record_Time DEFAULT(GETDATE()),
	CONSTRAINT PK_Send_Records_RowId PRIMARY KEY
	(RowId ASC)
) ON [PRIMARY]
GO

-- Create table to record error info when exception occurs
IF OBJECT_ID('DDLCollector.Error_Records', 'U') IS NOT NULL
	DROP TABLE [DDLCollector].[Error_Records]
GO

CREATE TABLE [DDLCollector].[Error_Records](
	[RowId] [int] IDENTITY(1,1) NOT NULL,
	[Msg_Body] [xml] NULL,
	[Conversation_handle] [uniqueidentifier] NULL,
	[Message_Type] SYSNAME NULL,
	[Service_Name] SYSNAME NULL,
	[Contact_Name] SYSNAME NULL,
	[Record_Time] [datetime] NOT NULL
		CONSTRAINT DF_Error_Records_Record_Time DEFAULT(GETDATE()),
	[Error_Details] [nvarchar](4000) NULL,
	CONSTRAINT PK_Error_Records_RowId PRIMARY KEY
	(RowId ASC)
) ON [PRIMARY]
GO


USE [DDLCenter]
GO

-- Create Store Procedure to Send Deadlock Graph xml to Center Server
IF OBJECT_ID('DDLCollector.UP_SendDeadlockMsg', 'P') IS NOT NULL
	DROP PROC [DDLCollector].[UP_SendDeadlockMsg]
GO

CREATE PROCEDURE [DDLCollector].[UP_SendDeadlockMsg](
	@DeadlockMsg XML
)
AS  
BEGIN      
	SET NOCOUNT ON; 

	DECLARE 
		@handle UNIQUEIDENTIFIER
		,@Proc_Name SYSNAME
		,@Error_Details VARCHAR(2000)
	;

	-- get the store procedure name
	SELECT 
        @Proc_Name = ISNULL(QUOTENAME(SCHEMA_NAME(SCHEMA_ID)) 
        + '.' 
        + QUOTENAME(OBJECT_NAME(@@PROCID)),'')
    FROM sys.procedures
    WHERE OBJECT_ID = @@PROCID
	;
	
	BEGIN TRY
		
		-- Begin Dialog
		BEGIN DIALOG CONVERSATION @handle
		FROM SERVICE [http://soa/deadlock/service/ClientService]
		TO Service 'http://soa/deadlock/service/CenterService'
		ON CONTRACT [http://soa/deadlock/contract/CheckContract]
		;

		-- Send deadlock graph xml as the message to Center Server
		SEND ON CONVERSATION @handle
		MESSAGE TYPE [http://soa/deadlock/MsgType/Request] (@DeadlockMsg);

		-- Log it successfully
		INSERT INTO [DDLCollector].[Send_Records]([Send_Msg], [Send_Msg_CheckSum])   
		VALUES( @DeadlockMsg, CHECKSUM(CAST(@DeadlockMsg as NVARCHAR(MAX))))
	END TRY
	BEGIN CATCH
		
		-- Record the error info when exception occurs
		SET   @Error_Details=
				' Error Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) +
				' Error Message : ' + ERROR_MESSAGE() +
				' Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) +
				' Error State: ' + CAST(ERROR_STATE() AS VARCHAR(10)) +
				' Error Line: ' + CAST(ERROR_LINE() AS VARCHAR(10)) +
				' Exception Proc: ' + @Proc_Name
		;    
        
		-- record into table
		INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])         
		VALUES(@DeadlockMsg, @handle, 'http://soa/deadlock/MsgType/Request', 'http://soa/deadlock/service/ClientService', 'http://soa/deadlock/contract/CheckContract', @Error_Details); 

	END CATCH
END
GO

-- Create Store Procedure for Queue: when extend event notification queue message
-- this store procedure will be called.
IF OBJECT_ID('DDLCollector.UP_ProcessDeadlockEventMsg', 'P') IS NOT NULL
	DROP PROC [DDLCollector].[UP_ProcessDeadlockEventMsg]
GO

CREATE PROCEDURE [DDLCollector].[UP_ProcessDeadlockEventMsg]
AS
/*

SELECT * FROM [DDLCollector].[Deadlock_Traced_Records]
SELECT * FROM [DDLCollector].[Send_Records]

SELECT * FROM [DDLCollector].[Error_Records]

*/
BEGIN      
	SET NOCOUNT ON;   
	DECLARE 
		@handle UNIQUEIDENTIFIER
		, @Message_Type SYSNAME
		, @Service_Name SYSNAME
		, @Contact_Name SYSNAME
		, @Error_Details VARCHAR(2000)
		, @Message_Body XML
		, @Proc_Name SYSNAME
	;

	-- Store Procedure Name
	SELECT 
        @Proc_Name = ISNULL(QUOTENAME(SCHEMA_NAME(SCHEMA_ID)) 
        + '.' 
        + QUOTENAME(OBJECT_NAME(@@PROCID)),'')
    FROM sys.procedures
    WHERE OBJECT_ID = @@PROCID
	;

	BEGIN TRY
    
	-- Receive message from queue
	WAITFOR(RECEIVE TOP(1)        
			@handle = conversation_handle
			, @Message_Type = message_type_name
			, @Service_Name = service_name
			, @Contact_Name = service_contract_name
			, @Message_Body = message_body        
			FROM dbo.[http://soa/deadlock/queue/ClientQueue]),Timeout 500
	;
	
	-- just return if there is no message needed to process        
	IF(@@Rowcount=0)      
		BEGIN   
			RETURN        
		END
	-- Get data from message queue
	ELSE IF @Message_Type = 'http://schemas.microsoft.com/SQL/Notifications/EventNotification'      
		BEGIN               
			-- Record message log first
			INSERT INTO  [DDLCollector].[Deadlock_Traced_Records](Processed_Msg, [Processed_Msg_CheckSum])         
			VALUES(@Message_Body, CHECKSUM(CAST(@Message_Body as NVARCHAR(MAX))))
		
			-- BE NOTED HERE: PLEASE DO'T END CONVERSATION, OR ELSE EXCEPTION WILL BE THROWN OUTPUT
			/*
			Error: 17001, Severity: 16, State: 1.
			Failure to send an event notification instance of type 'DEADLOCK_GRAPH' on conversation handle '{67419386-7C34-E711-A709-001C42099969}'. Error Code = '8429'.
			Error: 17005, Severity: 16, State: 1.
			Event notification 'DeadLockNotificationEvent' in database 'master' dropped due to send time service broker errors. Check to ensure the conversation handle, service broker contract, and service specified in the event notification are active.  
			*/
			--END CONVERSATION @handle

			--Here call another Store Procedure to send deadlock graph info to center server
			EXEC [DDLCollector].[UP_SendDeadlockMsg] @Message_Body;
		END
	--End Diaglog Message Type, that means we should end this conversation      
	ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'        
		BEGIN         
			END CONVERSATION @handle;     
		END
	-- Konwn Service Broker Errors by System.   
	ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'        
		BEGIN         
			END CONVERSATION @handle       
		
			INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])         
			VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Exception Store Procedure: ' + @Proc_Name);               
		END       
	ELSE
		-- unknown Message Types.        
		BEGIN         
			END CONVERSATION @handle

			INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])          
			VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Received unexpected message type when executing Store Procedure: ' + @Proc_Name);

			-- unexpected message type         
			RAISERROR (N' Received unknown message type: %s', 16, 1, @Message_Type) WITH LOG;        
		END      
	END TRY      
	BEGIN CATCH       
	BEGIN        
		SET   @Error_Details=
				' Error Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) +
				' Error Details : ' + ERROR_MESSAGE() +
				' Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) +
				' Error State: ' + CAST(ERROR_STATE() AS VARCHAR(10)) +
				' Error Line: ' + CAST(ERROR_LINE() AS VARCHAR(10)) + 
				' Exception Proc: ' + @Proc_Name
		;    
        
		INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])         
		VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, @Error_Details); 
	END      
	END CATCH  
END  
GO
USE master
GO
-- If the master key is not available, create it. 
IF NOT EXISTS (SELECT * 
				FROM sys.symmetric_keys
				WHERE name LIKE '%MS_DatabaseMasterKey%') 
BEGIN
	CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'ClientMasterKey*'; 
END 
GO

这里请注意证书的开始生效时间要略微早于当前时间,并设置合适的证书过期日期,我这里是设置的过期日期为9999年12月30号。

USE master
GO
-- Crete Transport Layer Certification
CREATE CERTIFICATE TrpCert_ClientLocal
AUTHORIZATION dbo
WITH SUBJECT = 'TrpCert_ClientLocal',
START_DATE = '05/07/2017',
EXPIRY_DATE = '12/30/9999'
GO

-- then backup it up to local path
-- and after that copy it to Center server
BACKUP CERTIFICATE TrpCert_ClientLocal
TO FILE = 'C:\Temp\TrpCert_ClientLocal.cer';
GO

这里的证书是通过证书文件来创建的,这个证书文件来自于远程通讯的另一端Deadlock Center SQL Server的证书文件的一份拷贝。

USE master
GO
-- Create certification came from Center Server.
CREATE	CERTIFICATE TrpCert_RemoteCenter 
FROM FILE = 'C:\Temp\TrpCert_RemoteCenter.cer'
GO

这里也可以创建带密码的常规用户登录,但是为了规避安全风险,这里最好创建基于证书文件的用户登录。

USE master
GO
-- Create user login
IF NOT EXISTS(SELECT * 
			FROM sys.syslogins 
			WHERE name='SSBDbo')
BEGIN
	CREATE LOGIN SSBDbo FROM CERTIFICATE TrpCert_ClientLocal;
END
GO

这里需要注意的是,端口授权的证书一定本地实例创建的证书,而不是来自于远程服务器的那个证书。比如代码中的AUTHENTICATION = CERTIFICATE TrpCert_ClientLocal部分。

USE master
GO 		 
--Creaet Tcp endpoint for SSB comunication and grant connect to users. 	 
CREATE ENDPOINT EP_SSB_ClientLocal
STATE = STARTED 
AS TCP 
( 
 	LISTENER_PORT = 4022 
) 
FOR SERVICE_BROKER (AUTHENTICATION = CERTIFICATE TrpCert_ClientLocal,  ENCRYPTION = REQUIRED 
) 
GO 

-- Grant Connect on Endpoint to User SSBDbo
GRANT CONNECT ON ENDPOINT::EP_SSB_ClientLocal TO SSBDbo 
GO
-- Now, let's go inside to conversation database
USE DDLCenter
GO

-- Create Master Key
IF NOT EXISTS (SELECT * 
				FROM sys.symmetric_keys 
				WHERE name LIKE '%MS_DatabaseMasterKey%')
BEGIN		
	CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'DDLCenterMasterKey*';
END
GO
USE DDLCenter
GO
-- Create conversation layer certification
CREATE CERTIFICATE DlgCert_ClientLocal
AUTHORIZATION dbo
WITH SUBJECT = 'DlgCert_ClientLocal',
START_DATE = '05/07/2017',
EXPIRY_DATE = '12/30/9999'
GO

-- backup it up to local path
-- and then copy it to remote Center server
BACKUP CERTIFICATE DlgCert_ClientLocal
TO FILE = 'C:\Temp\DlgCert_ClientLocal.cer';
GO
USE DDLCenter
GO
-- Create User for login under conversation database
IF NOT EXISTS(
	SELECT TOP 1 *
	FROM sys.database_principals
	WHERE name = 'SSBDbo'
)
BEGIN
	CREATE USER SSBDbo WITHOUT LOGIN;
END
GO
USE DDLCenter
GO
-- Create converstaion layer certification came from remote Center server.
CREATE	CERTIFICATE DlgCert_RemoteCenter
AUTHORIZATION SSBDbo
FROM FILE='C:\Temp\DlgCert_RemoteCenter.cer'
GO

GRANT CONNECT TO SSBDbo;

Deadlock Client与Deadlock Center在创建Service Broker组件对象时存在差异:第一个差异是创建Service的时候,需要包含Event Notification的Contract,名称为 http://schemas.microsoft.com/SQL/Notifications/PostEventNotification;第二个差异是需要多创建一个指向本地服务的路由http://soa/deadlock/route/LocalRoute。

USE DDLCenter
GO

-- Create Message Type
CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Request]
       VALIDATION = WELL_FORMED_XML;
CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Response]
       VALIDATION = WELL_FORMED_XML;
GO

-- Create Contact
CREATE CONTRACT [http://soa/deadlock/contract/CheckContract](
	[http://soa/deadlock/MsgType/Request] SENT BY INITIATOR,
	[http://soa/deadlock/MsgType/Response] SENT BY TARGET
);
GO

-- Create Queue
CREATE QUEUE dbo.[http://soa/deadlock/queue/ClientQueue] 
WITH STATUS = ON, RETENTION = OFF
, ACTIVATION (STATUS = ON , 
				PROCEDURE_NAME = [DDLCollector].[UP_ProcessDeadlockEventMsg] , 
				MAX_QUEUE_READERS = 2 , 
				EXECUTE AS N'dbo') 
GO

-- Create Service
-- Here is very import, we have to create service for both contacts
-- to get extend event notification and SSB work.
CREATE SERVICE [http://soa/deadlock/service/ClientService]
ON QUEUE [http://soa/deadlock/queue/ClientQueue]
(
  [http://soa/deadlock/contract/CheckContract],
  [http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]
);
GO

-- Grant Send on service
GRANT SEND ON SERVICE::[http://soa/deadlock/service/ClientService] to SSBDbo;
GO

-- Create Remote Service Bingding
CREATE REMOTE SERVICE BINDING [http://soa/deadlock/RSB/CenterRSB]
TO SERVICE 'http://soa/deadlock/service/CenterService' 
WITH  USER = [SSBDbo],
ANONYMOUS=Off
GO

-- Create Route
CREATE ROUTE [http://soa/deadlock/route/CenterRoute]
WITH SERVICE_NAME = 'http://soa/deadlock/service/CenterService',
ADDRESS = 'TCP://10.211.55.3:4024';
GO

-- Create route for the DeadlockNotificationSvc
CREATE ROUTE [http://soa/deadlock/route/LocalRoute]
WITH SERVICE_NAME = 'http://soa/deadlock/service/ClientService',
ADDRESS = 'LOCAL';
GO

Deadlock Center Server

-- Run script on center server to receive client deadlock xml
USE master
GO
-- Create Database
IF DB_ID('DDLCenter') IS NULL
	CREATE DATABASE [DDLCenter];
GO
-- Change datbase to simple recovery model
ALTER DATABASE [DDLCenter] SET RECOVERY SIMPLE WITH NO_WAIT
GO
-- Enable Service Broker
ALTER DATABASE [DDLCenter] SET ENABLE_BROKER,TRUSTWORTHY ON
GO
-- Change database Owner to sa
ALTER AUTHORIZATION ON DATABASE::DDLCenter TO [sa]
GO

表[DDLCollector].[Collect_Records]:Deadlock Center成功接收到的Service Broker消息。 表[DDLCollector].[Error_Records]:记录发生异常情况的详细信息。 表[DDLCollector].[Deadlock_Info]:记录所有Deadlock Client端发生的Deadlock详细信息。 存储过程[DDLCollector].[UP_ProcessDeadlockGraphEventMsg]:Deadlock Center上绑定到队列的激活存储过程,一旦队列中有消息进入,这个存储过程会被自动调用。 存储过程[DDLCollector].[UP_ParseDeadlockGraphEventMsg]:Deadlock Center上解析Deadlock Graph XML的存储过程对象,这个存储过程会被上面的激活存储过程调用来解析XML,然后放入表[DDLCollector].[Deadlock_Info]中。

USE [DDLCenter]
GO

-- Create Schema
IF NOT EXISTS(
	SELECT TOP 1 *
	FROM sys.schemas
	WHERE name = 'DDLCollector'
)
BEGIN
	EXEC('CREATE SCHEMA DDLCollector');
END
GO

-- Create table to log the received message
IF OBJECT_ID('DDLCollector.Collect_Records', 'U') IS NOT NULL
	DROP TABLE [DDLCollector].[Collect_Records]
GO

CREATE TABLE [DDLCollector].[Collect_Records](
	[RowId] [BIGINT] IDENTITY(1,1) NOT NULL,
	[Deadlock_Graph_Msg] [xml] NULL,
	[Deadlock_Graph_Msg_CheckSum] INT,
	[Record_Time] [datetime] NOT NULL 
		CONSTRAINT DF_Collect_Records_Record_Time DEFAULT(GETDATE()),
	CONSTRAINT PK_Collect_Records_RowId PRIMARY KEY
	(RowId ASC)
) ON [PRIMARY]
GO

-- create table to record the exception when error occurs
IF OBJECT_ID('DDLCollector.Error_Records', 'U') IS NOT NULL
	DROP TABLE [DDLCollector].[Error_Records]
GO

CREATE TABLE [DDLCollector].[Error_Records](
	[RowId] [int] IDENTITY(1,1) NOT NULL,
	[Msg_Body] [xml] NULL,
	[Conversation_handle] [uniqueidentifier] NULL,
	[Message_Type] SYSNAME NULL,
	[Service_Name] SYSNAME NULL,
	[Contact_Name] SYSNAME NULL,
	[Record_Time] [datetime] NOT NULL
		CONSTRAINT DF_Error_Records_Record_Time DEFAULT(GETDATE()),
	[Error_Details] [nvarchar](4000) NULL,
	CONSTRAINT PK_Error_Records_RowId PRIMARY KEY
	(RowId ASC)
) ON [PRIMARY]
GO

-- create business table to record deadlock analysised info
IF OBJECT_ID('DDLCollector.Deadlock_Info', 'U') IS NOT NULL
	DROP TABLE [DDLCollector].[Deadlock_Info]
GO
CREATE TABLE [DDLCollector].[Deadlock_Info](
	RowId INT IDENTITY(1,1) NOT NULL
	,SQLInstance sysname NULL
	,SPid INT NULL
	,is_Vitim BIT NULL
	,DeadlockGraph XML NULL
	,DeadlockGraphCheckSum INT NULL
	,lasttranstarted DATETIME NULL
	,lastbatchstarted DATETIME NULL
	,lastbatchcompleted DATETIME NULL
	,procname SYSNAME NULL 
	,Code NVARCHAR(max) NULL
	,LockMode sysname NULL
	,Indexname sysname NULL
	,KeylockObject sysname NULL
	,IndexLockMode sysname NULL
	,Inputbuf NVARCHAR(max) NULL
	,LoginName sysname NULL
	,Clientapp sysname NULL
	,Action varchar(1000) NULL
	,status varchar(10) NULL
	,[Record_Time] [datetime] NOT NULL
		CONSTRAINT DF_Deadlock_Info_Record_Time DEFAULT(GETDATE()),
	CONSTRAINT PK_Deadlock_Info_RowId PRIMARY KEY
	(RowId ASC)
)
GO



USE [DDLCenter]
GO

-- Create store procedure to analysis deadlock graph xml
-- and log into business table
IF OBJECT_ID('DDLCollector.UP_ParseDeadlockGraphEventMsg', 'P') IS NOT NULL
	DROP PROC [DDLCollector].[UP_ParseDeadlockGraphEventMsg]
GO

CREATE PROCEDURE [DDLCollector].[UP_ParseDeadlockGraphEventMsg](
	@DeadlockGraph_Msg XML
)
AS  
BEGIN      
	SET NOCOUNT ON; 

	;WITH deadlock
	AS
	(
		SELECT
			OwnerID = T.C.value('@id', 'varchar(50)')
			,SPid = T.C.value('(./@spid)[1]','int')
			,status = T.C.value('(./@status)[1]','varchar(10)')
			,Victim = case 
						when T.C.value('@id', 'varchar(50)') = T.C.value('./../../@victim','varchar(50)') then 1 
						else 0 end
			,LockMode = T.C.value('@lockMode', 'sysname')
			,Inputbuf = T.C.value('(./inputbuf/text())[1]','nvarchar(max)')
			,Code = T.C.value('(./executionStack/frame/text())[1]','nvarchar(max)')
			,SPName = T.C.value('(./executionStack/frame/@procname)[1]','sysname')
			,Hostname = T.C.value('(./@hostname)[1]','sysname')
			,Clientapp = T.C.value('(./@clientapp)[1]','varchar(1000)')
			,lasttranstarted = T.C.value('(./@lasttranstarted)[1]','datetime')
			,lastbatchstarted = T.C.value('(./@lastbatchstarted)[1]','datetime')
			,lastbatchcompleted = T.C.value('(./@lastbatchcompleted)[1]','datetime')
			,LoginName = T.C.value('@loginname', 'sysname')
			,Action = T.C.value('(./@transactionname)[1]','varchar(1000)')
		FROM @DeadlockGraph_Msg.nodes('EVENT_INSTANCE/TextData/deadlock-list/deadlock/process-list/process') AS T(C)
	)
	,
	keylock
	AS
	(
		SELECT
			OwnerID = T.C.value('./owner[1]/@id', 'varchar(50)')
			,KeylockObject = T.C.value('./../@objectname', 'sysname')
			,Indexname = T.C.value('./../@indexname', 'sysname')
			,IndexLockMode = T.C.value('./../@mode', 'sysname')
		FROM @DeadlockGraph_Msg.nodes('EVENT_INSTANCE/TextData/deadlock-list/deadlock/resource-list/keylock/owner-list') AS T(C)
	)
	SELECT
		SQLInstance = A.Hostname 
		,A.SPid
		,is_Vitim = A.Victim
		,DeadlockGraph = @DeadlockGraph_Msg.query('EVENT_INSTANCE/TextData/deadlock-list')
		,DeadlockGraphCheckSum = CHECKSUM(CAST(@DeadlockGraph_Msg AS NVARCHAR(MAX)))
		,A.lasttranstarted
		,A.lastbatchstarted
		,A.lastbatchcompleted
		,A.SPName
		,A.Code
		,A.LockMode
		,B.Indexname
		,B.KeylockObject
		,B.IndexLockMode
		,A.Inputbuf
		,A.LoginName
		,A.Clientapp
		,A.Action
		,status
		,[Record_Time] = GETDATE()
	FROM deadlock AS A
			LEFT JOIN keylock AS B
			ON A.OwnerID = B.OwnerID
	ORDER BY A.SPid, A.Victim
	;
END
GO

-- Create store Procedure for Center server service queue to process deadlock xml
-- when message sending from client server.
IF OBJECT_ID('DDLCollector.UP_ProcessDeadlockGraphEventMsg', 'P') IS NOT NULL
	DROP PROC [DDLCollector].[UP_ProcessDeadlockGraphEventMsg]
GO

CREATE PROCEDURE [DDLCollector].[UP_ProcessDeadlockGraphEventMsg]
AS
/*
EXEC [DDLCollector].[UP_ProcessDeadlockGraphEventMsg]

SELECT * FROM [DDLCollector].[Collect_Records]

SELECT * FROM [DDLCollector].[Error_Records]

SELECT * FROM [DDLCollector].[Deadlock_Info]
*/
BEGIN      
	SET NOCOUNT ON;   
	DECLARE 
		@handle UNIQUEIDENTIFIER
		, @Message_Type SYSNAME
		, @Service_Name SYSNAME
		, @Contact_Name SYSNAME
		, @Error_Details VARCHAR(2000)
		, @Message_Body XML
		, @Proc_Name SYSNAME
	;

	-- Store Procedure name
	SELECT 
        @Proc_Name = ISNULL(QUOTENAME(SCHEMA_NAME(SCHEMA_ID)) 
        + '.' 
        + QUOTENAME(OBJECT_NAME(@@PROCID)),'')
    FROM sys.procedures
    WHERE OBJECT_ID = @@PROCID
	;

	BEGIN TRY
        
	-- Receive deadlock message from service queue
	WAITFOR(RECEIVE TOP(1)        
			@handle = conversation_handle
			, @Message_Type = message_type_name
			, @Service_Name = service_name
			, @Contact_Name = service_contract_name
			, @Message_Body = message_body        
			FROM dbo.[http://soa/deadlock/queue/CenterQueue]),Timeout 500
	;
	        
	IF(@@Rowcount=0)      
		BEGIN   
			RETURN        
		END
	-- Message type is the very correct one
	ELSE IF @Message_Type = N'http://soa/deadlock/MsgType/Request'        
		BEGIN               
			-- Record message log first
			INSERT INTO  [DDLCollector].[Collect_Records](Deadlock_Graph_Msg, [Deadlock_Graph_Msg_CheckSum])          
			VALUES(@Message_Body, CHECKSUM(cast(@Message_Body as NVARCHAR(MAX))))
		
			END CONVERSATION @handle

			--Here call another Store Procedure to process our message to record deadlock relation info
			INSERT INTO [DDLCollector].[Deadlock_Info]
			EXEC [DDLCollector].[UP_ParseDeadlockGraphEventMsg] @Message_Body;
		END
	--End Diaglog Message Type, that means we should end this conversation      
	ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'        
		BEGIN         
			END CONVERSATION @handle;        
		END
	-- Konwn Service Broker Errors by System.   
	ELSE IF @Message_Type = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'        
		BEGIN         
			END CONVERSATION @handle       
		
			INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])          
			VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Exception Store Procedure: ' + @Proc_Name);               
		END       
	ELSE
		-- unknown Message Types.        
		BEGIN         
			END CONVERSATION @handle

			INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])          
			VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, ' Received unexpected message type when executing Store Procedure: ' + @Proc_Name);

			-- unexpected message type         
			RAISERROR (N' Received unexpected message type: %s', 16, 1, @Message_Type) WITH LOG;        
		END      
	END TRY      
	BEGIN CATCH       
	BEGIN
		-- record exception record       
		SET   @Error_Details=
				' Error Number: ' + CAST(ERROR_NUMBER() AS VARCHAR(10)) +
				' Error Message : ' + ERROR_MESSAGE() +
				' Error Severity: ' + CAST(ERROR_SEVERITY() AS VARCHAR(10)) +
				' Error State: ' + CAST(ERROR_STATE() AS VARCHAR(10)) +
				' Error Line: ' + CAST(ERROR_LINE() AS VARCHAR(10)) + 
				' Exception Proc: ' + @Proc_Name
		;    
        
		INSERT INTO [DDLCollector].[Error_Records]([Msg_Body], [Conversation_handle], [Message_Type], [Service_Name], [Contact_Name], [Error_Details])          
		VALUES(@Message_Body, @handle, @Message_Type, @Service_Name, @Contact_Name, @Error_Details); 
	END      
	END CATCH  
END  
GO
USE master
GO
-- If the master key is not available, create it. 
IF NOT EXISTS (SELECT * 
				FROM sys.symmetric_keys
				WHERE name LIKE '%MS_DatabaseMasterKey%') 
BEGIN
	CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'CenterMasterKey*'; 
END 
GO 
USE master
GO
-- Crete Transport Layer Certification
CREATE CERTIFICATE TrpCert_RemoteCenter
AUTHORIZATION dbo
WITH SUBJECT = 'TrpCert_RemoteCenter',
START_DATE = '05/07/2017',
EXPIRY_DATE = '12/30/9999'
GO

-- then backup it up to local path
-- and after that copy it to Client server
BACKUP CERTIFICATE TrpCert_RemoteCenter
TO FILE = 'C:\Temp\TrpCert_RemoteCenter.cer';
GO
USE master
GO
-- Create certification came from client Server.
CREATE	CERTIFICATE TrpCert_ClientLocal 
FROM FILE = 'C:\Temp\TrpCert_ClientLocal.cer'
GO
USE master
GO
-- Create user login
IF NOT EXISTS(SELECT * 
			FROM sys.syslogins 
			WHERE name='SSBDbo')
BEGIN
	CREATE LOGIN SSBDbo FROM CERTIFICATE TrpCert_RemoteCenter;
END
GO
USE master
GO
-- Creaet Tcp endpoint for SSB comunication and grant connect to users. 	 
CREATE ENDPOINT EP_SSB_RemoteCenter
STATE = STARTED 
AS TCP 
( 
 	LISTENER_PORT = 4024
) 
FOR SERVICE_BROKER (AUTHENTICATION = CERTIFICATE TrpCert_RemoteCenter,  ENCRYPTION = REQUIRED 
) 
GO 

-- Grant Connect on Endpoint to User SSBDbo
GRANT CONNECT ON ENDPOINT::EP_SSB_RemoteCenter TO SSBDbo 
GO
-- Now, let's go inside to conversation database
USE DDLCenter
GO

-- Create Master Key
IF NOT EXISTS (SELECT * 
				FROM sys.symmetric_keys 
				WHERE name LIKE '%MS_DatabaseMasterKey%')
BEGIN		
	CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'DDLCenterMasterKey*';
END
GO
USE DDLCenter
GO
-- Create conversation layer certification
CREATE CERTIFICATE DlgCert_RemoteCenter
AUTHORIZATION dbo
WITH SUBJECT = 'DlgCert_RemoteCenter',
START_DATE = '05/07/2017',
EXPIRY_DATE = '12/30/9999'
GO

-- backup it up to local path
-- and then copy it to remote client server
BACKUP CERTIFICATE DlgCert_RemoteCenter
TO FILE = 'C:\Temp\DlgCert_RemoteCenter.cer';
GO
USE DDLCenter
GO
-- Create User for login under conversation database
IF NOT EXISTS(
	SELECT TOP 1 *
	FROM sys.database_principals
	WHERE name = 'SSBDbo'
)
BEGIN
	--CREATE USER SSBDbo FOR LOGIN SSBDbo;
	CREATE USER SSBDbo WITHOUT LOGIN;
END
GO
USE DDLCenter
GO
-- Create converstaion layer certification came from remote client server.
CREATE	CERTIFICATE DlgCert_ClientLocal
AUTHORIZATION SSBDbo
FROM FILE='C:\Temp\DlgCert_ClientLocal.cer'
GO

GRANT CONNECT TO SSBDbo;
USE DDLCenter
GO

-- Create Message Type
CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Request]
       VALIDATION = WELL_FORMED_XML;
CREATE MESSAGE TYPE [http://soa/deadlock/MsgType/Response]
       VALIDATION = WELL_FORMED_XML;
GO

-- Create Contact
CREATE CONTRACT [http://soa/deadlock/contract/CheckContract](
	[http://soa/deadlock/MsgType/Request] SENT BY INITIATOR,
	[http://soa/deadlock/MsgType/Response] SENT BY TARGET
);
GO

-- Create Queue
CREATE QUEUE [dbo].[http://soa/deadlock/queue/CenterQueue] 
WITH STATUS = ON , RETENTION = OFF
, ACTIVATION (STATUS = ON , 
				PROCEDURE_NAME = [DDLCollector].[UP_ProcessDeadlockGraphEventMsg] , 
				MAX_QUEUE_READERS = 3 , 
				EXECUTE AS N'dbo') 
GO

-- Create Service
CREATE SERVICE [http://soa/deadlock/service/CenterService]
ON QUEUE [http://soa/deadlock/queue/CenterQueue]
(
  [http://soa/deadlock/contract/CheckContract]
);
GO

-- Grant Send on service to User SSBDbo
GRANT SEND ON SERVICE::[http://soa/deadlock/service/CenterService] to SSBDbo;
GO

-- Create Remote Service Bingding
CREATE REMOTE SERVICE BINDING [http://soa/deadlock/RSB/ClientRSB]
TO SERVICE 'http://soa/deadlock/service/ClientService' 
WITH  USER = SSBDbo,
ANONYMOUS=Off
GO

-- Create Route
CREATE ROUTE [http://soa/deadlock/route/ClientRoute]
WITH SERVICE_NAME = 'http://soa/deadlock/service/ClientService',
ADDRESS = 'TCP://10.211.55.3:4022';
GO

Event Notification配置

Event Notification只需要在Deadlock Client Server创建即可,因为只需要在Deadlock Client上跟踪死锁事件。在为Deadlock Client 配置Service Broker章节,我们已经为Event Notification创建了队列、服务和路由。因此,在这里我们只需要创建Event Notification对象即可。方法参见如下的代码:

USE DDLCenter
GO

-- Create Event Notification for the deadlock_graph event.
IF EXISTS(
	SELECT * FROM sys.server_event_notifications  
	WHERE name = 'DeadLockNotificationEvent'
)
BEGIN
	DROP EVENT NOTIFICATION DeadLockNotificationEvent
	ON SERVER;
END
GO


CREATE EVENT NOTIFICATION DeadLockNotificationEvent
ON SERVER
WITH FAN_IN
FOR DEADLOCK_GRAPH
TO SERVICE 
'http://soa/deadlock/service/ClientService', 
'current database'
GO

模拟死锁

至此为止,所有对象和准备工作已经准备完成,万事俱备只欠东风,让我们在Deadlock Client实例上模拟死锁场景。首先,我们在Test数据库下创建两个测试表,表名分别为:dbo.test_deadlock1和dbo.test_deadlock2,代码如下:

IF DB_ID('Test') IS NULL
	CREATE DATABASE Test;
GO

USE Test
GO

-- create two test tables
IF OBJECT_ID('dbo.test_deadlock1','u') IS NOT NULL
    DROP TABLE dbo.test_deadlock1
GO

CREATE TABLE dbo.test_deadlock1(
id INT IDENTITY(1,1) not null PRIMARY KEY
,name VARCHAR(20) null
);

IF OBJECT_ID('dbo.test_deadlock2','u') IS NOT NULL
    DROP TABLE dbo.test_deadlock2
GO

CREATE TABLE dbo.test_deadlock2(
id INT IDENTITY(1,1) not null PRIMARY KEY
,name VARCHAR(20) null
);

INSERT INTO dbo.test_deadlock1
SELECT 'AA'
UNION ALL
SELECT 'BB';


INSERT INTO dbo.test_deadlock2
SELECT 'AA'
UNION ALL
SELECT 'BB';
GO

接下来,我们使用SSMS打开一个新的连接,我们假设叫session 1,执行如下语句:

--session 1
USE Test
GO

BEGIN TRAN 
UPDATE dbo.test_deadlock1
SET name = 'CC'
WHERE id = 1
;
WAITFOR DELAY '00:00:05'

UPDATE dbo.test_deadlock2
SET name = 'CC'
WHERE id = 1
;
ROLLBACK

紧接着,我们使用SSMS打开第二个连接,假设叫Session 2,执行下面的语句:

--session 2
USE Test
GO

BEGIN TRAN 
UPDATE dbo.test_deadlock2
SET name = 'CC'
WHERE id = 1
;

UPDATE dbo.test_deadlock1
SET name = 'CC'
WHERE id = 1
;
COMMIT

等待一会儿功夫以后,死锁发生,并且Session 2做为了死锁的牺牲品,我们会在Session 2的SSMS信息窗口中看到如下的死锁信息:

Msg 1205, Level 13, State 51, Line 8
Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

用户查询死锁信息

根据上面的模拟死锁小节,说明死锁已经真真切切的发生了,那么,死锁信息到底有没有被捕获到呢?如果终端用户想要查看和分析所有客户端的死锁信息,只需要连接Deadlock Center SQL Server,执行下面的语句:

-- Run on Deadlock Center Server
USE DDLCenter
GO

SELECT * FROM [DDLCollector].[Deadlock_Info]

由于结果集宽度太宽,人为将查询结果分两段截图,第一段结果集展示如下: 03.png

第二段结果集截图如下: 04.png

从这个结果集,我们可以清楚的看到Deadlock Client发生死锁的详细信息,包含:

…… 等等。

踩过的坑

当Deadlock Client 上SQL Server发生两次或者两次以上的Deadlock事件以后,自建的Event Notification对象(名为:DeadLockNotificationEvent)会被SQL Server系统自动删除,从而导致整个死锁收集系统无法工作。

表象

SQL Server在错误日志中会抛出如下4个错误信息:两个错误编号为17004,一个编号为17001的错误,最后是一个编号为17005错误,其中17005明确说明了,Event Notification对象被删除了。如下:

Error: 17004, Severity: 16, State: 1.
Event notification conversation on dialog handle '{4A6A0FBD-7A34-E711-A709-001C42099969}' closed without an error.
Error: 17004, Severity: 16, State: 1.
Event notification conversation on dialog handle '{476A0FBD-7A34-E711-A709-001C42099969}' closed without an error.
Error: 17001, Severity: 16, State: 1.
Failure to send an event notification instance of type 'DEADLOCK_GRAPH' on conversation handle '{F711A404-7934-E711-A709-001C42099969}'. Error Code = '8429'.
Error: 17005, Severity: 16, State: 1.
Event notification 'DeadLockNotificationEvent' in database 'master' dropped due to send time service broker errors. Check to ensure the conversation handle, service broker contract, and service specified in the event notification are active.

错误日志截图如下: 05.png

问题分析

从错误提示信息due to send time service broker errors来看,最开始花了很长时间来排查Service Broker方面的问题,在长达数小时的问题排查无果后,静下心来仔细想想:如果是Service Broker有问题的话,我们不可能完成第一、第二条死锁信息的收集,所以问题应该与Service Broker没有直接关系。于是,注意到了错误提示信息的后半部分Check to ensure the conversation handle, service broker contract, and service specified in the event notification are active,再次以可以成功收集两条deadlock错误信息为由,排除Contact和Service的问题可能性,所以最有可能出问题的地方猜测应该是conversation handle,继续排查与conversation handle相关操作的地方,发现存储过程[DDLCollector].[UP_ProcessDeadlockEventMsg]的中的代码:

...
ELSE IF @Message_Type = 'http://schemas.microsoft.com/SQL/Notifications/EventNotification'      
		BEGIN               
			-- Record message log first
			INSERT INTO  [DDLCollector].[Deadlock_Traced_Records](Processed_Msg, [Processed_Msg_CheckSum])         
			VALUES(@Message_Body, CHECKSUM(CAST(@Message_Body as NVARCHAR(MAX))))
		
			END CONVERSATION @handle

			--Here call another Store Procedure to send deadlock graph info to center server
			EXEC [DDLCollector].[UP_SendDeadlockMsg] @Message_Body;
		END
...

这个逻辑分支不应该有End Conversation的操作,因为这里是与Event Notification相关的Message Type操作,而不是Service Broker相关的Message Type操作。

解决问题

问题分析清楚了,解决方法就非常简单了,注释掉这条语句END CONVERSATION @handle后,重新创建存储过程。再多次模拟死锁操作,再也没有出现Event Notification被系统自动删除的情况了,说明这个问题已经被彻底解决,坑已经被填上了。 解决问题的代码修改和注释如下截图,以此纪念下踩过的这个坑: 06.png

福利发放

以下是关于SQL Server死锁相关的系列文章,可以帮助我们全面了解、分析和解决死锁问题,其中第一个是这篇文章的视频演示。

最后总结

这篇文章是一个完整的SQL Server死锁收集系统典型案例介绍,你甚至可以很轻松简单的将这个方案应用到你的产品环境,来收集产品环境所有SQL Server实例发生死锁的详细信息,并根据该系统收集到的场景来改进和改善死锁发生的概率,从而降低死应用发生异常错误的可能性。因此这篇文章有着非常重要的现实价值和意义。