Thursday, November 12, 2009

Clustered SQL Servers - lessons learned

We have an active/passive, clustered in our production environment. We have two SQL instances on the passive node, we do this so we don’t have a perfectly good server sitting there doing nothing. Our main application used to report off our production database, but we replicate the databases to the secondary as well as to the warehouse, so the application on the primary can report off the secondary. I feel I have to point out that it was not designed by the company the source code was purchased and we continue to enhance and expand it. The other instance on the secondary instance is used as a “warm stand by” backup Instance in case another application database fails. Note this warm backup is not mirroring. So we have our main database server on primary, a reporting instance on the secondary and an additional instance that just needs to be up and ready.

Active/Active Cluster
This is not an Active/Active cluster; an Active/Active cluster is really two clusters, each with its own shared storage accessible only from the primary node. Each server is a primary on its cluster and secondary on the other cluster, assuming only two nodes in the cluster. If one server fails the other will run all instances.

Active/Passive Cluster
In Active/Passive cluster the primary can fail over to the secondary, but if the secondary server fails then you have to run around and find a replacement server or servers for what resided on it.  So whatever you install on the secondary on an Active\Passive cluster make sure it is not critical. In our case we have a plan if the secondary goes down. Of the two instances on the secondary, one is just a warm stand by it has an up to date copy of the databases to take over the application when we decide, so its only critical if the main database server goes down. The instance for reporting is for the application and we also replicate to our data warehouse server for reporting servers and for feeds for the data warehouse nightly process. We have two options here one is point the application reporting to the main database on the primary, this is not desirable for an extended period, but it is the easiest to do. The second option is to point the application reporting to our warehouse/reporting services copy of the data. This is the desirable solution, but it does take some work. We actually have two projects; one to point the application reporting to our data warehouse copy of the data and the second is rewrite the reports in SSRS. But with all projects there are always others that are higher priority and have revenue attached to it

Memory and Clusters
When the primary fails over all three instances reside on the secondary. One thing you have to be careful of in cases like this is the memory allocation, you don’t want to over allocate the memory. For example our secondary and servers each have 6 GB of RAM, you want to leave 1 GB for the OS and then divide the remaining 5GB up between the instances. In our case we gave 4GB to the main instance in the cluster, 1 GB of the reporting instance and 50 MB to the warm standby. If instead I gave each instance 3GB and set that as the minimum memory limit for each instance, so the total allocation of 9 GB of RAM when there is only 6 GB of RAM the memory will be over allocated. Um yeah that would not be good.

Applications and Clusters
We have several SSIS packages on our primary database server that encrypt data and FTP/FTPS the data to and from other sites. We use GNU pgp for the encryption and a third party FTP software to send and retrieve data. With applications, like these, that are not cluster aware, they have to be loaded on all nodes in a cluster. This became painfully obvious this past week when our primary server failed. What we had on the secondary was an outdated GNU pgp files and an evaluation copy of the FTP software that was outdated. The GNU pgp was easy to resolve, since its file based I just copied the directory over. Also GNU pgp could have been set up on a non C: drive. My best guess is that when it was developed on local machines there was only a C: and was set up that way in SSIS packages. We’re manually sending retrieving files with the FTP on another server for now until we can install a valid version of the FTP software on the secondary node.

Conclusion
As you plan out your cluster, design it with these lessons learned; For an Active/Passive Cluster keep the secondary free of other SQL instances (and other software that SQL Server does not need) and accept the fact that it will just sit there “unused” waiting for the day for it to take over and rule the cluster’s world. For either Active/Active or Active/Passive that has a secondary with SQL instances residing on it allocate memory accordingly and do not over allocate the memory. And lastly install software that is not cluster aware on all nodes, and UPDATE IT!!

Sunday, October 25, 2009

Disaster Recovery


Disaster Recovery
For a while we had a disaster recovery site and once a quarter we would bring our backups and restore the databases and we would see how far we could get bringing the business backup in one day. The site just had Servers, I got them after the OS was loaded, so I had to load SQL Server, and start restoring. We did not try to get everything up and running the first try, we set a goal of the databases and the main website and try to processes some business. Then in later visits we added to what we knew we could do, adding Phones (Voice over IP), Domains etc. The first time I just restored from a backup, the next and subsequent visits, I did a point in time restore using log restores. Oh and we have a ton of SQL logins that needed to be restored as well. So once a quarter for over two years we did this, after the second time of figuring out the restore script and restoring the logins etc I thought that there has to be a better way. This is what I came up with.
The first thing I did was create a directory in the backup path so I knew it was being backed up and would be available during our restore. I put the following scripts in this directory as well as other things that were needed:
·         SQLDTSFTPTask.msi which is a FTP Task for DTS packages
·         SQLServer2005_DTS.msi – since we are on 2005 and running DTS package we need the DTS add on
·         Current SQL Service Pack that servers are on.
·         Notes.txt – General instructions, things that were encountered
·         CreateDatabaselist.sql – creates a list of databases and their logic file names. Most likely the database files will have to move
·         Databaselist.txt – List of databases and their logical file names. We only have a primary file group, so if you have more than one you have to try something different.
·         CreateRestoreScripts.sql – this reads the databaselist.txt and given the path of the backups returns the restore scripts by reading the header information of the backups.
·         CreateLogins.sql – Script to recreate the SQL logins., Output from sp_help_revlogin
We always did a postmortem after wards and as things came up that would be help I added to the Notes.txt file. For example initially I did not have the SQL logins, but I did have a back up of master, so I put instructions on restoring master in the notes.TXT. You have to be in single user mode to restore master and I’m pretty sure, like msdb, the server that you are restoring to has to be at the same service pack that the backup is from. I was not too keen of restoring the master, but I was in a crunch.

/*
"CreateDatabaselist.sql"


Put the output from this into the file Databaselist.txt
*/

SET nocount ON
SELECT  d.NAME + ',' + MAX(CASE WHEN m.name NOT LIKE '%log' THEN m.name
                                ELSE ''
                           END) + ','
        + MAX(CASE WHEN m.name LIKE '%log' THEN m.name
                   ELSE ''
              END)
FROM    sys.Master_files m
JOIN    sys.databases d
        ON m.database_id = d.database_id
WHERE   d.name NOT IN ( 'tempdb', 'model', 'distribution' )
GROUP BY d.name 


Here is some sample output:
database1,database1_Data,database1_Log
database2,database2_Data,database2_Log
database3,Special_Data,Special_Log

For the above script I put it in a job to create the Databaselist.txt, the same one I run sp_help_revlogin.
The Script that does the heavy lifting is CreateRestoreScripts.sql below. It has been a while since I ran this and one thing we did for our recovery test was put the backups onto USB drivers so it was easier to restore. And yes we did test the restores from tape, but we knew that worked and we had limited time. I have to say that this script evolved. I think the first time I did a point in time I scripted the restores of the .TRN files sequentially, knowing full well that the backup was done in the middle some were. So when I ran the script and tried to load the earlier log files it spit out errors until it hit the right LSN number and from there on it loaded all subsequent files. LSN is the Log sequence number and what is looked at for the sequences of files to restores.   Comments are sparse in the script, sorry, so here are some notes:
·         Assumes Backup path is formatted DatabaseName\backups
·         Full backups are .BAK, Differentials are .DIF, and logs are .TRN
·         The #Header table I took from a script online way back when and did not record the author, my bad, thanks to that unknown SQL person
·         This uses DOS commands to get the list of files, here are some of the commands used
o   Dir /B – gives only file names
o   For a% in (PATH) do Echo … - this loops through all files in PATH and uses %a as variable, %~ta is the timestamp of that file
·         This does the INSERT INTO “table” Exec proc – I try to avoid this because of performance issues I’ve run into, but this is such a limited use it’s OK.
·         When restoring databases to a point in time they must restored WITH STANDBY, which this script does the last thing it does is restore the database with recovery so it’s ready for use.
SET NOCOUNT ON
-- set locations

-- backups location
DECLARE @BackUpPath VARCHAR(250)
SET @BackUpPath = 'E:\SQLSERVER\BACKUPs\'
DECLARE @NewSQLPath VARCHAR(250)
SET @NewSQLPath = 'D:\SQLSERVER\Data\'

/* tables used  */
-- table of actual db data
DROP TABLE #DBINFO
CREATE TABLE #DBINFO
    (
      DBNAME SYSNAME,
      DataFile VARCHAR(250),
      PathDataFile VARCHAR(250),
      LogFile VARCHAR(250),
      PathLogFile VARCHAR(250)
    )
BULK INSERT #DBINFO FROM 'E:\SQLSERVER\BACKUPs\DR Scripts\Databaselist.txt'
    WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' )
-- create table to insert from cmdshell
DROP TABLE #files
CREATE TABLE #files ( FILNAME VARCHAR(1000) )

-- this is used to get the list of backup files from the backup system
DROP TABLE  #FileList
CREATE TABLE #FileList
    (
      FileDate DATETIME,
      FILNAME VARCHAR(250)
    )

-- Restore header only Result set
DROP TABLE #Header
CREATE TABLE #Header
    (
      BackupName NVARCHAR(128),
      BackupDescription NVARCHAR(255),
      BackupType SMALLINT,
      ExpirationDate DATETIME,
      Compressed TINYINT,
      Position SMALLINT,
      DeviceType TINYINT,
      UserName NVARCHAR(128),
      ServerName NVARCHAR(128),
      DatabaseName NVARCHAR(128),
      DatabaseVersion INT,
      DatabaseCreationDate DATETIME,
      BackupSize NUMERIC(20, 0),
      FirstLSN NUMERIC(25, 0),
      LastLSN NUMERIC(25, 0),
      CheckpointLSN NUMERIC(25, 0),
      DatabaseBackupLSN NUMERIC(25, 0),
      BackupStartDate DATETIME,
      BackupFinishDate DATETIME,
      SortOrder SMALLINT,
      CodePage SMALLINT,
      UnicodeLocaleId INT,
      UnicodeComparisonStyle INT,
      CompatibilityLevel TINYINT,
      SoftwareVendorId INT,
      SoftwareVersionMajor INT,
      SoftwareVersionMinor INT,
      SoftwareVersionBuild INT,
      MachineName NVARCHAR(128),
      Flags INT,
      BindingID UNIQUEIDENTIFIER,
      RecoveryForkID UNIQUEIDENTIFIER,
      Collation NVARCHAR(128),
      FamilyGUID UNIQUEIDENTIFIER,
      HasBulkLoggedData BIT,
      IsSnapshot BIT,
      IsReadOnly BIT,
      IsSingleUser BIT,
      HasBackupChecksums BIT,
      IsDamaged BIT,
      BeginsLogChain BIT,
      HasIncompleteMetaData BIT,
      IsForceOffline BIT,
      IsCopyOnly BIT,
      FirstRecoveryForkID UNIQUEIDENTIFIER,
      ForkPointLSN NUMERIC(25, 0) NULL,
      RecoveryModel NVARCHAR(60),
      DifferentialBaseLSN NUMERIC(25, 0) NULL,
      DifferentialBaseGUID UNIQUEIDENTIFIER,
      BackupTypeDescription NVARCHAR(60),
      BackupSetGUID UNIQUEIDENTIFIER
    )


DECLARE @sql VARCHAR(3000)
-- pulls double duty as a SQL command string and a DOS command string


-- loop through the databases and load them
DECLARE @dbname VARCHAR(250),
    @DbbackupFilename VARCHAR(250),
    @lsn VARCHAR(50),
    @DataFileName VARCHAR(250),
    @DataFilePath VARCHAR(250),
    @LogFileName VARCHAR(250),
    @logFilePath VARCHAR(250),
    @filedate DATETIME


DELETE  FROM #DBINFO
WHERE   dbname <> @DBNAME
SET @dbname = ( SELECT  MIN(dbname)
                FROM    #DBINFO
              )


WHILE EXISTS ( SELECT TOP 1
                        *
               FROM     #DBINFO )
    BEGIN
        PRINT '/******           ' + @dbname
            + '                 ************/'
                       
        SELECT  @DataFileName = DataFile,
                @DataFilePath = PathDataFile,
                @LogFileName = LogFile,
                @logFilePath = PathLogFile
        FROM    #DBINFO
        WHERE   dbname = @dbname

            -- get the list of backup files
        SET @sql = 'dir ' + @BackUpPath + @dbname + ' /B'
        INSERT  INTO #files
                EXEC master..xp_cmdshell @sql
            -- CLEAN UP NULL VALUE
        DELETE  FROM #files
        WHERE   filname IS NULL

            -- find the lastest .BAK file
        SET @DbbackupFilename = ( SELECT TOP 1
                                            FILNAME
                                  FROM      #files
                                  WHERE     filname LIKE '%.bak'
                                  ORDER BY  FILNAME DESC
                                )
           
            -- get last Lsn number from backup
        SET @sql = 'restore headeronly from disk=''' + @BackUpPath + @dbname
            + '\' + @DbbackupFilename + ''''

        DELETE  FROM #Header
        INSERT  INTO #Header
                EXEC ( @sql
                    )
        SET @lsn = ( SELECT MAX(LastLSN)
                     FROM   #header
                   )
        SET @filedate = ( SELECT    BackupFinishDate
                          FROM      #header
                        )


        SET @SQL = 'RESTORE DATABASE ' + @DBname + ' FROM DISK = '''
            + @BackUpPath + @dbname + '\' + @DbbackupFilename + '''
      WITH MOVE ''' + @DataFileName + ''' TO ''' + @NewSQLPath + @DataFileName
            + '.mdf'',
         MOVE ''' + @LogFileName + ''' TO ''' + @NewSQLPath + @LogFileName
            + '.ldf'',
       REPLACE, STANDBY = ''' + @BackUpPath + @DBname + '\' + @DBname
            + '_Undo.dat'''
        PRINT @SQL

            -- remove ALL FULL BACKUPS FROM LIST
        DELETE  FROM #files
        WHERE   filname LIKE '%.bak'
        TRUNCATE TABLE #files
        SET @sql = 'for %a in (' + @BackUpPath + @dbname
            + '\*.dif) do @echo INSERT INTO #FileList Values(''%~ta'',''%a'')'
        INSERT  INTO #files
                EXEC master..xp_cmdshell @sql
        DELETE  FROM #files
        WHERE   ISNULL(FILNAME, 'XX') NOT LIKE 'INSERT%'

        WHILE EXISTS ( SELECT   *
                       FROM     #files )
            BEGIN
                SET @sql = ( SELECT MIN(FILNAME)
                             FROM   #files
                           )
                PRINT @sql -- just to verify it looks at all the .dif files
                EXEC ( @sql
                    )
                DELETE  FROM #files
                WHERE   @sql = FILNAME
            END
        DELETE  FROM #FileList
        WHERE   filedate < @filedate
            -- Load differentials
        WHILE EXISTS ( SELECT TOP 1
                                *
                       FROM     #FileList
                       WHERE    filname LIKE '%.dif' )
            BEGIN
                  -- get next file name
                SET @DbbackupFilename = ( SELECT TOP 1
                                                    FILNAME
                                          FROM      #FileList
                                          WHERE     filname LIKE '%.dif'
                                          ORDER BY  FILNAME ASC
                                        )
                 
                  -- get last Lsn number from backup
                SET @sql = 'restore headeronly from disk='''
                    + @DbbackupFilename + ''''

                  -- clear out header table and repopulate
                DELETE  FROM #Header
                INSERT  INTO #Header
                        EXEC ( @sql
                            )

                SET @filedate = ( SELECT    BackupFinishDate
                                  FROM      #header
                                )
                  -- if LSN is greater than last back up then load
                IF ( SELECT LastLSN
                     FROM   #header
                   ) >= @LSN
                    BEGIN
                        SET @sql = 'RESTORE DATABASE ' + @DBName
                            + ' FROM DISK=''' + @DbbackupFilename
                            + ''' WITH STANDBY=''' + @BackUpPath + @DBname
                            + '\' + @DBname + '_Undo.dat'''
                        PRINT @SQL

                        SET @lsn = ( SELECT LastLSN
                                     FROM   #header
                                   )           
                    END
                DELETE  FROM #FileList
                WHERE   filname = @DbbackupFilename  
            END        
            -- load logs

        TRUNCATE TABLE #files
        TRUNCATE TABLE #FileList
        SET @sql = 'for %a in (' + @BackUpPath + @dbname
            + '\*.trn) do @echo INSERT INTO #FileList Values(''%~ta'',''%a'')'
        INSERT  INTO #files
                EXEC master..xp_cmdshell @sql
        DELETE  FROM #files
        WHERE   ISNULL(FILNAME, 'XX') NOT LIKE 'INSERT%'

        WHILE EXISTS ( SELECT   *
                       FROM     #files )
            BEGIN
                SET @sql = ( SELECT MIN(FILNAME)
                             FROM   #files
                           )
                PRINT @sql --just to verify that it looks at all .trn files
                EXEC ( @sql
                    )
                DELETE  FROM #files
                WHERE   @sql = FILNAME
            END
        -- remove old files
        DELETE  FROM #FileList
        WHERE   filedate < @filedate
       
        WHILE EXISTS ( SELECT TOP 1
                                *
                       FROM     #FileList
                       WHERE    filname LIKE '%.trn' )
            BEGIN
                  -- get next file name
                SET @DbbackupFilename = ( SELECT TOP 1
                                                    FILNAME
                                          FROM      #FileList
                                          WHERE     filname LIKE '%.trn'
                                          ORDER BY  FILNAME ASC
                                        )
                 
                  -- get last Lsn number from backup
                SET @sql = 'restore headeronly from disk='''
                    + @DbbackupFilename + ''''

                  -- clear out header table and repopulate
                DELETE  FROM #Header
                INSERT  INTO #Header
                        EXEC ( @sql
                            )

                  -- if LSN is greater than last back up then load
                IF ( SELECT LastLSN
                     FROM   #header
                   ) >= @LSN
                    BEGIN
                        SET @sql = 'RESTORE LOG ' + @DBName + ' FROM DISK='''
                            + @DbbackupFilename + ''' WITH STANDBY='''
                            + @BackUpPath + @DBname + '\' + @DBname
                            + '_Undo.dat'''
                        PRINT @SQL

                        SET @lsn = ( SELECT LastLSN
                                     FROM   #header
                                   )           
                    END
                DELETE  FROM #FileList
                WHERE   filname = @DbbackupFilename  
            END        

        SET @sql = 'Restore Database ' + @DBName + ' with recovery'
        PRINT @SQL

        DELETE  FROM #DBINFO
        WHERE   dbname = @dbname
        SET @dbname = ( SELECT  MIN(dbname)
                        FROM    #DBINFO
                      )
    END