Archive

Archive for April, 2022

Setting up Always ON (SQL Server)

April 16th, 2022

 

There is a great little  docker setup that starts Always On (SQL Servers instance replication not to be confused with their article i.e. table replication). The docker setup creates a primary and a secondary. Find it here https://github.com/rafaelrodrigues3092/docker-mssql-alwayson

Just do

    git clone https://github.com/rafaelrodrigues3092/docker-mssql-alwayson
    cd docker-mssql-alwayson 
    docker compose up

and voila you’ll have a primary SQL Server with a Secondary where the “SALES” database is replicated.

To connect I use “sqlcli.sh” . It’s much nicer that sqlcmd but sqlcli has the drawback that it doesn’t handle @variables. For that I use Azure Studio

For this docker setup I use a script sqlcli.sh, and then run “sqlcli.sh 2500″ or “sqlcli.sh 2600″

CMD=//usr/local/bin/mssql-cli
PORT=",${1-1433}"
USER=SA
PW=Password123
HOST=localhost$PORT
 echo "$CMD -S $HOST -U $USER -P $PW --auto-vertical-output"
 eval "$CMD -S $HOST -U $USER -P $PW --auto-vertical-output"

I must say the SQL Server world seems bit opaque to me with all of the usage of GUI tools. GUI tool examples are mainly what shows up when I do searches on how to accomplish things, so I really appreciate examples that are SQL based. Here are a couple good references on setting up Always On with SQL

 

A couple of the things I wanted to modify in this docker configuration were:

  1. Add another database to replicate
  2. Add another secondary instance

Adding another database as surprisingly easy but it did have one trick which was to back up the database

   CREATE DATABASE KYLE;
   BACKUP DATABASE kyle TO DISK= '/tmp/kyle.back'
   ALTER AVAILABILITY GROUP AG1 ADD DATABASE kyle;

The surprising thing is that with the above commands the database was replicated. I didn’t need to recover the database on the secondary.

The next thing I did took me a lot longer. I wanted to add another secondary.

The first step was easy, I just duplicated the secondary called “db2″ in the docker.yml and called the next one db3 on a new port 2700 ( the primary d1 used 2500 and secondary db2 used 2600)

    db3:
        build: ./sql
        depends_on:
            - "db1"
        environment:
            INIT_SCRIPT: "aoag_third.sql"
            INIT_WAIT: 60
        ports:
            - "2700:1433"
        container_name: db3
        hostname: db3
        volumes:
            - mssql-server-shared:/var/opt/mssql/shared
            - mssql-server-backup:/var/opt/mssql/backup
        networks:
            - sqlaoag
        cap_add:
            - NET_ADMIN

That’s all it took to get a secondary up and running with “docker compose down”, then “docker compose up”. Problem is on the primary this new secondary db3 was reported as UNHEALTHY.

There were some changes that needed to be made to the SQL scripts that set up Always on.

One thing that threw me for a loop is that as I was making changes in docker container startup scripts, none of my changes were taking effect. Turns out I still had a image of one of my containers still floating around and it needed to be deleted before docker read my local files on startup again.

With this set of containers I can look for any remaining images with

% docker images | grep alwayson
docker-mssql-alwayson_db3             latest            f1c8834cd110   3 days ago      1.67GB
% docker image rmi docker-mssql-alwayson_db3

First the PRIMARY had to be changed in  ./sql/aoag_primary.sql to have db3 registered. The script aoag_primary.sq is run when starting up ‘”db1″

     SET @cmd ='
     CREATE AVAILABILITY GROUP [AG1]
     WITH (
         CLUSTER_TYPE = NONE
     )
     FOR REPLICA ON
     N'''+@@SERVERNAME+''' WITH
     (
         ENDPOINT_URL = N''tcp://'+@@SERVERNAME+':'+@hadr_port+''',
         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
         SEEDING_MODE = AUTOMATIC,
         FAILOVER_MODE = MANUAL,
         SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
     ),
     N''db2'' WITH
     (
         ENDPOINT_URL = N''tcp://db2:'+@hadr_port+''',
         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
         SEEDING_MODE = AUTOMATIC,
         FAILOVER_MODE = MANUAL,
         SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
     ),
     N''db3'' WITH
     (
         ENDPOINT_URL = N''tcp://db3:'+@hadr_port+''',
         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
         SEEDING_MODE = AUTOMATIC,
         FAILOVER_MODE = MANUAL,
         SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
     );
     ';
     exec sp_executesql @cmd

adding the part in red.

I’m not sure if the following is necessary or not. I’ll have to do some more testing but I change the always on user to have a different name on db2 than db3. I changed ./sql/aoag_secondary.sql as well as making a copy called ./sql/aoag_third.sql and change the user from aoag_login to aoag_login_db2 for secondary and aoag_login_db3 for third. For example the diffs from new and old aoag_secondary.sql  are:

     < SET @cmd = 'CREATE LOGIN aoag_login_db2 WITH PASSWORD = '''+@hadr_login_password+''', DEFAULT_DATABASE=[master],      CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF'
      ---      
     > SET @cmd = 'CREATE LOGIN aoag_login WITH PASSWORD = '''+@hadr_login_password+''', DEFAULT_DATABASE=[master],      CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF'
     25c25
     < CREATE USER aoag_user FOR LOGIN aoag_login_db2;
      --- 
     > CREATE USER aoag_user FOR LOGIN aoag_login;
     62c62
     < GRANT CONNECT ON ENDPOINT::Hadr_endpoint TO [aoag_login_db2];
      --- 
     > GRANT CONNECT ON ENDPOINT::Hadr_endpoint TO [aoag_login];

I also had to change the yaml to use ./sql/aoag_third.sql from db3, see red in yaml lines above

Code to set up 2 secondary Always on manually without using docker running the sql setup scripts. I added drops at the beginning of the scripts because I was running these multiple times.

Uncategorized

introducing network latency on Docker (SQL Server Always On testing)

April 13th, 2022

I am testing out Always On replication on SQL Server with a cool docker setup that includes primary and secondary https://github.com/rafaelrodrigues3092/docker-mssql-alwayson

When I created data on the primary there was never any lag on the secondary so I wanted to make sure my lag values were correct and wanted to introduce network latency between the primary and secondary to introduce lag time. For this there is a neat tool called “tc” that can introduced lag. For example on either or both of the docker containers , I can set up lag with

# add lag
tc qdisc add dev eth0 root netem delay 100ms
# delete lag
tc qdisc del dev eth0 root netem delay 100ms
#show lag
tc qdisc

Problem is, there was no “tc” on my docker containers and yum couldn’t find it (it is part of iproute2)

# yum install iproute2
Last metadata expiration check: 1:34:54 ago on Wed Apr 13 20:10:03 2022.
No match for argument: iproute2
Error: Unable to find a match: iproute2

I figure there is something wrong with the way the yum repos are setup on the docker images and I poked around there for a while but couldn’t get it to work, so then I tried to install by hand which turned out more onerous than I thought, so here I’m just documenting what was needed

create directory tc_build
cd tc_build
curl -o iproute2.tz  https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/snapshot/iproute2-5.17.0.tar.gz
curl -o bison.gz http://ftp.gnu.org/gnu/bison/bison-3.8.tar.gz
curl -o gettext.gz  https://ftp.gnu.org/pub/gnu/gettext/gettext-0.21.tar.gz
curl -o texinfo-6.8.tar.gz https://ftp.gnu.org/gnu/texinfo/texinfo-6.8.tar.gz
wget https://github.com/westes/flex/releases/download/v2.6.4/flex-2.6.4.tar.gz
yum install make
yum install m4
yum install libtool
yum install diffutils
for i in *z; do echo $i ; tar xvf $i ; done
mkdir trash
mv *z trash
cd textinfo*
 ./autogen.sh
 ./configure 
  make
  make install
cd ../gettext*
  make 
cd ../flex*
  make
cd ../bison*
  make  
cd ../iproute2*
  make  

The above isn’t exact but should get you mostly there. When it comes to “make” in each directory  its some variation of

  • autogen.sh
  • configure
  • make
  • make install

Not every directory needed all 4 commands but unfortunately didn’t take detailed enough notes to say which needed which. The most onerous part was “gettext” which was needed just for “autopoint” but I didn’t find a way just to install “autopoint” and “gettext” is big and takes a while to compile and install. Once it was all done “tc” was there. Then I just tar’ed the “tc” commands in /usr/sbin  in tc.tar.gz which I can copy to a docker container and voila it’s there.

 

Had to modify the docker-compose.yml to include

        cap_add:
            - NET_ADMIN

then after starting the containers , on both the primary and secondary, I set network latency

tc qdisc add dev eth0 root netem delay 500ms

The docker primary comes with SALES database that is replicated to the secondary, do I go on the primary, use SALES and create a bunch of data:

 

use sales;
create table foo(id int, content varchar(2000)) ;   
WITH mycte AS ( 
      SELECT 1 DataValue 
      UNION all 
      SELECT DataValue + 1 
      FROM    mycte    
      WHERE   DataValue + 1 <= 100000
 ) 
 INSERT INTO foo(id,content) 
 SELECT   DataValue,REPLICATE (NEWID() ,55) -- 
 FROM mycte m  
 OPTION (MAXRECURSION 0) 
 ; 

 

There is a nice sql query for monitoring Always On here https://dba.stackexchange.com/questions/278324/log-send-queue-size-and-redo-queue-size-in-ag

Here’s a short version of the query

SELECT 
    --AG info
    AGName                          = ag.name, 
    ReplicaServer                   = ar.replica_server_name, 
    DbName                          = adc.database_name, 
       Failover_ready                                    = ar.failover_mode_desc,
    -- State Info
    IsLocal                         = drs.is_local, 
    IsPrimary                       = drs.is_primary_replica, 
    SynchronizationState            = drs.synchronization_state_desc, 
    SynchronizationHealthState      = drs.synchronization_health_desc, 
    -- Queue Size & rates
    SendQueueSize                   = drs.log_send_queue_size,
    RedoQueueSize                   = drs.redo_queue_size, 
    SendRateKb                      = drs.log_send_rate, 
    RedoRateKb                      = drs.redo_rate, 
     --Oh yeah, filestream, too
    FileStreamSendRate              = drs.filestream_send_rate,
   drs.Secondary_lag_seconds 
FROM sys.dm_hadr_database_replica_states AS drs
JOIN sys.availability_databases_cluster AS adc 
             ON drs.group_id = adc.group_id AND 
                drs.group_database_id = adc.group_database_id
JOIN sys.availability_groups AS ag
             ON ag.group_id = drs.group_id
JOIN sys.availability_replicas AS ar 
             ON drs.group_id = ar.group_id AND 
                 drs.replica_id = ar.replica_id
ORDER BY 
    -- ag.name, ar.replica_server_name, adc.database_name;
    drs.log_send_queue_size + drs.redo_queue_size DESC;

primary

-[ RECORD 1 ]-------------------------
AGName                     | AG1
ReplicaServer              | db2
DbName                     | SALES
Failover_ready             | MANUAL
IsLocal                    | 0
IsPrimary                  | 0
SynchronizationState       | SYNCHRONIZED
SynchronizationHealthState | HEALTHY
SendQueueSize              | 6904
RedoQueueSize              | 9544
SendRateKb                 | 5113
RedoRateKb                 | 41343
FileStreamSendRate         | 0
Secondary_lag_seconds      | 0
-[ RECORD 2 ]-------------------------
AGName                     | AG1
ReplicaServer              | db1
DbName                     | SALES
Failover_ready             | MANUAL
IsLocal                    | 1
IsPrimary                  | 1
SynchronizationState       | SYNCHRONIZED
SynchronizationHealthState | HEALTHY
SendQueueSize              | NULL
RedoQueueSize              | NULL
SendRateKb                 | NULL
RedoRateKb                 | NULL
FileStreamSendRate         | NULL
Secondary_lag_seconds      | NULL

secondary

-[ RECORD 1 ]-------------------------
AGName                     | AG1
ReplicaServer              | db2
DbName                     | SALES
Failover_ready             | MANUAL
IsLocal                    | 1
IsPrimary                  | 0
SynchronizationState       | SYNCHRONIZED
SynchronizationHealthState | HEALTHY
SendQueueSize              | 60
RedoQueueSize              | 29068
SendRateKb                 | 4421
RedoRateKb                 | 41462
FileStreamSendRate         | 0
Secondary_lag_seconds      | NULL

another thing you can do is read the logs for Always On info

 EXEC xp_ReadErrorLog 0,1,"Always"  
-[ RECORD 1 ]-------------------------
LogDate     | 2022-04-13 18:47:56.440
ProcessInfo | spid27s
Text        | Always On: The availability replica manager is starting. This is an informational message only. No user action is required.
-[ RECORD 2 ]-------------------------
LogDate     | 2022-04-13 18:47:56.590
ProcessInfo | spid27s
Text        | Always On: The availability replica manager is waiting for the instance of SQL Server to allow client connections. This is an informational message only. No user action is required.
-[ RECORD 3 ]-------------------------
LogDate     | 2022-04-13 18:48:18.900
ProcessInfo | spid53
Text        | Always On: The local replica of availability group 'AG1' is preparing to transition to the primary role. This is an informational message only. No user action is required.
-[ RECORD 4 ...

Uncategorized