Archive

Author Archive

Setting up Always ON (SQL Server)

April 16th, 2022

 

There is a great little  docker setup that starts Always On (SQL Servers instance replication not to be confused with their article i.e. table replication). The docker setup creates a primary and a secondary. Find it here https://github.com/rafaelrodrigues3092/docker-mssql-alwayson

Just do

    git clone https://github.com/rafaelrodrigues3092/docker-mssql-alwayson
    cd docker-mssql-alwayson 
    docker compose up

and voila you’ll have a primary SQL Server with a Secondary where the “SALES” database is replicated.

To connect I use “sqlcli.sh” . It’s much nicer that sqlcmd but sqlcli has the drawback that it doesn’t handle @variables. For that I use Azure Studio

For this docker setup I use a script sqlcli.sh, and then run “sqlcli.sh 2500″ or “sqlcli.sh 2600″

CMD=//usr/local/bin/mssql-cli
PORT=",${1-1433}"
USER=SA
PW=Password123
HOST=localhost$PORT
 echo "$CMD -S $HOST -U $USER -P $PW --auto-vertical-output"
 eval "$CMD -S $HOST -U $USER -P $PW --auto-vertical-output"

I must say the SQL Server world seems bit opaque to me with all of the usage of GUI tools. GUI tool examples are mainly what shows up when I do searches on how to accomplish things, so I really appreciate examples that are SQL based. Here are a couple good references on setting up Always On with SQL

 

A couple of the things I wanted to modify in this docker configuration were:

  1. Add another database to replicate
  2. Add another secondary instance

Adding another database as surprisingly easy but it did have one trick which was to back up the database

   CREATE DATABASE KYLE;
   BACKUP DATABASE kyle TO DISK= '/tmp/kyle.back'
   ALTER AVAILABILITY GROUP AG1 ADD DATABASE kyle;

The surprising thing is that with the above commands the database was replicated. I didn’t need to recover the database on the secondary.

The next thing I did took me a lot longer. I wanted to add another secondary.

The first step was easy, I just duplicated the secondary called “db2″ in the docker.yml and called the next one db3 on a new port 2700 ( the primary d1 used 2500 and secondary db2 used 2600)

    db3:
        build: ./sql
        depends_on:
            - "db1"
        environment:
            INIT_SCRIPT: "aoag_third.sql"
            INIT_WAIT: 60
        ports:
            - "2700:1433"
        container_name: db3
        hostname: db3
        volumes:
            - mssql-server-shared:/var/opt/mssql/shared
            - mssql-server-backup:/var/opt/mssql/backup
        networks:
            - sqlaoag
        cap_add:
            - NET_ADMIN

That’s all it took to get a secondary up and running with “docker compose down”, then “docker compose up”. Problem is on the primary this new secondary db3 was reported as UNHEALTHY.

There were some changes that needed to be made to the SQL scripts that set up Always on.

One thing that threw me for a loop is that as I was making changes in docker container startup scripts, none of my changes were taking effect. Turns out I still had a image of one of my containers still floating around and it needed to be deleted before docker read my local files on startup again.

With this set of containers I can look for any remaining images with

% docker images | grep alwayson
docker-mssql-alwayson_db3             latest            f1c8834cd110   3 days ago      1.67GB
% docker image rmi docker-mssql-alwayson_db3

First the PRIMARY had to be changed in  ./sql/aoag_primary.sql to have db3 registered. The script aoag_primary.sq is run when starting up ‘”db1″

     SET @cmd ='
     CREATE AVAILABILITY GROUP [AG1]
     WITH (
         CLUSTER_TYPE = NONE
     )
     FOR REPLICA ON
     N'''+@@SERVERNAME+''' WITH
     (
         ENDPOINT_URL = N''tcp://'+@@SERVERNAME+':'+@hadr_port+''',
         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
         SEEDING_MODE = AUTOMATIC,
         FAILOVER_MODE = MANUAL,
         SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
     ),
     N''db2'' WITH
     (
         ENDPOINT_URL = N''tcp://db2:'+@hadr_port+''',
         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
         SEEDING_MODE = AUTOMATIC,
         FAILOVER_MODE = MANUAL,
         SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
     ),
     N''db3'' WITH
     (
         ENDPOINT_URL = N''tcp://db3:'+@hadr_port+''',
         AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
         SEEDING_MODE = AUTOMATIC,
         FAILOVER_MODE = MANUAL,
         SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL)
     );
     ';
     exec sp_executesql @cmd

adding the part in red.

I’m not sure if the following is necessary or not. I’ll have to do some more testing but I change the always on user to have a different name on db2 than db3. I changed ./sql/aoag_secondary.sql as well as making a copy called ./sql/aoag_third.sql and change the user from aoag_login to aoag_login_db2 for secondary and aoag_login_db3 for third. For example the diffs from new and old aoag_secondary.sql  are:

     < SET @cmd = 'CREATE LOGIN aoag_login_db2 WITH PASSWORD = '''+@hadr_login_password+''', DEFAULT_DATABASE=[master],      CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF'
      ---      
     > SET @cmd = 'CREATE LOGIN aoag_login WITH PASSWORD = '''+@hadr_login_password+''', DEFAULT_DATABASE=[master],      CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF'
     25c25
     < CREATE USER aoag_user FOR LOGIN aoag_login_db2;
      --- 
     > CREATE USER aoag_user FOR LOGIN aoag_login;
     62c62
     < GRANT CONNECT ON ENDPOINT::Hadr_endpoint TO [aoag_login_db2];
      --- 
     > GRANT CONNECT ON ENDPOINT::Hadr_endpoint TO [aoag_login];

I also had to change the yaml to use ./sql/aoag_third.sql from db3, see red in yaml lines above

Code to set up 2 secondary Always on manually without using docker running the sql setup scripts. I added drops at the beginning of the scripts because I was running these multiple times.

Uncategorized

introducing network latency on Docker (SQL Server Always On testing)

April 13th, 2022

I am testing out Always On replication on SQL Server with a cool docker setup that includes primary and secondary https://github.com/rafaelrodrigues3092/docker-mssql-alwayson

When I created data on the primary there was never any lag on the secondary so I wanted to make sure my lag values were correct and wanted to introduce network latency between the primary and secondary to introduce lag time. For this there is a neat tool called “tc” that can introduced lag. For example on either or both of the docker containers , I can set up lag with

# add lag
tc qdisc add dev eth0 root netem delay 100ms
# delete lag
tc qdisc del dev eth0 root netem delay 100ms
#show lag
tc qdisc

Problem is, there was no “tc” on my docker containers and yum couldn’t find it (it is part of iproute2)

# yum install iproute2
Last metadata expiration check: 1:34:54 ago on Wed Apr 13 20:10:03 2022.
No match for argument: iproute2
Error: Unable to find a match: iproute2

I figure there is something wrong with the way the yum repos are setup on the docker images and I poked around there for a while but couldn’t get it to work, so then I tried to install by hand which turned out more onerous than I thought, so here I’m just documenting what was needed

create directory tc_build
cd tc_build
curl -o iproute2.tz  https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/snapshot/iproute2-5.17.0.tar.gz
curl -o bison.gz http://ftp.gnu.org/gnu/bison/bison-3.8.tar.gz
curl -o gettext.gz  https://ftp.gnu.org/pub/gnu/gettext/gettext-0.21.tar.gz
curl -o texinfo-6.8.tar.gz https://ftp.gnu.org/gnu/texinfo/texinfo-6.8.tar.gz
wget https://github.com/westes/flex/releases/download/v2.6.4/flex-2.6.4.tar.gz
yum install make
yum install m4
yum install libtool
yum install diffutils
for i in *z; do echo $i ; tar xvf $i ; done
mkdir trash
mv *z trash
cd textinfo*
 ./autogen.sh
 ./configure 
  make
  make install
cd ../gettext*
  make 
cd ../flex*
  make
cd ../bison*
  make  
cd ../iproute2*
  make  

The above isn’t exact but should get you mostly there. When it comes to “make” in each directory  its some variation of

  • autogen.sh
  • configure
  • make
  • make install

Not every directory needed all 4 commands but unfortunately didn’t take detailed enough notes to say which needed which. The most onerous part was “gettext” which was needed just for “autopoint” but I didn’t find a way just to install “autopoint” and “gettext” is big and takes a while to compile and install. Once it was all done “tc” was there. Then I just tar’ed the “tc” commands in /usr/sbin  in tc.tar.gz which I can copy to a docker container and voila it’s there.

 

Had to modify the docker-compose.yml to include

        cap_add:
            - NET_ADMIN

then after starting the containers , on both the primary and secondary, I set network latency

tc qdisc add dev eth0 root netem delay 500ms

The docker primary comes with SALES database that is replicated to the secondary, do I go on the primary, use SALES and create a bunch of data:

 

use sales;
create table foo(id int, content varchar(2000)) ;   
WITH mycte AS ( 
      SELECT 1 DataValue 
      UNION all 
      SELECT DataValue + 1 
      FROM    mycte    
      WHERE   DataValue + 1 <= 100000
 ) 
 INSERT INTO foo(id,content) 
 SELECT   DataValue,REPLICATE (NEWID() ,55) -- 
 FROM mycte m  
 OPTION (MAXRECURSION 0) 
 ; 

 

There is a nice sql query for monitoring Always On here https://dba.stackexchange.com/questions/278324/log-send-queue-size-and-redo-queue-size-in-ag

Here’s a short version of the query

SELECT 
    --AG info
    AGName                          = ag.name, 
    ReplicaServer                   = ar.replica_server_name, 
    DbName                          = adc.database_name, 
       Failover_ready                                    = ar.failover_mode_desc,
    -- State Info
    IsLocal                         = drs.is_local, 
    IsPrimary                       = drs.is_primary_replica, 
    SynchronizationState            = drs.synchronization_state_desc, 
    SynchronizationHealthState      = drs.synchronization_health_desc, 
    -- Queue Size & rates
    SendQueueSize                   = drs.log_send_queue_size,
    RedoQueueSize                   = drs.redo_queue_size, 
    SendRateKb                      = drs.log_send_rate, 
    RedoRateKb                      = drs.redo_rate, 
     --Oh yeah, filestream, too
    FileStreamSendRate              = drs.filestream_send_rate,
   drs.Secondary_lag_seconds 
FROM sys.dm_hadr_database_replica_states AS drs
JOIN sys.availability_databases_cluster AS adc 
             ON drs.group_id = adc.group_id AND 
                drs.group_database_id = adc.group_database_id
JOIN sys.availability_groups AS ag
             ON ag.group_id = drs.group_id
JOIN sys.availability_replicas AS ar 
             ON drs.group_id = ar.group_id AND 
                 drs.replica_id = ar.replica_id
ORDER BY 
    -- ag.name, ar.replica_server_name, adc.database_name;
    drs.log_send_queue_size + drs.redo_queue_size DESC;

primary

-[ RECORD 1 ]-------------------------
AGName                     | AG1
ReplicaServer              | db2
DbName                     | SALES
Failover_ready             | MANUAL
IsLocal                    | 0
IsPrimary                  | 0
SynchronizationState       | SYNCHRONIZED
SynchronizationHealthState | HEALTHY
SendQueueSize              | 6904
RedoQueueSize              | 9544
SendRateKb                 | 5113
RedoRateKb                 | 41343
FileStreamSendRate         | 0
Secondary_lag_seconds      | 0
-[ RECORD 2 ]-------------------------
AGName                     | AG1
ReplicaServer              | db1
DbName                     | SALES
Failover_ready             | MANUAL
IsLocal                    | 1
IsPrimary                  | 1
SynchronizationState       | SYNCHRONIZED
SynchronizationHealthState | HEALTHY
SendQueueSize              | NULL
RedoQueueSize              | NULL
SendRateKb                 | NULL
RedoRateKb                 | NULL
FileStreamSendRate         | NULL
Secondary_lag_seconds      | NULL

secondary

-[ RECORD 1 ]-------------------------
AGName                     | AG1
ReplicaServer              | db2
DbName                     | SALES
Failover_ready             | MANUAL
IsLocal                    | 1
IsPrimary                  | 0
SynchronizationState       | SYNCHRONIZED
SynchronizationHealthState | HEALTHY
SendQueueSize              | 60
RedoQueueSize              | 29068
SendRateKb                 | 4421
RedoRateKb                 | 41462
FileStreamSendRate         | 0
Secondary_lag_seconds      | NULL

another thing you can do is read the logs for Always On info

 EXEC xp_ReadErrorLog 0,1,"Always"  
-[ RECORD 1 ]-------------------------
LogDate     | 2022-04-13 18:47:56.440
ProcessInfo | spid27s
Text        | Always On: The availability replica manager is starting. This is an informational message only. No user action is required.
-[ RECORD 2 ]-------------------------
LogDate     | 2022-04-13 18:47:56.590
ProcessInfo | spid27s
Text        | Always On: The availability replica manager is waiting for the instance of SQL Server to allow client connections. This is an informational message only. No user action is required.
-[ RECORD 3 ]-------------------------
LogDate     | 2022-04-13 18:48:18.900
ProcessInfo | spid53
Text        | Always On: The local replica of availability group 'AG1' is preparing to transition to the primary role. This is an informational message only. No user action is required.
-[ RECORD 4 ...

Uncategorized

MySQL : events_waits_current – who is waiting and who is on CPU

March 25th, 2022

Anyone familiar with Oracle performance tuning knows about ASH , aka active session history. ASH saves a history of every session that is active every collection, which by default is once per second. Once per second is fast enough to give a clear view of database load.

The ASH data comes from v$session which is a structure that contains information on every connection and shows which connections are active, what SQL they are running (SQL_ID) and if they are runnable on CPU or waiting on a wait_event for some resource like an I/O to complete or a lock or latch. It’s a pretty simple select query on v$session to select who is active,  their wait_event and SQL_ID.

On Oracle that gives us a query like:

select sid||':'||serial# session_id,
       username,
       s.sql_id||':'||sql_child_number sqlid,
       decode(state, 'WAITING', wait_class||':'||event, 'CPU') event
from v$session s
where   (( s.wait_time != 0  /* on CPU  */ and s.status='ACTIVE'  /*ACTIVE */)
            or s.wait_class  != 'Idle' )

On MySQL it’s not so straight forward.  The view that shows who is connect to the instances is performance_schema.threads, but this view doesn’t have the wait event.

Example content of threads:

*************************** 1. row ***************************
          THREAD_ID: 469
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 357
   PROCESSLIST_USER: shopper_22
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: dbmorders_6
PROCESSLIST_COMMAND: Query
   PROCESSLIST_TIME: 0
  PROCESSLIST_STATE: updating
   PROCESSLIST_INFO: UPDATE order_item SET sku=trim(sku) WHERE id < 553
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: TCP/IP
       THREAD_OS_ID: 246
     RESOURCE_GROUP: USR_default

Selecting out Active Session Data

SELECT
        threads.THREAD_ID,
    --  threads.PROCESSLIST_info sql_text,
        REPLACE(REPLACE(substr(threads.PROCESSLIST_info,1,40),'\r',''),'\n','') sql_text_short,
        threads.PROCESSLIST_USER,
        threads.PROCESSLIST_HOST,
        threads.PROCESSLIST_COMMAND,
        threads.PROCESSLIST_STATE
FROM
        performance_schema.threads AS threads 
 Where 
     threads.PROCESSLIST_STATE IS NOT NULL
 and threads.PROCESSLIST_COMMAND != 'Sleep'
 and threads.PROCESSLIST_COMMAND != 'Daemon'
 and threads.PROCESSLIST_ID != CONNECTION_ID()
;

-----------+------------------------------------------+------------------+------------------+---------------------+-------------------+
| THREAD_ID | sql_text_short                           | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_COMMAND | PROCESSLIST_STATE |
+-----------+------------------------------------------+------------------+------------------+---------------------+-------------------+
|       492 | SELECT * FROM order_item where sku LIKE  | shopper_11       | localhost        | Query               | executing         |
|       503 | SELECT * FROM order_item where sku LIKE  | shopper_1        | localhost        | Query               | executing         |
|       596 | SELECT * FROM order_item where sku LIKE  | shopper_7        | localhost        | Query               | executing         |
|       621 | INSERT INTO order_status_change (dbm_ord | shopper_6        | localhost        | Query               | executing         |
|       868 | INSERT INTO order_status_change (dbm_ord | shopper_2        | localhost        | Query               | executing         |
|       926 | INSERT INTO order_status_change (dbm_ord | shopper_13       | localhost        | Query               | executing         |
|     51380 | SELECT * FROM discount where description | shopper_7        | localhost        | Query               | executing         |
|    165954 | INSERT INTO order_status_change (dbm_ord | shopper_20       | localhost        | Query               | executing         |
|     98532 | SELECT * FROM discount where description | shopper_19       | localhost        | Query               | executing         |
|    843087 | select sleep(20)                         | dbmapp           | localhost        | Query               | User sleep        |
|    800275 | select sleep(20)                         | dbmapp           | localhost        | Query               | User sleep        |
|    932043 | select sleep(20)                         | dbmapp           | localhost        | Query               | User sleep        |
+-----------+------------------------------------------+------------------+------------------+---------------------+-------------------+

A few things:

  1. We get the SQL text directly from threads. Seems like this could be a bit inefficient if we collect and store this data frequently. Compare this to Oracle where we can just collect the SQL_ID each time and look up the SQL text just once per SQL_ID.  On MySQL we could has the text every collection and just store the text once per hash. TBD
  2. There is no WAIT_EVENT and no indication of who is waiting or who is runnable on CPU

 events_waits_current

To get wait events we should be able to look at performance_schema.events_waits_current and join threads using (THREAD_ID) .  Both views have the field THREAD_ID.

It sounds pretty simple as the documentation says there is one row per thread

https://dev.mysql.com/doc/refman/5.7/en/performance-schema-events-waits-current-table.html

The table stores one row per thread showing the current status of the thread’s most recent monitored wait event, so there is no system variable for configuring the table size.

Two problems:

  1. by default performance_schema.events_waits_current is empty
  2. there are cases where there are multiple lines in performance_schema.events_waits_current for the same thread_id

For the first issue where performance_schema.events_waits_current is empty, three things are required

  1. performance_schema enabled
  2. setup_consumers enabled
  3.  setup_instruments enabled

1. performance_schema enabled

Make sure

performance_schema  is set to ON:
SHOW VARIABLES LIKE 'performance_schema%';
+----------------------------------------------------------+-------+
| Variable_name                                            | Value |
+----------------------------------------------------------+-------+
| performance_schema                                       | ON    |
| performance_schema_accounts_size                         | -1    |
| performance_schema_digests_size                          | 10000 |
| performance_schema_error_size                            | 5018  |
| performance_schema_events_stages_history_long_size       | 10000 |
| performance_schema_events_stages_history_size            | 10    |
| performance_schema_events_statements_history_long_size   | 10000 |
| performance_schema_events_statements_history_size        | 10    |
| performance_schema_events_transactions_history_long_size | 10000 |
| performance_schema_events_transactions_history_size      | 10    |
| performance_schema_events_waits_history_long_size        | 10000 |
| performance_schema_events_waits_history_size             | 10    |
...

 

2. setup_consumers

https://dev.mysql.com/doc/mysql-perfschema-excerpt/5.6/en/performance-schema-wait-tables.html

The wait consumers are disabled by default:

mysql> SELECT *
       FROM performance_schema.setup_consumers
       WHERE NAME LIKE 'events_waits%';
+---------------------------+---------+
| NAME                      | ENABLED |
+---------------------------+---------+
| events_waits_current      | NO      |
| events_waits_history      | NO      |
| events_waits_history_long | NO      

To enable all the wait consumers :

      update performance_schema.setup_consumers set enabled="YES" where name like 'events_waits%';

3.setup_instruments

https://dev.mysql.com/doc/mysql-perfschema-excerpt/5.6/en/performance-schema-wait-tables.html

Some wait instruments are enabled by default; others are disabled. For example:

mysql> SELECT * FROM performance_schema.setup_instruments
       WHERE NAME LIKE 'wait/io/file/innodb%';
+--------------------------------------+---------+-------+
| NAME                                 | ENABLED | TIMED |
+--------------------------------------+---------+-------+
| wait/io/file/innodb/innodb_data_file | YES     | YES   |
| wait/io/file/innodb/innodb_log_file  | YES     | YES   |
| wait/io/file/innodb/innodb_temp_file | YES     | YES   |
+--------------------------------------+---------+-------+

mysql> SELECT * FROM performance_schema.setup_instruments
       WHERE NAME LIKE 'wait/io/socket/%';
+----------------------------------------+---------+-------+
| NAME                                   | ENABLED | TIMED |
+----------------------------------------+---------+-------+
| wait/io/socket/sql/server_tcpip_socket | NO      | NO    |
| wait/io/socket/sql/server_unix_socket  | NO      | NO    |
| wait/io/socket/sql/client_connection   | NO      | NO    |
+----------------------------------------+---------+-------+

There should be zero rows returned for

      SELECT * FROM performance_schema.setup_instruments WHERE NAME LIKE 'wait%' and ENABLED='NO';

To enable all the instruments :

      update performance_schema.setup_instruments set enabled='YES' , 
      timed='YES'where NAME LIKE 'wait/%';

Now we are ready to query events_waits_current to get wait_events for active thread_id values from performance_schema.threads

 select
     t.thread_id,
     t.PROCESSLIST_STATE,
     w.EVENT_ID,
     w.END_EVENT_ID,
     w.NESTING_EVENT_TYPE ,
     w.NESTING_EVENT_ID,
     w.SOURCE,
     COALESCE(
    IF ( t.PROCESSLIST_STATE ='User sleep','User sleep',  
        IF(w.EVENT_ID=w.END_EVENT_ID,'CPU',EVENT_NAME)  )
    ,'CPUbis')
       AS wait_event,
     REPLACE(REPLACE(substr(t.PROCESSLIST_info,1,40),'\r',''),'\n','') sql_text_short
 from
            performance_schema.threads t
  left join performance_schema.events_waits_current w on w.thread_id = t.thread_id              
 where 
         t.PROCESSLIST_STATE IS NOT NULL
   and   t.PROCESSLIST_COMMAND != 'Sleep' 
   and   t.PROCESSLIST_COMMAND != 'Daemon'
   and   t.PROCESSLIST_ID != CONNECTION_ID()
   order by t.thread_id
 ;


+-----------+-------------------+-----------+--------------+--------------------+------------------+---------------------------------------+------------------------------------------+
| thread_id | PROCESSLIST_STATE | EVENT_ID  | END_EVENT_ID | NESTING_EVENT_TYPE | NESTING_EVENT_ID | wait_event                            | sql_text                                 |
+-----------+-------------------+-----------+--------------+--------------------+------------------+---------------------------------------+------------------------------------------+
|       374 | Sending data      |  47464227 |         NULL | NULL               |             NULL | idle                                  | SELECT * FROM discount where description |
|       651 | Sending data      |  48306893 |         NULL | STATEMENT          |         48299933 | wait/io/table/sql/handler             | INSERT INTO order_status_change (dbm_ord |
|       651 | Sending data      |  48306901 |         NULL | WAIT               |         48306898 | wait/synch/mutex/innodb/lock_mutex    | INSERT INTO order_status_change (dbm_ord |
|       701 | Sending data      |  47680423 |         NULL | STATEMENT          |         47309200 | wait/io/table/sql/handler             | INSERT INTO order_status_change (dbm_ord |
|       701 | Sending data      |  47680425 |         NULL | WAIT               |         47680423 | wait/synch/mutex/innodb/trx_sys_mutex | INSERT INTO order_status_change (dbm_ord |
|       712 | Sending data      |  73823545 |         NULL | NULL               |             NULL | idle                                  | SELECT * FROM order_item where sku LIKE  |

Thread_id 651 and 701 both have two rows despite what the manual says about one row per thread!

Which row do we use? Or is there some bug?

Thanks to Mark Leith @MarkLeith who pointed out to me that the one to use is the one with the highest EVENT_ID

 select
     t.thread_id,
     t.PROCESSLIST_STATE,
     w.EVENT_ID,
     w.END_EVENT_ID,
     w.NESTING_EVENT_TYPE ,
     w.NESTING_EVENT_ID,
     w.SOURCE,
     COALESCE(
    IF ( t.PROCESSLIST_STATE ='User sleep','User sleep',  
        IF(w.EVENT_ID=w.END_EVENT_ID,'CPU',EVENT_NAME)  )
    ,'CPU')
       AS wait_event,
     REPLACE(REPLACE(substr(t.PROCESSLIST_info,1,40),'\r',''),'\n','') sql_text 
 from
            performance_schema.threads t
  left join performance_schema.events_waits_current w on w.thread_id = t.thread_id              
   and w.EVENT_ID in (
                           select max(w1.EVENT_ID)
                            from
                                performance_schema.threads t1
                     left join  performance_schema.events_waits_current w1
                     on w1.thread_id = t1.thread_id
                    where 
                            t1.PROCESSLIST_STATE IS NOT NULL
                      and   t1.PROCESSLIST_COMMAND != 'Sleep' 
                      and   t1.PROCESSLIST_COMMAND != 'Daemon'
                      and   t1.PROCESSLIST_ID != CONNECTION_ID()
                    group by t1.thread_id)
 where 
         t.PROCESSLIST_STATE IS NOT NULL
   and   t.PROCESSLIST_COMMAND != 'Sleep' 
   and   t.PROCESSLIST_COMMAND != 'Daemon'
   and   t.PROCESSLIST_ID != CONNECTION_ID()
   order by t.thread_id
 ;

+-----------+-------------------+-----------+--------------+--------------------+------------------+-------------------------------------------+------------------------------------------+
| thread_id | PROCESSLIST_STATE | EVENT_ID  | END_EVENT_ID | NESTING_EVENT_TYPE | NESTING_EVENT_ID | EVENT_NAME                                | sql_text                                 |
+-----------+-------------------+-----------+--------------+--------------------+------------------+-------------------------------------------+------------------------------------------+
|       672 | Sending data      |  47224916 |         NULL | STATEMENT          |         47224892 | wait/io/table/sql/handler                 | SELECT * FROM order_item where sku LIKE  |
|      1864 | User sleep        |  97625244 |         NULL | STATEMENT          |         97625228 | wait/synch/cond/sql/Item_func_sleep::cond | select sleep(20)                         |
|      1911 | updating          | 108506184 |         NULL | STATEMENT          |        108506129 | wait/io/table/sql/handler                 | update customer            set name=nam  |
|      2059 | User sleep        |  89344512 |         NULL | STATEMENT          |         89344496 | wait/synch/cond/sql/Item_func_sleep::cond | select sleep(20)                         |
+-----------+-------------------+-----------+--------------+--------------------+------------------+-------------------------------------------+------------------------------------------+

Now there are still a few problems. First how do we know who is on CPU and who is waiting?
If END_EVENT_ID has a value then the wait has finished and we should be on CPU.
Also what happens if there is not a corresponding row in events_waits_current for a thread_id that is active in performance_schema.threads? I imagine that means the thread is runnable on CPU as well. Putting this together we get

select
    t.thread_id,
    t.PROCESSLIST_COMMAND cmd,
    t.PROCESSLIST_STATE,
    w.EVENT_ID,
    w.END_EVENT_ID,
    w.NESTING_EVENT_TYPE ,
    w.NESTING_EVENT_ID,
    COALESCE( IF(w.EVENT_ID=w.END_EVENT_ID,'CPU',EVENT_NAME) ,'CPU') AS wait_event,
    REPLACE(REPLACE(substr(t.PROCESSLIST_info,1,40),'\r',''),'\n','') sql_text 
from
           performance_schema.threads t
 left join performance_schema.events_waits_current w on w.thread_id = t.thread_id  
  and w.EVENT_ID in (
                           select max(w1.EVENT_ID)
                            from
                                performance_schema.threads t1
                     left join  performance_schema.events_waits_current w1
                     on w1.thread_id = t1.thread_id
                    where 
                            t1.PROCESSLIST_STATE IS NOT NULL
                      and   t1.PROCESSLIST_COMMAND != 'Sleep' 
                      and   t1.PROCESSLIST_COMMAND != 'Daemon'
                      and   t1.PROCESSLIST_ID != CONNECTION_ID()
                    group by t1.thread_id)          
where 
        t.PROCESSLIST_STATE IS NOT NULL
  and   t.PROCESSLIST_COMMAND != 'Sleep' 
  and   t.PROCESSLIST_COMMAND != 'Daemon'
  and   t.PROCESSLIST_ID != CONNECTION_ID()
;

+-----------+-------------------+----------+--------------+--------------------+------------------+-------------------------------------------+------------------------------------------+
| thread_id | PROCESSLIST_STATE | EVENT_ID | END_EVENT_ID | NESTING_EVENT_TYPE | NESTING_EVENT_ID | wait_event                                | sql_text                                 |
+-----------+-------------------+----------+--------------+--------------------+------------------+-------------------------------------------+------------------------------------------+
|      1000 | Sending data      |  2101672 |         NULL | NULL               |             NULL | idle                                      | SELECT * FROM order_item where sku LIKE  |
|      1461 | User sleep        |   273962 |         NULL | NULL               |             NULL | wait/synch/cond/sql/Item_func_sleep::cond | select sleep(20)                         |
|      1466 | Sending data      |  1576483 |         NULL | NULL               |             NULL | idle                                      | SELECT * FROM discount where description |
|      1682 | Sending data      |  1037890 |         NULL | NULL               |             NULL | idle                                      | SELECT * FROM discount where description |
|      1888 | freeing items     |  1489102 |      1489102 | NULL               |             NULL | CPU                                       | COMMIT                                   |
+-----------+-------+-------------------+----------+--------------+--------------------+------------------+-------------------------------------------+------------------------------------------+

Now there are still strange things such as the “idle” wait_events? Why does performance_schema.threads say we are active but the corresponding row in events_waits_current say we are idle?
By the way, I only see these “idle” events on 5.6 and 5.7. On 8.0 they seem to be gone.
One nice thing is that this query runs on all three versions which is a bit surprising considering all the performance schema changes over those three versions.
Adding in performance_schema.events_statements_current & performance_schema.socket_instances

SELECT
    threads.thread_id, 
    threads.PROCESSLIST_ID,
    threads.PROCESSLIST_USER,
    threads.PROCESSLIST_HOST,
    threads.PROCESSLIST_DB,
    threads.PROCESSLIST_COMMAND,
    threads.PROCESSLIST_STATE ,
--  statement.TIMER_START AS event_timer_start,
--  statement.TIMER_END AS event_timer_end,
-- statement.LOCK_TIME,
    statement.CURRENT_SCHEMA,
--  threads.PROCESSLIST_info as SQL_TEXT,
    REPLACE(REPLACE(substr(threads.PROCESSLIST_info,1,40),'\r',''),'\n','') sql_text,
    COALESCE(
       IF ( threads.PROCESSLIST_STATE ='User sleep','User sleep',  
             IF(waits.EVENT_ID=waits.END_EVENT_ID,'CPU',waits.EVENT_NAME)  )
       ,'CPUbis') AS wait_event,
--  waits.TIMER_START AS wait_timer_start,
--  waits.TIMER_END AS wait_timer_end,
    waits.OBJECT_SCHEMA,
    waits.OBJECT_NAME,
    waits.INDEX_NAME,
    waits.OBJECT_TYPE,
    waits.SOURCE,
    socket.IP,
    socket.PORT
FROM
           performance_schema.threads                   AS threads
 LEFT JOIN performance_schema.events_waits_current      AS waits on waits.thread_id = threads.thread_id 
   and waits.EVENT_ID in (  select max(w1.EVENT_ID)
                           from   performance_schema.threads t1
                       left join  performance_schema.events_waits_current w1
                         on w1.thread_id = t1.thread_id
                   where 
                           t1.PROCESSLIST_STATE IS NOT NULL
                     and   t1.PROCESSLIST_COMMAND != 'Sleep' 
                     and   t1.PROCESSLIST_COMMAND != 'Daemon'
                     and   t1.PROCESSLIST_ID != CONNECTION_ID()
                   group by t1.thread_id)
 LEFT JOIN performance_schema.events_statements_current AS statement ON statement.thread_id = threads.thread_id
 LEFT JOIN performance_schema.socket_instances          AS socket  ON socket.thread_id = threads.thread_id
where 
        threads.PROCESSLIST_STATE IS NOT NULL
  and   threads.PROCESSLIST_COMMAND != 'Sleep' 
  and   threads.PROCESSLIST_COMMAND != 'Daemon'
  and   threads.PROCESSLIST_ID != CONNECTION_ID()
  order by threads.thread_id
;  
  

One thing that doesn’t make sense at all to me is that with the last line the SQL, the order by, the output includes lines with PROCESSLIST_COMMAND = Sleep even though there is an explicit threads.PROCESSLIST_COMMAND != ‘Sleep’. Anyone have an idea of what is happening? Is this a bug or is there an explanation?

-----------+----------------+------------------+------------------+----------------+---------------------+----------------------------+----------------+------------------------------------------+---------------------------+---------------+-------------+------------+-------------+--------------------+-----------+-------+
| thread_id | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB | PROCESSLIST_COMMAND | PROCESSLIST_STATE          | CURRENT_SCHEMA | sql_text                                 | wait_event                | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | OBJECT_TYPE | SOURCE             | IP        | PORT  |
+-----------+----------------+------------------+------------------+----------------+---------------------+----------------------------+----------------+------------------------------------------+---------------------------+---------------+-------------+------------+-------------+--------------------+-----------+-------+
|       557 |            445 | shopper_28       | localhost        | dbmorders_4    | Query               | executing                  | dbmorders_4    | INSERT INTO order_status_change (dbm_ord | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL               | 127.0.0.1 | 42832 |
|       582 |            470 | shopper_12       | localhost        | dbmorders_5    | Query               | executing                  | dbmorders_5    | SELECT * FROM order_item where sku LIKE  | wait/io/table/sql/handler | dbmorders_5   | order_item  | NULL       | TABLE       | handler.cc:2991    | 127.0.0.1 | 42884 |
|       608 |            496 | shopper_11       | localhost        | dbmorders_1    | Query               | executing                  | dbmorders_1    | INSERT INTO order_status_change (dbm_ord | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL               | 127.0.0.1 | 42940 |
|       750 |            638 | shopper_14       | localhost        | dbmorders_9    | Query               | waiting for handler commit | dbmorders_9    | INSERT INTO order_status_change (dbm_ord | CPU                       | NULL          | NULL        | NULL       | NULL        | sql_plugin.cc:2632 | 127.0.0.1 | 43240 |
|       789 |            677 | shopper_3        | localhost        | dbmorders_4    | Query               | executing                  | dbmorders_4    | INSERT INTO order_status_change (dbm_ord | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL               | 127.0.0.1 | 43320 |
|       811 |            699 | shopper_7        | localhost        | dbmorders      | Sleep               | NULL                       | dbmorders      | NULL                                     | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL               | 127.0.0.1 | 43366 |
|       866 |            754 | shopper_4        | localhost        | dbmorders_8    | Sleep               | NULL                       | dbmorders_8    | NULL                                     | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL               | 127.0.0.1 | 43482 |
|       921 |            809 | dbmapp           | localhost        | dbm            | Query               | updating                   | dbm            | update customer            set name=nam  | wait/io/table/sql/handler | dbm           | customer    | PRIMARY    | TABLE       | handler.cc:3262    | 127.0.0.1 | 43598 |
|       938 |            826 | dbmapp           | localhost        | dbm            | Query               | User sleep                 | dbm            | select sleep(20)                         | User sleep                | NULL          | NULL        | NULL       | NULL        | item_func.cc:5301  | 127.0.0.1 | 43636 |


where as without the order by there are no threads.PROCESSLIST_COMMAND = ‘Sleep’

+-----------+----------------+------------------+------------------+----------------+---------------------+----------------------------+----------------+------------------------------------------+---------------------------+---------------+-------------+------------+-------------+-------------------+-----------+-------+
| thread_id | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB | PROCESSLIST_COMMAND | PROCESSLIST_STATE          | CURRENT_SCHEMA | sql_text                                 | wait_event                | OBJECT_SCHEMA | OBJECT_NAME | INDEX_NAME | OBJECT_TYPE | SOURCE            | IP        | PORT  |
+-----------+----------------+------------------+------------------+----------------+---------------------+----------------------------+----------------+------------------------------------------+---------------------------+---------------+-------------+------------+-------------+-------------------+-----------+-------+
|       657 |            545 | shopper_9        | localhost        | dbmorders_7    | Query               | executing                  | dbmorders_7    | SELECT * FROM dbm_user where username LI | wait/io/table/sql/handler | dbmorders_7   | dbm_user    | NULL       | TABLE       | handler.cc:2991   | 127.0.0.1 | 43046 |
|       485 |            373 | shopper_8        | localhost        | dbmorders_4    | Query               | executing                  | dbmorders_4    | SELECT * FROM dbm_user where username LI | wait/io/table/sql/handler | dbmorders_4   | dbm_user    | NULL       | TABLE       | handler.cc:2991   | 127.0.0.1 | 42666 |
|       561 |            448 | shopper_23       | localhost        | dbmorders_5    | Query               | executing                  | dbmorders_5    | INSERT INTO order_status_change (dbm_ord | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL              | 127.0.0.1 | 42840 |
|       778 |            666 | shopper_5        | localhost        | dbmorders      | Query               | executing                  | dbmorders      | INSERT INTO order_status_change (dbm_ord | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL              | 127.0.0.1 | 43298 |
|     26965 |           1061 | shopper_17       | localhost        | dbmorders_2    | Query               | executing                  | dbmorders_2    | INSERT INTO order_status_change (dbm_ord | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL              | 127.0.0.1 | 44132 |
|       720 |            608 | shopper_22       | localhost        | dbmorders_1    | Query               | executing                  | dbmorders_1    | SELECT * FROM discount where description | wait/io/table/sql/handler | dbmorders_1   | discount    | NULL       | TABLE       | handler.cc:2991   | 127.0.0.1 | 43176 |
|       726 |            614 | shopper_22       | localhost        | dbmorders_5    | Query               | executing                  | dbmorders_5    | SELECT * FROM discount where description | wait/io/table/sql/handler | dbmorders_5   | discount    | NULL       | TABLE       | handler.cc:2991   | 127.0.0.1 | 43188 |
|     10663 |            995 | shopper_25       | localhost        | dbmorders_2    | Query               | executing                  | dbmorders_2    | INSERT INTO order_status_change (dbm_ord | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL              | 127.0.0.1 | 43988 |
|      8619 |            987 | shopper_24       | localhost        | dbmorders_9    | Query               | executing                  | dbmorders_9    | INSERT INTO order_status_change (dbm_ord | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL              | 127.0.0.1 | 43970 |
|      1350 |            862 | shopper_27       | localhost        | dbmorders_5    | Query               | executing                  | dbmorders_5    | SELECT * FROM dbm_user where username LI | wait/io/table/sql/handler | dbmorders_5   | dbm_user    | NULL       | TABLE       | handler.cc:2991   | 127.0.0.1 | 43712 |
|    222821 |           1341 | dbmapp           | localhost        | dbm            | Query               | updating                   | dbm            | update customer            set name=nam  | wait/io/table/sql/handler | dbm           | customer    | PRIMARY    | TABLE       | handler.cc:3262   | 127.0.0.1 | 44734 |
|    108033 |           1233 | dbmapp           | localhost        | dbm            | Query               | User sleep                 | dbm            | select sleep(20)                         | User sleep                | NULL          | NULL        | NULL       | NULL        | item_func.cc:5301 | 127.0.0.1 | 44502 |
|     31247 |           1075 | dbmapp           | localhost        | dbm            | Query               | User sleep                 | dbm            | select sleep(20)                         | User sleep                | NULL          | NULL        | NULL       | NULL        | item_func.cc:5301 | 127.0.0.1 | 44164 |
|     58167 |           1143 | shopper_26       | localhost        | dbmorders_1    | Query               | executing                  | dbmorders_1    | INSERT INTO order_status_change (dbm_ord | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL              | 127.0.0.1 | 44306 |
|    550359 |           1871 | root             | localhost        | kyle           | Query               | updating                   | kyle           | update toto set id=4                     | wait/io/table/sql/handler | kyle          | toto        | NULL       | TABLE       | handler.cc:2991   | 127.0.0.1 | 45860 |
|    550417 |           1929 | dbm              | localhost        | sbtest         | Query               | waiting for handler commit | sbtest         | COMMIT                                   | CPUbis                    | NULL          | NULL        | NULL       | NULL        | NULL              | 127.0.0.1 | 45984 |
+-----------+----------------+------------------+------------------+----------------+---------------------+----------------------------+----------------+------------------------------------------+---------------------------+---------------+-------------+------------+-------------+-------------------+-----------+-------+

From Twitter, Øystein Grøvlen @ogrovlen, suggests
·
Regarding your last question, there is no concurrency control for performance schema tables, and the optimizer may choose to fetch column values after sorting. I suggest to try, ORDER BY threads.thread_id, threads.PROCESSLIST_COMMAND
 order by “threads.thread_id, threads.PROCESSLIST_COMMAND” didn’t work for me. Still getting sleep in the output but it gave me a idea  about adding a column from events_waits_current to the order by. That did work and now the sleeps are correctly filtered out
“order by t.thread_id , w.EVENT_NAME”

Uncategorized

MySQL -> what to do with NESTING_EVENT_ID

March 24th, 2022

How does one use  NESTING_EVENT_ID?

Below I’ve selected from  events_waits_history_long for a single thread, but none of the  NESTING_EVENT_ID values show up in EVENT_ID or END_EVENT_ID.

To simplify things , I’m just showing the data for one THREAD_ID listed in order of TIMER_START.

mysql>  select 
            THREAD_ID,
            EVENT_ID,
            END_EVENT_ID,
            NESTING_EVENT_ID,
            NESTING_EVENT_TYPE,
            EVENT_NAME,
            SOURCE,
            TIMER_WAIT,
            SPINS,
            OBJECT_SCHEMA,
            OBJECT_NAME,
            INDEX_NAME,
            OBJECT_TYPE,
            OBJECT_TYPE,
            OBJECT_INSTANCE_BEGIN,
            OPERATION,
            NUMBER_OF_BYTES,
            FLAGS 
 from 
           performance_schema.events_waits_history_long 
 where 
            THREAD_ID in ( select max(thread_id) from performance_schema.events_waits_history_long where NESTING_EVENT_ID is not NULL )
 order by 
           THREAD_ID, TIMER_START
 Limit 100;
+-----------+----------+--------------+------------------+--------------------+-------------------------------------------+---------------------------------+---------------+-------+---------------+----------------------------+------------+-------------+-------------+-----------------------+---------------+-----------------+-------+
| THREAD_ID | EVENT_ID | END_EVENT_ID | NESTING_EVENT_ID | NESTING_EVENT_TYPE | EVENT_NAME                                | SOURCE                          | TIMER_WAIT    | SPINS | OBJECT_SCHEMA | OBJECT_NAME                | INDEX_NAME | OBJECT_TYPE | OBJECT_TYPE | OBJECT_INSTANCE_BEGIN | OPERATION     | NUMBER_OF_BYTES | FLAGS |
+-----------+----------+--------------+------------------+--------------------+-------------------------------------------+---------------------------------+---------------+-------+---------------+----------------------------+------------+-------------+-------------+-----------------------+---------------+-----------------+-------+
| 518033674 |    92174 |        92174 |            92171 | WAIT               | wait/io/socket/sql/client_connection      | viosocket.cc:145                |          NULL |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | recv          |               4 |  NULL |
| 518033674 |    92232 |        92232 |            92231 | WAIT               | wait/io/socket/sql/client_connection      | viosocket.cc:145                |          NULL |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | recv          |               0 |  NULL |
| 518033674 |    92210 |        92210 |            92207 | WAIT               | wait/io/socket/sql/client_connection      | viosocket.cc:145                |          NULL |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | recv          |               4 |  NULL |
| 518033674 |    92209 |        92209 |            92207 | WAIT               | wait/io/socket/sql/client_connection      | viosocket.cc:813                |          NULL |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | select        |            NULL |  NULL |
| 518033674 |    92208 |        92208 |            92207 | WAIT               | wait/io/socket/sql/client_connection      | viosocket.cc:145                |          NULL |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | recv          |               0 |  NULL |
| 518033674 |    92173 |        92173 |            92171 | WAIT               | wait/io/socket/sql/client_connection      | viosocket.cc:813                |          NULL |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | select        |            NULL |  NULL |
| 518033674 |    92176 |        92176 |            92175 | STATEMENT          | wait/io/socket/sql/client_connection      | viosocket.cc:145                |       1612800 |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | recv          |              97 |  NULL |
| 518033674 |    92177 |        92177 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_data   | sql_class.h:3758                |        149600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541392 | lock          |            NULL |  NULL |
| 518033674 |    92178 |        92178 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.h:2970                |        512800 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92179 |        92179 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.cc:2152               |        232000 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92180 |        92180 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.h:2970                |         69600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92181 |        92181 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.h:2970                |         73600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92182 |        92182 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_query_plan | sql_class.h:1044                |        190400 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541584 | lock          |            NULL |  NULL |
| 518033674 |    92183 |        92183 |            92175 | STATEMENT          | wait/synch/mutex/sql/LOCK_table_cache     | table_cache.h:140               |        328000 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |              94768184 | lock          |            NULL |  NULL |
| 518033674 |    92184 |        92184 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_data   | sql_class.h:3769                |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541392 | lock          |            NULL |  NULL |
| 518033674 |    92185 |        92185 |            92175 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1355                  |        214400 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92186 |        92186 |            92175 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1383                  |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92187 |        92191 |            92175 | STATEMENT          | wait/lock/table/sql/handler               | handler.cc:7789                 |       2965600 |  NULL | dbm           | info                       | PRIMARY    | TABLE       | TABLE       |        47817989761064 | read external |            NULL |  NULL |
| 518033674 |    92188 |        92188 |            92187 | WAIT               | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1355                  |         33600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92189 |        92189 |            92187 | WAIT               | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1383                  |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92191 |        92191 |            92187 | WAIT               | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1355                  |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92192 |        92192 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_query_plan | sql_class.h:1044                |        114400 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541584 | lock          |            NULL |  NULL |
| 518033674 |    92193 |        92195 |            92175 | STATEMENT          | wait/io/table/sql/handler                 | handler.cc:3280                 |      18676800 |  NULL | dbm           | info                       | PRIMARY    | TABLE       | TABLE       |        47817989761064 | fetch         |               1 |  NULL |
| 518033674 |    92194 |        92194 |            92193 | WAIT               | wait/synch/mutex/innodb/trx_sys_mutex     | read0read.cc:544                |        123200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811968171800 | lock          |            NULL |  NULL |
| 518033674 |    92195 |        92195 |            92193 | WAIT               | wait/synch/sxlock/innodb/btr_search_latch | btr0sea.ic:122                  |        158400 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811987295224 | shared_lock   |            NULL |  NULL |
| 518033674 |    92196 |        92196 |            92175 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1383                  |         45600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92197 |        92197 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_query_plan | sql_class.h:1044                |         52000 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541584 | lock          |            NULL |  NULL |
| 518033674 |    92198 |        92198 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_query_plan | sql_class.h:1044                |         66400 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541584 | lock          |            NULL |  NULL |
| 518033674 |    92199 |        92199 |            92175 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1355                  |         45600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92200 |        92200 |            92175 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1383                  |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92201 |        92201 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_data   | sql_base.cc:1727                |         45600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541392 | lock          |            NULL |  NULL |
| 518033674 |    92202 |        92202 |            92175 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | ha_innodb.cc:17396              |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92203 |        92203 |            92175 | STATEMENT          | wait/synch/mutex/sql/LOCK_table_cache     | table_cache.h:140               |         45600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |              94768184 | lock          |            NULL |  NULL |
| 518033674 |    92204 |        92204 |            92175 | STATEMENT          | wait/io/socket/sql/client_connection      | viosocket.cc:220                |      20792800 |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | send          |              77 |  NULL |
| 518033674 |    92205 |        92205 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.cc:2152               |         75200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92206 |        92206 |            92175 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.h:2970                |         34400 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92212 |        92212 |            92211 | STATEMENT          | wait/io/socket/sql/client_connection      | viosocket.cc:145                |        454400 |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | recv          |               9 |  NULL |
| 518033674 |    92213 |        92213 |            92211 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_data   | sql_class.h:3758                |         37600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541392 | lock          |            NULL |  NULL |
| 518033674 |    92214 |        92214 |            92211 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.h:2970                |         32800 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92215 |        92215 |            92211 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.cc:2152               |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92216 |        92216 |            92211 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.h:2970                |         52800 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92217 |        92217 |            92211 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.h:2970                |         47200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92218 |        92218 |            92211 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_query_plan | sql_class.h:1044                |         44800 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541584 | lock          |            NULL |  NULL |
| 518033674 |    92219 |        92219 |            92211 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1355                  |         44800 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92220 |        92220 |            92211 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | lock0lock.cc:6386               |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92221 |        92221 |            92211 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1331                  |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92222 |        92222 |            92211 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.cc:1771                 |         28800 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92223 |        92223 |            92211 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | lock0lock.cc:6406               |        121600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92224 |        92224 |            92211 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | lock0lock.cc:6458               |         27200 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92225 |        92225 |            92211 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.cc:1957                 |         44000 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92226 |        92226 |            92211 | STATEMENT          | wait/synch/mutex/innodb/trx_mutex         | trx0trx.h:1383                  |         28000 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47811978633272 | lock          |            NULL |  NULL |
| 518033674 |    92227 |        92227 |            92211 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_query_plan | sql_class.h:1044                |         45600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541584 | lock          |            NULL |  NULL |
| 518033674 |    92228 |        92228 |            92211 | STATEMENT          | wait/io/socket/sql/client_connection      | viosocket.cc:220                |       4366400 |  NULL | NULL          | ::ffff:10.135.133.25:38102 | NULL       | SOCKET      | SOCKET      |        47811962370304 | send          |              11 |  NULL |
| 518033674 |    92229 |        92229 |            92211 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.cc:2152               |         30400 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92230 |        92230 |            92211 | STATEMENT          | wait/synch/mutex/sql/THD::LOCK_thd_query  | sql_class.h:2970                |         37600 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |        47817989541440 | lock          |            NULL |  NULL |
| 518033674 |    92171 |        92174 |             NULL | NULL               | idle                                      | init_net_server_extension.cc:66 | 4024163774000 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |                     0 | idle          |            NULL |  NULL |
| 518033674 |    92207 |        92210 |             NULL | NULL               | idle                                      | init_net_server_extension.cc:66 |     761912000 |  NULL | NULL          | NULL                       | NULL       | NULL        | NULL        |                     0 | idle          |            NULL |  NULL |
+-----------+----------+--------------+------------------+--------------------+-------------------------------------------+---------------------------------+---------------+-------+---------------+----------------------------+------------+-------------+-------------+-----------------------+---------------+-----------------+-------+

mysql> select * from performance_schema.events_waits_history_long  
    where EVENT_ID in (
       select NESTING_EVENT_ID from performance_schema.events_waits_history_long  
        where
          THREAD_ID in ( select max(thread_id) from  performance_schema.events_waits_history_long where NESTING_EVENT_ID is not NULL )
    );
Empty set (0.17 sec)


mysql> select * from performance_schema.events_waits_history_long  
      where END_EVENT_ID in (
         select NESTING_EVENT_ID from performance_schema.events_waits_history_long  
         where
          THREAD_ID in ( select max(thread_id) from  performance_schema.events_waits_history_long where NESTING_EVENT_ID is not NULL )
      );
Empty set (0.14 sec)



Sometimes when selecting from performance_schema.events_waits_current I get two lines for the same thread:

 select
    t.thread_id,
    t.PROCESSLIST_COMMAND cmd,
    ww.EVENT_ID,
    ww.END_EVENT_ID,
    ww.NESTING_EVENT_TYPE ,
    ww.NESTING_EVENT_ID,
    ww.EVENT_NAME  wait_event,
    REPLACE(REPLACE(substr(t.PROCESSLIST_info,1,80),'\r',''),'\n','') sql_text 
from
                performance_schema.threads t
    left   join  performance_schema.events_waits_current ww on ww.thread_id = t.thread_id
where 
        t.PROCESSLIST_STATE IS NOT NULL
  and ( t.PROCESSLIST_COMMAND != 'Sleep' and t.PROCESSLIST_COMMAND != 'Daemon')
  and   t.PROCESSLIST_ID != CONNECTION_ID();

+-----------+-------+----------+--------------+--------------------+------------------+-------------------------------------------+----------------------------------------------------------------------------------+
| thread_id | cmd   | EVENT_ID | END_EVENT_ID | NESTING_EVENT_TYPE | NESTING_EVENT_ID | wait_event                                | sql_text                                                                         |
+-----------+-------+----------+--------------+--------------------+------------------+-------------------------------------------+----------------------------------------------------------------------------------+
| 518032847 | Query |    76405 |         NULL | STATEMENT          |            76347 | wait/io/table/sql/handler                 | SELECT customer.id_ AS customer_id_, customer.name AS customer_name, customer.de |
| 518032847 | Query |    91369 |         NULL | WAIT               |            76405 | wait/io/file/sql/hash_join                | SELECT customer.id_ AS customer_id_, customer.name AS customer_name, customer.de |

I see that  NESTING_EVENT_ID 76405  for  second row  where NESTING_EVENT_TYPE = WAIT is the same as the EVENT_ID for the first line where NESTING_EVENT_TYPE = STATEMENT, but what information does this communicate?

 

https://dev.mysql.com/doc/refman/5.7/en/performance-schema-events-statements-current-table.html

  • NESTING_EVENT_IDNESTING_EVENT_TYPENESTING_EVENT_LEVEL

    These three columns are used with other columns to provide information as follows for top-level (unnested) statements and nested statements (executed within a stored program).

    For top level statements:

    OBJECT_TYPE = NULL
    OBJECT_SCHEMA = NULL
    OBJECT_NAME = NULL
    NESTING_EVENT_ID = NULL
    NESTING_EVENT_TYPE = NULL
    NESTING_LEVEL = 0

    For nested statements:

    OBJECT_TYPE = the parent statement object type
    OBJECT_SCHEMA = the parent statement object schema
    OBJECT_NAME = the parent statement object name
    NESTING_EVENT_ID = the parent statement EVENT_ID
    NESTING_EVENT_TYPE = 'STATEMENT'
    NESTING_LEVEL = the parent statement NESTING_LEVEL plus one

Get the last statements for thread with blocking  transactions

select 
   t.thread_id, 
   s.EVENT_ID, 
   REPLACE(REPLACE(substr(s.SQL_TEXT,1,40),'\r',''),'\n','') thread_proc_info_short, 
   s.TIMER_WAIT/100000000000 time
from 
        performance_schema.events_transactions_current t
  join performance_schema.events_statements_history s on s.NESTING_EVENT_ID = t.EVENT_ID
where t.thread_id in ( select BLOCKING_THREAD_ID from performance_schema.data_lock_waits )
  and s.NESTING_EVENT_TYPE='TRANSACTION'
order by t.thread_id, s.EVENT_ID;

+-----------+----------+------------------------------------------+----------+
| thread_id | EVENT_ID | thread_proc_info_short                   | time     |
+-----------+----------+------------------------------------------+----------+
|    941942 |  1157846 | update toto set id=4                     |   0.0044 |
|    941942 |  1157944 | select    t.thread_id,    s.EVENT_ID,    |   2.1082 |
|    941942 |  1158151 | update toto set id=9                     |   0.2056 |
|    941942 |  1158259 | select    t.thread_id,    s.EVENT_ID,    |   1.1989 |
|    941942 |  1158433 | select BLOCKING_THREAD_ID from performan |   0.2601 |
|    942146 |     6245 | update titi set id=2                     |   0.0089 |
|    942146 |     6354 | update toto set id=999                   | 508.0915 |
|    942146 |     6455 | SET SESSION innodb_lock_wait_timeout =10 |   0.0032 |
|    949290 |     5682 | update tutu set id=3                     |   0.1975 |
+-----------+----------+------------------------------------------+----------+

NOt sure why this doesn’t work – would think INNODB_TRX has all the transactions but apparently not ?

select 
      t.thread_id,  
      s.EVENT_ID, 
      REPLACE(REPLACE(substr(s.SQL_TEXT,1,40),'\r',''),'\n','') thread_proc_info_short,    
      s.TIMER_WAIT/100000000000 time
from  
      performance_schema.events_transactions_current t
join  performance_schema.events_statements_history_long s on s.NESTING_EVENT_ID = t.EVENT_ID
where t.thread_id  in  ( select trx_mysql_thread_id  from  information_schema.INNODB_TRX )
  and s.NESTING_EVENT_TYPE='TRANSACTION'
order by  t.thread_id, s.EVENT_ID;
Empty set (2.00 sec)

This works

select 
      t.thread_id,  
      s.EVENT_ID, 
      REPLACE(REPLACE(substr(s.SQL_TEXT,1,40),'\r',''),'\n','') thread_proc_info_short,    
      s.TIMER_WAIT/100000000000 time
from  
      performance_schema.events_transactions_current t
join  performance_schema.events_statements_history s on s.NESTING_EVENT_ID = t.EVENT_ID
where t.thread_id  in  ( select   distinct THREAD_ID from performance_schema.data_locks)
  and s.NESTING_EVENT_TYPE='TRANSACTION'
order by  t.thread_id, s.EVENT_ID;

+-----------+----------+------------------------------------------+----------+
| thread_id | EVENT_ID | thread_proc_info_short                   | time     |
+-----------+----------+------------------------------------------+----------+
|    941942 |  1157846 | update toto set id=4                     |   0.0044 |
|    941942 |  1157944 | select    t.thread_id,    s.EVENT_ID,    |   2.1082 |
|    941942 |  1158151 | update toto set id=9                     |   0.2056 |
|    941942 |  1158259 | select    t.thread_id,    s.EVENT_ID,    |   1.1989 |
|    941942 |  1158433 | select BLOCKING_THREAD_ID from performan |   0.2601 |
|    941942 |  1158599 | select       t.thread_id,        s.EVE   |  57.5793 |
|    941942 |  2602390 | select       t.thread_id,        s.EVE   |  18.5984 |
|    942146 |     6245 | update titi set id=2                     |   0.0089 |
|    942146 |     6354 | update toto set id=999                   | 508.0915 |
|    942146 |     6455 | SET SESSION innodb_lock_wait_timeout =10 |   0.0032 |
|    949290 |     5682 | update tutu set id=3                     |   0.1975 |
+-----------+----------+------------------------------------------+----------+
11 rows in set (3.85 sec)


Could select by thread_id but that gives all the SQL and not just SQL in transaction

 select 
       s.thread_id,  
       s.EVENT_ID, 
       REPLACE(REPLACE(substr(s.SQL_TEXT,1,40),'\r',''),'\n','') thread_proc_info_short,    
       s.TIMER_WAIT/100000000000 time
 from 
 performance_schema.events_statements_history s where THREAD_ID=942146;

-----------+----------+------------------------------------------+---------+
| thread_id | EVENT_ID | thread_proc_info_short                   | time    |
+-----------+----------+------------------------------------------+---------+
|    941942 |  1158259 | select    t.thread_id,    s.EVENT_ID,    |  1.1989 |
|    941942 |  1158433 | select BLOCKING_THREAD_ID from performan |  0.2601 |
|    941942 |  1158599 | select       t.thread_id,        s.EVE   | 57.5793 |
|    941942 |  2602390 | select       t.thread_id,        s.EVE   | 18.5984 |
|    941942 |  2604424 | select       t.thread_id,        s.EVE   | 36.9560 |
|    941942 |  1157736 | SET SESSION innodb_lock_wait_timeout =10 |  0.0152 |
|    941942 |  1157755 | start transaction                        |  0.1609 |
|    941942 |  1157846 | update toto set id=4                     |  0.0044 |
|    941942 |  1157944 | select    t.thread_id,    s.EVENT_ID,    |  2.1082 |
|    941942 |  1158151 | update toto set id=9                     |  0.2056 |
+-----------+----------+------------------------------------------+---------+

 

reference:

https://www.bobbydurrettdba.com/2019/02/12/one-query-in-mysql-performance-schema/

Uncategorized

Protected: DataDog – the marriage of ASH, Query Stats and UI

March 23rd, 2022
Enter your password to view comments.

This content is password protected. To view it please enter your password below:

Uncategorized

MySQL waits and CPU

March 23rd, 2022

Question: how do I know that one user is waiting for a lock and another  user is burning CPU on MySQL?

i.e. how can I create active session history on MySQL like on Oracle?

Two simple cases I’m looking at
  1. session waiting on lock
    1. The wait event is ” wait/io/table/sql/handler” which happens for a number of potential reasons
    2. -> I can join to information_schema views INNODB_LOCK_WAITS & INNODB_TRX but this is heavy, and I’d rather only access those after I know there is a lock waiter.
  2. session burning CPU
    1. How do I know from performance schema views that the connection is active and burning CPU?
    2. It has a wait event that shows up wait/synch/mutex/sql/THD::LOCK_thd_data
    3. on Oracle to determine CPU, we just look at connections that are active and not waiting.
Also what is NESTING_EVENT_TYPE?  I see the same thread sometime twice in events_waits_current with NESTING_EVENT_TYPE STAGE and WAIT.
On some systems for the lock test below the NESTING_EVENT_TYPE shows up as NULL and others as STAGE.

 

I basically want to monitor user activity and collect who is active and if they are waiting for something or if they are on CPU.

Row Lock

First user
             start transaction;
             update toto set id=1;
Second user does
             update toto set id=4; -- hangs

How do I see second user is waiting for a row lock? Here is output from

  • performance_schema.threads
  • performance_schema.events_waits_current

For a connection waiting on a lock:

select * from performance_schema.threads  where thread_id=5329\G
           THREAD_ID: 5329
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 5223
   PROCESSLIST_USER: root
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: kyle
PROCESSLIST_COMMAND: Query
   PROCESSLIST_TIME: 782
  PROCESSLIST_STATE: updating
   PROCESSLIST_INFO: update toto set id=4
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: SSL/TLS
       THREAD_OS_ID: 2088
     RESOURCE_GROUP: USR_default
     
select * from performance_schema.events_waits_current  where thread_id=5329\G
*************************** 1. row ***************************
            THREAD_ID: 5329
             EVENT_ID: 740239187
         END_EVENT_ID: NULL
           EVENT_NAME: wait/io/table/sql/handler
               SOURCE: handler.cc:2991
          TIMER_START: 45606173945401364
            TIMER_END: 46413477812683064
           TIMER_WAIT: 807303867281700
                SPINS: NULL
        OBJECT_SCHEMA: kyle
          OBJECT_NAME: toto
           INDEX_NAME: NULL
          OBJECT_TYPE: TABLE
OBJECT_INSTANCE_BEGIN: 139872327141152
     NESTING_EVENT_ID: 740239186
   NESTING_EVENT_TYPE: STAGE
            OPERATION: fetch
      NUMBER_OF_BYTES: 61

CPU

Have user running a procedure that just burns CPU running “call minute_rollup(100000000)”.  Here is output from
  • performance_schema.threads
  • performance_schema.events_waits_current
 select * from performance_schema.threads where thread_id= 6273  \G
 *************************** 1. row ***************************
          THREAD_ID: 6273
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 6167
   PROCESSLIST_USER: root
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: kyle
PROCESSLIST_COMMAND: Query
   PROCESSLIST_TIME: 668
  PROCESSLIST_STATE: Opening tables
   PROCESSLIST_INFO: call minute_rollup(100000000)
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: SSL/TLS
       THREAD_OS_ID: 2087
     RESOURCE_GROUP: USR_default
     
    select * from performance_schema.events_waits_current  where thread_id=6273\G
             THREAD_ID: 6273
             EVENT_ID: 403685579
         END_EVENT_ID: 403685579
           EVENT_NAME: wait/synch/mutex/sql/THD::LOCK_thd_data
               SOURCE: sql_class.h:4142
          TIMER_START: 46300974764007954
            TIMER_END: 46300974764070510
           TIMER_WAIT: 62556
                SPINS: NULL
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: NULL
           INDEX_NAME: NULL
          OBJECT_TYPE: NULL
OBJECT_INSTANCE_BEGIN: 139871988613512
     NESTING_EVENT_ID: 403685578
   NESTING_EVENT_TYPE: STATEMENT
            OPERATION: lock
      NUMBER_OF_BYTES: NULL
                FLAGS: NULL

Procedure to burn CPU:

DELIMITER $$
DROP PROCEDURE IF EXISTS minute_rollup$$
CREATE PROCEDURE minute_rollup(input_number INT)
BEGIN
DECLARE counter int;
DECLARE out_number float;
set counter=0;
WHILE counter <= input_number DO
SET out_number=SQRT(rand());
SET counter = counter + 1;
END WHILE;
END$$
DELIMITER ;

call minute_rollup(100000000);

I think the trick to see if the connection is on CPU is performance_schema.events_waits_current.EVENT_ID= END_EVENT_ID then we are on CPU

 

select
       t.thread_id,
       IF(w.EVENT_ID=w.END_EVENT_ID,'CPU', w.EVENT_NAME) AS wait_event,
       REPLACE( REPLACE(substr(t.PROCESSLIST_info ,1,80), '\r', '') , '\n','') sql_text 
from
           performance_schema.threads t
 left join performance_schema.events_waits_current w on w.thread_id = t.thread_id
where 
        t.PROCESSLIST_STATE IS NOT NULL
 and ( t.PROCESSLIST_COMMAND != 'Sleep' and t.PROCESSLIST_COMMAND != 'Daemon')
 and t.PROCESSLIST_ID != CONNECTION_ID();

+-----------+-------------------------------------------+----------------------------------------------------------------------------------+
| thread_id | wait_event                                | sql_text                                                                         |
+-----------+-------------------------------------------+----------------------------------------------------------------------------------+
| 518002247 | CPU                                       | SELECT anon_1.anon_2_anon_3_customer_id_ AS anon_1_anon_2_anon_3_customer_id_, a |
| 518007382 | CPU                                       | COMMIT                                                                           |
| 518006817 | wait/io/table/sql/handler                 | SELECT anon_1.anon_2_customer_id_ AS anon_1_anon_2_customer_id_, anon_1.anon_2_c |
| 518007025 | wait/io/table/sql/handler                 | update customer            set name=name            where id_ in (2, 1);         |
| 518007027 | wait/synch/cond/sql/Item_func_sleep::cond | select sleep(20)                                                                 |
| 518007028 | wait/io/table/sql/handler                 | update customer            set name=name            where id_ in (8, 4);         |
| 518007029 | wait/io/table/sql/handler                 | update customer            set name=name            where id_ in (6, 3);         |
| 518007032 | wait/synch/cond/sql/Item_func_sleep::cond | select sleep(20)                                                                 |
| 518007034 | wait/io/table/sql/handler                 | SELECT anon_1.anon_2_customer_id_ AS anon_1_anon_2_customer_id_, anon_1.anon_2_c |
| 518007035 | wait/synch/cond/sql/Item_func_sleep::cond | select sleep(20)                                                                 |
| 518007039 | wait/io/table/sql/handler                 | update customer            set name=name            where id_ in (5, 3);         |
| 518005220 | wait/io/table/sql/handler                 | SELECT anon_1.customer_id_ AS anon_1_customer_id_, anon_1.customer_name AS anon_ |
| 518007058 | wait/io/table/sql/handler                 | update customer            set name=name            where id_ in (4, 2);         |
| 518004406 | wait/io/table/sql/handler                 | SELECT anon_1.anon_2_customer_id_ AS anon_1_anon_2_customer_id_, anon_1.anon_2_c |
| 518003240 | wait/io/table/sql/handler                 | update toto set id=4                                                             |
+-----------+-------------------------------------------+----------------------------------------------------------------------------------+
 

One unusual thing is that the thread can show up more than once in events_waits_current.

 select
    t.thread_id,
    w.NESTING_EVENT_TYPE,
    IF(w.EVENT_ID=w.END_EVENT_ID,'CPU', w.EVENT_NAME)  AS wait_event,
    REPLACE(    REPLACE(substr(t.PROCESSLIST_info ,1,80), '\r', '') , '\n','') sql_text 
from
               performance_schema.threads t
    left join  performance_schema.events_waits_current w on w.thread_id = t.thread_id
where 
        t.PROCESSLIST_STATE IS NOT NULL
  and ( t.PROCESSLIST_COMMAND != 'Sleep' and t.PROCESSLIST_COMMAND != 'Daemon')
  and   t.PROCESSLIST_ID != CONNECTION_ID()

+-----------+--------------------+-------------------------------------------+----------------------------------------------------------------------------------+
| thread_id | NESTING_EVENT_TYPE | wait_event                                | sql_text                                                                         |
+-----------+--------------------+-------------------------------------------+----------------------------------------------------------------------------------+
| 518010010 | STATEMENT          | wait/io/table/sql/handler                 | SELECT customer.id_ AS customer_id_, customer.name AS customer_name, customer.de |
| 518010010 | WAIT               | wait/io/file/sql/hash_join                | SELECT customer.id_ AS customer_id_, customer.name AS customer_name, customer.de |
| 518010011 | STATEMENT          | wait/io/table/sql/handler                 | SELECT customer.id_ AS customer_id_, customer.name AS customer_name, customer.de |
| 518010011 | WAIT               | wait/io/file/sql/hash_join                | SELECT customer.id_ AS customer_id_, customer.name AS customer_name, customer.de |

In the case of NESTING_EVENT_TYPE of type WAIT, I want that WAIT. IF there is no NESTING_EVENT_TYPE of type WAIT and only of type STATEMENT , then I want that wait (AFAICT) leading to a bit of a convoluted statement

select
    t.thread_id,
    COALESCE( ww.NESTING_EVENT_TYPE, ws.NESTING_EVENT_TYPE) NESTING_EVENT_TYPE,
    IF(ws.EVENT_ID=ws.END_EVENT_ID,'CPU',
       COALESCE( ww.EVENT_NAME, ws.EVENT_NAME) )  AS wait_event,
    REPLACE(REPLACE(substr(t.PROCESSLIST_info,1,80),'\r',''),'\n','') sql_text 
from
                performance_schema.threads t
    left   join  performance_schema.events_waits_current ww on ww.thread_id = t.thread_id
         and ww.NESTING_EVENT_TYPE = 'WAIT'
    left   join  performance_schema.events_waits_current ws on ws.thread_id = t.thread_id
            and ws.NESTING_EVENT_TYPE = 'STATEMENT'
where 
        t.PROCESSLIST_STATE IS NOT NULL
  and ( t.PROCESSLIST_COMMAND != 'Sleep' and t.PROCESSLIST_COMMAND != 'Daemon')
  and   t.PROCESSLIST_ID != CONNECTION_ID();
+-----------+--------------------+-------------------------------------------+----------------------------------------------------------------------------------+
| thread_id | NESTING_EVENT_TYPE | wait_event                                | sql_text                                                                         |
+-----------+--------------------+-------------------------------------------+----------------------------------------------------------------------------------+
| 518009610 | STATEMENT          | wait/io/table/sql/handler                 | SELECT anon_1.anon_2_anon_3_customer_id_ AS anon_1_anon_2_anon_3_customer_id_, a |
| 518010002 | STATEMENT          | CPU                                       | SELECT current_schema, sql_text, digest, digest_text, timer_start, @startup_time |
| 518003240 | STATEMENT          | wait/io/table/sql/handler                 | update toto set id=4                                                             |
| 518010412 | STATEMENT          | wait/io/table/sql/handler                 | SELECT anon_1.anon_2_customer_id_ AS anon_1_anon_2_customer_id_, anon_1.anon_2_c |
| 518010010 | WAIT               | wait/io/file/sql/hash_join                | SELECT customer.id_ AS customer_id_, customer.name AS customer_name, customer.de |
| 518010417 | WAIT               | wait/io/file/sql/hash_join                | SELECT anon_1.customer_id_ AS anon_1_customer_id_, anon_1.customer_name AS anon_ |
+-----------+--------------------+-------------------------------------------+----------------------------------------------------------------------------------+

It gets more confusing because in some cases there are multiple lines (only seen two, not more ) for the same thread with the same NESTING_EVENT_TYPE ( NULL in the below example)

SELECT
        threads.PROCESSLIST_ID,
         threads.PROCESSLIST_USER,
         threads.PROCESSLIST_HOST,
         threads.PROCESSLIST_COMMAND,
         threads.PROCESSLIST_STATE,
         current_events.TIMER_START,
         current_events.TIMER_END,
         current_events.TIMER_WAIT,
         current_events.NESTING_EVENT_TYPE,
         REPLACE(REPLACE(substr(current_events.SQL_TEXT,1,40),'\r',''),'\n','')  stmt_sql_text_short,
         REPLACE(REPLACE(substr(threads.PROCESSLIST_info,1,40),'\r',''),'\n','') thread_proc_info_short,
         current_events.CURRENT_SCHEMA,
         IF ( threads.PROCESSLIST_STATE ='User sleep','User sleep', (
            IF(current_waits.EVENT_ID=current_waits.END_EVENT_ID,'CPU', current_waits.EVENT_NAME)
          )) AS wait_event
 FROM
                    performance_schema.threads AS threads
         INNER JOIN performance_schema.events_statements_current AS current_events ON threads.THREAD_ID = current_events.THREAD_ID
         INNER JOIN performance_schema.events_waits_current AS current_waits ON threads.THREAD_ID = current_waits.THREAD_ID
  Where threads.PROCESSLIST_STATE is not NULL


+------+-----------+-------------+-----------------+--------------------+---------------------------------------------------------------------------------------------+
| PID  | PLIST_CMD | PLIST_STATE |  TIMER_WAIT     | NESTING_EVENT_TYPE |  thread_proc_info_short                   | wait_event                                       |
+------+-----------+-------------+-----------------+--------------------+---------------------------------------------------------------------------------------------+
| 8539 | Query     | executing   |    445524154000 | NULL               |  INSERT INTO order_status_change (dbm_ord | wait/io/table/sql/handler                        |
| 8539 | Query     | executing   |    445580677000 | NULL               |  INSERT INTO order_status_change (dbm_ord | CPU                                              |

| 8820 | Query     | init        |       170936000 | NULL               |  SELECT * from dbm_user WHERE id = 2404   | idle                                             |
| 8820 | Query     | init        |       170936000 | NULL               |  SELECT * from dbm_user WHERE id = 2404   | wait/io/socket/sql/client_connection             |

| 8730 | Query     | executing   |    595809657000 | NULL               |  INSERT INTO order_status_change (dbm_ord | wait/io/table/sql/handler                        |
| 8730 | Query     | executing   |    595869870000 | NULL               |  INSERT INTO order_status_change (dbm_ord | wait/synch/sxlock/innodb/lock_sys_global_rw_lock |


 

Uncategorized

Postgres performance gone south but nothing changed !

July 14th, 2021

Postgres seems pretty cool as open source databases go. I’m finding a much greater affinity for Postgres than say MySQL.

On the other hand, coming from an Oracle background there are a few things that come across as strange on Postgres. Most of the weirdness for me centers around MVCC,

Here is one new surprise for me.

I’m running a steady workload on a set of tables with inserts, updates, deletes and selects. This is putting a decent load on the database.

Then at 20:45, I simply do:


\set AUTOCOMMIT off
\echo :AUTOCOMMIT
create table toto ( id int);
insert into toto values (1);
commit;
select * from toto for update;

and throughput goes down and CPU load goes up.

 

At 20:45 “idle_in_transaction_max_time”, the last time series graph below in yellow starts increasing until 21:04  when I do a “commit” after the “select * from toto for update;”.

Between 20:45 and  21:04 “idle_in_transaction_max_time” increases, the DB Load (last chart) CPU time increases and throughout “Transactions.xact_commit” goes down,  the first graph.

 

 

image (3)

image (4)

 

In Postgres, dead rows are left in blocks until the vacuum process cleans them out.

The vacuum process can only clean out dead rows older than the oldest open transaction.

Thus if there is an open transaction and then new transactions doing deletes and inserts leaving dead rows, they will pile up because vacuum can’t clean them out.

 

Uncategorized

Covering Indexes in Postgres

June 19th, 2020

Covering indexes are where all the columns needed are in the index and, in theory, only the index needs to be accessed. This seemingly straight forward strategy is much tricker on Postgres.

Let’s take TPC-H query 6:

select
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
            l_shipdate >= date '1996-01-01'
        and l_shipdate < cast(date '1996-01-01' + interval '1 year' as date)
        and l_discount between 0.06 - 0.01 and 0.06 + 0.01
        and l_quantity < 24;

we’ve got 3 predicate filter columns

  • l_shipdate
  • l_discount
  • l_quantity

None of them are that selective but all three are pretty good

select count(*) from
        lineitem
where
            l_shipdate >= date '1996-01-01'
        and l_shipdate < cast(date '1996-01-01' + interval '1 year' as date)

→ 1,823,373

select count(*) from
        lineitem
where l_discount between 0.06 - 0.01 and 0.06 + 0.01;

→ 3,272,056

select count(*) from
        lineitem
where l_quantity < 24;

→ 5,517,389

select count(*) from
        lineitem
where
            l_shipdate >= date '1996-01-01'
        and l_shipdate < cast(date '1996-01-01' + interval '1 year' as date)
        and l_discount between 0.06 - 0.01 and 0.06 + 0.01
        and l_quantity < 24;

→ 228,160

select count(*) from lineitem;

→ 11,997,996

228,160/11,997,996 = 1.9%

 

With the index added on all 3 fields, the optimizer doesn’t even use it!

https://explain.depesz.com/s/OG27

I forced the optimizer to use the index (set enable_seqscan=off;) , then yes, the improvement is small, from ~800+ ms to 600+ ms mainly because the index results have to go back to the table.

https://explain.depesz.com/s/bJMg

I made a covering index with all the fields in the query and it still goes back to the table!

https://explain.depesz.com/s/92mV

Now, if I do a vacuum and sure enough, the covering index works and get a 6x speedup!   ( 870ms to 140ms )

https://explain.depesz.com/s/fdy5

What does the vacuum have to do with the covering index?

From Jim Nasby:

The issue with covering indexes is that you still have to visit the heap page, unless that page happens to be marked as being all visible. Pages can be marked as all visible by autovacuum (by default runs once 10% of rows in a table have been updated or deleted) or a manual vacuum. They can only be marked visible if all live rows on the page are older than the oldest running transaction in the system.

In other words, effectiveness of this technique is highly dependent on the workload.

 

Reference

https://www.percona.com/blog/2018/08/06/basic-understanding-bloat-vacuum-postgresql-mvcc/

TL;DR postgres keeps undo in the data blocks. Vacuum clears out old unnecessary rows that are “dead”

https://bitnine.net/blog-useful-information/index-only-scan-and-visibility-map-12/

“Because there is no information for version in an index tuple, it is impossible to judge whether a tuple is readable in the current transaction or not”

https://blog.makandra.com/2018/11/investigating-slow-postgres-index-only-scans/

Replica + hot_standby_feedback can affect index only scan

 

Uncategorized

Granularity impact on visualization

May 22nd, 2020

The longer the timeline we show in the graph, either the more points we show or the larger granularity we use.

For example if I show 1 hour of data, then I can show 60 points of 1 minute each granularity.

If I showed a day at 1 minute each then that would be 1440 points.

Say we wanted to limit the points to less than 300 points so our UI stays responsive no matter the time scale.

The for 24 hours I can show 5 minute granularity which is  288 points

If  I show 25 hours  that’s 300 points at 5 minute granularity so we are no longer under 300 points.

At this time I’d have to jump to the next level of granularity. In my case the next step up is 1 hour.

Using one hour granularity will look different than 5 minutes.

The following is roughly a day at 5 minute granularity first and 1 hour granularity second.
image(17)
a

image(18)

The most common affect of larger granularity is loosing the spikes i.e the 5 minute granularity could have highs and lows that get averaged out at larger granularity.

What is noticeable is that the 1 hour granularity above is that it is spiky. The highs and lows are roughly the same. The spikiness comes from the fact that the above graphs are using points. Each point represents the average value for the width of the granularity, but since we are using points, the actual width of the granularity is not visible. What would be more accurate is using bars for each “point” where the bar was the width of the granularity.

 

Here are a couple of examples from the tool DB Optimizer were the width of the bar represents the granularity:

Screen Shot 2020-05-22 at 10.41.44 AM

 

 

Screen Shot 2020-05-22 at 10.41.23 AM

Uncategorized

Where to add Index on Postgres

March 30th, 2020

 

Summary

Goal in this post is to layout a methodology that can be implemented to scan the explain plans and identify opportunities to optimize the sql execution with additional indexes.

The missing index opportunities outlined here are

  1. Indexes that would avoid full table scan with a predicate filter that filters out most of the returned rows
  2. Joins that are missing index on the join field
  3. Index range scans that could be made more efficient by addition of a field to existing index

Also for future consideration, but not yet outline here

  • Recommending covering index where the addition of fields to an index avoids table access all together
  • Indexes that support the order by clause

Basic procedure is to take the output from “explain” command.
In the explain output only look at nodes that are at least say 10-30% or more of total cost of the whole query.
Look for full table scans , these are nodes with the following entry : “Node Type”: “Seq Scan”
If it’s in the second node of a nested loop, suggest an index on the join field
If it’s is the most expensive node of a Hash join suggest an index on the join field
If it’s the most expense node of Merge Join suggest an index on the join field.
If is followed by a filter, suggest an index on the filter and join column.

If if the filter is not the most expensive node in a HJ or it’s the first node in a NL, and that node is expensive and the filter is expected to return X% ( for example 10% ) or less of the table after applying the filter, then suggest an index on that filter column.
Additionally Index range scans that could be improved by adding fields to the index will require using “explain analyze”. With explain analyze we can see how many rows were filtered out after the index access.

Get plan with “Explain” and text from SQL statement

EXPLAIN (COSTS true, FORMAT json, verbose true)
select id from hypg1 where id = 2006;
                QUERY PLAN                 
-------------------------------------------
 [                                        +
   {                                      +
     "Plan": {                            +
       "Node Type": "Gather",             +
       "Parallel Aware": false,           +
       "Startup Cost": 1000.00,           +
       "Total Cost": 10633.43,            +
       "Plan Rows": 1,                    +
       "Plan Width": 4,                   +
       "Output": ["id"],                  +
       "Workers Planned": 2,              +
       "Single Copy": false,              +
       "Plans": [                         +
         {                                +
           "Node Type": "Seq Scan",       +
           "Parent Relationship": "Outer",+
           "Parallel Aware": true,        +
           "Relation Name": "hypg1",      +
           "Schema": "public",            +
           "Alias": "hypg1",              +
           "Startup Cost": 0.00,          +
           "Total Cost": 9633.33,         +
           "Plan Rows": 1,                +
           "Plan Width": 4,               +
           "Output": ["id"],              +
           "Filter": "(hypg1.id = 2006)"  +
         }                                +
       ]                                  +
     }                                    +
   }                                      +
 ]

 

Cost in Explain Plan

In all cases we should sort nodes by cost and only look at nodes that take up a certain percentage or more of the cost.
Each node has a “cost”. We need to extract the “total cost” at each node in the plan and compare that to the “total cost” of the whole query (top node) and calculate % of the “total cost” at a node compared whole query and only look at nodes that represent some value say 10% or more of total cost. Costs include the sum of all indented nodes, so to get the per node time or cost we have subtract out the child nodes.

If we can run “explain analyze” we can get the actual time per node. The command “explain analyze” actually runs the query . I would say we should plan on “explain analyze” with certain constraints. If the query is already being run multiple times a minute, then running it once more for tuning information is probably worth it. Time is just like cost where time includes times of child nodes, so we have subtract out the child nodes to see the time elapsed at every node.

Here is an example of the costs at nodes and showing the child nodes are subcomponents of the parent nodes cost

 

[                                                               +
   {                                                             +
     "Plan": {                                                   +
       "Total Cost": 22308.54,                                   +
       "Plans": [                                                +
         {                                                       +
           "Total Cost": 11675.10,                               +
             {                                                   +
               "Total Cost": 10675.00,                           +
             }                                                   +
         },                                                      +
         {                                                       +
           "Total Cost": 10633.43,                               +
             {                                                   +
               "Total Cost": 9633.33,                            +
             }                                                   +
           ]                                                     +
         }                                                       +
       ]                                                         +
     }                                                           +
   }                                                             +
 ]

for example, subtracting out the child node costs to get the per node costs

[                                                               +
   {                                                             +
     "Plan": {                                                   +
       "Total Cost": 22308.54,  --  node cost 22308.54 - (11675.10+10633.43) = 0.01
       "Plans": [                                                +
         {                                                       +
           "Total Cost": 11675.10, --  node cost 11675.10 - 10675.00 = 1000.1
             {                                                   +
               "Total Cost": 10675.00,                           +
             }                                                   +
         },                                                      +
         {                                                       +
           "Total Cost": 10633.43, --  node cost   10633.43 - 9633.33 = 1000.1
             {                                                   +
               "Total Cost": 9633.33,                            +
             }                                                   +
           ]                                                     +
         }                                                       +
       ]                                                         +
     }                                                           +
   }                                                             +
 ]

Predicate filters missing indexes

For predicate filters that can use indexes scan the explain plan for the following string

           "Node Type": "Seq Scan",       +

and see if they are followed by a “Filter”.

           "Total Cost": 9633.33,         +
           "Plan Rows": 1,                +
           "Filter": "(hypg1.id = 2006)"  +

Verify that the node is a significant cost of overall cost of query. This node takes up most to the cost of the plan 9633.33 out of 10633.43 so it is the most important node to optimize.
We should only look at nodes that take up a certain percentage of total cost, say 10% or more. ( value should be configurable in our code, as it is debatable where the cut off point is)

We can compare rows returned (“Plan Rows”) to total rows in the table statistics
If the filter eliminates a large amount of rows, then it is a candidate.
The following query had to be run in the database that the table is in, thus have to collect database as part of the sampling of the load data.

SELECT reltuples FROM pg_class WHERE relname = 'hypg1';

reltuples
-----------
 1e+06

% filtered = 100 * ( 1e+06 – 1 ) / 1e+06 = 99.99% rows are filtered out, so it’s the perfect candidate.

We should only recommend index the filter returns 10% or less of the table (i.e. 90% of rows returned from full table scanned are filtered out) and in the advisor recommendation for the index we should indicate much of the table is returned via the filter. We can give some recommendations based on the % for example
low recommendation 5%-10% of table returned after apply filter
medium recommendation 1%-5% of table returned after apply filter
high recommendation < 1% of table returned after apply filter

The filter column is hypg1.id thus we suggest an index on hypg1.id

Joins missing indexes

Using the following query in the examples below

explain (FORMAT json)
select max (t1.data)
from t1, t2
where t1.id = t2.id
and t1.clus = 10
and t2.val > 1001
;

HASH JOIN

Look for nodes of type “Hash Cond”: ” that are followed by a “Seq Scan”
If we find a “Seq Scan” is preceded by a Hash JOIN and cost is a significant amount of total query, then suggest an index on the join column
If both nodes of the hash join have Seq Scans , suggest the index join column from the more expensive node.
If that node also contains a filter, suggest including the filter in the index.

There is “Node Type”: “Seq Scan”
table “Relation Name”: “t2″,
cost “Total Cost”: 13512.67″
looking the hash join above the Seq Scan node, the join condition is

"Hash Cond": "(t2.id = t1.id)",  

suggest index on t2.id

              "Plans": [                                                 +
                 {                                                        +
                   "Node Type": "Hash Join",                              +
                   "Parent Relationship": "Outer",                        +
                   "Parallel Aware": false,                               +
                   "Join Type": "Inner",                                  +
                   "Startup Cost": 11.40,                                 +
                   "Total Cost": 15086.97,                                +
                   "Plan Rows": 41,                                       +
                   "Plan Width": 33,                                      +
                   "Output": ["t1.data"],                                 +
                   "Inner Unique": false,                                 +
                   "Hash Cond": "(t2.id = t1.id)",                        +
                   "Plans": [                                             +
                     {                                                    +
                       "Node Type": "Seq Scan",                           +
                       "Parent Relationship": "Outer",                    +
                       "Parallel Aware": true,                            +
                       "Relation Name": "t2",                             +
                       "Schema": "public",                                +
                       "Alias": "t2",                                     +
                       "Startup Cost": 0.00,                              +
                       "Total Cost": 13512.67,                            +
                       "Plan Rows": 416667,                               +
                       "Plan Width": 4,                                   +
                       "Output": ["t2.id", "t2.clus", "t2.val", "t2.data"]+
                     },

Full explain

total cost of query is 41534.23
HJ node cost is 40532.27 on condition “Hash Cond”: “(t2.id = t1.id)”,
Most expensive node is second node of hash join, “Total Cost”: 24346.00
Second node , the expensive node, is doing full scan on “Relation Name”: “t1″
Thus suggest an index on t1.id, the join condition of the hash join.
The T1 node also contains a filter that returns “Plan Rows”: 5000,” i.e. 5000 rows of 1M in the table ( we look up total rows in pg_class) . The filter is on “Filter”: “((clus) = 10)” so we suggest an index on the combine id & clus

recommend index on t1 ( id, clus)

                                                         +
 [                                                        +
   {                                                      +
     "Plan": {                                            +
       "Node Type": "Aggregate",                          +
       "Strategy": "Plain",                               +
       "Partial Mode": "Finalize",                        +
       "Parallel Aware": false,                           +
       "Startup Cost": 41534.23,                          +
       "Total Cost": 41534.24,                            +
       "Plan Rows": 1,                                    +
       "Plan Width": 32,                                  +
       "Plans": [                                         +
         {                                                +
           "Node Type": "Gather",                         +
           "Parent Relationship": "Outer",                +
           "Parallel Aware": false,                       +
           "Startup Cost": 41534.01,                      +
           "Total Cost": 41534.22,                        +
           "Plan Rows": 2,                                +
           "Plan Width": 32,                              +
           "Workers Planned": 2,                          +
           "Single Copy": false,                          +
           "Plans": [                                     +
             {                                            +
               "Node Type": "Aggregate",                  +
               "Strategy": "Plain",                       +
               "Partial Mode": "Partial",                 +
               "Parent Relationship": "Outer",            +
               "Parallel Aware": false,                   +
               "Startup Cost": 40534.01,                  +
               "Total Cost": 40534.02,                    +
               "Plan Rows": 1,                            +
               "Plan Width": 32,                          +
               "Plans": [                                 +
                 {                                        +
                   "Node Type": "Hash Join",              +
                   "Parent Relationship": "Outer",        +
                   "Parallel Aware": false,               +
                   "Join Type": "Inner",                  +
                   "Startup Cost": 24408.50,              +
                   "Total Cost": 40532.27,                +
                   "Plan Rows": 695,                      +
                   "Plan Width": 33,                      +
                   "Inner Unique": false,                 +
                   "Hash Cond": "(t2.id = t1.id)",        +
                   "Plans": [                             +
                     {                                    +
                       "Node Type": "Seq Scan",           +
                       "Parent Relationship": "Outer",    +
                       "Parallel Aware": true,            +
                       "Relation Name": "t2",             +
                       "Alias": "t2",                     +
                       "Startup Cost": 0.00,              +
                       "Total Cost": 15596.00,            +
                       "Plan Rows": 138889,               +
                       "Plan Width": 4,                   +
                       "Filter": "((val + 0) > 1001)"     +
                     },                                   +
                     {                                    +
                       "Node Type": "Hash",               +
                       "Parent Relationship": "Inner",    +
                       "Parallel Aware": false,           +
                       "Startup Cost": 24346.00,          +
                       "Total Cost": 24346.00,            +
                       "Plan Rows": 5000,                 +
                       "Plan Width": 37,                  +
                       "Plans": [                         +
                         {                                +
                           "Node Type": "Seq Scan",       +
                           "Parent Relationship": "Outer",+
                           "Parallel Aware": false,       +
                           "Relation Name": "t1",         +
                           "Alias": "t1",                 +
                           "Startup Cost": 0.00,          +
                           "Total Cost": 24346.00,        +
                           "Plan Rows": 5000,             +
                           "Plan Width": 37,              +
                           "Filter": "((clus) = 10)"      +
                         }                                +
                       ]                                  +
                     }                                    +
                   ]                                      +
                 }                                        +
               ]                                          +
             }                                            +
           ]                                              +
         }                                                +
       ]                                                  +
     }                                                    +
   }                                                      +                                                   +

Nested Loops

Similar we find a Nest Loops join where the second node has a Seq Scan, look to see if an index can be added to support the join. Only look at Nest Loop nodes that are a significant % of total query cost.

Most of the cost of the query is the next loops:

          "Node Type": "Nested Loop",        +
  ...
          "Total Cost": 1320240.33,          + 

The first node in the NL select rows that are looked up in the second node.
The second node is doing a full table scan:

                     "Node Type": "Seq Scan",       +
                       "Relation Name": "t2",         +
                       "Plan Rows": 1000000,          +

We want to avoid Full table scans on the second node of NL.
Every row in the first node will do a full table scan on the second node.
The join is on

                   "Join Filter": "(t1.id = t2.id)",  +

So we suggest an index on t2.id

[                                                    +
   {                                                  +
     "Plan": {                                        +
       "Node Type": "Aggregate",                      +
       "Strategy": "Plain",                           +
       "Partial Mode": "Finalize",                    +
       "Parallel Aware": false,                       +
       "Startup Cost": 1321240.65,                    +
       "Total Cost": 1321240.66,                      +
       "Plan Rows": 1,                                +
       "Plan Width": 32,                              +
       "Plans": [                                     +
         {                                            +
           "Node Type": "Gather",                     +
           "Parent Relationship": "Outer",            +
           "Parallel Aware": false,                   +
           "Startup Cost": 1321240.44,                +
           "Total Cost": 1321240.65,                  +
           "Plan Rows": 2,                            +
           "Plan Width": 32,                          +
           "Workers Planned": 2,                      +
           "Single Copy": false,                      +
           "Plans": [                                 +
             {                                        +
               "Node Type": "Aggregate",              +
               "Strategy": "Plain",                   +
               "Partial Mode": "Partial",             +
               "Parent Relationship": "Outer",        +
               "Parallel Aware": false,               +
               "Startup Cost": 1320240.44,            +
               "Total Cost": 1320240.45,              +
               "Plan Rows": 1,                        +
               "Plan Width": 32,                      +
               "Plans": [                             +
                 {                                    +
                   "Node Type": "Nested Loop",        +
                   "Parent Relationship": "Outer",    +
                   "Parallel Aware": false,           +
                   "Join Type": "Inner",              +
                   "Startup Cost": 0.00,              +
                   "Total Cost": 1320240.33,          +
                   "Plan Rows": 41,                   +
                   "Plan Width": 33,                  +
                   "Inner Unique": false,             +
                   "Join Filter": "(t1.id = t2.id)",  +
                   "Plans": [                         +
                     {                                +
                       "Node Type": "Seq Scan",       +
                       "Parent Relationship": "Outer",+
                       "Parallel Aware": true,        +
                       "Relation Name": "t1",         +
                       "Alias": "t1",                 +
                       "Startup Cost": 0.00,          +
                       "Total Cost": 14554.33,        +
                       "Plan Rows": 41,               +
                       "Plan Width": 37,              +
                       "Filter": "(clus = 10)"        +
                     },                               +
                     {                                +
                       "Node Type": "Seq Scan",       +
                       "Parent Relationship": "Inner",+
                       "Parallel Aware": false,       +
                       "Relation Name": "t2",         +
                       "Alias": "t2",                 +
                       "Startup Cost": 0.00,          +
                       "Total Cost": 19346.00,        +
                       "Plan Rows": 1000000,          +
                       "Plan Width": 4                +
                     }                                +
                   ]                                  +
                 }                                    +
               ]                                      +
             }                                        +
           ]                                          +
         }                                            +
       ]                                              +
     }                                                +
   }                                                  +

 

Merge Join

similar for a merge join

-- to create a merge join example 
 set enable_nestloop to 'off';
 SET enable_hashjoin TO off;

 

explain (FORMAT json)
select max (t1.data)
from t1, t2
where t1.id = t2.id
and t1.clus = 10
and t2.val  > 1001
;

We want to turn the merge join into a NL or HJ.
In order to do so efficiently we need an index on the join on one table.
We want to have the index on the table with the more rows to analyze.

t1 returns 5000 rows

                           "Relation Name": "t1",         +
                           "Plan Rows": 5000,             +

t2 returns 138889

                          "Relation Name": "t2",         +
                           "Plan Rows": 138889,           +

and join on T2 is on

                   "Merge Cond": "(t2.id = t1.id)",       +

thus suggest index on t2.id

  [                                                        +
   {                                                      +
     "Plan": {                                            +
       "Node Type": "Aggregate",                          +
       "Strategy": "Plain",                               +
       "Partial Mode": "Finalize",                        +
       "Parallel Aware": false,                           +
       "Startup Cost": 55741.63,                          +
       "Total Cost": 55741.64,                            +
       "Plan Rows": 1,                                    +
       "Plan Width": 32,                                  +
       "Plans": [                                         +
         {                                                +
           "Node Type": "Gather",                         +
           "Parent Relationship": "Outer",                +
           "Parallel Aware": false,                       +
           "Startup Cost": 55741.42,                      +
           "Total Cost": 55741.63,                        +
           "Plan Rows": 2,                                +
           "Plan Width": 32,                              +
           "Workers Planned": 2,                          +
           "Single Copy": false,                          +
           "Plans": [                                     +
             {                                            +
               "Node Type": "Aggregate",                  +
               "Strategy": "Plain",                       +
               "Partial Mode": "Partial",                 +
               "Parent Relationship": "Outer",            +
               "Parallel Aware": false,                   +
               "Startup Cost": 54741.42,                  +
               "Total Cost": 54741.43,                    +
               "Plan Rows": 1,                            +
               "Plan Width": 32,                          +
               "Plans": [                                 +
                 {                                        +
                   "Node Type": "Merge Join",             +
                   "Parent Relationship": "Outer",        +
                   "Parallel Aware": false,               +
                   "Join Type": "Inner",                  +
                   "Startup Cost": 54013.29,              +
                   "Total Cost": 54739.68,                +
                   "Plan Rows": 695,                      +
                   "Plan Width": 33,                      +
                   "Inner Unique": false,                 +
                   "Merge Cond": "(t2.id = t1.id)",       +
                   "Plans": [                             +
                     {                                    +
                       "Node Type": "Sort",               +
                       "Parent Relationship": "Outer",    +
                       "Parallel Aware": false,           +
                       "Startup Cost": 29360.10,          +
                       "Total Cost": 29707.32,            +
                       "Plan Rows": 138889,               +
                       "Plan Width": 4,                   +
                       "Sort Key": ["t2.id"],             +
                       "Plans": [                         +
                         {                                +
                           "Node Type": "Seq Scan",       +
                           "Parent Relationship": "Outer",+
                           "Parallel Aware": true,        +
                           "Relation Name": "t2",         +
                           "Alias": "t2",                 +
                           "Startup Cost": 0.00,          +
                           "Total Cost": 15596.00,        +
                           "Plan Rows": 138889,           +
                           "Plan Width": 4,               +
                           "Filter": "((val ) > 1001)" +
                         }                                +
                       ]                                  +
                     },                                   +
                     {                                    +
                       "Node Type": "Sort",               +
                       "Parent Relationship": "Inner",    +
                       "Parallel Aware": false,           +
                       "Startup Cost": 24653.19,          +
                       "Total Cost": 24665.69,            +
                       "Plan Rows": 5000,                 +
                       "Plan Width": 37,                  +
                       "Sort Key": ["t1.id"],             +
                       "Plans": [                         +
                         {                                +
                           "Node Type": "Seq Scan",       +
                           "Parent Relationship": "Outer",+
                           "Parallel Aware": false,       +
                           "Relation Name": "t1",         +
                           "Alias": "t1",                 +
                           "Startup Cost": 0.00,          +
                           "Total Cost": 24346.00,        +
                           "Plan Rows": 5000,             +
                           "Plan Width": 37,              +
                           "Filter": "((clus + 0) = 10)"  +
                         }                                +
                       ]                                  +
                     }                                    +
                   ]                                      +
                 }                                        +
               ]                                          +
             }                                            +
           ]                                              +
         }                                                +
       ]                                                  +
     }                                                    +
   }                                                      +

 

3. Mixed cases – filter and join indexing opportunities

If a filter is missing an index causing a full scan (i.e. “Seq Scan”) and the table being scanned is also used in a join (hash join, merge join, nested loops join), suggest an index on both the filter column and
In the following there is a Seq Scan on T2 followed by a filter.
The filter returns an estimated 375K rows or about 1/3 of the table , i.e. not a good candidate for an index
The Seq Scan is part of a HASH JOIN that represents most of the cost of the query. The hash join is on t2.id.

cost total query 16973.03
t2 access for the hash join cost: 14554.33
t2 has 1M rows ( have to collect this from table stats)
access is on t2.id , “Hash Cond”: “(t2.id = t1.id)”,

suggest index on t2 id and val to avoid the full table scan.

 [                                                        +
   {                                                      +
     "Plan": {                                            +
       "Node Type": "Aggregate",                          +
       "Strategy": "Plain",                               +
       "Partial Mode": "Finalize",                        +
       "Parallel Aware": false,                           +
       "Startup Cost": 16973.02,                          +
       "Total Cost": 16973.03,                            +
       "Plan Rows": 1,                                    +
       "Plan Width": 32,                                  +
       "Plans": [                                         +
         {                                                +
           "Node Type": "Gather",                         +
           "Parent Relationship": "Outer",                +
           "Parallel Aware": false,                       +
           "Startup Cost": 16972.81,                      +
           "Total Cost": 16973.02,                        +
           "Plan Rows": 2,                                +
           "Plan Width": 32,                              +
           "Workers Planned": 2,                          +
           "Single Copy": false,                          +
           "Plans": [                                     +
             {                                            +
               "Node Type": "Aggregate",                  +
               "Strategy": "Plain",                       +
               "Partial Mode": "Partial",                 +
               "Parent Relationship": "Outer",            +
               "Parallel Aware": false,                   +
               "Startup Cost": 15972.81,                  +
               "Total Cost": 15972.82,                    +
               "Plan Rows": 1,                            +
               "Plan Width": 32,                          +
               "Plans": [                                 +
                 {                                        +
                   "Node Type": "Hash Join",              +
                   "Parent Relationship": "Outer",        +
                   "Parallel Aware": false,               +
                   "Join Type": "Inner",                  +
                   "Startup Cost": 11.40,                 +
                   "Total Cost": 15972.72,                +
                   "Plan Rows": 37,                       +
                   "Plan Width": 33,                      +
                   "Inner Unique": false,                 +
                   "Hash Cond": "(t2.id = t1.id)",        +
                   "Plans": [                             +
                     {                                    +
                       "Node Type": "Seq Scan",           +
                       "Parent Relationship": "Outer",    +
                       "Parallel Aware": true,            +
                       "Relation Name": "t2",             +
                       "Alias": "t2",                     +
                       "Startup Cost": 0.00,              +
                       "Total Cost": 14554.33,            +
                       "Plan Rows": 375098,               +
                       "Plan Width": 4,                   +
                       "Filter": "(val > 1001)"           +
                     },                                   +
                     {                                    +
                       "Node Type": "Hash",               +
                       "Parent Relationship": "Inner",    +
                       "Parallel Aware": false,           +
                       "Startup Cost": 10.16,             +
                       "Total Cost": 10.16,               +
                       "Plan Rows": 99,                   +
                       "Plan Width": 37,                  +
                       "Plans": [                         +
                         {                                +
                           "Node Type": "Index Scan",     +
                           "Parent Relationship": "Outer",+
                           "Parallel Aware": false,       +
                           "Scan Direction": "Forward",   +
                           "Index Name": "t1_clus",       +
                           "Relation Name": "t1",         +
                           "Alias": "t1",                 +
                           "Startup Cost": 0.42,          +
                           "Total Cost": 10.16,           +
                           "Plan Rows": 99,               +
                           "Plan Width": 37,              +
                           "Index Cond": "(clus = 10)"    +
                         }                                +
                       ]                                  +
                     }                                    +
                   ]                                      +
                 }                                        +
               ]                                          +
             }                                            +
           ]                                              +
         }                                                +
       ]                                                  +
     }                                                    +
   }                                                      +

4. Index Range scans

To see if an index needs more columns to be efficient looks like ti will require “explain analyze” and not just “explain”

select count(*)
from t1
where t1.clus >= 10 and t1.clus < 15
and t1.val = 1001
;

There is an index on t1.clus and the explain shows it is used
but we don’t know how many rows were retrieved by the index vs filtered out by the Filter

[                                                        +
   {                                                      +
     "Plan": {                                            +
       "Node Type": "Aggregate",                          +
       "Strategy": "Plain",                               +
       "Partial Mode": "Simple",                          +
       "Parallel Aware": false,                           +
       "Startup Cost": 27.14,                             +
       "Total Cost": 27.15,                               +
       "Plan Rows": 1,                                    +
       "Plan Width": 8,                                   +
       "Plans": [                                         +
         {                                                +
           "Node Type": "Index Scan",                     +
           "Parent Relationship": "Outer",                +
           "Parallel Aware": false,                       +
           "Scan Direction": "Forward",                   +
           "Index Name": "t1_clus",                       +
           "Relation Name": "t1",                         +
           "Alias": "t1",                                 +
           "Startup Cost": 0.42,                          +
           "Total Cost": 27.13,                           +
           "Plan Rows": 1,                                +
           "Plan Width": 0,                               +
           "Index Cond": "((clus >= 10) AND (clus < 15))",+
           "Filter": "(val = 1001)"                       +
         }                                                +
       ]                                                  +
     }                                                    +
   }                                                      +

but it doesn’t show how many rows are retrieved by the index and how many are post processed filtered out by the Filter.
Using explain analyze does.
In the following output from “explain analyze”
Only 1 run is returned after index access and filter
but filter takes out 499 rows

"Filter": "(val = 1001)",                      +
           "Rows Removed by Filter": 499                  +

Thus suggest to add val to the existing index

           "Index Name": "t1_clus",                       +

 

[                                                        +
   {                                                      +
     "Plan": {                                            +
       "Node Type": "Aggregate",                          +
       "Strategy": "Plain",                               +
       "Partial Mode": "Simple",                          +
       "Parallel Aware": false,                           +
       "Startup Cost": 27.14,                             +
       "Total Cost": 27.15,                               +
       "Plan Rows": 1,                                    +
       "Plan Width": 8,                                   +
       "Actual Startup Time": 0.279,                      +
       "Actual Total Time": 0.279,                        +
       "Actual Rows": 1,                                  +
       "Actual Loops": 1,                                 +
       "Plans": [                                         +
         {                                                +
           "Node Type": "Index Scan",                     +
           "Parent Relationship": "Outer",                +
           "Parallel Aware": false,                       +
           "Scan Direction": "Forward",                   +
           "Index Name": "t1_clus",                       +
           "Relation Name": "t1",                         +
           "Alias": "t1",                                 +
           "Startup Cost": 0.42,                          +
           "Total Cost": 27.13,                           +
           "Plan Rows": 1,                                +
           "Plan Width": 0,                               +
           "Actual Startup Time": 0.053,                  +
           "Actual Total Time": 0.273,                    +
           "Actual Rows": 1,                              +
           "Actual Loops": 1,                             +
           "Index Cond": "((clus >= 10) AND (clus < 15))",+
           "Rows Removed by Index Recheck": 0,            +
           "Filter": "(val = 1001)",                      +
           "Rows Removed by Filter": 499                  +
         }                                                +
       ]                                                  +
     },                                                   +
     "Planning Time": 0.270,                              +
     "Triggers": [                                        +
     ],                                                   +
     "Execution Time": 0.388                              +
   }                                                      +

 

Check if INDEX we suggest already exists

look up existence of index ( https://stackoverflow.com/questions/2204058/list-columns-with-indexes-in-postgresql)

select
    t.relname as table_name,
    i.relname as index_name,
    array_to_string(array_agg(a.attname), ', ') as column_names
from
    pg_class t,
    pg_class i,
    pg_index ix,
    pg_attribute a
where
    t.oid = ix.indrelid
    and i.oid = ix.indexrelid
    and a.attrelid = t.oid
    and a.attnum = ANY(ix.indkey)
    and t.relkind = 'r'
    and t.relname like 'auth%'
group by
    t.relname,
    i.relname
order by
    t.relname,
    i.relname;
    

 table_name | index_name | column_names 
------------+------------+--------------
 authors    | authors_id | id

 

 

Creating data for test cases above

drop table t1;
 
 create table t1 ( 
    id  int,
    clus int,
    val  int,
    data VARCHAR(40)
 );
 
 insert into t1 (
    id, clus , val, data
)
  select 
  i,
  trunc(i/100),
  mod(i,10000),
  left(md5(random()::text), 40) 
from generate_series(1, 1000000) s(i);
  
select count(*) from t1 where clus = 1 ;
100
select count(*) from t1 where val =1 ;
100

create table t2 as select * from t1;  
  

explain (analyze,verbose,buffers)
select max(t1.data) from  t1, t2
where t1.id = t2.id 
and t1.clus = 1
;

Parameters that affect the explain plan

SET max_parallel_workers_per_gather = 0;
 set enable_mergejoin to 'off';
 set enable_nestloop to 'off';
 SET enable_hashjoin TO off;

Uncategorized