Archive

Archive for January, 2020

Index Advisor ? PostgreSQL, Oracle and SQL Server : yes , MySQL : no

January 30th, 2020

How does one check if there are missing indexes in a database?

Is there a programatic way to do it?

Explain Plan
Most blogs on the subject will say look at the explain plan for your query.
Basic concept is to look for full table scans in the explain plan and the look and see if there is a predicate filter (i.e. SQL text of form “where table.field=value”) with a full table scan or a full table scan in a join ( i.e. tableA.field = tableB.field). There are other considerations, things to check, and other places for indexes, but this is a good start.

Explain Plans for whole database
What if you don’t have a particular query and instead want to check the whole running database?
Is there a programatic way to do it? If the answer to finding missing indexes for one query is looking at the explain plan then by extension the answer for the whole database is looking at the explain plans for the whole database.

Execution Plan Cache
On Oracle and SQL Server the databases have a cache of execution plans (the plans actually used by the optimizer as opposed to the plan the optimizer thinks it would use the “explain” plan. Execution plans are from executions and explain plans from the explain command without executing the statement. They should be the same but for many reason the might be different).
Thus on Oracle (V$SQL_PLAN) and SQL Server ( sys.dm_exec_query_plan) the execution plans can be programmatically queried and analyzed.

Explain plan on SQL from SQL cache
What about PostgreSQL and MySQL. Neither have a query plan cache.
Both PostgreSQL (pg_stat_statements) and MySQL (events_statements_history) do have the text for executed queries. With the SQL queries one can programmatically collect the SQL queries and run the “explain” command and get the explain plan.
Another issue is getting the full SQL text. Without the full SQL text, the Explain plan can’t be run. On MySQL the SQL text length is limited by default to 1K but can be adjusted with parameter performance_schema_max_sql_text_length system variable at server startup. Postgres is also limited to 1K by default can be increased with the parameter track_activity_query_size.

Explain Analyze
For select statements (i.e. queries that don’t modify data) one could even consider running “explain analyze” which both PostgreSQL and MySQL have. Explain Analyze which runs the query to give the execution plan as well as the estimated costs and the actual execution statistics which is useful as we will discuss later.

Explain on SQL with bind variables
Unfortunately the Explain command breaks with SQL that have variable parameters. In order to get Explain to work requires substituting real values for variables.

Bind variable capture
Bind variables can be captured on PostgreSQL with extensions like PG_qualstats. PG_qualstats also captures other information that is perfect for finding missing indexes.
Bind variable capture is not possible on MySQL even with slow query log.

So on MySQL without execution plans caches and without bind variables we can’t programatic index advisor on MySQL.

Uncategorized

SQLcmd on my Mac: Seg fault

January 23rd, 2020

trying to connect to SQL Server from my Mac so install sqlcmd:

brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
brew install mssql-tools

then get error

$ sqlcmd -S kylelfsqls.xxxxxx.us-west-2.amazonaws.com,1433 -U kylelf  
Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : SSL Provider: [OpenSSL library could not be loaded, make sure OpenSSL 1.0 or 1.1 is installed].
Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : Client unable to establish connection.
Segmentation fault: 11

Did’t find much info google, so blogging here.
Ran

ln -s /usr/local/opt/openssl@1.1 /usr/local/opt/openssl

and it worked

Uncategorized

MySQL – manually updating table and index optimizer statistics

January 22nd, 2020

Goal was to create a empty table and poplulate it with optimizer table and index stats from a poplulated table to try and get the optimizer to give the same explain plan and costs on the empty table as the populated table. The purpose of this was to be able to modify the empty table by adding indexes quickly and cheaply and still be able to get correct optimizer costs. This worked in the case of table access but unfortunately didn’t work when bring in an index.

Create procedure to populate test table

DROP PROCEDURE IF EXISTS populate_data;
DELIMITER //
CREATE PROCEDURE populate_data(max_val int)
BEGIN
DECLARE int_val INT DEFAULT 0;
test_loop : LOOP
IF (int_val = max_val) THEN
LEAVE test_loop;
END IF;
INSERT INTO source(id) VALUES (FLOOR(RAND()*(1000000+1)+1));
SET int_val = int_val +1;
END LOOP;
END;//
DELIMITER ;

Create source table and populate it.
Create source_clone that will be empty but we want to manually fill in optimizer statistics to mimic source table

CREATE TABLE source(id integer);
call populate_data(1000000);
CREATE TABLE source_clone (id integer) ;

comparing table and index stats on source and source_clone we can see as expected they are different:

select * from mysql.innodb_table_stats where table_name like 'sourc%';
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name   | last_update         | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
| kyle          | source       | 2020-01-22 18:49:48 |   9980 |                   21 |                        0 |
| kyle          | source_clone | 2020-01-22 18:49:44 |      0 |                    1 |                        0 |
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
select * from mysql.innodb_index_stats where table_name like 'sourc%';
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name   | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| kyle          | source       | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | n_diff_pfx01 |       9980 |          20 | DB_ROW_ID                         |
| kyle          | source       | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| kyle          | source       | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | size         |         21 |        NULL | Number of pages in the index      |
| kyle          | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:49:44 | n_diff_pfx01 |          0 |           1 | DB_ROW_ID                         |
| kyle          | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:49:44 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| kyle          | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:49:44 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+

running the explain plan we get different rows returned as expected

--
mysql> explain select id from kyle.source where id > 2056;
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | source | ALL  | NULL          | NULL | NULL    | NULL | 9980 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.05 sec)

mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | source_clone | ALL  | NULL          | NULL | NULL    | NULL |    1 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.04 sec)

Now let’s copy the stats from the populated source table to the empty source_clone data

-- TABLE  STATS
update mysql.innodb_table_stats AS `dest`,
     (SELECT  n_rows , clustered_index_size 
     FROM mysql.innodb_table_stats 
     WHERE 
         table_name = 'source'
    ) AS `src`
SET dest.n_rows  = src.n_rows , 
    dest.clustered_index_size  = src.clustered_index_size
WHERE 
         table_name = 'source_clone';
-- INDEX STATS
UPDATE mysql.innodb_index_stats AS `dest`,
(SELECT stat_value, stat_name
FROM mysql.innodb_index_stats
WHERE
table_name = 'source'
and index_name = 'GEN_CLUST_INDEX'
) AS `src`
SET dest.stat_value = src.stat_value
WHERE dest.table_name = 'source_clone'
and dest.index_name = 'GEN_CLUST_INDEX'
and dest.stat_name = src.stat_name;

The optimizer table and index statistics are now the same

mysql> select * from mysql.innodb_table_stats where table_name like 'sourc%';
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name   | last_update         | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
| kyle          | source       | 2020-01-22 18:49:48 |   9980 |                   21 |                        0 |
| kyle          | source_clone | 2020-01-22 18:54:31 |   9980 |                   21 |                        0 |
+---------------+--------------+---------------------+--------+----------------------+--------------------------+
--
mysql> select * from mysql.innodb_index_stats where table_name like 'sourc%';
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name   | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| kyle          | source       | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | n_diff_pfx01 |       9980 |          20 | DB_ROW_ID                         |
| kyle          | source       | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| kyle          | source       | GEN_CLUST_INDEX | 2020-01-22 18:49:48 | size         |         21 |        NULL | Number of pages in the index      |
| kyle          | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:54:46 | n_diff_pfx01 |       9980 |           1 | DB_ROW_ID                         |
| kyle          | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:54:46 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| kyle          | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:54:46 | size         |         21 |        NULL | Number of pages in the index      |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+

Running explain we are still getting different rows returned on the source clone

--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | source_clone | ALL  | NULL          | NULL | NULL    | NULL |    1 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+

But by doing a flush table on the source_clone we not get the same rows returned in the explain as the source

 FLUSH TABLE source_clone;
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | source_clone | ALL  | NULL          | NULL | NULL    | NULL | 9980 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+

Now let’s create an index and try to manually update the optimizer index stats

CREATE INDEX source_id ON source (id);
CREATE INDEX source_clone_id ON source_clone (id);
--
mysql> explain select id from kyle.source where id > 2056;
+----+-------------+--------+-------+---------------+-----------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key       | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+-----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | source | range | source_id     | source_id | 5       | NULL | 4990 | Using where; Using index |
+----+-------------+--------+-------+---------------+-----------+---------+------+------+--------------------------+
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table        | type  | possible_keys   | key             | key_len | ref  | rows | Extra                    |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | source_clone | range | source_clone_id | source_clone_id | 5       | NULL |    1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
--
UPDATE mysql.innodb_index_stats AS `dest`,
(SELECT stat_value, stat_name
FROM mysql.innodb_index_stats
WHERE
table_name = 'source'
and index_name = 'source_id'
) AS `src`
SET dest.stat_value = src.stat_value
WHERE dest.table_name = 'source_clone'
and dest.index_name = 'source_clone_id'
and dest.stat_name = src.stat_name;
--
mysql> select * from mysql.innodb_index_stats where table_name like 'sourc%';
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name   | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| kyle          | source       | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | n_diff_pfx01 |       9980 |          20 | DB_ROW_ID                         |
| kyle          | source       | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| kyle          | source       | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | size         |         21 |        NULL | Number of pages in the index      |
| kyle          | source       | source_id       | 2020-01-22 18:56:46 | n_diff_pfx01 |       9950 |          11 | id                                |
| kyle          | source       | source_id       | 2020-01-22 18:56:46 | n_diff_pfx02 |      10000 |          11 | id,DB_ROW_ID                      |
| kyle          | source       | source_id       | 2020-01-22 18:56:46 | n_leaf_pages |         11 |        NULL | Number of leaf pages in the index |
| kyle          | source       | source_id       | 2020-01-22 18:56:46 | size         |         12 |        NULL | Number of pages in the index      |
| kyle          | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | n_diff_pfx01 |       9980 |           1 | DB_ROW_ID                         |
| kyle          | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| kyle          | source_clone | GEN_CLUST_INDEX | 2020-01-22 18:56:46 | size         |         21 |        NULL | Number of pages in the index      |
| kyle          | source_clone | source_clone_id | 2020-01-22 18:57:38 | n_diff_pfx01 |       9950 |           1 | id                                |
| kyle          | source_clone | source_clone_id | 2020-01-22 18:57:38 | n_diff_pfx02 |      10000 |           1 | id,DB_ROW_ID                      |
| kyle          | source_clone | source_clone_id | 2020-01-22 18:57:38 | n_leaf_pages |         11 |        NULL | Number of leaf pages in the index |
| kyle          | source_clone | source_clone_id | 2020-01-22 18:57:38 | size         |         12 |        NULL | Number of pages in the index      |
+---------------+--------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table        | type  | possible_keys   | key             | key_len | ref  | rows | Extra                    |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | source_clone | range | source_clone_id | source_clone_id | 5       | NULL |    1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
--
 FLUSH TABLE source_clone;
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table        | type  | possible_keys   | key             | key_len | ref  | rows | Extra                    |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | source_clone | range | source_clone_id | source_clone_id | 5       | NULL |    1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
--
Flush tables;
--
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table        | type  | possible_keys   | key             | key_len | ref  | rows | Extra                    |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | source_clone | range | source_clone_id | source_clone_id | 5       | NULL |    1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
-- REBOOT database
mysql> explain select id from kyle.source_clone where id > 2056;
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table        | type  | possible_keys   | key             | key_len | ref  | rows | Extra                    |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | source_clone | range | source_clone_id | source_clone_id | 5       | NULL |    1 | Using where; Using index |
+----+-------------+--------------+-------+-----------------+-----------------+---------+------+------+--------------------------+

The optimizer index stats are the same, we have flushed the tables, and even did a reboot of the database, but the explain plan still shows different results on the source_clone.
The optimizer must be basing costs in this case on things other than mysql.innodb_index_stats and mysql.innodb_table_stats
 

 

Uncategorized