Archive for June, 2022

DataDog – the marriage of ASH, Query Stats and UI

June 30th, 2022

I’m so happy to be working at Datadog ( and we are hiring so ping me if interested. Looking for Python  developers who are interested in databases. Next up will be work on deep Oracle monitoring).

Datadog has brought together my favorite parts of database performance monitoring

  • Average Active Sessions
  • Query metrics
  • Powerful UI

I’ve long been an evangelist of the Average Active Sessions (AAS) view of database performance  but I’ve wanted the AAS approach married with rich SQL execution metrics like one would see in VividCortex (now part of Solarwinds).

I also love rich powerful graphics like Datadog has.  I recall seeing Datadog presentations years ago at conferences and thinking “wow, I’d like to work on database performance visualizations with their graphical interfaces”. At the time Datadog only had cursory database monitoring capabilities, but starting a little over a year ago they kicked off a project to heavily invest in deep database monitoring.

Now Datadog is bringing it all together – great graphics, active session driven analysis supplemented with rich SQL execution Metrics.

Instance Listing

First lets look at the instances listing page. I love that this page has the little Database Load chart as a column – kind of a spark chart. It gives a quick idea of the differences between instances in the fleet.

Screen Shot 2022-03-23 at 4.29.15 PM


There is also flyover on the load chart widget

Screen Shot 2022-03-23 at 4.31.36 PM

This is really cool – in our staging environment we have lots of instances with load being built, loaded, then scrapped. Now I can see the old instances that have been shut down (those with the bars on the left) and then new instances with load starting ( bars on right)

Screen Shot 2022-03-23 at 4.41.00 PM

If I see an instance of interest I’d just click on that instance and go to the instance detail page:



Instance Details

Here is the instance Detail Page.

The top two graphs give queries and latency showing the averages over the last hour compared to same hour yesterday compared to same hour over the last week. Pretty cool.

Then there is the ubiquitous load chart.

Then what’s really cool is the top SQL has not only the load but also detailed SQL metrics. It also shows how many users are currently executing the top SQL and how long those current executions are




Screen Shot 2022-03-23 at 4.44.04 PM




Query Details

if I see a query of interest I’d just click on that query and go to the query detail page.

One super cool thing is Datadog tracks the execution plans for a statement over time, so I can see if the plan changes.

The top 3 charts compare metrics of this query to the total of the queries on the system. Just clicking on “all other queries” will remove the other query metrics so the chart just shows metrics for the query on the page.

The same query can be run on more than one instance “Host Running this Query” will show other instances running this query.



Screen Shot 2022-03-23 at 4.49.33 PM


Finally I can click on the “metrics” tab in the middle of the screen to get access to a full set of query metrics

Screen Shot 2022-03-23 at 4.56.03 PM


Are some MySQL wait events actually CPU time ?

June 29th, 2022

Sure, wait event times in any database will include some CPU time, though ideally it should be minimal. Waits should be instrumented as close to the off CPU wait event as possible, i.e. if I’m doing an IO, then I want the wait to cover just the read call:

pread( ... );

So my function that sets up the read will do some CPU that could get caught under the cover of the wait itself and the call to pread will use some CPU but most of the time should be waiting for the actual I/O if the I/O is coming from storage.


On MySQL I’ve noticed that average active sessions on CPU are often well below what the host reports as CPU utilization. For example if I have one connection on CPU a 100% of the time on a 2 vCPU host then CPU Utilization should be around 50%.  There might be some other CPU activity by other processes but if the database is the only thing on the host and I only have one connection active then this is a pretty good approximation. One connection on CPU 100% of the time is the same a 1 average active session  (AAS) on CPU or 1 AAS on CPU.

On MySQL systems I’d often see AAS on CPU say at an average of 10% where as the CPU Utilization of the host is 50% or even 100%.

Finally I ran a test today where I had 2 sessions on a 2 vCPU host running a join using the same table 3 times and no index and no predicate filter. That basically makes the SQL have to do N * N * N lookups where N is the number of rows in the the table. In this case the table fit into memory so this should be a pure CPU operation and yes the host OS reported 100% CPU.  but low and behold in my case it came out ZERO AAS on CPU and 2 AAS on the wait ‘wait/io/table/sql/handler’.

On MySQL I am constantly seeing ‘wait/io/table/sql/handler’ in the top waits and happening when the SQL running should just be  reading buffers in memory. Here is an example:

 insert into seed(val) values (1);
 insert into seed(val) select val from seed; /* 2 */
 insert into seed(val) select val from seed; /* 4 */
 insert into seed(val) select val from seed; /* 8 */
 insert into seed(val) select val from seed; /* 16 */
 insert into seed(val) select val from seed; /* 32 */
 insert into seed(val) select val from seed; /* 64 */
 insert into seed(val) select val from seed; /* 128 */
 insert into seed(val) select val from seed; /* 256 */
 insert into seed(val) select val from seed; /* 512 */
 insert into seed(val) select val from seed; /* 1024 */
 insert into seed(val) select val from seed; /* 2048 */
 insert into seed(val) select val from seed; /* 4096 */
 insert into seed(val) select val from seed; /* 8192 */
 insert into seed(val) select val from seed; /* 16384 */
create table seed1 as select * from seed;
create table seed2 as select * from seed;

Then run the following query by a couple of sessions.

select count(*) from seed a, seed1 b, seed2 c where and;


Here is the Performance Insights dashboard in Amazon RDS. CPU Utilizaiton goes from about 5% to 100% and DB Load goes from 0 to 2 with almost all load on ‘wait/io/table/sql/handler’

Screen Shot 2021-11-05 at 11.01.59 PM

Now if we track down the IO waits. we can see it’s wrapped around fetch row without actually seeing if the fetch is doing an I/O or reading from cache:

 { result= index_read_map(buf, key, keypart_map, find_flag); })
 if (!result && m_update_generated_read_fields)
 result= update_generated_read_fields(buf, table, active_index);
 m_update_generated_read_fields= false;
  virtual int index_read_map(uchar * buf, const uchar * key,
                             key_part_map keypart_map,
                             enum ha_rkey_function find_flag)
    uint key_len= calculate_key_len(table, active_index, keypart_map);
    return  index_read(buf, key, key_len, find_flag);
uint calculate_key_len(TABLE *table, uint key,
                       key_part_map keypart_map)
  /* works only with key prefixes */
  assert(((keypart_map + 1) & keypart_map) == 0);

  KEY *key_info= table->key_info + key;
  KEY_PART_INFO *key_part= key_info->key_part;
  KEY_PART_INFO *end_key_part= key_part + actual_key_parts(key_info);
  uint length= 0;

  while (key_part < end_key_part && keypart_map)   {     length+= key_part->store_length;
    keypart_map >>= 1;
  return length;
IO Waits also are instrumented at
IO waits

MYSQL_TABLE_IO_WAIT(PSI_TABLE_FETCH_ROW, MAX_KEY, result, { result= rnd_pos(buf, pos); }) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_FETCH_ROW, active_index, result, { result= index_read_map(buf, key, keypart_map, find_flag); }) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_FETCH_ROW, index, result, { result= index_read_idx_map(buf, index, key, keypart_map, find_flag); }) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_FETCH_ROW, active_index, result, { result= index_next(buf); }) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_FETCH_ROW, active_index, result, { result= index_prev(buf); }) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_FETCH_ROW, active_index, result, { result= index_first(buf); }) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_FETCH_ROW, active_index, result, { result= index_last(buf); }) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_FETCH_ROW, active_index, result, { result= index_next_same(buf, key, keylen); }) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_WRITE_ROW, MAX_KEY, error, { error= write_row(buf); }) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_UPDATE_ROW, active_index, error, { error= update_row(old_data, new_data);}) 
MYSQL_TABLE_IO_WAIT(PSI_TABLE_DELETE_ROW, active_index, error, { error= delete_row(buf);}) 


m_psi_locker = PSI_TABLE_CALL(start_table_io_wait)


#define PSI_TABLE_CALL(M) psi_table_service->M

static struct PSI_table_locker *start_table_lock_wait_noop(
struct PSI_table_locker_state *, struct PSI_table *,
enum PSI_table_lock_operation, ulong, const char *, uint) {
return nullptr;

static PSI_table_service_t psi_table_noop = {
get_table_share_noop, release_table_share_noop,
drop_table_share_noop, open_table_noop,
unbind_table_noop, rebind_table_noop,
close_table_noop, start_table_io_wait_noop,
end_table_io_wait_noop, start_table_lock_wait_noop,
end_table_lock_wait_noop, unlock_table_noop};

PSI_table_service_t *psi_table_service = &psi_table_noop;




JDBC Request: Cannot load JDBC driver class’com.mysql.jdbc.Driver’ solution

June 22nd, 2022


Just  quick post to document a troublesome error when running Jmeter on Mysql

JDBC Request: Cannot load JDBC driver class’com.mysql.jdbc.Driver’ solution


Should be pretty simple, it’s just a question of putting the mysql-connector-java jar file, available at, into the classpath for Jmeter. The easiest way for me is to put this into the jmx file:

      <stringProp name=“TestPlan.user_define_classpath”> /Users/kyle.hailey/jars/mysql-connector-java-8.0.29.jar,/home/ec2-user/jars/mysql-connector-java-8.0.29.jar</stringProp>

Notice there is a space ” ” before the first jar file.  Apparently the space  was enough to cause the error. After removing the space , jmeter connected successfully to mysql.