Honeycomb.io for DB Load and Active Sessions

April 3rd, 2019

Honeycomb.io turns out to be a nice solution for collecting, retrieving and displaying multi-dimensional time series data, i.e. the kind of data you get from sampling.

For example, in the database world we have Active Session History (ASH) which at  it’s core tracks

  1. when – timestamp
  2. who – user
  3. command – what SQL are they running
  4. state – are they runnable on CPU or are they waiting and if waiting what are they waiting for like I/O, Lock, latch, buffer space, etc

Collecting this information is pretty easy to store in a relational database as I did when creating S-ASH (Simulated ASH) and Marcin Przepiorowski built upon over the years since, or even store in flatfiles like I did with W-ASH (web enabled ASH).

On the other hand retrieving the data in a way that can be graphed is challenging. To retrieve and display the data we need to transform the data into number time series.

WIth honeycomb.io we can store, retrieve and display data by various dimensions as time series.

Just sign up at honeycomb, then start to create a dataset. Pick any application it doesn’t matter and when you hit continue for creating a dataset, you will get a writekey. WIth that writekey you can start sending data to honeycomb.io.

I’m on a Mac using Python so I just installed with

pip install libhoney

see: https://docs.honeycomb.io/getting-data-in/python/sdk/

I then connected to a PostgreSQL database in Amazon RDS and looped, running a query to collect the sampled data

select 
       datname,
       pid, 
       usename, 
       application_name, 
       client_addr, 
       COALESCE(client_hostname,'unknown'), 
       COALESCE(wait_event_type,'cpu'), 
       COALESCE(wait_event,'cpu'), 
       query 
from 
       pg_stat_activity 
where 
       state = 'active' ; "

and putting this in a call to honeycomb.io

import libhoney
import psycopg2
import pandas as pd
import time
from time import gmtime, strftime
libhoney.init(writekey="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", dataset="honeycomb-python-example", debug=True)
PGHOST="kylelf2.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com"
PGDATABASE="postgres"
PGUSER="kylelf"
PGPASSWORD="xxxxxxxx"
conn_string = "host="+ PGHOST +" port="+ "5432" +" dbname="+ PGDATABASE +" user=" + PGUSER  +" password="+ PGPASSWORD
conn=psycopg2.connect(conn_string)
print("Connected!")
sql_command = "select datname, pid, usename, application_name, client_addr, COALESCE(client_hostname,'unknown'), COALESCE(wait_event_type,'cpu'), COALESCE(wait_event,'cpu'), query from pg_stat_activity where state = 'active' ; "
print (sql_command)
builder = libhoney.Builder()
try:
        while 1 < 2 :
                mytime=strftime("%Y-%m-%d %H:%M:%S", time.localtime())
                print "- " + mytime + " --------------------- "
                cursor = conn.cursor()
                cursor.execute(sql_command)
                for row in cursor:
                        db=row[0]
                        pid=row[1]
                        un=row[2]
                        app=row[3]
                        ip=row[4]
                        host=row[5]
                        group=row[6]
                        event=row[7]
                        query=row[8].replace('\n', ' ')
                        if group != "cpu" :
                                event= group + ":" + event
                        print '{0:10s} {1:15s} {2:15s} {3:15s} {4:40.40s}'.format(un,ip,group,event,query)
                        ev = builder.new_event()
                        ev.add_field( "hostname", ip)
                        ev.add_field( "user", un)
                        ev.add_field( "event", event)
                        ev.add_field( "sql", query)
                        #ev.created_at = mytime;
                        ev.send()
                time.sleep(1)
                cursor.close()
                conn.close()
                conn=psycopg2.connect(conn_string)

( You might notice the disconnect / connect at the end of the loop. That waste resources but for some reason querying from pg_stat_activity would return the same number of rows if I didn’t disconnect. Disconnecting it worked. For the case of a simple demo I gave up trying to figure out what was going on. This weirdness doesn’t happen for user tables)

On the honeycomb.io dashboard page I can choose “events” under “BREAK DOWN” and “count()” under (CALCULATE PER GROUP) and I get a db load chart by wait event. I can further choose to make the graph a stacked graph:

 

Screen Shot 2019-04-03 at 2.38.47 PM

 

Now there are some limitations that make this less than a full solution. For one, zooming out cause the granularity to change from graphing a point every second to graphing points every 15, 30 or 60 seconds, yet the count will count all the points in those intervals and there is no way to normalize it by the elasped time i.e. for a granularity of 60 seconds it sums up all the points in 60 seconds and graphs that value where what I want is to take that 60 second sum and divide by 60 seconds to get the *Average* active sessions in that interval and not the sum.

But over all a fun easy demo to get started with.

I found honeycomb.io to respond quickly to emails and they have a slack channel where folks were asking and answering questions that was responsive as well.

 


Uncategorized

  1. Trackbacks

  2. No trackbacks yet.
  1. Comments

  2. JB
    | #1

    Average Active Sessions is still not well understood as the fundamental concept that it is. Replace “sessions “ with “processes” or “threads” depending on context, still the same concept: time normalized load that is comparable across host and application boundaries.


one × = 6