Archive

Archive for April, 2019

Want to change the future Amazon RDS performance monitoring?

April 10th, 2019

UPDATE: All slots for this study have been filled.

On the other and would love your feedback. Please send any ideas about what you’d like to see in RDS performance monitoring to me at kylelf at amazon.com

Thanks

Kyle
APG_2

Have you ever used Amazon RDS console to manage performance on an RDS database and had ideas on how to make it better?  The Amazon UX Research team for AWS is collaborating with me to recruit for an upcoming user research study with Amazon Relational Database Service (RDS) performance monitoring. Sessions will take place between Monday, April 22, 2019 – Friday, April 26, 2019. In this study Amazon is looking to speak to experienced Database Administrators who currently use RDS. The sessions will be conducted remotely on WebEx and will be 1 hour long. As appreciation for your time and feedback, participants will receive a $100 Amazon.com gift card for completing a session (if your company guidelines permit).

If you are interested, sign up for this confidential study here: (Note that the study has limited spots, first come first serve.) 

https://app.acuityscheduling.com/schedule.php?owner=14014140&calendarID=2201515

After you sign up, and if you get a slot, the Amazon UX team will send you a WebEx invite for your session.

 

 

 

Uncategorized

Honeycomb.io for DB Load and Active Sessions

April 3rd, 2019

Honeycomb.io turns out to be a nice solution for collecting, retrieving and displaying multi-dimensional time series data, i.e. the kind of data you get from sampling.

For example, in the database world we have Active Session History (ASH) which at  it’s core tracks

  1. when – timestamp
  2. who – user
  3. command – what SQL are they running
  4. state – are they runnable on CPU or are they waiting and if waiting what are they waiting for like I/O, Lock, latch, buffer space, etc

Collecting this information is pretty easy to store in a relational database as I did when creating S-ASH (Simulated ASH) and Marcin Przepiorowski built upon over the years since, or even store in flatfiles like I did with W-ASH (web enabled ASH).

On the other hand retrieving the data in a way that can be graphed is challenging. To retrieve and display the data we need to transform the data into number time series.

WIth honeycomb.io we can store, retrieve and display data by various dimensions as time series.

Just sign up at honeycomb, then start to create a dataset. Pick any application it doesn’t matter and when you hit continue for creating a dataset, you will get a writekey. WIth that writekey you can start sending data to honeycomb.io.

I’m on a Mac using Python so I just installed with

pip install libhoney

see: https://docs.honeycomb.io/getting-data-in/python/sdk/

I then connected to a PostgreSQL database in Amazon RDS and looped, running a query to collect the sampled data

select 
       datname,
       pid, 
       usename, 
       application_name, 
       client_addr, 
       COALESCE(client_hostname,'unknown'), 
       COALESCE(wait_event_type,'cpu'), 
       COALESCE(wait_event,'cpu'), 
       query 
from 
       pg_stat_activity 
where 
       state = 'active' ; "

and putting this in a call to honeycomb.io

import libhoney
import psycopg2
import pandas as pd
import time
from time import gmtime, strftime
libhoney.init(writekey="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", dataset="honeycomb-python-example", debug=True)
PGHOST="kylelf2.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com"
PGDATABASE="postgres"
PGUSER="kylelf"
PGPASSWORD="xxxxxxxx"
conn_string = "host="+ PGHOST +" port="+ "5432" +" dbname="+ PGDATABASE +" user=" + PGUSER  +" password="+ PGPASSWORD
conn=psycopg2.connect(conn_string)
print("Connected!")
sql_command = "select datname, pid, usename, application_name, client_addr, COALESCE(client_hostname,'unknown'), COALESCE(wait_event_type,'cpu'), COALESCE(wait_event,'cpu'), query from pg_stat_activity where state = 'active' ; "
print (sql_command)
builder = libhoney.Builder()
try:
        while 1 < 2 :
                mytime=strftime("%Y-%m-%d %H:%M:%S", time.localtime())
                print "- " + mytime + " --------------------- "
                cursor = conn.cursor()
                cursor.execute(sql_command)
                for row in cursor:
                        db=row[0]
                        pid=row[1]
                        un=row[2]
                        app=row[3]
                        ip=row[4]
                        host=row[5]
                        group=row[6]
                        event=row[7]
                        query=row[8].replace('\n', ' ')
                        if group != "cpu" :
                                event= group + ":" + event
                        print '{0:10s} {1:15s} {2:15s} {3:15s} {4:40.40s}'.format(un,ip,group,event,query)
                        ev = builder.new_event()
                        ev.add_field( "hostname", ip)
                        ev.add_field( "user", un)
                        ev.add_field( "event", event)
                        ev.add_field( "sql", query)
                        #ev.created_at = mytime;
                        ev.send()
                time.sleep(1)
                cursor.close()
                conn.close()
                conn=psycopg2.connect(conn_string)

( You might notice the disconnect / connect at the end of the loop. That waste resources but for some reason querying from pg_stat_activity would return the same number of rows if I didn’t disconnect. Disconnecting it worked. For the case of a simple demo I gave up trying to figure out what was going on. This weirdness doesn’t happen for user tables)

On the honeycomb.io dashboard page I can choose “events” under “BREAK DOWN” and “count()” under (CALCULATE PER GROUP) and I get a db load chart by wait event. I can further choose to make the graph a stacked graph:

 

Screen Shot 2019-04-03 at 2.38.47 PM

 

Now there are some limitations that make this less than a full solution. For one, zooming out cause the granularity to change from graphing a point every second to graphing points every 15, 30 or 60 seconds, yet the count will count all the points in those intervals and there is no way to normalize it by the elasped time i.e. for a granularity of 60 seconds it sums up all the points in 60 seconds and graphs that value where what I want is to take that 60 second sum and divide by 60 seconds to get the *Average* active sessions in that interval and not the sum.

But over all a fun easy demo to get started with.

I found honeycomb.io to respond quickly to emails and they have a slack channel where folks were asking and answering questions that was responsive as well.

 

Uncategorized