Archive

Author Archive

Example Jmeter workload for Postgres and Oracle

February 27th, 2017

pgload.jmx is JMX file you can load into Jmeter and run a substantial load on a Postgres database. Should work just as well on Oracle if you change the test SQL from “Select 1″ to “select 1 from dual”

Install jmeter on our machine . On my mac, I did

  • brew install jmeter

You will need the Postgres driver. I used

jmeter1

To use this file, save it as pgload.jmx and then open it up with Jmeter. Change the database URL with host, port, database name, and fill in your username and password.

The database URL looks like jdbc:postgresql://mydb.machine.com:5432/postgres

Screen Shot 2017-02-27 at 2.56.19 PM

Then just hit the green triangle to start the load.

The load will create a table named authors, a sequence called serial and an index called author_id if these don’t already exist.

It will then run inserts, deletes, updates and selects on this table.

 

Uncategorized

9th Circuit Court Ruling 3-0

February 12th, 2017

Little did I know this building that captured my visual attention and imagination so many times walking to work over the last 6 months would play a historic roll in the current political climate.

Here is a picture of the US District Court House from recent articles

Screen Shot 2017-02-11 at 9.19.43 PM

 

And here are some of my iPhone shots over the last few months with some Instagram filtering mixed in :)

Screen Shot 2017-02-10 at 8.17.09 AM Screen Shot 2017-02-10 at 8.15.09 AM

Screen Shot 2017-02-10 at 8.15.01 AMScreen Shot 2017-02-10 at 8.16.58 AM Screen Shot 2017-02-10 at 8.16.52 AM Screen Shot 2017-02-10 at 8.16.21 AM Screen Shot 2017-02-10 at 8.15.54 AM Screen Shot 2017-02-10 at 8.14.32 AM Screen Shot 2017-02-10 at 8.14.24 AM Screen Shot 2017-02-10 at 8.14.17 AM Screen Shot 2017-02-10 at 8.14.09 AM Screen Shot 2017-02-10 at 8.13.59 AM Screen Shot 2017-02-10 at 8.13.41 AM Screen Shot 2017-02-10 at 8.13.32 AM Screen Shot 2017-02-10 at 8.13.18 AM

Uncategorized

jmeter – Variable Name must not be null in JDBC Request

January 6th, 2017

So Jmeter seems super cool.

I’ve only used it a little bit but it does seem a bit touchy about somethings (like spaces in input fields) and the errors are often less than obvious and I’m not finding that much out there on google for the errors.

Today I ran into the error

Variable Name must not be null in JDBC Request

 screen-shot-2017-01-06-at-12-01-10-pm

screen-shot-2017-01-06-at-12-00-50-pm

and Googling it didn’t turn up anything.

I’m pretty sure I ran into this same error a few weeks ago when I was first starting with Jmeter, so blogging here to document it.

I was trying something new – running a procedure – instead of a regular sql statement and I think that threw me off.

The error sounded to me like I needed to define an input or output variable.

I tried both of those, until finally I saw its not the input or output variable but the name of the JDBC connection pool that was missing

 

screen-shot-2017-01-06-at-11-51-15-am

 

 

This was the closest google hit I could find

http://stackoverflow.com/questions/36741446/add-summary-report-results-to-database-in-jmeter

In googling around, I did turn up

http://stackoverflow.com/questions/36741446/add-summary-report-results-to-database-in-jmeter

which sounds cool – loading up the results into a table after the Jmeter run

And thanks to Ivan Rancati who answered this same question of mine 4 weeks ago on user@jmeter.apache.org

 

Followup

Another problem I had today was “”The column index is out of range”

I was doing a

“INSERT INTO authors (id,name,email) VALUES(nextval(‘serial’),’Priya’,’p@gmail.com’);”

The JDBC Request worked when it was just

“INSERT INTO authors (id,name,email) VALUES(2,’Priya’,’p@gmail.com’);”

Turns out I had set “Parameter values” and “Parameter types”. When I took them out it worked. What confuses me, and what I’ll have to look back into, is the whole reason I added the  Parameters was because the nextval wasn’t work. Forgot what that original error was.

Uncategorized

jmeter – getting started

January 5th, 2017

jmeter

This blog post is just a start at documenting some of my experiences with jmeter. As far as load testing tools go, jmeter looks the most promising to me. It has an active community, supports many different databases and looks quite flexible as far as architecting different work loads goes.

The flexibility of jmeter also makes it hard to use. One can use jmeter for many other things besides databases so the initial set up is a bit oblique and there look to be many paths to similar results. As such, my understand and method for doing things will probably change considerably as I start to use jmeter more and more.

I’m installing it on a mac and using RDS instances.

installing jmeter

brew install jmeter

see

Database Driver download (I’m using the following)

Created a test table

  • CREATE TABLE authors (id INT, name VARCHAR(20), email VARCHAR(20));
  • INSERT INTO authors (id,name,email) VALUES(2,’foo’,’foo@foo.com’);

Startup up jmeter

    $ which jmeter 
      /usr/local/bin/jmeter 
    $ jmeter 
      Writing log file to: /Users/kylelf/jmeter.log

brings up a graphic window

screen-shot-2017-01-03-at-2-40-06-pm

 

Add your path to the database drivers at the bottom of the screen by clicking “Browse …” and going to your driver file and selecting it.

screen-shot-2017-01-04-at-1-48-19-pm

 

We are going to create the following (minimum setup for an example)

  1. create test: Thread group named ‘Database Users’
  2. db connection: Config element of type JDBC Connection Configuration
  3. query to run: Sampler of type JDBC Request
  4. results output: Listener of type “View Results Tree”

1. First to do is add a “Thread Group”

(right click on “Test Plan”)

Define how many connections to make and how many loops to make of the workload

thread_group

 

screen-shot-2017-01-03-at-2-45-25-pm

interesting parts here are

  • “Number of Threads (users)” : can set the number of database connections
  • “Loop Count ” : can set the number of iterations of the test query

2. Add a Config Element of type JDBC Connection Configuration

 

Define what database to connect to

screen-shot-2017-01-03-at-2-46-28-pm

For Oracle make sure and change “Select 1″ to “Select 1 from dual” or you’ll get non-obvious error.

Name the pool. For example I call mine “orapool”

and fill out all the connection information

  • Database machine, port and SID of form: jdbc:oracle:thin:@yourmachine:1521:ORCL
  • JDBC Driver Class: oracle.jdbc.OracleDriver
  • Username
  • Password

screen-shot-2017-01-03-at-3-03-46-pm

3. Sampler of type JDBC Request

Define a SQL statement to run

screen-shot-2017-01-03-at-3-35-47-pm

screen-shot-2017-01-03-at-3-39-24-pm

 

Make sure and include the name of the thread pool created above. In my case it’s called “orapool”

add a SQL statement to run

screen-shot-2017-01-03-at-3-53-39-pm

4. Listener of type “View Results Tree”

create a widget to see the output

screen-shot-2017-01-03-at-3-55-36-pm

Final setup looks like

run your load and look at the output

screen-shot-2017-01-03-at-4-42-49-pmNow you hit the run button, the green triangle.

Then click on “View Results Tree” to see the output.

screen-shot-2017-01-03-at-4-02-54-pm

I clicked on “View Results Tree” and then clicked on “JDBC Request” in red.

Then I’ll see some output. I choose “Response data” because it’s a bit more succinct and see the error. In this case there is an extra space ” ” at the end of “oracle.jdbc.OracleDriver “. Jmeter is sensitive to spaces. I’ve gotten a lot of errors because of spaces in fields such as variable names and such.

Correcting that it runs

screen-shot-2017-01-03-at-4-52-52-pm

 

All the setup might sound like a bit of a pain but once it’s set up, it’s easy to click through and make modifications.

All the setup is available in a text .jmx file and if you are brave you can edit directly there.

Here is the above example .jmx file on github.

Look for “my” and replace

  • myinstance.rds.amazonaws.com
  • myuser
  • mypassword

The above example is more or less pointless – sort of a “Hello World”.

From here though you can increase the number of threads, increase the number of loops, add more SQL statements.

Jmeter allows a lot of customization so you can add .cvs files for input values, capture output values into variables and use them in input values, have different types of loops with different users running concurrently etc.

More to come.

Christian Antognini gave a presentation at Oaktable World SF in Sept 2016. He was gracious enough to send along his functionally rich .jmx file and I’ll blog on that soon.

 

Uncategorized

Graphics for SQL Optimization

January 4th, 2017

Dan Tow, in his book SQL Tuning, lays out a simple method of tuning SQL queries. The method is

  • Draw a diagram of each table in the query with Children above Parents
  • Draw join lines between each join (many-to-many, one-to-many)
  • Mark each table with a predicate filter and calculate the amount of table filtered out

Then to find a great optimal optimization path candidate

  1. Start at the table with the strongest predicate filter (the filter that returns the fewest % of the table)
  2. join down to children (if multiple children join to child with strongest predicate filter)
  3. If you can’t join to children, join up to parent

The basics are pretty simple and powerful. Of course there are many cases that get more complex and Dan goes into these complex cases in his book.

What about indexes? Well the method will point out joins that should happen and if those joins are missing indexes then it indicates that indexes should be created.

What about join type? I generally leave this to the optimizer. The join type can be important but generally order of joins and indexes are more important. I look at join type as the final optimization.

Let’s take an example query:

SELECT COUNT (*)
FROM   a,
       b,
       c
WHERE
       b.val2 = 100 AND
       a.val1 = b.id AND
       b.val1 = c.id;

There are  indexes on b.id and c.id.  Diagramming the query in DB Optimizer gives

sql1

The red lines with crows feet mean that as far as the definitions go, the relations could be many to many.

Question is “what is the optimal execution path for this query?”

One of  the best execution plans is to

  1. start at the most selective filter table
  2. join to children  if possible
  3. else join to parent

There is one filter in the diagram, represented by the green F on table B. Table B has a filter criteria in the query “b.val2=100″.

Ok, table B is where we start the query. Now where do we go from B? Who is the parent and who is the child? It’s not defined in the constraints nor indexes on these tables so it’s hard for us to know. Guess what ? It’s also hard for Oracle to figure it out. Well, what does Oracle decide to do? This is where the cool part of DB Optimizer  comes in.

The super cool thing with DB Optimizer is we can overlay the diagram with the actual execution path (I think this is awesome)

sql2

For the digram we can see Oracle starts with B and joins to A. The result if this is joined to C. Is this the optimal path?

Well, let’s keep the same indexes and just add some constraints:

alter table c add constraint c_pk_con unique (id);
alter table b add constraint b_pk_con unique (id);

Now let’s diagram the query with DB Optimizer:

sql3

We can now see who the parent and child is, so we can determine the optimal query path which is to start at B, the only filter and  join to the child C then to the parent A.  Now what does Oracle do with the added constraint info:

sql4

Guess what? The execution plan has now changed with the addition of constraints and now Oracle’s execution path goes from a suboptimal plan to  the optimal path. Moral of the story is to make sure and define constraint information because it helps the optimizer, but what I wanted to show here was the explain plan overlay on the diagram which makes comparing execution plans much easier. Putting the queries VST diagrams side by side along with the overlay of execution path we can clearly and quickly see the differences:

sql5sql6

I plan to blog more about this awesome feature. It’s really cool.

Here is an example from an article by Jonathan Lewis

http://www.simple-talk.com/sql/performance/designing-efficient-sql-a-visual-approach/

The query Jonathan discusses is

SELECT order_line_data
FROM
         customers cus
         INNER JOIN
         orders ord
         ON ord.id_customer = cus.id
         INNER JOIN
         order_lines orl
         ON orl.id_order = ord.id
         INNER JOIN
         products prd1
         ON prd1.id = orl.id_product
         INNER JOIN
         suppliers sup1
         ON sup1.id = prd1.id_supplier
   WHERE
         cus.location = 'LONDON' AND
         ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND
         sup1.location = 'LEEDS' AND
    EXISTS (SELECT NULL
            FROM
                 alternatives alt
                 INNER JOIN
                 products prd2
                 ON prd2.id = alt.id_product_sub
                 INNER JOIN
                 suppliers sup2
                 ON sup2.id = prd2.id_supplier
           WHERE
                  alt.id_product = prd1.id AND
                  sup2.location != 'LEEDS')

which diagrammed looks like

sql7

There are multiple filters, so we need to know which one is the most selective to know where to start, so we ask DB Optimizer to display the statistics as well  (blue below a table is the filter %, green above is # of rows in table and numbers on join lines are rows returned by a join of just those two tables)

sql8

Now that we can determine a candidate for best optimization path, does Oracle take it?

sql9

Can you find the optimization error?

Dark green is where execution starts. There are two starts: one for the main query body and one for the subquery.

The red is where query execution ends.

PS a big part of this work is by the lead developer Matt Vegh. Many thanks to Matt for this awesome work.

 

PPS another example from Karl Arao

sql10

The dark green nodes are starts, so there are 4 separate starts. We can see how the result sets from each start are joined with each successive table join set. The red is the final step.

Uncategorized

Apple Upset – upgrading to iPhone 7

December 22nd, 2016

Upgrading is always stressful – be it a computer, an Oracle database or an iPhone. There’s always a good chance for lost data and lost time dealing with complications.

So yesterday I picked up a new iPhone 7 from Verizon. The pickup was seamless. I had signed up for an upgrade program when I got the iPhone 6, so now I just walked in, gave them my old iPhone 6 and they gave me an new iPhone 7. It’s bit scary giving up my old phone before restoring to my new phone, but I had a backup AND I asked Verizon to please not wipe my iPhone 6 for 24  hours incase there were upgrade errors. They normally wipe the phone immediately.

The day was off to a good start. It only took about 10 minutes to get the phone and I had taken a full backup of my iPhone 6 the day before and thought I’d plug in , restore back up and wow, that would be easy.

I get back to my office, luckily just a couple blocks away. Plug it in, try to restore the backup and it asked me for a password. I’m like ‘rrr’! The day before when I had taken the backup, I saw that the “encrypt” checkbox was filled and thought about taking it off, but then thought, “well it’s probably more prudent to leave it on”. Of course the backup didn’t ask me to verify my password. It just took the backup.

screen-shot-2016-12-22-at-10-52-13-am

Now in order to use the backup, I had to know the password. The day before when it took the backup, I was thinking my computer had it cached, or else why didn’t it ask for the password when I took the backup?

So now I had to figure out what the password it was. I tried all conceivable passwords I could think of. My name space of passwords is limited to about 3 core, 10 common  and 20 rare passwords. I tried them all. It wasn’t my Apple ID password which it should have been. It wasn’t my iPhone 6 number code. It wasn’t any of the 20 passwords I’ve used over the past several years.

OK, fine. I’ll go back get my iPhone 6 and take a new un-encrypted backup!

I go back to the store. To their credit they gave me my old phone in a couple of minutes and it was back to the office.

I go to the backup interface, seen above, and try to un-check the “Encrypt iPhone backup”.

This is  where it gets fun.

*** No ***

you can’t uncheck the Encrypt!  What has been seen can never be unseen.

To uncheck the box you need the password. If I knew the password I wouldn’t be unchecking the box.

So now I  *** CAN’T *** take a backup (at least one that is usable)! Thanks a lot Apple. Are you serious?

It’s my  iPhone, my computer, I’m using everything on both and Apple won’t let me take a backup of the iPhone on the computer!

From Apple:

screen-shot-2016-12-22-at-10-47-24-am

Unbelievable. There is no way for me to backup my iPhone 6 so I can upgrade to my iPhone 7.

Are they serious ??

The work around is to backup to the iCloud which doesn’t need the password. Does that make sense? Can we say slow, insecure and inefficient?

Backing up my 64G  iPhone 6 to the iCloud is a recipe for a huge waste of time.

So I go through and blow everything away on my iPhone (so much for a good backup) until I’m down to about 6G and then backup to the iCloud. Of course, I have to buy more space on the iCloud to do this. rrrr

I bought more space, I backed up the iPhone, I went back to Verizon, switched phones again, back to the office, restored from iCloud and it worked. Of course I’m missing all the photos, music, books and apps I rarely used. Now time to put that stuff on back by hand.

Some parts of the upgrade are magic, but this part blows me away. Why? Why so much pain? Just to force me to buy some iCloud space for a day?

What a frustrating waste of time from a company that prides itself in easy powerful user interfaces.

First, why can’t you turn of encryption of backups once chosen?

Second why not use apple ID password ?

Third, why not alert the user every time they take an encrypted backup asking for password? No point in taking the encrypted backup if you don’t know what the password is.

 

Uncategorized

Why does my full table scan take 10x longer today ?!

December 13th, 2016

Every so often a DSS query that usually takes 10 minutes ends up taking over an hour.  (or one that takes an hour never seems to finish)

Why would this happen?

When investigating the DSS query, perhaps with wait event tracing,  one finds that the query which is doing full table scans and should be doing large multi-block reads and waiting for “db file scattered read” is instead waiting for single block reads, ie “db file sequential read”.  What the heck is going on?

Sequential reads during a  full table scan scattered read query is a classic sign of reading rollback and reading rollback can make that minute(s) full table scan take hours.

What can happen especially after over night jobs, is that if an overnight job fails to finished before the DSS query is run and if that overnight job  does massive updates without committing till then end, then the DSS query will have to rollback any changes made by the updates to the tables the DSS query is accessing.

How do we quickly identify if this our issue?

ASH is good at identify it. On the other hand it’s often impractical to whip up from scratch an ASH query and that’s where ashmasters on Github comes in. This ASH query and others are on Github under ashmasters.

see https://github.com/khailey/ashmasters

For this case specifically see:

https://github.com/khailey/ashmasters/blob/master/ash_io_top_obj_advanced.sql

Here is the output (slight different format than in the github repository) of a query I used in my Oracle Performance classes

AAS SQL_ID           %  OBJ              TABLESPACE
----- -------------  ---  ---------------  ----------
  .18 0yas01u2p9ch4    6  ITEM_PRODUCT_IX  SOEINDEX
                       6  ORDER_ITEMS_UK   SOEINDEX
                      88  ITEM_ORDER_IX    SOEINDEX
  .32 6v6gm0fd1rgrz    6  MY_BIG_Table     SOEDATA
                      94  UNDO             UNDOTBS1

i.e. 95% of the second SQL_ID’s i/o was coming from UNDO. The reads will be single block reads and tremendously slow down the full table scans.

Uncategorized

Amazon announces Performance Insights

December 2nd, 2016

Excited to see the announcement of Amazon RDS Performance Insight feature for database performance monitoring and tuning.

Having met the team for this project I can say from my personal view point that the importance and future success of this feature is clear as day to me. The team is awesomely sharp, the architecture is super impressive, and this is by far the most exciting performance monitoring and feedback system I’ve been involved with,  surpassing the work I’ve been involved in on Oracle’s performance monitoring and tuning system and Embarcadero’s DB Optimizer and Quest Foglight and Spotlight. Not only does the feature provide it’s own dashboard but will also provide an API to power your dashboards that already exist in the industry. I expect to see partners leveraging the API to provide new insights in their already existing database performance monitors.

Below are a couple of snippets that showed up on the web this week  as well as a couple of photos of Jeremiah Wilton, who I personally consider the catalyst and moving force behind this feature, giving demonstrations of the feature.

PS the team is also hiring! If you you are interested, check out this post on ycombinator.

On Amazon Blog

https://aws.amazon.com/blogs/aws/amazon-aurora-update-postgresql-compatibility/

perfsights

On Brent Ozar’s  blog

https://www.brentozar.com/archive/2016/12/7-things-learned-aurora-aws-reinvent-2016/

 Amazon’s throwing in free performance monitoring tools. The new Performance Insights tool shows wait stats, top resource-intensive queries, lock detection, execution plans, and 35 days of data retention. It’s designed by a former Oracle (there we go again) performance tuning consultant to match his workflow. You don’t have to install an agent, configure a repository, or keep the thing running. It’s in preview for Aurora PostgreSQL today, and will be rolled out to all RDS databases (including SQL Server) in 2017.

 

aws-performance-insights

 

Live Demo at the re:Invent demo grounds

photo-perfinsight1img_0011

Video of presentation at re:Invent

 

Starting at 39:00

 

Uncategorized

SQL*PLus on Mac

December 1st, 2016

I would think installing SQL*Plus on the Mac would be point, click download, point click, install bam it works.

Nah

It did install mostly straight forward on my old Mac. Got a new Mac and no dice.

Tried installing myself guessing at the downloads. First of all why isn’t there just one download?

Downloaded instantclient and instantclient with SQL*Plus which turns out to be correct, but no dice. Still got errors.

Got errors. Gave up.

Came back again to look at it yesterday and followed this:

https://tomeuwork.wordpress.com/2014/05/12/how-to-install-oracle-sqlplus-and-oracle-client-in-mac-os/

worked like a charm.

Then I ran a shell script that used SQL*Plus oramon.sh and get the error

dyld: Library not loaded: /ade/dosulliv_sqlplus_mac/oracle/sqlplus/lib/libsqlplus.dylib
Referenced from: /Applications/oracle/product/instantclient_64/11.2.0.4.0/bin/sqlplus
Reason: image not found

so I set my environment variable

export DYLD_LIBRARY_PATH=$ORACLE_HOME/lib

Let’s check the environment variable:

$ env | grep LIB
$

Huh?! Where is my environment variable?

Apparently you can’t set DYLD_LIBRARY_PATH on Mac without over riding some security settings which didn’t sound attractive.

I googled around and found

https://blog.caseylucas.com/2013/03/03/oracle-sqlplus-and-instant-client-on-mac-osx-without-dyld_library_path/

which didn’t work for me. Then found

https://blogs.oracle.com/taylor22/entry/sqlplus_and_dyld_library_path

which has as a solution to set the library path inside an alias for SQL*Plus – cool!

alias sqlplus=”DYLD_LIBRARY_PATH=/Applications/oracle/product/instantclient_64/11.2.0.4.0/lib sqlplus”

and I put that into my ~/.bashrc and it worked!

Uncategorized

Started at Amazon! … want to join me?

August 29th, 2016

(Disclamer: any opinions expressed here are fully my own and not representative of my employer)

11039191375_cac06fb854_z

photo by alvaroprieto  (cc 2.0)

Super excited to be working at Amazon on my passion which is performance data visualization and database monitoring. Suffice it to say this is the most excited I’ve been about work in my career and I’ve had ample opportunity to work on database performance in the past such as at Oracle (where I helped design the performance pages and designed Top Activity page), at Quest (now Dell) on Spotlight, on my own free tools ( ASHMon, S-ASH, W-ASH, Oramon etc)  and at Embarcadero where our team produced DB Optimizer that extended sampling and average active sessions to SQL Server, DB2 and Sybase (not to mention Visual SQL Tuning). The work here at Amazon looks to largely surpass all previous work.

More to news to come as I settle in.

In the mean time Amazon is looking to hire! We are looking for product managers, developers, service people etc. Mainly senior people with a good track record.  Please feel free to contact me if (and only if)

  • you are senior in your career  and/or
  • we have personally worked together  and/or
  • you have done something innovative already in your career (a free tool, a new design, etc).

Please refrain from contacting me about junior positions.  If you are interested in junior positions please look at the Amazon jobs listed on their website. Amazon is hiring aggressively!

These positions are almost all out of Seattle. There is some chance of working in Vancouver and Palo Alto though it would be recommended to work out of Seattle.

One specific position on my groups team is a Data Engineer to work on reporting. Here is the job listing from the Amazon site:

 

External job description:

Amazon Relational Database Service (Amazon RDS) is an industry leading web service that makes it easy to set up, operate, and scale a relational database in the cloud using any of the leading database engines – MySQL, MariaDB, PostgreSQL, SQL Server and Oracle, as well as Amazon’s own MySQL-compatible database engine, Aurora. We are looking for a for a seasoned and talented data engineer to join the team in our Seattle Headquarters. More information on Amazon RDS is available at http://aws.amazon.com/rds.

The data engineer must be passionate about data and the insights that large amounts of data can provide and has the ability to contribute major novel innovations for our team. The role will focus on working with a team of product and program managers, engineering leaders and business leaders to build pipelines and data analysis tools to help the organization run it’s business better. The role will focus on business insights, deep data and trend analysis, operational monitoring and metrics as well as new ideas we haven’t had yet (but you’ll help us have!). The ideal candidate will possess both a data engineering background and a strong business acumen that enables him/her to think strategically and add value to help us improve the RDS customer experience. He/she will experience a wide range of problem solving situations, strategic to real-time, requiring extensive use of data collection and analysis techniques such as data mining and machine learning. In addition, the data engineering role will act as a foundation for the business intelligence team and be forward facing to all levels within the organization.

· Develop and improve the current data architecture for RDS
· Drive insights into how our customers use RDS, how successful they are, where our revenue trends are going up or down, how we are helping customers have a remarkable experience, etc.
· Improve upon the data ingestion models, ETLs, and alarming to maintain data integrity and data availability.
· Keep up to date with advances in big data technologies and run pilots to design the data architecture to scale with the increased data sets of RDS.
· Partner with BAs across teams such as product management, operations, sales, marketing and engineering to build and verify hypotheses.
· Manage and report via dashboards and papers the results of daily, weekly, and monthly reporting

 

Basic Qualifications
· Bachelor’s Degree in Computer Science or a related technical field.
· 6+ years of experience developing data management systems, tools and architectures using SQL, databases, Redshift and/or other distributed computing systems.
· Familiarity with new advances in the data engineering space such as EMR and NoSQL technologies like Dynamo DB.
· Experience designing and operating very large Data Warehouses.
· Demonstrated strong data modelling skills in areas such as data mining and machine learning.
· Proficient in Oracle, Linux, and programming languages such as R, Python, Ruby or Java.
· Skilled in presenting findings, metrics and business information to a broad audience consisting of multiple disciplines and all levels or the organizations.
· Track record for quickly learning new technologies.
· Solid experience in at least one business intelligence reporting tool, e.g. Tableau.
· An ability to work in a fast-paced environment where continuous innovation is occurring and ambiguity is the norm.

 

Preferred Qualification
· Master’s degree in Information Systems or a related field.
· Capable of investigating, familiarizing and mastering new datasets quickly.
· Knowledge of a programming or scripting language (R, Python, Ruby, or JavaScript).
· Experience with MPP databases such as Greenplum, Vertica, or Redshift
· Experience with Java and Map Reduce frameworks such as Hive/Hadoop.
· 1+ years of experience managing an Analytic or Data Engineering team.
· Strong organizational and multitasking skills with ability to balance competing priorities.

 

Uncategorized