I solved a gnarly performance problem last week and I'd like to share with you what I learnt while doing it.
We received an alert from one of our automated monitoring systems that a particular query being run via an API was taking too long. And by "too long" I mean minutes. Since database engines are optimised to return data to you in milliseconds, this seemed wildly wrong.
The first step when you are checking a query for performance is to use the query analyzer. This is a very powerful tool in any database, and helps you see just what the engine is going to do for any particular bit of SQL. The term you're looking for in the documentation is EXPLAIN.
We thought the query was fairly simple. It followed this general form:
SELECT columns, COUNT(SELECT specific column WHERE conditions), FROM database WHERE conditions ORDER BY ordering criteria;
and gave us not just results with specific conditions, but the count of rows which met those conditions too.
You will notice that I did not use first set of conditions and second set of conditions. This is because the selection criteria were in fact the same. That was the first clue.
A second clue was that of these selection conditions was that we had a range check - is the column's value between A and B? [We actually had two, but having just one was sufficient to show the problem]. This was expressed as
columnA >= smallervalue AND columnA <= largervalue
So you'll see that for each row we had two comparisons to do. I tried making that condition singular, by using just >= smallervalue (I also tried <= largervalue> - no difference in timing) and while that did make the query faster, it did not reflect what we need to be able to do, so that was out.
Back to the query planner. It wasn't happy. It turns out that the most efficient plan it could come up with was using a Bitmap Index Scan. Twice. For the same query - but throwing the results of the first query away before running the whole thing again.
I knew that there had to be a better way (tm) - and there was. Here's the first thing I learned: the SQL standard has a thing called a Common Table Expression or CTE. This is a way of creating a temporary table that is used for just that query.
With this knowledge, I could now re-write the query in a considerably more efficient fashion. (It took me several days to figure this out - I'm no SQL guru!)
WITH better_query AS (SELECT columnname FROM tablename WHERE conditions) SELECT columns, count(better_query.columnname) FROM tablename GROUP BY grouping criteria ORDER BY ordering criteria;
Excellent! ... except that that was giving me all the rows in tablename which matched from better_query rather than the specific rows which met conditions. To solve this niggle I needed a join.
WITH better_query AS (SELECT columnname FROM tablename WHERE conditions) SELECT columns, count(better_query.columnname) FROM tablename TN JOIN better_query BQ on BQ.columnname = TN.columnname GROUP BY grouping criteria ORDER BY ordering criteria;
The query planner was a lot happier. Happier to the tune of a 20x improvement.
There might have been a cheer or two.
So how do we put this into our application codebase?
While a major reason $employer hired me was my Python skills, $employer isn't a Python shop by any stretch of the imagination. We're a Java shop, and I'm cross-pollinating :-).
Part of that process has been learning how we abstract database specificities out of the code - we use an Object-Relational Mapping (aka ORM) to cope with using more than one vendor's database. There are a few options for Java ORMs and the one that we use is jOOQ. This is the first time I've used an ORM (of course I've read about them, but years ago) so this too was something new that I've learned. Heading to the documentation I read a lot, getting my head around the concepts and syntax.
It took me several days of research and hacking (stackoverflow, github, come on down!) to work out the correct syntax.
Here's the gist of what we needed:
public static QueryClass { private final String withClauseName = new String("betterQuery"); private final String withClauseField = new String("fieldName"); .... CommonTableExpression<Record1<Integer>> withClause = DSL.name(withClauseName) .fields(withClauseField) .as(select(columnName) .from(tableName) .where(conditions) ); Field<Integer> results = DSL.field(context.select( count(DSL.field(columnName))) .from(withClause) .limit(1) ).as("results"); /* * append results field to array of column names in the select, then ... */ List<Typename> resultSet = DSLcontext.with(withClause) .select(columns) .from(tableName) .join(withClause) .on(withClause.field(withClauseField, typename.class) .eq(tablename.columnName)) .groupBy(columnName) .orderBy(columnName) .... };
No, it's not simple - but using jOOQ does mean that we can keep our code pretty close to how we would write bare SQL (in this case that would still be rather gnarly), and that is definitely a win.
Acknowledgement
Since this is another SQL-based post, I'd like to acknowledge the massive knowledge and confidence boost I got from reading Julia Evans (aka b0rk)'s SQL zine Become a SELECT star. She has a beautiful style and breaks complex technical concepts down into easily understandable chunks with great illustrations. Reading that zine advanced my database confidence immensely.