From 97983bfd86be6c0b897655d39c1562f1cb97da3d Mon Sep 17 00:00:00 2001 From: Olof-Joachim Frahm Date: Sat, 25 Apr 2015 15:20:37 +0200 Subject: [PATCH] Add post about PostgreSQL insights, first post. --- postgresql-insights-i.post | 86 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) create mode 100644 postgresql-insights-i.post diff --git a/postgresql-insights-i.post b/postgresql-insights-i.post new file mode 100644 index 0000000..9bd011e --- /dev/null +++ b/postgresql-insights-i.post @@ -0,0 +1,86 @@ +;;;;; +title: PostgreSQL insights I +tags: postgresql +date: 2015-04-25 13:47:23+01:00 +format: md +;;;;; + +After working a lot more with PostgreSQL, I happened to stumble upon a few +things, mostly related to query optimisation, that at least for me weren't +quite readily apparent. Please note that this is all with regards to +PostgreSQL 9.3, but unless otherwise noted it should still be the case for 9.4 +as well. + +## Slow `DELETE`s + +Obviously, indexes are key to good query performance. If you're already using +`EXPLAIN`, with, or without `ANALYZE`, chances are good you know how and why +your queries perform like they do. However I encountered a problem with a +`DELETE` query, where the output from `EXPLAIN` was as expected, i.e. it was +using the optimal plan, but still the performance was abysmal; a query to +delete about 300 elements in bulk, like `DELETE FROM table WHERE id IN (...);`, +was quite fast to remove the elements (as tested from a separately running +`psql`), but still the query took about three minutes(!) to complete. + +In this scenario the table in question was about a million rows long, had a +primary index on the `id` column and was referenced from three other tables, +which also had foreign key constraints set up; no other triggers were running +on any of the tables involved. + +The `postgres` server process in question wasn't doing anything interesting, it +was basically taking up one core with `semop` calls as reported by `strace`, no +other I/O was observed though. + +At that point I finally turned to `#postgresql`. Since the information I could +share wasn't very helpful, there were no immediate replies, but one hint +finally helped me to fix this problem. It turns out that the (kinda obvious) +solution was to check for missing indexes on foreign key constraints from +*other tables*. With two out of three indexes missing I resumed to `CREATE +INDEX CONCURRENTLY...;` and about five minutes later the `DELETE` was now +running in a few milliseconds. + +The part where I was really frustrated here is that none of the available +statistics guided me in the search; `EXPLAIN ANALYZE` apparently doesn't +include the runtime for foreign key constraints and they don't show up in +other places as well. In hindsight this is something that I should've checked +earlier (and from now on I will), but it's also a weakness of the analyze +framework not to help the developer to see the slowdowns involved in a +situation like this. + +## Common Table Expressions + +Refactoring queries to reuse results with `WITH` queries is absolutely worth it +and improved the runtime of rather large queries by a large factor. This is +something that can be seen from the query plan, so when you're using the same +expressions twice, start looking into this and see if it helps both for +readability (don't repeat yourself) and performance. + +## JSON result construction + +We need nested JSON output in a few cases. This means (in 9.3, there are some +better functions available in 9.4) that a combination of +`row_to_json(row(json_agg(...)))` was necessary to get proper nesting of +aggregated sub-objects, as well as wrapping the end result in another object, +because the output had to be formatted as a JSON object (with curly brackets) +instead of a JSON array (with rectangular brackets). + +Technicalities aside the JSON support is pretty good and since the initial code +was written I've discovered that since we in many cases don't actually have +multiple results (for `json_agg`), not using that method will again +significantly improve performance. + +That means instead of something like the following: + + SELECT row_to_json(row(json_agg(...))) FROM ... JOIN ... GROUP BY id; + +, where the input to `json_agg` is a *single* result from the `JOIN`, we can +write the following instead: + + SELECT row_to_json(row(ARRAY[...])) FROM ... JOIN ...; + +, which, if you examine the output of `EXPLAIN`, means *no sorting* because of +the `GROUP BY` clause. The convenience of `json_agg` here doesn't really +justify the significant slowdown caused by the aggregation function. + +Note that the array constructed via `ARRAY[]` is properly converted to JSON, so +the end result is again proper JSON. -- 1.7.10.4