Friday, March 2, 2012

Highly volatile postgresql

I've had a few clients that have had some issues with postgresql and performance problems taking place over time due to their databases being highly transactional. Mind you these customers are really dealing with a single database that has grown in size, as opposed to several on a single server. It had gotten to the point where my colleague Rich Keggans (from 3X Systems also) had been scheduling a monthly cluster to be run due to how bad the performance would become.

So in response we tried a little experiment involving tweaking the auto-vacuum settings and creating an automated task. I felt the need to share it with you, as we've seen some rather promising results so far as performance degradation seems to have ceased in two instances so far.

For the auto-vacuum we really just made the settings a bit more aggressive. In the case we tested this the customer's database had a very high level of transactions taking place. The chance for bloat to build up here is pretty high so we made the settings reflect the following in order to fight it:

autovacuum = on                         # enable autovacuum subprocess?
                                        # 'on' requires stats_start_collector
                                        # and stats_row_level to also be on
autovacuum_naptime = 1min               # time between autovacuum runs
autovacuum_vacuum_threshold = 250       # min # of tuple updates before
                                        # vacuum
autovacuum_analyze_threshold = 150      # min # of tuple updates before
                                        # analyze
autovacuum_vacuum_scale_factor = 0.2    # fraction of rel size before
                                        # vacuum
autovacuum_analyze_scale_factor = 0.1   # fraction of rel size before
                                        # analyze
autovacuum_freeze_max_age = 200000000   # maximum XID age before forced vacuum
                                        # (change requires restart)
autovacuum_vacuum_cost_delay = -1       # default vacuum cost delay for
                                        # autovacuum, -1 means use
                                        # vacuum_cost_delay
autovacuum_vacuum_cost_limit = -1       # default vacuum cost limit for
                                        # autovacuum, -1 means use
                                        # vacuum_cost_limit

So as you can see we changed a couple values by a small amount in order to make it more aggressive. Paired with this I wrote a quick one line script that would run Analyze. The analyze should update the statistics for the table which makes postgresql's rather effective query planner work at its best. The following one liner is the script, which I had put into /sbin/:

 psql -c "analyze" -q -S -d boxicom -U boxicom > /usr/vault/log/pganalyzetask.log

This was paired with an entry in the crontab using crontab -e:
*   4,20   *   *   * /sbin/pganalyzetask

In this case I picked 4am and 8pm due to the fact that the customer had constant activity during normal business hours and again during a certain period at night. This ideally is going to update the stats after those periods allowing his scripts to perform well on each subsequent time period. 

Further tweaking is going to be required to get these servers running just right, but this is a start anyway. I'll likely be diving back into the postgresql.conf file to see what other changes we can make (based on the hardware available of course). Hopefully we'll continue to see performance improvements on these postgresql databases.

No comments:

Post a Comment