Wednesday, March 21, 2012

Exchange Maintenance

I apologize for having disappeared for a short bit. I had some military training that pulled my attention from this blog for a short time. However, jumping back into the swing I'd like to make a short note about Exchange maintenance policy. For those of you that are using a backup solution utilizing VSS to complete file backups of the information store, this is of particular importance.

I have found that a lot of our clients have the maintenance cycle running as a daily occurrence. The logic of this move is that having the defrag occur that often is going to make the database more efficient and thus keep performance up for end users. The trouble is that if you attempt to use any backup solution that scans at the block level you are going to see a lot of change data. The consequence of this is that your backups will take longer than they should because you're sending over all these changed blocks that really aren't new data, but are flagged as such due to the shuffle. More often may not always be better.

A good real world example of this is a customer I had recently. They had a fairly small information store of around 50-60GB and were seeing backup jobs (using the 3X backup appliance) taking as long as 5-6 hours. Upon investigation it was found that each day the Exchange client was transmitting upwards of 4 GB of change data. I worked with the customer to reduce the scheduling down to twice a week over night (not conflicting with the backup window). The results of the change were staggering. After the first day we saw a 20% improvement, and after a week the job was down to 2 hours (60%).

Overall I suggest attempting to run maintenance less often if possible. Smaller deltas should speed up your backup solution, and cause less saturation on your infrastructure from the operation.

Ryan Koch
3X Systems
ryan.koch@3x.com

Friday, March 2, 2012

Highly volatile postgresql

I've had a few clients that have had some issues with postgresql and performance problems taking place over time due to their databases being highly transactional. Mind you these customers are really dealing with a single database that has grown in size, as opposed to several on a single server. It had gotten to the point where my colleague Rich Keggans (from 3X Systems also) had been scheduling a monthly cluster to be run due to how bad the performance would become.

So in response we tried a little experiment involving tweaking the auto-vacuum settings and creating an automated task. I felt the need to share it with you, as we've seen some rather promising results so far as performance degradation seems to have ceased in two instances so far.

For the auto-vacuum we really just made the settings a bit more aggressive. In the case we tested this the customer's database had a very high level of transactions taking place. The chance for bloat to build up here is pretty high so we made the settings reflect the following in order to fight it:

autovacuum = on                         # enable autovacuum subprocess?
                                        # 'on' requires stats_start_collector
                                        # and stats_row_level to also be on
autovacuum_naptime = 1min               # time between autovacuum runs
autovacuum_vacuum_threshold = 250       # min # of tuple updates before
                                        # vacuum
autovacuum_analyze_threshold = 150      # min # of tuple updates before
                                        # analyze
autovacuum_vacuum_scale_factor = 0.2    # fraction of rel size before
                                        # vacuum
autovacuum_analyze_scale_factor = 0.1   # fraction of rel size before
                                        # analyze
autovacuum_freeze_max_age = 200000000   # maximum XID age before forced vacuum
                                        # (change requires restart)
autovacuum_vacuum_cost_delay = -1       # default vacuum cost delay for
                                        # autovacuum, -1 means use
                                        # vacuum_cost_delay
autovacuum_vacuum_cost_limit = -1       # default vacuum cost limit for
                                        # autovacuum, -1 means use
                                        # vacuum_cost_limit

So as you can see we changed a couple values by a small amount in order to make it more aggressive. Paired with this I wrote a quick one line script that would run Analyze. The analyze should update the statistics for the table which makes postgresql's rather effective query planner work at its best. The following one liner is the script, which I had put into /sbin/:

 psql -c "analyze" -q -S -d boxicom -U boxicom > /usr/vault/log/pganalyzetask.log

This was paired with an entry in the crontab using crontab -e:
*   4,20   *   *   * /sbin/pganalyzetask

In this case I picked 4am and 8pm due to the fact that the customer had constant activity during normal business hours and again during a certain period at night. This ideally is going to update the stats after those periods allowing his scripts to perform well on each subsequent time period. 

Further tweaking is going to be required to get these servers running just right, but this is a start anyway. I'll likely be diving back into the postgresql.conf file to see what other changes we can make (based on the hardware available of course). Hopefully we'll continue to see performance improvements on these postgresql databases.