Removing old SQL records

Lean and mean is the essence of a good data-set. Applying such strategy also encompasses removing old data in order to prevent data-set and associated indexes to grow unnecessarily. This is good practice not only to prevent wasting resources but also to keep healthy query performances against the data-set itself.

The best approach to remove old data is to run a periodical cron job. As deletion of records is a stressful operation for the RDBMS, the suggestion is to identify a quiet period of the day for your specific scenario and spread all such maintenance operations among its duration. It is hence generally not recommended to perform such operations very frequently and spreaded throughout the day (ie. every 5 minutes or every hour).

An example of the script that can be run out of the crontab follows. It deletes data which is older than 10 days (or 864000 seconds). It's tailored on MySQL but underlying concepts apply to other RDBMS aswell.

 #!/bin/bash

 MYSQL="/usr/local/mysql/bin/mysql"

 echo "DELETE FROM acct WHERE stamp_inserted < DATE_SUB(NOW(), INTERVAL 864000 SECOND)" | $MYSQL -u pmacct -p<password> -D pmacct

If an auto-increment index is in use, when rows are deleted is the best moment to check whether it's going to wrap up any soon - and reset if required. An example follows and is based on MySQL and a 4-bytes auto-increment id field but concepts can be easily extended.

 #!/bin/bash

 MAX_VALUE=4000000000
 MYSQL="/usr/local/mysql/bin/mysql"

 MAX_ACCT=`echo "SELECT MAX(id) AS id FROM acct" | $MYSQL -N -u pmacct -p<password> -D pmacct`

 if [ "$MAX_ACCT" -gt "$MAX_VALUE" ]; then
        echo "ALTER TABLE acct AUTO_INCREMENT=1" | $MYSQL -N -u pmacct -p<password> -D pmacct
 fi

A slight variation on the theme is to use periodicly (ie. hourly, daily or weekly) rotating SQL tables, called dynamic tables in pmacct jargon and enabled via the sql_table_schema directive (which points to a valid file containing a CREATE TABLE statement) and a configuration snippet like the one below:

sql_history: 1h
sql_history_roundoff: h
sql_table: acct_%Y%m%d_%H
sql_table_schema: /path/to/acct.schema

In such a scenario removal of old data can be as easy as dropping entire table(s). The process is also extremely lightweight because table(s) and index(es) are totally removed instead of being modified to reflect the reduced data-set.

pmacct Wiki: RemovingOldSQLRecords (last edited 2010-05-03 18:39:42 by paolo)