The problem is the site was/is never actually "crashing". The database tables were locking up, but the database service was never actually failing, neither was any other service on the server.
You could try Zenoss for monitoring, there's a free core version and you can setup SQL as test and alert if the time to run is over a threshold, it will also keep graphs of the response times etc.