Break report expiration into batches
The problem we are facing with reports not being deleted and slowed down report importing is concurrency. If you want to delete many records in one transaction, the RDBMS must lock whole table and this blocks further inserts or updates. The same problem is on PostgreSQL.
You can indeed use any SQL query which will do the job of getting rid of the records. Inner join is something that is exceptionally problematic on MySQL, but it’s super slow on PostgreSQL as well.
The recommended solution to this long-standing issue is to break the deletion into chunks. Each chunk needs to be just hundreds of thousands of records, then do sleep (dozens of seconds or minutes) so the locked transactions have a chance to complete. You need to find the right balance of deletion of records so you actually keep up with incoming records.
To do that, we really need to use joins because whole chain of joined records must be deleted in a single transaction (we don’t have ON DELETE CASCADE set for those). Or if MySQL supports cascade deletion, it is worth setting that and then it is as easy as deleting batch of records from reports table but this was problematic with Rails AFAIK.
Currently the SQL statements are:
DELETE FROM “logs” WHERE “logs”.“report_id” IN (SELECT “reports”.“id” FROM “reports” WHERE “reports”.“type” IN (‘ConfigReport’) AND (reports.created_at < ‘2018-05-10 13:36:21’)) DELETE FROM “messages” WHERE (id not IN (SELECT DISTINCT message_id FROM “logs”)) DELETE FROM “sources” WHERE (id not IN (SELECT DISTINCT source_id FROM “logs”)) DELETE FROM “reports” WHERE “reports”.“type” IN (‘ConfigReport’) AND (reports.created_at < ‘2018-05-10 13:36:21’)