Foreman should have a mechanism for stripping sensitive information out of the database for debugging and reporting purposes

Added by Martin Jackson over 8 years ago. Updated over 8 years ago.

There should be a way to programmatically remove sensitive/identifying information from the foreman database, but still allow it to be used to model, for example, unfortunate query behaviors (such as N+1 queries), etc.

We have developed a process for doing this, which involves a script and a second machine. The process is destructive to the database, so we wouldn't want to run this script on the production instance.

The outline of the process is as follows:

• Export a database snapshot
• Load the snapshot on a “safe” test machine
• Change the production access restrictions (i.e. change the database password for the foreman user in the DB)
• Perform the following data transformations:
o Delete authentication sources that are not internal (i.e. LDAP integrations)
o Set the admin user password to “changeme”
o Delete all non-admin users
o Obfuscate all non-smartclass parameters to the string “Overridden”
o Set all Foreman global settings to their defaults
o Change all parameter overrides to obfuscated strings that are syntactically legal but not useful.
o Purge all user sessions
o Purge all audit records

What the database still does have is all of our regular hostnames, fact, and report data.

I’ve validated that the resulting database can be used with a standalone application instance, and can be used (for example) to explore expensive database queries, fact queries, etc.

Files 777 Bytes Python3 script to programmatically execute SQL statements (built to run against rh-python34 SCL) Martin Jackson, 07/21/2015 10:46 AM
sanitize_foreman.json sanitize_foreman.json 2.16 KB JSON file containing SQL statements to run, in sequence, to "sanitize" foreman database Martin Jackson, 07/21/2015 10:47 AM
Updated by Dominic Cleal over 8 years ago

  • Category set to Database
Updated by Ohad Levy over 8 years ago

I could think of a few more:

  • audits
  • compute resources
  • report content? (e.g. diff)

