Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when migrating existing fields into pgcrypto fields #246

Closed
fisle opened this issue Jul 31, 2020 · 7 comments
Closed

Error when migrating existing fields into pgcrypto fields #246

fisle opened this issue Jul 31, 2020 · 7 comments
Labels

Comments

@fisle
Copy link

fisle commented Jul 31, 2020

Hey,

I am upgrading my Django application from 2.2.9 to Django 3.x and ran into errors on migrations that are converting normal fields (like CharField) into pgcrypto fields (CharPGPSymmetricKeyField in this case).

I'm not sure if this is a bug in Django, psycopg2 or django-pgcrypto-fields. I'm hoping you could assist me.
These migrations will work when running Django 2.2.9 but when I update Django package, they won't work any more.

I've managed to reproduce this in a fresh Django app.

I'm using Ubuntu 19.10, running Python 3.8.5 in a Docker container with the following packages:
django==3.0 (or any 3.x)
django-pgcrypto-fields==2.5.1
psycopg2==2.8.5

PostgreSQL is 11.1 and pgcrypto extension is installed.

Steps to reproduce:

  1. Create a model with normal fields:
class CryptoTest(models.Model):
    name = fields.CharField(blank=True, null=True, max_length=255)
  1. Create migration: python manage.py makemigrations

  2. Modify model:

class CryptoTest(models.Model):
    name = fields.CharPGPSymmetricKeyField(blank=True, null=True, max_length=255)
  1. Create migration: python manage.py makemigrations
  2. Run migrations: python manage.py migrate

Traceback

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.DatatypeMismatch: column "name" cannot be cast automatically to type bytea
HINT:  You might need to specify "USING name::bytea".


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    main()
  File "./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/commands/test.py", line 23, in run_from_argv
    super().run_from_argv(argv)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 328, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 369, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/commands/test.py", line 53, in handle
    failures = test_runner.run_tests(test_labels)
  File "/usr/local/lib/python3.8/site-packages/django/test/runner.py", line 684, in run_tests
    old_config = self.setup_databases(aliases=databases)
  File "/usr/local/lib/python3.8/site-packages/django/test/runner.py", line 604, in setup_databases
    return _setup_databases(
  File "/usr/local/lib/python3.8/site-packages/django/test/utils.py", line 169, in setup_databases
    connection.creation.create_test_db(
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/creation.py", line 67, in create_test_db
    call_command(
  File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 168, in call_command
    return command.execute(*args, **defaults)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 369, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 83, in wrapped
    res = handle_func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/commands/migrate.py", line 231, in handle
    post_migrate_state = executor.migrate(
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/executor.py", line 117, in migrate
    state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
    state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/executor.py", line 245, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/migration.py", line 124, in apply
    operation.database_forwards(self.app_label, schema_editor, old_state, project_state)
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/operations/fields.py", line 249, in database_forwards
    schema_editor.alter_field(from_model, from_field, to_field)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/schema.py", line 564, in alter_field
    self._alter_field(model, old_field, new_field, old_type, new_type,
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/postgresql/schema.py", line 147, in _alter_field
    super()._alter_field(
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/schema.py", line 710, in _alter_field
    self.execute(
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/schema.py", line 142, in execute
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: column "name" cannot be cast automatically to type bytea
HINT:  You might need to specify "USING name::bytea".

If I was to create my model using pgcrypto fields from the start, it works - but the model in my application was converted later on in the development and now these migrations are in the production aswell.

If you feel like I haven't given enough details, feel free to ask and I will try to explain more.

Thanks in advance for any help.

@9mido
Copy link

9mido commented Aug 8, 2020

Could it be because there is already previous unencrypted data in the column/field you are now trying to encrypt and it won't let you? If so, maybe one approach for a package PR would be to encrypt the entire column of previous unencrypted data then for every new row of data in that newly configured column, encrypt it normally? It may be that this package is only setup to encrypt only fresh new columns without any previous data in it.

@fisle Are you using a docker volumes or bind mounts to store the data? Maybe share your docker setup. Run some inspect commands on the volumes/containers and share (or make an image and share the docker hub link). Can you also comment what is happening with your previous unencrypted data in the column you are trying to encrypt? Try setting it up with fresh blank data and report back what happens. Did you run into this error before you updated Django or is this only happening with the new updates?

I agree that this is a big issue and should definitely be solved. It is essential for a site/database administrator to be able to make updates to their code especially if an auditor comes along and says your data is not compliant for security encryption standards. The previous data in the column should be encrypted also to protect all of those user's PII data.

Another PR approach might be https://docs.djangoproject.com/en/3.0/howto/writing-migrations/

Or adding a new field, encrypting old data into it, and removing old field.

Contributors @peterfarrell @Ian-Foote @meshy @adam-thomas @kevinetienne @maxpeterson @Minglee01 @U039b

Any advice or insight?

@peterfarrell
Copy link
Collaborator

@9mido @fisle This library does not take care of encrypting unencrypted data in a field when you migrate the same field name. I suggest you create a new field with PGcrypto, write your own forwards migration to copy unencrypted data from field A -> B, delete the old field, and then rename the new field to what you want.

@9mido
Copy link

9mido commented Aug 10, 2020

@peterfarrell I was able to figure it out using the steps mentioned above. But is there a chance this feature will ever make its way to the library so we do not have to do the custom data migrations?

Maybe this should go under the limitations section of the docs and provide an example of how to deal with this type of situation.

@peterfarrell
Copy link
Collaborator

@9mido I don't believe this will ever become part of the library as Django does not provide functionality to migrate from one field type to another type automatically. It would be impossible for the library to know how to reliably translate one field to a pgcrypto field and back. For example, change a models.CharField -> models.IntegerField would fail in Django if there was any data that couldn't be cast into integer by the database.

However, a PR is welcome if you want to add to the documentation a section on how to migrate unencrypted pre-existing fields to encrypted fields using a forwards() migration.

@fisle
Copy link
Author

fisle commented Aug 18, 2020

Hey, thanks for looking at the issue.

What I'm most puzzled with is that the migration fails even without any existing data in the database.
Also that this used to work previously when running on Django 2.x.

Is this a 'feature' that was never supposed to be there?

I guess I'm gonna have to do some dirty manual work to fix my existing migrations 🤔

@kevinetienne
Copy link
Collaborator

kevinetienne commented Aug 18, 2020

tldr: everything quoted here is the same as #246 (comment) and #246 (comment)

Hi @fisle and @9mido and thank you for taking the time to explain what you did. It looks like you're right and we might have had an unexpected feature! I would be careful however if you had a previous project where you were able to cast a field from a non encrypted to an encrypted as data integrity is not guaranteed.

Migration were never really supported except to create/activate the pgcrypto extension in postgres. Migrating data is complicated as there might be a few things to consider such as:

  • the shape of the data
  • validations/constrains done on the table/model/form and anywhere else
  • ..

The library has no way of doing all these guesses or to take all these decisions. I think it would be interesting to see if an extension like this exists or if there are any recommendations made available in the django documentations.

From what you are describing I can suggest three ways to solve it:

  1. when there's no data in the db it should be possible to start from scratch by recreating the db
  2. when there's no data in the table it should be possible to recreate the table
  3. when there's data or if the project is shared it should be possible to do it in a non destructive way

No data is in the db

  • drop the database
  • squash the migrations
  • recreate the db

No data in the table

  • create a migration to drop the table
  • create a new migration for the table with the encrypted field
  • optionally squash the migration

Migrating in a non-destructive way

The goal here is to be able to use to legacy field if something goes wrong.

1st step:

  • create new field
  • when data is saved write both to legacy and new field
  • create a data migration to cast data from legacy field to new field
  • check existing data from legacy and new field are the same if possible

2nd step:

  • rename the fields and drop legacy fields
  • optionally update the code to use only the new field

@peterfarrell
Copy link
Collaborator

@kevinetienne Thank you for the concise tl:dr summary. I have used it as a basis and updated the README documentation with a section on migrating existing unencrypted data to pgcrypto fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants