Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Postgres 13 #3451

Merged
merged 10 commits into from
Jan 13, 2022
Merged

Upgrade to Postgres 13 #3451

merged 10 commits into from
Jan 13, 2022

Conversation

rajadain
Copy link
Member

@rajadain rajadain commented Jan 6, 2022

Overview

Upgrades Postgres to 13.4, the latest available on RDS, from 9.6.22. Necessary as AWS is going to hard switch to 12 come January 18. This PR upgrades Postgres in development and staging. It will be upgraded on production in #3444.

Connects #3379

Demo

ssh mmw-stg "psql -h database.service.mmw.internal -d modelmywatershed -U modelmywatershed -c 'SELECT version();'"
Password for user modelmywatershed: ***

                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
(1 row)

image

Notes

Here's the full steps that were taken for the performing this ugprade on staging:

General Setup for Working with Deployments

  1. Have a virtualenv with Python 3.5+ (preferably 3.8 or 3.9) for working with the deployment code in ~/deployment. Activate it.
  2. Install all the deployment requirements pip install -r requirements.txt
  3. Ensure you have the mmw-stg AWS profile configured correctly
  4. Ensure you have access to the mmw-stg.pem file for SSHing in to Bastion
  5. Ensure you have a staging.yml deployment configuration file available locally, akin to the default.yml used for deployments in the fileshare.

Take a Database Snapshot

(this was not done for staging, but should be done for production)

  1. Log in to AWS using the mmw-stg credentials in LastPass
  2. Go to RDS, select Snapshots → Take snapshot, and name it "pre-postgres-upgrade"

Upgrade Bastion

The bastion currently runs on 16.04 Xenial, whereas everything else has been upgraded to 20.04 Focal.

  1. Generate a plan for the data plane upgrade and save it as JSON:
    python mmw_stack.py launch-stacks --aws-profile "mmw-stg" --mmw-profile "staging" --mmw-config-path ~/azavea/model-my-watershed/scratch/staging.yml --data-plane --print-json > data-plane.json
  2. In that plan, find the default value of BastionHostAMI:
    jq '.Parameters.BastionHostAMI.Default' data-plane.json
    "ami-06992628e0a8e044c"
  3. Log in to AWS using the mmw-stg credentials in LastPass
  4. Go to CloudFormation, select DataPlane → Change sets → Create change set
  5. Select "Use current template" and update the BastionHostAMI with the value from above
  6. Set RDSParameterGroupName to mmw-postgres96
  7. Click Next, don't change any Stack options, click "Create change set" after reviewing it
  8. In the description box, write "Upgrading Bastion to Ubuntu 20.04", and create it
  9. Once it is created, "Execute" the change set. Choose to execute it immediately, than in a scheduled window
  10. Observe the change set being executed. In case of failures, you'll see reasons for them in CloudFormation's DataPlan → Events. Fix them and try again.
  11. Once the new Bastion is up, ensure you can SSH in to it using the mmw-stg.pem file
  12. From within the Bastion, try SSHing in to an app or worker VM to ensure that still works
  13. Install the postgres-client:
    sudo apt install postgres-client
  14. Create a new tmux session for psql:
    tmux new-session -A -s psql
  15. Start psql in the tmux session, and then detach from it using Ctrl+B, D
    psql -h database.service.mmw.internal -d modelmywatershed -U modelmywatershed

Upgrade Postgres to 9.6.23

This is the latest version of Postgres 9.6, and will make subsequent upgrades easier.

  1. Go to RDS → Databases → dd1gc3iuv75ep7t (or whatever is the database ID) → Modify
  2. Set DB engine version to 9.6.23, DB Instance class to db.t3.medium
  3. Click "Continue"
  4. Select "Apply Immediately", and click "Modify DB Instance"
  5. To see the progress, go to dd1gc3iuv75ep7t (or whatever is the database ID) → Logs & Events, and sort the Logs by "Last Written" in descending order
  6. Open the most recent log to see the upgrade
  7. Once the RDS instance is up, SSH in to the Bastion, and attach to the psql tmux instance
    tmux new-session -A -s psql
  8. In psql, check the installed version of Postgres to ensure it is 9.6.23:
    SELECT version();
  9. In psql, vaccuum the database
    SET maintenance_work_mem='2GB';
    \timing
    VACUUM (FREEZE, ANALYZE, VERBOSE);
    This took 33 minutes on staging.
  10. Upgrade extensions to the very latest they can be:
    SELECT * FROM pg_available_extensions WHERE installed_version IS NOT NULL;
    
    SELECT * FROM pg_available_extension_versions WHERE name in ('postgis', 'pg_trgm', 'plpgsql');
    
    ALTER EXTENSION pg_trgm UPDATE;
    
    ALTER EXTENSION postgis UPDATE;
    
    SELECT PostGIS_Extensions_Upgrade();
    
    DROP EXTENSION postgis_raster;

Upgrade Postgres to 12.8

This is the higher version we can go with PostGIS 2.5, which is the highest version of PostGIS supported by Postgres 9.6.

  1. Go to RDS → Parameter Groups → Create parameter group, and make one based off the default Postgres 12 group and call it mmw-postgres12
  2. Go to RDS → Databases → dd1gc3iuv75ep7t (or whatever is the database ID) → Modify
  3. Set DB engine version to 12.8, DB parameter group to mmw-postgres12
  4. Click "Continue"
  5. Select "Apply Immediately", and click "Modify DB Instance"
  6. To see the progress, go to dd1gc3iuv75ep7t (or whatever is the database ID) → Logs & Events, and sort the Logs by "Last Written" in descending order
  7. Open the most recent log to see the upgrade output
  8. In psql, check the installed version of Postgres to ensure it is 12.8:
    SELECT version();
  9. In psql, vaccuum the database
    SET maintenance_work_mem='2GB';
    \timing
    VACUUM (FREEZE, ANALYZE, VERBOSE);
    This took 3.5 minutes on staging.
  10. Upgrade extensions to the very latest they can be:
    SELECT * FROM pg_available_extensions WHERE installed_version IS NOT NULL;
    
    SELECT * FROM pg_available_extension_versions WHERE name in ('postgis', 'pg_trgm', 'plpgsql');
    
    ALTER EXTENSION pg_trgm UPDATE;
    
    ALTER EXTENSION postgis UPDATE;
    
    SELECT PostGIS_Extensions_Upgrade();
    
    DROP EXTENSION postgis_raster;

Upgrade Postgres to 13.4

Now with PostGIS at 3.1, we can upgrade to Postgres 13.4.

  1. Go to RDS → Parameter Groups → Create parameter group, and make one based off the default Postgres 13 group and call it mmw-postgres13
  2. Edit it so that log_min_duration_statement has a value of 500 (to match the mmw-postgres96 parameter group)
  3. Go to RDS → Databases → dd1gc3iuv75ep7t (or whatever is the database ID) → Modify
  4. Set DB engine version to 13.4, DB parameter group to mmw-postgres13
  5. Click "Continue"
  6. Select "Apply Immediately", and click "Modify DB Instance"
  7. To see the progress, go to dd1gc3iuv75ep7t (or whatever is the database ID) → Logs & Events, and sort the Logs by "Last Written" in descending order
  8. Open the most recent log to see the upgrade output
  9. In psql, check the installed version of Postgres to ensure it is 12.8:
    SELECT version();
  10. In psql, vaccuum the database
    SET maintenance_work_mem='2GB';
    \timing
    VACUUM (FREEZE, ANALYZE, VERBOSE);
    This took 3.5 minutes on staging.
  11. Upgrade extensions to the very latest they can be:
    SELECT * FROM pg_available_extensions WHERE installed_version IS NOT NULL;
    
    SELECT * FROM pg_available_extension_versions WHERE name in ('postgis', 'pg_trgm', 'plpgsql');
    
    ALTER EXTENSION pg_trgm UPDATE;
    
    ALTER EXTENSION postgis UPDATE;
    
    SELECT PostGIS_Extensions_Upgrade();
    
    DROP EXTENSION postgis_raster;

Downgrade RDS Instance to t3.micro

Since the instance was set to t2.micro before, we reduce it from t3.medium to t3.micro.

  1. Go to RDS → Databases → dd1gc3iuv75ep7t (or whatever is the database ID) → Modify
  2. Set the RDSInstanceType to db.t3.micro
  3. Click "Continue"
  4. Select "Apply Immediately", and click "Modify DB Instance"
  5. To see the progress, go to dd1gc3iuv75ep7t (or whatever is the database ID) → Logs & Events, and sort the Logs by "Last Written" in descending order
  6. Open the most recent log to see the upgrade output

Resolve CloudFormation Template

Now, with everything updated manually, run the CloudFormation data plane plan to ensure everything matches:

python mmw_stack.py launch-stacks --aws-profile "mmw-stg" --mmw-profile "staging" --mmw-config-path ~/azavea/model-my-watershed/scratch/staging.yml --data-plane

Testing Instructions

RDS has only recently added support for PostGIS 3.1 as of
October 2021: https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-rds-postgresql-postgis-3-1/

Unfortunately, PostGIS 3.1 and 3.0 kept failing in local
provisionig for both Postgres 12 and 13. So we're going with
PostGIS 3.2 for now, although we'll keep all our features
to PostGIS 3.1 support.
This is what we'll be using in staging / production.
Had been previously missed
We have been on 9.6.22 since #3149
This prepares us for a full jump to the latest 13.4.
@rajadain rajadain added the GEN Funding Source: General label Jan 6, 2022
@rajadain rajadain requested a review from jwalgran January 6, 2022 22:41
@rajadain
Copy link
Member Author

rajadain commented Jan 6, 2022

CI is failing with this message:

TASK [model-my-watershed.postgresql-support : Install client API libraries for PostgreSQL] ***
Thursday 06 January 2022  23:05:00 +0000 (0:00:00.645)       0:00:20.647 ****** 
fatal: [app]: FAILED! => {"cache_update_time": 1627494653, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"      install 'libpq5=13.*.pgdg20.04+1' 'libpq-dev=13.*.pgdg20.04+1'' failed: E: Packages were downgraded and -y was used without --allow-downgrades.\n", "rc": 100, "stderr": "E: Packages were downgraded and -y was used without --allow-downgrades.\n", "stderr_lines": ["E: Packages were downgraded and -y was used without --allow-downgrades."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nSuggested packages:\n  postgresql-doc-13\nThe following packages will be DOWNGRADED:\n  libpq-dev libpq5\n0 upgraded, 0 newly installed, 2 downgraded, 0 to remove and 143 not upgraded.\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Suggested packages:", "  postgresql-doc-13", "The following packages will be DOWNGRADED:", "  libpq-dev libpq5", "0 upgraded, 0 newly installed, 2 downgraded, 0 to remove and 143 not upgraded."]}
Reading package lists...
Building dependency tree...
Reading state information...
Suggested packages:
  postgresql-doc-13
  The following packages will be DOWNGRADED:
  libpq-dev libpq5
  0 upgraded, 0 newly installed, 2 downgraded, 0 to remove and 143 not upgraded.

which I am unable to reproduce locally. Going to take a look on Jenkins.

@rajadain
Copy link
Member Author

rajadain commented Jan 7, 2022

Fixed CI by logging in to civicci01 as jenkins and then running vagrant box update && vagrant destroy -f for both the PR builder and develop builder, and retrying the build.

The hi res streams data is very large, and often fails import into
the default 30GB disk. This plugin allows for setting the disk size
in the Vagrant file, and we add a shell provisioner to increase the
logical volume and file system to use it.

The README is updated accordingly.
@rajadain
Copy link
Member Author

rajadain commented Jan 9, 2022

I am noticing that importing the development data with Postgres 13 takes ~33% more time. Provisioning a new services VM and importing all the development data using this series of commands:

vagrant destroy -f services &&
vagrant up services &&
vagrant ssh app -c 'cd /vagrant && ./scripts/aws/setupdb.sh -bc' &&
vagrant ssh app -c 'cd /vagrant && ./scripts/aws/setupdb.sh -dmpq' &&
vagrant ssh app -c 'cd /vagrant && ./scripts/aws/setupdb.sh -sS'

Takes about 3h4m on develop, and around 4h1m on this branch (about 83% of this time is spent on importing high resolution stream data). Since imports are done infrequently, this may not be too bad. But this could also be an indication of other performance issues, either due to lack of optimization or regression in new Postgres.

We have to do the upgrade as part of AWS requirements, so I'm not going to look further into this here, but making a note of it.

Copy link
Contributor

@jwalgran jwalgran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. As always it is a pleasure to review PRs that are so logically split into incremental, progressive commits.

Comment on lines +94 to +95
sudo lvextend -l +100%FREE /dev/ubuntu-vg/ubuntu-lv
sudo resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a documentation reference for this? This appears to be some advances Linux file system configuration that I have not seed before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This was cobbled together from various sources including https://stackoverflow.com/a/66117279/6995854, https://medium.com/@kanrangsan/how-to-automatically-resize-virtual-box-disk-with-vagrant-9f0f48aa46b3, and https://marcbrandner.com/blog/increasing-disk-space-of-a-linux-based-vagrant-box-on-provisioning/, with some trial and error to get the minimum set of instructions. I did not know this at the time, but we've done something similar in https://github.com/azavea/cicero/pull/1617

From what I understand these commands are Ubuntu specific, with CentOS and the like using other utilities (e.g. xfs_growfs). Essentially, by the time we get to the shell provisioner, the physical disk size is set to 64GB as specified in the Vagrantfile. The logical volume is still stuck at ~30GB, possibly initialized by the base box we're using. lvextend extends the logical volume to take up all the free space in the physical volume. resize2fs resizes the file system to extend to all the logical volume.

I'll add this explanation to the commit message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commands are not ubuntu specific or terribly bleeding-edge. lvextend is a standard part of the lvm2 tools, and resize2fs will work with all variants of the ext file system. xfs_growfs is for xfs; Filesystems generally implement their own tools to expand the filesystem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so xfs_growfs vs resize2fs are because of the file system, not because of the distro. Thanks for pointing that out!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment explaining this in 4356133.

@jwalgran jwalgran assigned rajadain and unassigned jwalgran Jan 13, 2022
Since we use Postgres 13, we want to use libpq-dev 13 as well.
During provisioning, at some point it gets upgraded to 14, and
then reprovisioning fails since downgrades are not allowed by
default. This enables them to ensure we stay on the same version.
@rajadain rajadain merged commit 7d793c3 into develop Jan 13, 2022
@rajadain rajadain deleted the tt/upgrade-postgres branch January 13, 2022 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GEN Funding Source: General
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants