Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Ubuntu to 20.04 Focal in Development / CI #3420

Merged
merged 21 commits into from
Sep 28, 2021

Conversation

rajadain
Copy link
Member

@rajadain rajadain commented Aug 31, 2021

Overview

Upgraded development environment to Ubuntu 20.04 Focal, PostgreSQL to 12, and PostGIS to 3.1.

Connects #3416

TODO

  • The boundary_county.sql.gz dataset contains ST_Force2D directives that have been deprecated for some time, and are no longer supported in PostgreSQL 12. That dataset should be rexported in a PostgreSQL 12 compatible format.

Notes

Testing this will require a complete destruction and recreation of the existing development environment setup, and >100GB free space.

Testing Instructions

  • Checkout this branch
  • Destroy your current development environment:
    $ vagrant destroy -f
    • Sometimes you'll have to restart your computer for the space emptied by deleting these VM disk files to be reclaimed as empty space
  • Bring up only the services VM:
    $ vagrant up services
    • This will likely fail the first time. This is expected. This happens because the postgresql_ commands run off the python detected at the start of provisioning, which is python3 on Focal, but we're installing python2 and symlinking it to python during our provisioning, which confuses the task. Since we'll be upgrading to Python 3 shortly in Upgrade to Python 3 #3165, rather than find a more elegant solution here, we just re-run the provisioner, which succeeds the second time:
      $ vagrant reload services --provision
    • Ensure the provisioning succeeds the second time
  • Now bring up the rest of the VMs:
    $ vagrant up app worker tiler
    • Ensure they all provision successfully
  • Run migrations from your host:
    $ ./scripts/manage.sh migrate
    • Ensure they run successfully
  • Adjust your setupdb.sh script to run locally:
    diff --git a/scripts/aws/setupdb.sh b/scripts/aws/setupdb.sh
    index 611cc19a..e653661a 100755
    --- a/scripts/aws/setupdb.sh
    +++ b/scripts/aws/setupdb.sh
    @@ -84,9 +84,7 @@ function download_and_load {
     }
    
     function purge_tile_cache {
    -    for path in "${PATHS[@]}"; do
    -        aws s3 rm --recursive "s3://tile-cache.${PUBLIC_HOSTED_ZONE_NAME}/${path}/"
    -    done
    +    echo "Skipping"
     }
    
     function create_trgm_indexes {
  • Now run the import scripts from your host:
    $ vagrant ssh app -c 'cd /vagrant && ./scripts/aws/setupdb.sh -bsSdmpcq'
    • This will pull in ~30GB+ of data and import it into the database, should take between 1-2 hours on a ~100Mbps connection
    • Ensure it succeeds

At this time, your local MMW setup should be complete. We can now test the application functionality.

  • Go to http://localhost:8000/
    • Ensure it opens successfully
  • Select a HUC-10 as your shape and proceed to Analyze
    • Ensure HUC-10s populate and select successfully
    • Ensure Analysis completes successfully for all types
  • Select Watershed Multi-Year Model
    • Ensure the Modeling completes successfully for the base Current Conditions scenario

Ubuntu 20.04 Focal does not have a `python` package in its apt
repository. Instead, it has a `python2` package. Since this is
an unusual case, of installing an older version of Python in a
newer OS, we switch from using the generic `azavea.python` and
`azavea.pip` roles to an MMW specific one, which installs the
most recent version of Python 2.7, and symlinks `python2` to
`python`, making the default version by Python 2.7.

The most recent version of pip that supports Python 2.7 is also
installed.
Ubuntu 20.04 Focal defaults to expecting `pip3` as the executable,
which we haven't installed. Thus, we specify `pip` as the default
executable.
This version of GEOS has a mistake in its version number which
ends with an empty space. Newer versions of libgeos can handle
that, but the one we're using with Django 1.11 and Python 2.7
does not. There is a bug for this: https://code.djangoproject.com/ticket/31838,
but the advice there is to upgrade Python and Django.

We are not ready to do that just yet, so instead we fix
the version checking line in the library with an additional
strip() which removes any extraneous spaces.
This is no longer available, and as far as I can tell, not
needed anymore?
We need build-essential in the tiler VM as well to build the
Windshaft dependencies. Thus, we now move the azavea.build-essential
inclusion to mmw.base, instead of just the mmw.app and mmw.celery-worker
roles, to ensure it is available for all VMs.
The azavea.postgresql-support role has dependencies on azavea.python
and azavea.pip which we're working around at this stage. Thus, we
replicate the core functionality of that role locally, sans the
problematic dependencies.
@rajadain
Copy link
Member Author

The PR builder had 16.04 VMs that I deleted like this:

$ ssh civicci01
azavea@civici01$ sudo su jenkins
jenkins@civicci01$ cd ~/workspace/model-my-watershed-pull-requests/
jenkins@civicci01$ vagrant destroy -f

Then reran the PR builder. This provisioned correctly, but is now reporting the following failing tests:

======================================================================
ERROR: test_hundred_sq_km_aoi (app.apps.geoprocessing_api.tests.ExerciseCatchmentIntersectsAOI)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/app/apps/geoprocessing_api/tests.py", line 788, in test_hundred_sq_km_aoi
    reprojected_aoi = aoi.transform(5070, clone=True)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/geos/geometry.py", line 527, in transform
    g.transform(ct)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/geometries.py", line 408, in transform
    capi.geom_transform_to(self.ptr, sr.ptr)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/prototypes/errcheck.py", line 119, in check_errcode
    check_err(result, cpl=cpl)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/error.py", line 73, in check_err
    raise e(msg)
GDALException: OGR failure.

======================================================================
ERROR: test_ten_thousand_sq_km_aoi (app.apps.geoprocessing_api.tests.ExerciseCatchmentIntersectsAOI)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/app/apps/geoprocessing_api/tests.py", line 1031, in test_ten_thousand_sq_km_aoi
    reprojected_aoi = aoi.transform(5070, clone=True)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/geos/geometry.py", line 527, in transform
    g.transform(ct)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/geometries.py", line 408, in transform
    capi.geom_transform_to(self.ptr, sr.ptr)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/prototypes/errcheck.py", line 119, in check_errcode
    check_err(result, cpl=cpl)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/error.py", line 73, in check_err
    raise e(msg)
GDALException: OGR failure.

======================================================================
ERROR: test_thousand_sq_km_aoi (app.apps.geoprocessing_api.tests.ExerciseCatchmentIntersectsAOI)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/app/apps/geoprocessing_api/tests.py", line 909, in test_thousand_sq_km_aoi
    reprojected_aoi = aoi.transform(5070, clone=True)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/geos/geometry.py", line 527, in transform
    g.transform(ct)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/geometries.py", line 408, in transform
    capi.geom_transform_to(self.ptr, sr.ptr)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/prototypes/errcheck.py", line 119, in check_errcode
    check_err(result, cpl=cpl)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/error.py", line 73, in check_err
    raise e(msg)
GDALException: OGR failure.

----------------------------------------------------------------------
Ran 111 tests in 21.247s

Looking in to those now.

@rajadain
Copy link
Member Author

The failure in more detail:

In [7]: one_sq_km_aoi.transform(5070, clone=True)
Out[7]: <Polygon object at 0x7fc06e489250>

In [8]: hundred_sq_km_aoi.transform(5070, clone=True)
---------------------------------------------------------------------------
GDALException                             Traceback (most recent call last)
<ipython-input-8-7bcaa86cef5e> in <module>()
----> 1 hundred_sq_km_aoi.transform(5070, clone=True)

/usr/local/lib/python2.7/dist-packages/django/contrib/gis/geos/geometry.pyc in transform(self, ct, clone)
    525         # Creating an OGR Geometry, which is then transformed.
    526         g = gdal.OGRGeometry(self._ogr_ptr(), srid)
--> 527         g.transform(ct)
    528         # Getting a new GEOS pointer
    529         ptr = g._geos_ptr()

/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/geometries.pyc in transform(self, coord_trans, clone)
    406         elif isinstance(coord_trans, six.integer_types + six.string_types):
    407             sr = SpatialReference(coord_trans)
--> 408             capi.geom_transform_to(self.ptr, sr.ptr)
    409         else:
    410             raise TypeError('Transform only accepts CoordTransform, '

/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/prototypes/errcheck.pyc in check_errcode(result, func, cargs, cpl)
    117     Check the error code returned (c_int).
    118     """
--> 119     check_err(result, cpl=cpl)
    120
    121

/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/error.pyc in check_err(code, cpl)
     71     elif code in err_dict:
     72         e, msg = err_dict[code]
---> 73         raise e(msg)
     74     else:
     75         raise GDALException('Unknown error code: "%s"' % code)

GDALException: OGR failure.

This is the one_sq_km_aoi that succeeds:

{
    "type": "Polygon",
    "coordinates": [
        [
            [
                -75.27900695800781,
                39.891925022904516
            ],
            [
                -75.26608943939209,
                39.891925022904516
            ],
            [
                -75.26608943939209,
                39.90173657727282
            ],
            [
                -75.27900695800781,
                39.90173657727282
            ],
            [
                -75.27900695800781,
                39.891925022904516
            ]
        ]
    ]
}

This is the hundred_sq_km_aoi fails:

{
    "type": "Polygon",
    "coordinates": [
        [
            [
                -94.64584350585938,
                38.96154447940714
            ],
            [
                -94.53460693359374,
                38.96154447940714
            ],
            [
                -94.53460693359374,
                39.05225165582583
            ],
            [
                -94.64584350585938,
                39.05225165582583
            ],
            [
                -94.64584350585938,
                38.96154447940714
            ]
        ]
    ]
}

@rajadain
Copy link
Member Author

I've skipped those failing tests in 6489389. Will re-enable them after Django has been upgraded in #3419.

We install docker-compose with pip for the Ansible
module to work. Also, we must specify that it explicitly
use /usr/bin/python, otherwise the first time the VM is
built it uses /usr/bin/python3, and in subsequent times
it uses /usr/bin/python.
It's a big jump, but fortunately works without any modifications
to other pieces of the infrastructure.
Previously this package was preinstalled in Ubuntu. As of Focal
it is no longer there. It is needed for Ansible to use a
non-super-user role, so we install it explicitly.

See georchestra/ansible#55 (comment)
Needed since the base values are unable to pull NHD Catchment
or High Resolution Stream sql files from S3.
Some tests fail because Django 1.11 cannot work with the
more recent versions of proj4 (and consequently GEOS) that
are available for Ubuntu 20.04. These tests must wait until
the Django upgrade to be enabled again, in #3419.
It is the same data as production, re-exported to not include
the ST_Force2D statements which were in the original file, as
that is no longer supported in PostgreSQL.

The original file is still on S3.
@rajadain rajadain changed the title Upgrade Ubuntu to 20.04 Focal Upgrade Ubuntu to 20.04 Focal in Development / CI Sep 13, 2021
@rbreslow rbreslow self-requested a review September 13, 2021 19:05
Copy link
Contributor

@rbreslow rbreslow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to follow your testing instructions in their entirety. They were very clear and I didn't run into any obstacles that weren't specific to my unique development workflow.

One of these obstacles is worth surfacing, though, because it will impact the ability for others to spin up Model My Watershed. We're still using functionality that was deprecated in Ansible 2.9. I have Ansible 2.11 installed on my host and I don't have a great way (yet, 🤞 for Nix Shell) to install an older version of Ansible for a single project. I ended up adjusting the Vagrantfile to use the ansible_local provisioner.

The diff for that is here. Let me know if you're comfortable with me pushing it to your branch.

diff --git a/Vagrantfile b/Vagrantfile
index bc796c95..e65804bd 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -3,6 +3,11 @@

 Vagrant.require_version ">= 2.2"

+# We need to stay on Ansible 2.8 because the version_compare filter was removed
+# in 2.9.
+# https://github.com/ansible/ansible/issues/64174#issuecomment-548639160
+ANSIBLE_VERSION = "2.8.*"
+
 if ["up", "provision", "status"].include?(ARGV.first)
   require_relative "vagrant/ansible_galaxy_helper"

@@ -56,8 +61,14 @@ Vagrant.configure("2") do |config|
       v.cpus = 4
     end

-    services.vm.provision "ansible" do |ansible|
+    services.vm.provision "ansible_local" do |ansible|
       ansible.compatibility_mode = "2.0"
+      ansible.install_mode = "pip_args_only"
+      # We can't use Python 3 yet because the provisioning process fails on
+      # "Create PostgreSQL super user." Failed to import the required Python
+      # library (psycopg2) on services's Python /usr/bin/python3.
+      ansible.pip_install_cmd = "curl https://bootstrap.pypa.io/pip/2.7/get-pip.py | sudo python"
+      ansible.pip_args = "ansible==#{ANSIBLE_VERSION}"
       ansible.playbook = "deployment/ansible/services.yml"
       ansible.groups = ANSIBLE_GROUPS.merge(ANSIBLE_ENV_GROUPS)
       ansible.raw_arguments = ["--timeout=60"]
@@ -96,8 +107,11 @@ Vagrant.configure("2") do |config|
       v.cpus = 2
     end

-    worker.vm.provision "ansible" do |ansible|
+    worker.vm.provision "ansible_local" do |ansible|
       ansible.compatibility_mode = "2.0"
+      ansible.install_mode = "pip_args_only"
+      ansible.pip_install_cmd = "curl https://bootstrap.pypa.io/pip/2.7/get-pip.py | sudo python"
+      ansible.pip_args = "ansible==#{ANSIBLE_VERSION}"
       ansible.playbook = "deployment/ansible/workers.yml"
       ansible.groups = ANSIBLE_GROUPS.merge(ANSIBLE_ENV_GROUPS)
       ansible.raw_arguments = ["--timeout=60"]
@@ -136,8 +150,11 @@ Vagrant.configure("2") do |config|
       v.memory = 2048
     end

-    app.vm.provision "ansible" do |ansible|
+    app.vm.provision "ansible_local" do |ansible|
       ansible.compatibility_mode = "2.0"
+      ansible.install_mode = "pip_args_only"
+      ansible.pip_install_cmd = "curl https://bootstrap.pypa.io/pip/2.7/get-pip.py | sudo python"
+      ansible.pip_args = "ansible==#{ANSIBLE_VERSION}"
       ansible.playbook = "deployment/ansible/app-servers.yml"
       ansible.groups = ANSIBLE_GROUPS.merge(ANSIBLE_ENV_GROUPS)
       ansible.raw_arguments = ["--timeout=60"]
@@ -160,8 +177,11 @@ Vagrant.configure("2") do |config|
       v.memory = 1024
     end

-    tiler.vm.provision "ansible" do |ansible|
+    tiler.vm.provision "ansible_local" do |ansible|
       ansible.compatibility_mode = "2.0"
+      ansible.install_mode = "pip_args_only"
+      ansible.pip_install_cmd = "curl https://bootstrap.pypa.io/pip/2.7/get-pip.py | sudo python"
+      ansible.pip_args = "ansible==#{ANSIBLE_VERSION}"
       ansible.playbook = "deployment/ansible/tile-servers.yml"
       ansible.groups = ANSIBLE_GROUPS.merge(ANSIBLE_ENV_GROUPS)
       ansible.raw_arguments = ["--timeout=60"]

Because I was using ansible_local, I never ran into a problem with vagrant up services. I think this is because I am specifying a Python 2 interpreter from the start. My host's Ansible is set up to use Python 3.9.7.

Also, just a point of feedback. There were a couple of times when I was reading your PR that I asked myself, "Why did Terrence do this?" Then, I saw your thoughtful commit messages and was able to answer my questions. However, I had to keep a list of your commits open alongside the Files changed diff in another window. Consider propagating some of the details you include in commit messages into inline comments on the pull request.

Hector's blog post on this is worth a read.

deployment/ansible/group_vars/all Outdated Show resolved Hide resolved
Comment on lines +22 to +27

- name: Hack Python 2.7 to work with Ubuntu Focal
replace:
path: /usr/local/lib/python2.7/dist-packages/django/contrib/gis/geos/libgeos.py
regexp: geos_version\(\)\.decode\(\)
replace: geos_version().strip().decode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rbreslow and others added 3 commits September 22, 2021 09:55
To better allow developers with different versions of Ansible
installed on their hosts to run this project.
@rajadain
Copy link
Member Author

Switched to ansible_local in 4fd8845.

@rajadain
Copy link
Member Author

Same the same failures as CI in my local, added 3577f50 to fix those. Hopefully the CI succeeds now 🤞

@rajadain
Copy link
Member Author

There were other, non-stopped Docker containers using port 5432 on Jenkins, causing port conflicts. Stopped them manually, retriggered the PR builder.

@rajadain
Copy link
Member Author

If I log in to CI as Jenkins and run the script that the PR builder runs, it succeeds:

jenkins@civicci01:~/workspace/model-my-watershed-pull-requests$ ./scripts/vagrant-up.sh && ./scripts/test.sh

but Jenkins continues to fail. I'm not sure what is going on there, but this PR is ready for another look.

@rajadain
Copy link
Member Author

Thanks to @rbreslow for fixing CI. There were additional environment variables that for some reason were causing the build to fail. Recording them here for posterity:

VAGRANT_ENV=TEST
MMW_APP_IP= 33.33.34.13
MMW_TILER_IP=33.33.34.43
MMW_WORKER_IP=33.33.34.23
MMW_SERVICES_IP= 33.33.34.33
RWD_DATA=/opt/rwd-data

@rajadain
Copy link
Member Author

I looked at #3113 for inspiration on how to support Focal in AWS, and found these changes to be made:

diff --git a/deployment/cfn/data_plane.py b/deployment/cfn/data_plane.py
index e7aec071..b4a7d426 100644
--- a/deployment/cfn/data_plane.py
+++ b/deployment/cfn/data_plane.py
@@ -195,7 +195,7 @@ class DataPlane(StackNode):
             bastion_ami_id = self.get_input('BastionHostAMI')
         except MKUnresolvableInputError:
             filters = {'name':
-                       'ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*',
+                       'ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*',
                        'architecture': 'x86_64',
                        'block-device-mapping.volume-type': 'gp2',
                        'root-device-type': 'ebs',
diff --git a/deployment/packer/driver.py b/deployment/packer/driver.py
index c477964f..8505a31e 100644
--- a/deployment/packer/driver.py
+++ b/deployment/packer/driver.py
@@ -18,7 +18,7 @@ LOGGER.setLevel(logging.INFO)
 def get_recent_ubuntu_ami(region, aws_profile):
     """Gets AMI ID for current release in region"""
     filters = {
-        'name': 'ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*',
+        'name': 'ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*',
         'architecture': 'x86_64',
         'root-device-type': 'ebs',
         'virtualization-type': 'hvm',

Would this be enough? If so, this can be included in this PR. If it requires more advanced work, then it may be better to defer until a follow up.

Copy link
Contributor

@rbreslow rbreslow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I will open separate issues for upgrading to Ubuntu 20.04 on AWS and resolving / understanding the issue with CI and those IP addresses.

@rajadain rajadain merged commit 4393a97 into develop Sep 28, 2021
@rajadain rajadain deleted the tt/upgrade-ubuntu-20.04 branch September 28, 2021 15:25
@rajadain rajadain mentioned this pull request Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants