-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Ubuntu to 20.04 Focal in Development / CI #3420
Conversation
Ubuntu 20.04 Focal does not have a `python` package in its apt repository. Instead, it has a `python2` package. Since this is an unusual case, of installing an older version of Python in a newer OS, we switch from using the generic `azavea.python` and `azavea.pip` roles to an MMW specific one, which installs the most recent version of Python 2.7, and symlinks `python2` to `python`, making the default version by Python 2.7. The most recent version of pip that supports Python 2.7 is also installed.
Ubuntu 20.04 Focal defaults to expecting `pip3` as the executable, which we haven't installed. Thus, we specify `pip` as the default executable.
This version of GEOS has a mistake in its version number which ends with an empty space. Newer versions of libgeos can handle that, but the one we're using with Django 1.11 and Python 2.7 does not. There is a bug for this: https://code.djangoproject.com/ticket/31838, but the advice there is to upgrade Python and Django. We are not ready to do that just yet, so instead we fix the version checking line in the library with an additional strip() which removes any extraneous spaces.
This is no longer available, and as far as I can tell, not needed anymore?
We need build-essential in the tiler VM as well to build the Windshaft dependencies. Thus, we now move the azavea.build-essential inclusion to mmw.base, instead of just the mmw.app and mmw.celery-worker roles, to ensure it is available for all VMs.
The azavea.postgresql-support role has dependencies on azavea.python and azavea.pip which we're working around at this stage. Thus, we replicate the core functionality of that role locally, sans the problematic dependencies.
The PR builder had 16.04 VMs that I deleted like this: $ ssh civicci01
azavea@civici01$ sudo su jenkins
jenkins@civicci01$ cd ~/workspace/model-my-watershed-pull-requests/
jenkins@civicci01$ vagrant destroy -f Then reran the PR builder. This provisioned correctly, but is now reporting the following failing tests:
Looking in to those now. |
The failure in more detail: In [7]: one_sq_km_aoi.transform(5070, clone=True)
Out[7]: <Polygon object at 0x7fc06e489250>
In [8]: hundred_sq_km_aoi.transform(5070, clone=True)
---------------------------------------------------------------------------
GDALException Traceback (most recent call last)
<ipython-input-8-7bcaa86cef5e> in <module>()
----> 1 hundred_sq_km_aoi.transform(5070, clone=True)
/usr/local/lib/python2.7/dist-packages/django/contrib/gis/geos/geometry.pyc in transform(self, ct, clone)
525 # Creating an OGR Geometry, which is then transformed.
526 g = gdal.OGRGeometry(self._ogr_ptr(), srid)
--> 527 g.transform(ct)
528 # Getting a new GEOS pointer
529 ptr = g._geos_ptr()
/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/geometries.pyc in transform(self, coord_trans, clone)
406 elif isinstance(coord_trans, six.integer_types + six.string_types):
407 sr = SpatialReference(coord_trans)
--> 408 capi.geom_transform_to(self.ptr, sr.ptr)
409 else:
410 raise TypeError('Transform only accepts CoordTransform, '
/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/prototypes/errcheck.pyc in check_errcode(result, func, cargs, cpl)
117 Check the error code returned (c_int).
118 """
--> 119 check_err(result, cpl=cpl)
120
121
/usr/local/lib/python2.7/dist-packages/django/contrib/gis/gdal/error.pyc in check_err(code, cpl)
71 elif code in err_dict:
72 e, msg = err_dict[code]
---> 73 raise e(msg)
74 else:
75 raise GDALException('Unknown error code: "%s"' % code)
GDALException: OGR failure. This is the {
"type": "Polygon",
"coordinates": [
[
[
-75.27900695800781,
39.891925022904516
],
[
-75.26608943939209,
39.891925022904516
],
[
-75.26608943939209,
39.90173657727282
],
[
-75.27900695800781,
39.90173657727282
],
[
-75.27900695800781,
39.891925022904516
]
]
]
} This is the {
"type": "Polygon",
"coordinates": [
[
[
-94.64584350585938,
38.96154447940714
],
[
-94.53460693359374,
38.96154447940714
],
[
-94.53460693359374,
39.05225165582583
],
[
-94.64584350585938,
39.05225165582583
],
[
-94.64584350585938,
38.96154447940714
]
]
]
} |
We install docker-compose with pip for the Ansible module to work. Also, we must specify that it explicitly use /usr/bin/python, otherwise the first time the VM is built it uses /usr/bin/python3, and in subsequent times it uses /usr/bin/python.
It's a big jump, but fortunately works without any modifications to other pieces of the infrastructure.
Previously this package was preinstalled in Ubuntu. As of Focal it is no longer there. It is needed for Ansible to use a non-super-user role, so we install it explicitly. See georchestra/ansible#55 (comment)
Needed since the base values are unable to pull NHD Catchment or High Resolution Stream sql files from S3.
Some tests fail because Django 1.11 cannot work with the more recent versions of proj4 (and consequently GEOS) that are available for Ubuntu 20.04. These tests must wait until the Django upgrade to be enabled again, in #3419.
It is the same data as production, re-exported to not include the ST_Force2D statements which were in the original file, as that is no longer supported in PostgreSQL. The original file is still on S3.
6489389
to
7c6c8fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to follow your testing instructions in their entirety. They were very clear and I didn't run into any obstacles that weren't specific to my unique development workflow.
One of these obstacles is worth surfacing, though, because it will impact the ability for others to spin up Model My Watershed. We're still using functionality that was deprecated in Ansible 2.9. I have Ansible 2.11 installed on my host and I don't have a great way (yet, 🤞 for Nix Shell) to install an older version of Ansible for a single project. I ended up adjusting the Vagrantfile
to use the ansible_local
provisioner.
The diff for that is here. Let me know if you're comfortable with me pushing it to your branch.
diff --git a/Vagrantfile b/Vagrantfile
index bc796c95..e65804bd 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -3,6 +3,11 @@
Vagrant.require_version ">= 2.2"
+# We need to stay on Ansible 2.8 because the version_compare filter was removed
+# in 2.9.
+# https://github.com/ansible/ansible/issues/64174#issuecomment-548639160
+ANSIBLE_VERSION = "2.8.*"
+
if ["up", "provision", "status"].include?(ARGV.first)
require_relative "vagrant/ansible_galaxy_helper"
@@ -56,8 +61,14 @@ Vagrant.configure("2") do |config|
v.cpus = 4
end
- services.vm.provision "ansible" do |ansible|
+ services.vm.provision "ansible_local" do |ansible|
ansible.compatibility_mode = "2.0"
+ ansible.install_mode = "pip_args_only"
+ # We can't use Python 3 yet because the provisioning process fails on
+ # "Create PostgreSQL super user." Failed to import the required Python
+ # library (psycopg2) on services's Python /usr/bin/python3.
+ ansible.pip_install_cmd = "curl https://bootstrap.pypa.io/pip/2.7/get-pip.py | sudo python"
+ ansible.pip_args = "ansible==#{ANSIBLE_VERSION}"
ansible.playbook = "deployment/ansible/services.yml"
ansible.groups = ANSIBLE_GROUPS.merge(ANSIBLE_ENV_GROUPS)
ansible.raw_arguments = ["--timeout=60"]
@@ -96,8 +107,11 @@ Vagrant.configure("2") do |config|
v.cpus = 2
end
- worker.vm.provision "ansible" do |ansible|
+ worker.vm.provision "ansible_local" do |ansible|
ansible.compatibility_mode = "2.0"
+ ansible.install_mode = "pip_args_only"
+ ansible.pip_install_cmd = "curl https://bootstrap.pypa.io/pip/2.7/get-pip.py | sudo python"
+ ansible.pip_args = "ansible==#{ANSIBLE_VERSION}"
ansible.playbook = "deployment/ansible/workers.yml"
ansible.groups = ANSIBLE_GROUPS.merge(ANSIBLE_ENV_GROUPS)
ansible.raw_arguments = ["--timeout=60"]
@@ -136,8 +150,11 @@ Vagrant.configure("2") do |config|
v.memory = 2048
end
- app.vm.provision "ansible" do |ansible|
+ app.vm.provision "ansible_local" do |ansible|
ansible.compatibility_mode = "2.0"
+ ansible.install_mode = "pip_args_only"
+ ansible.pip_install_cmd = "curl https://bootstrap.pypa.io/pip/2.7/get-pip.py | sudo python"
+ ansible.pip_args = "ansible==#{ANSIBLE_VERSION}"
ansible.playbook = "deployment/ansible/app-servers.yml"
ansible.groups = ANSIBLE_GROUPS.merge(ANSIBLE_ENV_GROUPS)
ansible.raw_arguments = ["--timeout=60"]
@@ -160,8 +177,11 @@ Vagrant.configure("2") do |config|
v.memory = 1024
end
- tiler.vm.provision "ansible" do |ansible|
+ tiler.vm.provision "ansible_local" do |ansible|
ansible.compatibility_mode = "2.0"
+ ansible.install_mode = "pip_args_only"
+ ansible.pip_install_cmd = "curl https://bootstrap.pypa.io/pip/2.7/get-pip.py | sudo python"
+ ansible.pip_args = "ansible==#{ANSIBLE_VERSION}"
ansible.playbook = "deployment/ansible/tile-servers.yml"
ansible.groups = ANSIBLE_GROUPS.merge(ANSIBLE_ENV_GROUPS)
ansible.raw_arguments = ["--timeout=60"]
Because I was using ansible_local
, I never ran into a problem with vagrant up services
. I think this is because I am specifying a Python 2 interpreter from the start. My host's Ansible is set up to use Python 3.9.7.
Also, just a point of feedback. There were a couple of times when I was reading your PR that I asked myself, "Why did Terrence do this?" Then, I saw your thoughtful commit messages and was able to answer my questions. However, I had to keep a list of your commits open alongside the Files changed diff in another window. Consider propagating some of the details you include in commit messages into inline comments on the pull request.
Hector's blog post on this is worth a read.
deployment/ansible/roles/model-my-watershed.app/tasks/dependencies.yml
Outdated
Show resolved
Hide resolved
|
||
- name: Hack Python 2.7 to work with Ubuntu Focal | ||
replace: | ||
path: /usr/local/lib/python2.7/dist-packages/django/contrib/gis/geos/libgeos.py | ||
regexp: geos_version\(\)\.decode\(\) | ||
replace: geos_version().strip().decode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deployment/ansible/roles/model-my-watershed.docker/tasks/main.yml
Outdated
Show resolved
Hide resolved
To better allow developers with different versions of Ansible installed on their hosts to run this project.
Switched to |
They would not work in the old format after the ansible_local switch.
Same the same failures as CI in my local, added 3577f50 to fix those. Hopefully the CI succeeds now 🤞 |
There were other, non-stopped Docker containers using port 5432 on Jenkins, causing port conflicts. Stopped them manually, retriggered the PR builder. |
If I log in to CI as Jenkins and run the script that the PR builder runs, it succeeds:
but Jenkins continues to fail. I'm not sure what is going on there, but this PR is ready for another look. |
Thanks to @rbreslow for fixing CI. There were additional environment variables that for some reason were causing the build to fail. Recording them here for posterity:
|
I looked at #3113 for inspiration on how to support Focal in AWS, and found these changes to be made: diff --git a/deployment/cfn/data_plane.py b/deployment/cfn/data_plane.py
index e7aec071..b4a7d426 100644
--- a/deployment/cfn/data_plane.py
+++ b/deployment/cfn/data_plane.py
@@ -195,7 +195,7 @@ class DataPlane(StackNode):
bastion_ami_id = self.get_input('BastionHostAMI')
except MKUnresolvableInputError:
filters = {'name':
- 'ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*',
+ 'ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*',
'architecture': 'x86_64',
'block-device-mapping.volume-type': 'gp2',
'root-device-type': 'ebs',
diff --git a/deployment/packer/driver.py b/deployment/packer/driver.py
index c477964f..8505a31e 100644
--- a/deployment/packer/driver.py
+++ b/deployment/packer/driver.py
@@ -18,7 +18,7 @@ LOGGER.setLevel(logging.INFO)
def get_recent_ubuntu_ami(region, aws_profile):
"""Gets AMI ID for current release in region"""
filters = {
- 'name': 'ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*',
+ 'name': 'ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*',
'architecture': 'x86_64',
'root-device-type': 'ebs',
'virtualization-type': 'hvm', Would this be enough? If so, this can be included in this PR. If it requires more advanced work, then it may be better to defer until a follow up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I will open separate issues for upgrading to Ubuntu 20.04 on AWS and resolving / understanding the issue with CI and those IP addresses.
Overview
Upgraded development environment to Ubuntu 20.04 Focal, PostgreSQL to 12, and PostGIS to 3.1.
Connects #3416
TODO
boundary_county.sql.gz
dataset containsST_Force2D
directives that have been deprecated for some time, and are no longer supported in PostgreSQL 12. That dataset should be rexported in a PostgreSQL 12 compatible format.Notes
Testing this will require a complete destruction and recreation of the existing development environment setup, and >100GB free space.
Testing Instructions
$ vagrant destroy -f
services
VM:$ vagrant up services
postgresql_
commands run off thepython
detected at the start of provisioning, which ispython3
on Focal, but we're installingpython2
and symlinking it topython
during our provisioning, which confuses the task. Since we'll be upgrading to Python 3 shortly in Upgrade to Python 3 #3165, rather than find a more elegant solution here, we just re-run the provisioner, which succeeds the second time:$ vagrant reload services --provision
$ vagrant up app worker tiler
$ ./scripts/manage.sh migrate
setupdb.sh
script to run locally:$ vagrant ssh app -c 'cd /vagrant && ./scripts/aws/setupdb.sh -bsSdmpcq'
At this time, your local MMW setup should be complete. We can now test the application functionality.