Ingest NHD High Resolution Streams, Add Tile Layer #3417

rajadain · 2021-08-18T17:43:32Z

Overview

The full ingest process is documented in #3415 (comment). This process took several days, not just to figure out the data shape, but to process it, because of the volume. This is by far the largest volume of data we've added to MMW, and may have non-trivial consequences for performance and hosting costs that will reveal themselves as this moves to staging and production.

The work here adds the NHD High Resolution stream data to the MMW database, and wires up the tiler to render it on the map. It does not switch our analyses and models to use the new High Resolution data, which still use Medium Resolution. That will be done in future cards.

I was unable to provision the tiler VM on my local, which could only be resolved by upgrading the NPM version on that VM. It seemingly works well enough, and is isolated from the app VM, so should be fine.

Connects #3415

Demo

Notes

The compressed data is ~17GB, and in the database it is ~26GB:

SELECT pg_size_pretty(pg_total_relation_size('nhdflowlinehr'));

 pg_size_pretty 
----------------
 26 GB
(1 row)

There's a total of ~24.5M rows:

SELECT COUNT(*) FROM nhdflowlinehr;

  count   
----------
 24517604
(1 row)

I attempted to drop the extraneous columns in the table to see if that would reduce the size much, but it didn't. I imagine that most of the data is the geometry of the streams, which we cannot elide.

About 2% of the streams do not have a value for stream_order or slope, which were obtained by joining with the Value Added Attributes table:

SELECT COUNT(*) FROM nhdflowlinehr WHERE stream_order IS NULL;

 count  
--------
 493431
(1 row)

Testing Instructions

Before you begin, ensure you have ~80+ GB of free space on your host computer.

Import the new high resolution streams with:

$ vagrant ssh app -c 'cd /vagrant && ./scripts/aws/setupdb.sh -S'

This might take 30-60 minutes.

If this fails, try giving your services VM more resources:

diff --git a/Vagrantfile b/Vagrantfile
index 600fc14d..561d833c 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -52,7 +52,8 @@ Vagrant.configure("2") do |config|

     services.vm.provider "virtualbox" do |v|
       v.customize ["guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-threshold", 10000 ]
-      v.memory = 2048
+      v.memory = 6144
+      v.cpus = 4
     end

     services.vm.provision "ansible" do |ansible|

Ensure it succeeds

While the above is happening, reprovision your tiler:
```
$ vagrant reload --provision tiler
```
- Ensure it succeeds
Once the import is complete, go to http://localhost:8000/
Turn on the Continental US High Resolution Stream Network layer
- Ensure it renders as expected
Turn on the Continental US Medium Resolution Stream Network layer
- Ensure that still works as before

The previous version was failing on my machine.

rajadain · 2021-08-18T17:44:29Z

src/tiler/styles/streams.mss

@@ -92,6 +92,90 @@
  }
 }

+#nhdflowlinehr {


These styles are identical to those for #nhdflowline

jwalgran · 2021-08-23T03:33:56Z

An update. I set up a new set of vagrant VMs on an external drive and increased the memory and CPU as suggested. Loading the high res data has been chugging on ALTER TABLE for a while, a lot more than the suggested "30-60 minutes"

+ curl -s https://s3.amazonaws.com/data.mmw.azavea.com/nhdflowlinehr.sql.gz
+ gunzip -q
+ psql --single-transaction
SET
SET
SET
SET
SET
 set_config
------------

(1 row)

SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
ALTER TABLE
ALTER SEQUENCE
ALTER TABLE
COPY 24517604
  setval
----------
 24517604
(1 row)

ALTER TABLE

The disk on my services VM is up to about 33GB.

I will let it go overnight.

jwalgran

I was unable to see the data load complete in a reasonable amount of time when attempting to run the VM from my external spinning hard disk. The process did appear to be working, and the changes in the PR are mostly a reuse of existing functionality. 👍

rajadain · 2021-08-23T16:03:50Z

Thanks for taking a look. I'll merge this and we can get this on staging and evaluate there. (Although we may have to wait for #3416 before staging deployments work again.)

rajadain added 2 commits August 18, 2021 00:29

Add NHD High Resolution Streams

3edf027

Upgrade Tiler NPM to fix provisioning

64aa5ae

The previous version was failing on my machine.

rajadain added the PA DEP Funding Source: Pennsylvania Department of Environment Protection label Aug 18, 2021

rajadain requested a review from jwalgran August 18, 2021 17:43

rajadain assigned jwalgran Aug 18, 2021

rajadain commented Aug 18, 2021

View reviewed changes

src/tiler/styles/streams.mss

@@ -92,6 +92,90 @@

}

}

#nhdflowlinehr {

Copy link

Member Author

rajadain Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These styles are identical to those for #nhdflowline

rajadain mentioned this pull request Aug 19, 2021

Analyze High Resolution NHD Streams #3418

Closed

3 tasks

jwalgran approved these changes Aug 23, 2021

View reviewed changes

jwalgran assigned rajadain and unassigned jwalgran Aug 23, 2021

rajadain merged commit b2aa875 into develop Aug 23, 2021

rajadain deleted the tt/ingest-nhd-hires-streams branch August 23, 2021 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest NHD High Resolution Streams, Add Tile Layer #3417

Ingest NHD High Resolution Streams, Add Tile Layer #3417

rajadain commented Aug 18, 2021

rajadain Aug 18, 2021

jwalgran commented Aug 23, 2021

jwalgran left a comment

rajadain commented Aug 23, 2021

Ingest NHD High Resolution Streams, Add Tile Layer #3417

Ingest NHD High Resolution Streams, Add Tile Layer #3417

Conversation

rajadain commented Aug 18, 2021

Overview

Demo

Notes

Testing Instructions

rajadain Aug 18, 2021

Choose a reason for hiding this comment

jwalgran commented Aug 23, 2021

jwalgran left a comment

Choose a reason for hiding this comment

rajadain commented Aug 23, 2021