Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoordinateSequence: create using external buffer #747

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dbaston
Copy link
Member

@dbaston dbaston commented Nov 30, 2022

This PR adds methods to create a CoordinateSequence backed by an external buffer. Currently this can only be done for XYZ or XYZM sequences until GEOS algorithms are made "2d-safe."

@dbaston
Copy link
Member Author

dbaston commented Nov 30, 2022

image

Overall this change causes no performance regression, and a small gain in a few cases.

@nyalldawson
Copy link

Nice! Would you consider an equivalent for x/y/... stored in seperate buffers? (Ie the qgis model)

@pramsey
Copy link
Member

pramsey commented Nov 30, 2022

Not really... the only reason this works is because the external buffer is exactly the same shape as the internal buffer.

@dr-jts
Copy link
Contributor

dr-jts commented Nov 30, 2022

Currently this can only be done for XYZ or XYZM sequences until GEOS algorithms are made "2d-safe."

What needs to be done to make things "2d-safe"?

@dbaston
Copy link
Member Author

dbaston commented Nov 30, 2022

What needs to be done to make things "2d-safe"?

Need to stop reading 3D coordinates (Coordinate) by reference without checking whether the sequence is actually 3D or 4D. Not a small task!

@pramsey
Copy link
Member

pramsey commented Nov 30, 2022

Can we mostly start reading CoordinateXY by reference instead?

@nyalldawson
Copy link

@pramsey

Not really... the only reason this works is because the external buffer is exactly the same shape as the internal buffer.

Fair enough -- this might be motivation enough for me to rework how QGIS internally stores these arrays!

@dbaston
Copy link
Member Author

dbaston commented Nov 30, 2022

Can we mostly start reading CoordinateXY by reference instead?

It's always safe to read CoordinateXY&, though the assumption that we have a Coordinate& shows up in a surprising number of places.

Simplification and triangulation are good examples...they're 2D algorithms, but they also have this assumption that we're going to preserve z values from the input.

@dbaston
Copy link
Member Author

dbaston commented Feb 9, 2023

Here's another test I did:

  • Removed the part of this PR that copies an external XY buffer to XYZ. This should give best case performance but could crash if the sequence is used in parts of the code that assume 3d coordinates.
  • Updated PostGIS so that GEOS coordinate sequences reference the memory in a POINTARRAY with no copies
  • Ran ST_IsValid on all polygons in 10 copies of the GADM country boundaries dataset (328 million vertices total), with STORAGE EXTERNAL

When geometries are copied into GEOS, this takes 100.58 seconds. When geometries are created with no coordinate copies, this takes 100.55 seconds. Not a big win for zero-copy!

A second test of a self-join with ST_Intersects takes 80 seconds when coordinates are copied and 74 seconds when the coordinates are reference. Maybe a 7.5% gain is worthwhile?

@dbaston
Copy link
Member Author

dbaston commented Feb 9, 2023

About a 10% gain on ST_Centroid. I have never been too worried about ST_Centroid as a bottleneck, but maybe better performance for very fast functions could still be useful for clients like GDAL that don't have all of the PostGIS-style short circuits. Calling GEOSIntersection on geometries that are likely to be disjoint, for example. (Although GEOSIntersection currently requires padded inputs)

To safely apply these changes, we'd need to have C API functions that require padded coordinates check their inputs and automatically copy into a new padded coordinate sequence. (This is easy to do with something like seq->reserve(seq->size()).)

@pramsey
Copy link
Member

pramsey commented Feb 10, 2023

As a PostGIS guy, this leaves me cold. The only win is for use cases where they are wrapping and using things like Area() and Length() and Distance() frequently, which... yeah it happens, but as your testing shows, for the "that's too hard to write myself, I need GEOS to do it" use cases, there's no win. I'd vote for GEOS simplicity and letting people write their own easy calculations. And honestly, of course the Rust guys are going to write Rust implementations of Area and Length and Distance! Purity! We saw the same thing with Turf.js and JSTS.

@@ -2034,7 +2042,24 @@ extern void GEOS_DLL GEOSFree(void *buffer);
extern GEOSCoordSequence GEOS_DLL *GEOSCoordSeq_create(unsigned int size, unsigned int dims);

/**
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge conflict markers

extern GEOSCoordSequence GEOS_DLL *GEOSCoordSeq_createFromBuffer_r(
GEOSContextHandle_t handle,
double* buf,
unsigned int size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that be size_t ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the other coordinate sequence functions did not use unsigned int

}

T* m_buf;
std::uint32_t m_capacity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use size_t for m_capacity and m_size ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C API uses 32 bits for sizes and incides, I'm doubtful a coordinate sequence with > 1 billion points would function in practice, and I'm trying to be sensitive to the object size of Point, whose API requires that it be backed by a CoordinateSequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants