Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark geom_map() as superseded #3721

Closed
yutannihilation opened this issue Jan 10, 2020 · 48 comments · Fixed by #4865
Closed

Mark geom_map() as superseded #3721

yutannihilation opened this issue Jan 10, 2020 · 48 comments · Fixed by #4865
Assignees
Milestone

Comments

@yutannihilation
Copy link
Member

yutannihilation commented Jan 10, 2020

I believe we are already in the mood of moving away from geom_map() as geom_sf() almost superseded it. But, AFAIK, this hasn't been discussed formally so far. Considering we keep developing many codes around geom_sf(), I think it's good to mark geom_map() soft-deprecated before we face difficulties about the different behaviors between them.

Here I file this issue to discuss what functionalities are missing from geom_sf() and its related functions, which might drive people to fall back to geom_map(). Any thoughts?

Edit: to be clear, I don't think we can remove geom_map() considering its wide spread usages (so I chose the word "soft-deprecated"). Still, I believe we can discourage the usage of geom_map() to prevent further confusions by improving geom_sf() or providing some alternatives.

@yutannihilation
Copy link
Member Author

One thing in my mind is map_data() for sf.

@microly
Copy link
Contributor

microly commented Jan 11, 2020

geom_map can accept data.frame object, but geom_sf has to use sf object.

Before deprecate geom_map, I think #3659 , #3655 and #3654 should be accomplished.

@yutannihilation
Copy link
Member Author

Agreed, those ongoing pull requests are definitely needed to replace geom_map(). Thanks.

@microly
Copy link
Contributor

microly commented Jan 11, 2020

Maybe, sf_data() is needed to transform data.frame to sf for geom_sf(), just as map_data() did for geom_map().

It can be another solution for #3659 , #3655 and #3654.

As I mentioned, #3655 (comment), it is a issue about data, so just handle the data.

@yutannihilation
Copy link
Member Author

sf_data() is needed to transform data.frame to sf for geom_sf()

Isn't this can be done with st_as_sf(coords = c("x", "y"))...?

@microly
Copy link
Contributor

microly commented Jan 11, 2020

1.st_as_sf() can only output point(s), if the input is a data.frame.
However, the users may want to draw a line or a polygon.

2.Sf indeed supply more constructor, such as st_linestring and st_polygon, to create a line or polygon.
But it is too complicated for ggplot2 users, I think most of them are not familiar with the internal structure of sf geometry.

3.So, a light weighted and user friendly sf_data() would be needed.
It can only handle point, line, polygon and polygon with hole, this is enough for ggplot2 users because the geometry (points) and multiple geometry (multipoint) look the same in the plot.

it may be sth like:

sf_data(data, coords = c("x", "y"), crs = 4326, group = NULL, subgroup = NULL, type = c("point", "line", "polygon"))

data, a data.frame
coords, just like the coords for st_as_sf
crs, coordinate reference system
group, for creating a line or polygon
subgroup, for the hole in the polygon, just like geom_polygon
type, sf geometry type

This is a tentative suggestion, please forgive me if it is wrong~

@clauswilke
Copy link
Member

@microly I agree with your general thoughts on this, but I'd like to propose a different approach: How about a geom_map_sf() that takes exactly the same inputs at geom_map() but then converts everything into the sf framework?

@yutannihilation
Copy link
Member Author

1.st_as_sf() can only output point(s), if the input is a data.frame.
However, the users may want to draw a line or a polygon.

Ah, makes sense. But, I'm a bit skeptical about implementing it in ggplot2. IIUC, in the pre-sf days, we needed to define ggplot2's own convention about the map data and provide the utilities (like various fortify() methods) for this. So, I feel something like sf_data() should belong to sf (or some sf extension package).

How about a geom_map_sf() that takes exactly the same inputs at geom_map() but then converts everything into the sf framework?

Sounds good. But, isn't it possible to convert data.frame transparently in Layer$setup_layer() (when some option like convert = TRUE is specified) so that we don't need another geom?

@microly
Copy link
Contributor

microly commented Jan 12, 2020

@clauswilke
Wonderful idea!

@yutannihilation
I think it could be done in the stat step.

@yutannihilation
Copy link
Member Author

yutannihilation commented Jan 12, 2020

I feel the geometry column should be determined before stat. Probably it also can be done in the stat step, but setup_layer() seems the better place to me.

ggplot2/R/layer-sf.R

Lines 29 to 37 in 0183011

# automatically determine the name of the geometry column
# and add the mapping if it doesn't exist
if ((isTRUE(self$inherit.aes) && is.null(self$mapping$geometry) && is.null(plot$mapping$geometry)) ||
(!isTRUE(self$inherit.aes) && is.null(self$mapping$geometry))) {
if (is_sf(data)) {
geometry_col <- attr(data, "sf_column")
self$mapping$geometry <- as.name(geometry_col)
}
}

@microly
Copy link
Contributor

microly commented Jan 12, 2020

@yutannihilation
sorry, I misunderstand your comment.

Another geom seems to be needed.
For short, geom_sf() is for sf object, geom_map_sf() is for data.frame.
This would be more clear and user-friendly.

As I see, a lot of ggplot2 users do not have enough knowlege about spatial data object, such as sf.

@microly
Copy link
Contributor

microly commented Jan 12, 2020

Just like the author of #3717, a lot of ggplot2 users know few about sf.
They just want to throw a data.frame into ggplot2 and get a map as easily as possible.
Then, geom_map_sf() fill the bill.

@clauswilke
Copy link
Member

@yutannihilation When you say "so we don't need another geom", do you mean we just modify geom_map() so it works with coord_sf(), either as default or when a specific parameter is set? This sounds reasonable to me. The current geom_map() code is only a thin wrapper around geom_polygon() anyways, so a more general geom_map() would simply provide two alternative wrappers.

@microly
Copy link
Contributor

microly commented Jan 12, 2020

@clauswilke
Agreed!

We can just refactor the geom_map().

@yutannihilation
Copy link
Member Author

@microly
I see, I agree with you to some extent. Having different geoms for sf and for data.frame sounds good.

@clauswilke
I meant to say about extending geom_sf() to accept data.frame with conversion, but modifying geom_map() sounds reasonable to me.

@clauswilke
Copy link
Member

I meant to say about extending geom_sf() to accept data.frame with conversion, but modifying geom_map() sounds reasonable to me.

Yeah, I don't think modifying geom_sf() makes sense, for two reasons:

  1. We'd have to add a map argument, which is ugly and goes against the entire concept of sf.

  2. Currently, geom_map() doesn't modify position scales. If we want to keep this behavior, then we'd have to give geom_sf() a special argument to ignore position data, and again that seems rather ugly. Better to leave the ugliness in the existing geom_map().

@yutannihilation
Copy link
Member Author

Let me clarify. One of the reasons I opened this issue was you suggested discouraging use of geom_map() in favor of geom_sf() in #3454 (comment). I agreed your comment in that geom_map() is seemingly simple, but the user might face the unexpected behaviour like this due to the mismatch with ggplot2's own convention about map data. Do you think the interface of geom_map() is still useful?

@clauswilke
Copy link
Member

A lot of code out in the wild uses geom_map() and geom_polygon() to draw maps. See e.g. Kieran Healy's book, which was only published last year: http://socviz.co/maps.html#maps
I think it would be good if we could provide a framework for people where they can stick to their existing such code as much as possible but at the same time transition to sf. However, now looking at the book, I see that the interfaces to geom_polygon() and geom_map() are very different, even though their relative purpose is essentially the same, except one trains position scales and the other does not, so everything is a mess already.

In conclusion: I'm not sure what the best approach is.

Maybe provide a geom that can take the same input data as geom_polygon but converts it to sf and that can be told to either train or not train position scales?

@microly
Copy link
Contributor

microly commented Jan 12, 2020

I think geom_map() is still useful for the users who know few about sf and want to draw a map using data.frame as succinctly as possible.
Also, some users want to lay a map layer using data.frame on the geom_sf() layer.

In sum, geom_map() is useful in the sense that users may want to draw a map layer using data.frame.

But geom_map() need a deep refactoring, such as adding a StatMap.

@clauswilke
Copy link
Member

How about this:

  • geom_map(): Behaves like current geom_map(), but by default converts to sf. Has a switch that turns off conversion, thus providing backwards compatibility for existing code.

  • geom_polygon_map(): Takes the same input data as current geom_polygon(), but then converts the data to sf and basically behaves like geom_sf().

@microly
Copy link
Contributor

microly commented Jan 12, 2020

This indicated that more geoms should be added, such as geom_point_map() and geom_line_map() to darw the map layer of point(s) and line(s).

It seems not so good...

@microly
Copy link
Contributor

microly commented Jan 12, 2020

In sum, there are 2 problems:

  1. data.frame has to add a crs information.
  2. how to handle different geometry types, at least four of these types: point, line, polygon, polygon with hole.

line and polygon need a identifier (eg. id or group) other than xy coords.
polygon with hole need a more identifier (eg. subgroup).

@clauswilke
Copy link
Member

We may be going round in circles now: I think I have to finish up #3659 first, as it will resolve a lot of these issues. For example, annotation_map() just works in an sf context with this PR. Similarly, geom_polygon() and geom_map() will just work.

@yutannihilation
Copy link
Member Author

Thanks. geom_polygon_map() sounds good, but let's wait for #3659 before talking about data.frame-version of geom_sf() (sorry that I haven't contributed the discussion there so far...).

However, now looking at the book, I see that the interfaces to geom_polygon() and geom_map() are very different, even though their relative purpose is essentially the same, except one trains position scales and the other does not, so everything is a mess already.

I see... I don't think we can remove geom_map() in the foreseeable future.

@yutannihilation
Copy link
Member Author

yutannihilation commented Jan 12, 2020

Btw, apart from the discussion about data.frame, I was wondering about if there's any use cases of geom_map() to plot sp data that cannot be converted to sf by st_as_sf() but can be to data.frame by fortify().

@microly
Copy link
Contributor

microly commented Jan 12, 2020

How about create a new stat?

For drawing points:
ggplot() + stat_map(aes(x = col_x, y = col_y), data = df, crs = 4326, geom = "point")

For drawing lines:
ggplot() + stat_map(aes(x = col_x, y = col_y, group = col_group), data = df, crs = 4326, geom = "line")

For drawing polygons:
ggplot() + stat_map(aes(x = col_x, y = col_y, group = col_group), data = df, crs = 4326, geom = "polygon")

For drawing polygons with holes:
ggplot() + stat_map(aes(x = col_x, y = col_y, group = col_group, subgroup = col_subgroup), data = df, crs = 4326, geom = "polygon")

@microly
Copy link
Contributor

microly commented Jan 12, 2020

@yutannihilation

Sp can hold one or some kinds of wired spatial data structure, which cannot be represented by sf.
In this case, sp cannot be converted to sf by st_as_sf().

I saw this message somewhere, but I am sorry that I forget where and I do not know whether it is still true now.

@yutannihilation
Copy link
Member Author

Yeah, I vaguely know not all sp data can be represented by simple features. What I'm wondering is that there are any cases when such data can still be handled by fortify(). I expect these cases are few, though.

@microly
Copy link
Contributor

microly commented Jan 12, 2020

I believe that these cases are very very few.
This indicates the design of sp is not so strictly, but not a flaw of sf.

@clauswilke
Copy link
Member

Just a tentative suggestion:

For drawing points:
ggplot() + geom_map(aes(x = col_x, y = col_y), data = df, crs = 4326, geom = "point")

For drawing lines:
ggplot() + geom_map(aes(x = col_x, y = col_y, group = col_group), data = df, crs = 4326, geom = "line")

For drawing polygons:
ggplot() + geom_map(aes(x = col_x, y = col_y, group = col_group), data = df, crs = 4326, geom = "polygon")

For drawing polygons with holes:
ggplot() + geom_map(aes(x = col_x, y = col_y, group = col_group, subgroup = col_subgroup), data = df, crs = 4326, geom = "polygon")

None of this is necessary if we merge #3659, since all regular geoms work with it. I suggest you try it out and see if you run into any problems. It's basically working at this time, the only thing missing is being able to specify limits in coord_sf() in the default crs.

@microly
Copy link
Contributor

microly commented Jan 12, 2020

@clauswilke

Thanks for your reply.
I will try it out.
Maybe, a new stat, stat_map(), is better than a new edition of geom_map().

@hadley
Copy link
Member

hadley commented Mar 4, 2020

I would recommend superseding geom_map(), making it clear that we no longer recommend and it will only receive critical updates. Then it could be deprecated at some point further in the future.

@clauswilke
Copy link
Member

I agree this is the way to go. @hadley or @yutannihilation, could I get an approval for PR #3659, so we can move forward with this? I think merging this code early in the current development cycle is a good idea, as it gives us more opportunity to catch any issues I've overlooked.

@yutannihilation yutannihilation changed the title Soft-deprecate geom_map()? Mark geom_map() as superseded Mar 5, 2020
@yutannihilation
Copy link
Member Author

Thanks, makes sense. I changed the issue title.

@microly
Copy link
Contributor

microly commented Mar 5, 2020

@clauswilke

I have a small (if it is not foolish) question:

If I have several data.frames which are using different crs, how can I plot them on one plot through #3659?

@clauswilke
Copy link
Member

@microly Nothing fundamentally changes from before:

  • If you have different geometry columns with different crs they all get automatically transformed to one common crs.
  • If you have different regular data columns without associated crs you have to transform to one common crs before handing the data to ggplot.

@microly
Copy link
Contributor

microly commented Mar 11, 2020

@clauswilke

Thanks for your kind reply and sorry for this late reply!

My chief concern is the regular data.frame with different crs.
It seems not very convenient that users have to transform crs before ploting, especially for the users who know few about sf (or GIS).
They just want to throw the data.frame into ggplot2 and plot the data on a map.
After all, crs is the most difficult and complicated difference between regular data.frame and sf (or GIS data).
It would be better that ggplot2 takes care of the different crs of multiple regular data.frames just as it dose for sf:)

@thomasp85
Copy link
Member

If you have different data frames with different crs you are obligated to know a bit about spatial data processing — I don’t think that is too much to ask. Basically what you describe requires the user to:

  1. Understand what crs is
  2. Know that their data is in different crs and what those are
  3. Understand that this is a problem during plotting

At that point, sitting down and understanding the basics of sf becomes an imperative and I don’t think it makes sense to complicate the ggplot2 API for the users who do not wish to do that

@microly
Copy link
Contributor

microly commented Mar 12, 2020

I do not mean to change the ggplot2 API.

I think a new stat may be needed to solve this problem.
The stat can transform the data.frame (with its crs) to sf.
Three sf geometry types are enough: point, line, polygon (with holes).

#3659 is great, especially it can work with geom_mark_hull(), it is wonderful!

But there is a question:
If users have to transform the crs of data.frame before ploting, they can plot without #3659 .
All they need to do is that just transform the crs to the same as the sf of first layer.

As a GISer, I saw a lot of ggplot2 users that know few about sf (or GIS) and just want to plot one or some of data.frames onto a map.
Usually, the task is too urgent for them to learn sf.
Or they only need to draw very a few map, it is too expensive for them to learn sf.
This problem makes them very frustrated.
So I hope that ggplot2 can be convenience enough for them.

@microly
Copy link
Contributor

microly commented Mar 12, 2020

If users can noly throw the data.frame into ggplot2 and tell ggplot2 the crs, then ggplot2 can draw the data onto a map.
This will make them very happy:)

@microly
Copy link
Contributor

microly commented Mar 12, 2020

I saw this problem from a view of GISer, not a desinger of ggplot2, so please forgive me if my idea is wrong or even foolish.

@clauswilke
Copy link
Member

clauswilke commented Mar 12, 2020

Users will be able to transform their data frame with one function call, see this example:
#3659 (comment)

Also, @paleolimbot is working on additional stats that make this easier. Not all of them need to live in ggplot2 itself: https://cran.r-project.org/web/packages/ggspatial/index.html

@microly
Copy link
Contributor

microly commented Mar 12, 2020

I see~
Thank you:)

@yutannihilation yutannihilation added this to the ggplot2 3.4.0 milestone Jun 27, 2020
@yutannihilation
Copy link
Member Author

Now that #3659 is merged, let's add this to the milestones of the next release.

@clauswilke
Copy link
Member

I think it's more important to mark coord_map() and coord_quickmap() as deprecated and to show how geom_map() can be used with coord_sf(). (Look at the documentation for annotation_map() for an example.)

Specifically, here are a few proposed changes for the examples provided with geom_map():

# modified from `geom_map()` examples, part "Better example"
library(maps)
library(ggplot2)

crimes <- data.frame(state = tolower(rownames(USArrests)), USArrests)

# Equivalent to crimes %>% tidyr::pivot_longer(Murder:Rape)
vars <- lapply(names(crimes)[-1], function(j) {
  data.frame(state = crimes$state, variable = j, value = crimes[[j]])
})
crimes_long <- do.call("rbind", vars)

states_map <- map_data("state")

# current example, leave in
ggplot(crimes, aes(map_id = state)) +
  geom_map(aes(fill = Murder), map = states_map) +
  expand_limits(x = states_map$long, y = states_map$lat)

# current example, remove
last_plot() + coord_map()

# proposed new example
ggplot(crimes, aes(map_id = state)) +
  geom_map(aes(fill = Murder), map = states_map) +
  # crs = 102003 is an Albers equal area projection, see: https://epsg.io/102003
  coord_sf(crs = 102003, xlim = c(-125, -70), ylim = c(25, 52))

# current example, remove
ggplot(crimes_long, aes(map_id = state)) +
  geom_map(aes(fill = value), map = states_map) +
  expand_limits(x = states_map$long, y = states_map$lat) +
  facet_wrap( ~ variable)

# proposed new example
ggplot(crimes_long, aes(map_id = state)) +
  geom_map(aes(fill = value), map = states_map) +
  coord_sf(crs = 102003, xlim = c(-125, -70), ylim = c(25, 52)) +
  facet_wrap( ~ variable)

Created on 2020-06-26 by the reprex package (v0.3.0)

@yutannihilation
Copy link
Member Author

Thanks, agreed. We need to review the related documentations. The new example looks good to me!

@thomasp85
Copy link
Member

@clauswilke are you interested in making a conclusion on this? As far as I understand we landed on simply updating the docs?

@clauswilke
Copy link
Member

@thomasp85 Yes, correct, update the documentation and the examples and otherwise leave everything as is. I can add it to my todo list.

@clauswilke clauswilke self-assigned this May 18, 2022
clauswilke added a commit to wilkelab/ggplot2_archive that referenced this issue Jun 7, 2022
clauswilke added a commit that referenced this issue Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants