Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect plotting of exactly overlapping scatter with hue and hue_order #3728

Open
eloyvallinaes opened this issue Jul 12, 2024 · 3 comments

Comments

@eloyvallinaes
Copy link

While working with sns.scatterplot for representing locations on a grid, I discovered an issue where using hue and hue_order produces an incorrect plot: markers that should be perfectly overlapping—they have identical (x, y) coordinates—are drawn at a small offset, such that the edge of one can be seen intersecting the other. Here's a minimal example that reproduces the issue with matplotlib 3.9.1 and seaborn 0.13.2:

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.DataFrame.from_dict({
    'x': [6.3, 6.3, 6.3, 6.3, 6.633333, 6.633333, 6.633333, 6.633333, 33.48, 33.48, 33.48, 33.48, 33.813333, 33.813333, 33.813333, 33.813333],
    'y': [-12.42, -12.42, -4.0, -4.0, -12.42, -12.42, -4.0, -4.0, -12.42, -12.42, -4.0, -4.0, -12.42, -12.42, -4.0, -4.0],
    'locid': ['loc1', 'loc1', 'loc1', 'loc1', 'loc2', 'loc2', 'loc2', 'loc2', 'loc1', 'loc1', 'loc1', 'loc1', 'loc2', 'loc2', 'loc2', 'loc2']
})

sns.scatterplot(
    data=df,
    x='x',
    y='y',
    marker="o",
    hue='locid',
    hue_order=['loc1'],
)
print('Pandas version: ', pd.__version__)  # 2.2.2
print('Matplotlib version: ', matplotlib.__version__)  # 3.9.1
print('Seaborn version: ', sns.__version__)  # 0.13.2

That code produces the following plot:
bugPlot
where at each corner, the edge of the second marker is clearly seen to intersect the face of the first

From my brief dive into this problem:

  1. As in the example, it doesn't matter whether a tall stack of markers are made to overlap: there's only to points with the exact (6.3, -12.42) coordinates and the problem is there.
  2. The issue is seaborn-specific. Using matplotlib's plt.scatter does yield a correct plot.
  3. Both hue and hue_order need to be used in order for the issue to appear. Slicing the data with df[df.locid == 'loc1'] makes a correct plot.
  4. The problem persists even with marker='.' , marker='s', marker='v' and marker='d', but not with marker='x'.
@mwaskom
Copy link
Owner

mwaskom commented Jul 12, 2024

I don't think anything is wrong with the position things are plotted in here. Rather, using hue_order in scatteplot doesn't suppress the datapoints from the plot, but it does cause the facecolor to be null. So you're seeing the edges from the loc2 points. I could have sworn there was an issue about this already but couldn't find it quickly.

@mwaskom
Copy link
Owner

mwaskom commented Jul 15, 2024

Oh here it is #3601

@eloyvallinaes
Copy link
Author

Ah! You're right of course! 😄 Thanks!

I took a look at relational._ScatterPlotter.plot and thought some logic could be added to fix this problem, so here's a pull request #3730. I assumed the intended behaviour is to have transparent edges whenever hue_order has made a face transparent, while preserving whatever edgecolor (white is default) was passed to the plot method.

It's a bit of a patch but it covers all the use cases I could think of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants