Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidelines to choose the best parameters for bivariate analysis using Visium HD data #133

Open
Rafael-Silva-Oliveira opened this issue Aug 13, 2024 · 5 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Rafael-Silva-Oliveira
Copy link

Rafael-Silva-Oliveira commented Aug 13, 2024

Hello again!

I've been trying out the bivariate approach with the Visium HD data, and I've been testing some of the parameters, mainly bandwidth and max_neighbours:

li.ut.spatial_neighbors(
    adata,
    bandwidth=1000,
    cutoff=0.1,
    kernel="gaussian",
    set_diag=False,
    max_neighbours=500,
)
li.mt.bivariate(
    adata,
    layer="lognorm_counts",
    resource_name="consensus",  # NOTE: uses HUMAN gene symbols!
    local_name="cosine",  # Name of the function
    global_name="morans",  # Name global function
    n_perms=75,  # Number of permutations to calculate a p-value
    mask_negatives=False,  # Whether to mask LowLow/NegativeNegative interactions
    add_categories=True,  # Whether to add local categories to the results
    nz_prop=0.01,  # Minimum expr. proportion for ligands/receptors and their subunits
    use_raw=False,
    verbose=True,
)

And the results from the spatial plot look like this:

With max neighbors 500 and bandwidth 1000:
output3

Max neighbors 100 and bandwidth 250:
image

Now, on both of them they naturally have circular shaped regions due to the settings, but I'd just like to ask some guidelines given the following:

  • Each "bin" or "spot" seen in the picture is an actual cell (processed by Bin2Cell)
  • Each spot is assigned a cell type label (not proportion)

Given that these methods can take a bit to get to the results, is there any other factors I should consider to adjust so that things look a bit more "smooth" like the tutorials seen on LIANA+ documentation?

Would it be a good idea to test for the jaccard index considering the actual labelled categories? Should I choose max_neighbors of 1-2 instead and bandwidth of 50-100?

Thanks once again for the support :)

For reference:

bandwidth

@Rafael-Silva-Oliveira Rafael-Silva-Oliveira added bug Something isn't working help wanted Extra attention is needed labels Aug 13, 2024
@dbdimitrov
Copy link
Collaborator

Hi @Rafael-Silva-Oliveira,

Now, the bandwidth is I assume is being calculated in pixels, i.e. the x,y units stored in adata.obsm['spatial']? And pixels in Visium HD images should correspond to something between 0.5 to 2 microns, I guess. If you can check this then you can calculare how many microns are per pixel, and then a commonly used assumption for distance of diffusion is ~100 microns.

Or alternatively, you could set it to 10 or 20 cells.

Both cases are obviously oversimplifications since diffusion depends on the ligand, and you also have membrane-bound interactions. Also, this is expression and not proteins so, to me, the decision is a bit arbitrary and case-to-case dependent.

Hope this helps :)

@Rafael-Silva-Oliveira
Copy link
Author

Rafael-Silva-Oliveira commented Aug 13, 2024

Hi @Rafael-Silva-Oliveira,

Now, the bandwidth is I assume is being calculated in pixels, i.e. the x,y units stored in adata.obsm['spatial']? And pixels in Visium HD images should correspond to something between 0.5 to 2 microns, I guess. If you can check this then you can calculare how many microns are per pixel, and then a commonly used assumption for distance of diffusion is ~100 microns.

Or alternatively, you could set it to 10 or 20 cells.

Both cases are obviously oversimplifications since diffusion depends on the ligand, and you also have membrane-bound interactions. Also, this is expression and not proteins so, to me, the decision is a bit arbitrary and case-to-case dependent.

Hope this helps :)

Thank you for the swift reply once again!

Indeed, the original Visium HD dataset would be seen as the coordinates of each bin, but given I've processed with Bin2Cell, these coordinates got "aggregated" in some way, so I'll have to confirm that :)

Just by following your suggestions, I got to these plots, which seem to me a bit more of what we'd like to see with this type of data!

I've also changed to the jaccard index instead of cosine, as it might be better for categorical data, but I'll see with cosine too

image

Thanks again!

@Rafael-Silva-Oliveira
Copy link
Author

Rafael-Silva-Oliveira commented Aug 14, 2024

Hello again! I don't think this would require opening a new issue, but whenever I run this part of the tutorial (the decoupleR component of the bivariate analysis using LIANA+):


# Estimate cosine similarity
li.mt.bivariate(
    mdata,
    x_mod="comps",
    y_mod="tf",
    local_name="cosine",
    interactions=interactions,
    mask_negatives=True,
    add_categories=True,
    x_use_raw=False,
    y_use_raw=False,
    nz_prop=0.01,  
    xy_sep="<->",
    x_name="celltype",
    y_name="tf",
)


My terminal is killed (I'm assuming because of memory errors, no other warnings);

I have 1100 interactions, 22 cell types (where I converted from string label to one-hot encoded - Instead of being "proportions", here we have 1 spot = 1 cell, so 1 for the cell type it was predicted as for that given cell and 0 for all the others) and 520k cells

I tried reducing to just the top 5 highly variable TFs, but still crashed

Thanks again :)

@dbdimitrov
Copy link
Collaborator

Hi @Rafael-Silva-Oliveira,

You could try setting add categories and mask negatives to false. Perhaps, this is causing the issue. If it is, I could have another look as there might be a way to make it work also on a laptop.

Daaniel

@dbdimitrov dbdimitrov reopened this Aug 15, 2024
@Rafael-Silva-Oliveira
Copy link
Author

Hi @Rafael-Silva-Oliveira,

You could try setting add categories and mask negatives to false. Perhaps, this is causing the issue. If it is, I could have another look as there might be a way to make it work also on a laptop.

Daaniel

Hey Daniel, I tried that approach and still crashed, I haven't checked the underlying code yet, I can also have a look and see where it might be crashing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants