This vignette outlines the steps of inference, analysis and visualization of cell-cell communication network for spatial multiomics data using CellChat. We showcase CellChat’s application by applying it to a mouse spleen dataset that is generated by a high-throughput spatial transcriptomics and proteomics co-profiling technology (SPOTS).
CellChat requires gene expression, protein abundance and spatial location data of spots/cells as the user input and models the probability of cell-cell communication by integrating gene expression, protein abundance with spatial distance as well as prior knowledge of the interactions between signaling ligands, receptors and their cofactors.
Upon infering the intercellular communication network, CellChat’s various functionality can be used for further data exploration, analysis, and visualization.
ptm = Sys.time()
library(CellChat)
library(patchwork)
options(stringsAsFactors = FALSE)
When inferring spatially-proximal cell-cell communication from spatially multiomics data, user also should provide spatial coordinates/locations of spot/cell centroids. In addition, to filter out cell-cell communication beyond the maximum diffusion range of molecules (e.g., ~250μm), CellChat needs to compute the cell centroid-to-centroid distance in the unit of micrometers. Therefore, for spatial technologies that only provide spatial coordinates in pixels, CellChat converts spatial coordinates from pixels to micrometers by requiring users to input the conversion factor.
CellChat requires four user inputs:
data.input (Gene expression and protein
abundance of spots/cells): genes/proteins should be in rows
with rownames and cells in columns with colnames. Normalized data (e.g.,
library-size normalization and then log-transformed with a pseudocount
of 1) is required as input for CellChat analysis.
meta (User assigned cell labels and samples
labels): a data frame (rows are cells with rownames) consisting
of cell information, which will be used for defining cell groups. A
column named samples should be provided for spatial
transcriptomics analysis, which is useful for analyzing cell-cell
communication by aggregating multiple samples/replicates. Of note, for
comparison analysis across different conditions, users still need to
create a CellChat object seperately for each condition.
coordinates (Spatial coordinates of
spots/cells): a data frame in which each row gives the spatial
coordinates/locations of each cell/spot centroid.
spatial.factors (Spatial factors of spatial
distance): a data frame containing two distance factors
ratio and tol, which is dependent on spatial
transcriptomics technologies (and specific datasets). (i)
ratio: the conversion factor when converting spatial
coordinates from Pixels or other units to Micrometers (i.e.,Microns).
For example, setting ratio = 0.18 indicates that 1 pixel
equals 0.18um in the coordinates. (ii) tol: the tolerance
factor to increase the robustness when comparing the center-to-center
distance against the interaction.range. This can be the
half value of cell/spot size in the unit of um. Please check the
vignette
of FAQ on applying CellChat to spatially resolved transcriptomics
data for detailed explanations and setting of different technologies
of spatial transcriptomics data.
When inferring contact-dependent or juxtacrine signaling by setting
contact.dependent = TRUE in computeCommunProb,
and using L-R pairs from Cell-Cell Contact signaling
classified in CellChatDB$interaction$annotation, CellChat
requires another one user input:
contact.range: a value giving the
interaction range (Unit: microns) to restrict the contact-dependent
signaling. For spatial transcriptomics in a single-cell resolution,
contact.range is approximately equal to the estimated cell
diameter (i.e., the cell center-to-center distance), which means that
contact-dependent or juxtacrine signaling can only happens when the two
cells are contact to each other. Typically,
contact.range = 10, which is a typical human cell size.
However, for low-resolution spatial data such as 10X visium, it should
be the cell center-to-center distance (i.e.,
contact.range = 100 for 10X visium data). The function
computeCellDistance can compute the center-to-center
distance.Instead of providing contact.range, users may
alternatively provide a value of contact.knn.k, in order to
restrict the contact-dependent signaling within the k-nearest neighbors
(knn).
# Here we load a Seurat object of 10X Visium mouse cortex data and its associated cell meta data
load("/Users/suoqinjin/Library/CloudStorage/OneDrive-Personal/works/CellChat/tutorial/data_mouse_spleen_RNA_ADT.RData")
# Prepare input data for CelChat analysis
data.list <- list(RNA = data.input.rna, ADT = data.input.adt)
res.multi <- preProcMultiomics(data.list, db = CellChatDB.mouse)
data.input <- res.multi$data.input
# define the meta data:
# a column named `samples` should be provided for spatial transcriptomics analysis, which is useful for analyzing cell-cell communication by aggregating multiple samples/replicates. Of note, for comparison analysis across different conditions, users still need to create a CellChat object seperately for each condition.
meta = data.frame(labels = meta$annotations, samples = "sample1", row.names = colnames(data.input)) # manually create a dataframe consisting of the cell labels
meta$samples <- factor(meta$samples)
unique(meta$labels) # check the cell labels
#> [1] Mac-1 B-cells Mac-2 T-cells
#> Levels: Mac-1 Mac-2 B-cells T-cells
unique(meta$samples) # check the sample labels
#> [1] sample1
#> Levels: sample1
# load spatial transcriptomics information
# Spatial locations of spots from full (NOT high/low) resolution images are required. For 10X Visium, this information is in `tissue_positions.csv`.
spatial.locs <- read.csv("/Users/suoqinjin/Library/CloudStorage/OneDrive-Personal/works/CellChat/tutorial/spatial_imaging_data-mouse_spleen/tissue_positions_list.csv", header = F, row.names = 1)
spatial.locs = spatial.locs[rownames(meta), c(4,5)]
# Spatial factors of spatial coordinates
# For 10X Visium, the conversion factor of converting spatial coordinates from Pixels to Micrometers can be computed as the ratio of the theoretical spot size (i.e., 65um) over the number of pixels that span the diameter of a theoretical spot size in the full-resolution image (i.e., 'spot_diameter_fullres' in pixels in the 'scalefactors_json.json' file).
# Of note, the 'spot_diameter_fullres' factor is different from the `spot` in Seurat object and thus users still need to get the value from the original json file.
scalefactors = jsonlite::fromJSON(txt = file.path("/Users/suoqinjin/Library/CloudStorage/OneDrive-Personal/works/CellChat/tutorial/spatial_imaging_data-mouse_spleen", 'scalefactors_json.json'))
spot.size = 65 # the theoretical spot size (um) in 10X Visium
conversion.factor = spot.size/scalefactors$spot_diameter_fullres
spatial.factors = data.frame(ratio = conversion.factor, tol = spot.size/2)
d.spatial <- computeCellDistance(coordinates = spatial.locs, ratio = spatial.factors$ratio, tol = spatial.factors$tol)
min(d.spatial[d.spatial!=0]) # this value should approximately equal 100um for 10X Visium data
#> [1] 98.89004
USERS can create a new CellChat object from a data matrix or
Seurat. If input is a Seurat object, the meta data in the
object will be used by default and USER must provide
group.by to define the cell groups. e.g, group.by = “ident”
for the default cell identities in Seurat object.
NB: If USERS load previously calculated CellChat object
(version < 2.1.0), please update the object via
updateCellChat
cellchat <- createCellChat(object = data.input, meta = meta, group.by = "labels",
datatype = "spatial", coordinates = spatial.locs, spatial.factors = spatial.factors)
#> [1] "Create a CellChat object from a data matrix"
#> Create a CellChat object from spatial transcriptomics data...
#> Set cell identities for the new CellChat object
#> The cell groups used for CellChat analysis are Mac-1, Mac-2, B-cells, T-cells
cellchat
#> An object of class CellChat created from a single dataset
#> 3044 genes.
#> 2568 cells.
#> CellChat analysis of spatial data! The input spatial locations are
#> x_cent y_cent
#> AAACACCAATAACTGC-1 2033 928
#> AAACAGAGCGACTCCT-1 727 2180
#> AAACAGCTTTCAGAAG-1 1568 761
#> AAACAGGGTCTATATT-1 1684 828
#> AAACCGGGTAGGTACC-1 1539 1078
#> AAACCGTTCGTCCAGG-1 1830 1312
Before users can employ CellChat to infer cell-cell communication, they need to set the ligand-receptor interaction database and identify over-expressed ligands or receptors.
Our database CellChatDB is a manually curated database of literature-supported ligand-receptor interactions in both human and mouse. CellChatDB v2 contains ~3,300 validated molecular interactions, including ~40% of secrete autocrine/paracrine signaling interactions, ~17% of extracellular matrix (ECM)-receptor interactions, ~13% of cell-cell contact interactions and ~30% non-protein signaling. Compared to CellChatDB v1, CellChatDB v2 adds more than 1000 protein and non-protein interactions such as metabolic and synaptic signaling. It should be noted that for molecules that are not directly related to genes measured in scRNA-seq, CellChat v2 estimates the expression of ligands and receptors using those molecules’ key mediators or enzymes for potential communication mediated by non-proteins.
CellChatDB v2 also adds additional functional annotations of ligand-receptor pairs, such as UniProtKB keywords (including biological process, molecular function, functional class, disease, etc), subcellular location and relevance to neurotransmitter.
Users can update CellChatDB by adding their own curated ligand-receptor pairs. Please check the tutorial on updating the ligand-receptor interaction database CellChatDB.
When analyzing human samples, use the database
CellChatDB.human; when analyzing mouse
samples, use the database
CellChatDB.mouse. CellChatDB categorizes
ligand-receptor pairs into different types, including “Secreted
Signaling”, “ECM-Receptor”, “Cell-Cell Contact” and “Non-protein
Signaling”. By default, the “Non-protein Signaling” are not used.
CellChatDB.use <- res.multi$db.use
# set the used database in the object
cellchat@DB <- CellChatDB.use
To infer the cell state-specific communications, CellChat identifies over-expressed ligands or receptors in one cell group and then identifies over-expressed ligand-receptor interactions if either ligand or receptor are over-expressed.
We also provide a function to project gene expression data onto
protein-protein interaction (PPI) network. Specifically, a diffusion
process is used to smooth genes’ expression values based on their
neighbors’ defined in a high-confidence experimentally validated
protein-protein network. This function is useful when analyzing
single-cell data with shallow sequencing depth because the projection
reduces the dropout effects of signaling genes, in particular for
possible zero expression of subunits of ligands/receptors. One might be
concerned about the possible artifact introduced by this diffusion
process, however, it will only introduce very weak communications. By
default CellChat uses the raw data (i.e.,
object@data.signaling) instead of the projected data. To
use the projected data, users should run the function
projectData before running computeCommunProb,
and then set raw.use = FALSE when running
computeCommunProb.
# subset the expression data of signaling genes for saving computation cost
cellchat <- subsetData(cellchat) # This step is necessary even if using the whole database
future::plan("multisession", workers = 4)
cellchat <- identifyOverExpressedGenes(cellchat)
cellchat <- identifyOverExpressedInteractions(cellchat, variable.both = F)
#> The number of highly variable ligand-receptor pairs used for signaling inference is 43
# project gene expression data onto PPI (Optional: when running it, USER should set `raw.use = FALSE` in the function `computeCommunProb()` in order to use the projected data)
# cellchat <- projectData(cellchat, PPI.mouse)
execution.time = Sys.time() - ptm
print(as.numeric(execution.time, units = "secs"))
#> [1] 16.54874
CellChat infers the biologically significant cell-cell communication by assigning each interaction with a probability value and peforming a permutation test. CellChat models the probability of cell-cell communication by integrating gene expression with prior known knowledge of the interactions between signaling ligands, receptors and their cofactors using the law of mass action.
CAUTION: The number of inferred ligand-receptor pairs clearly depends
on the method for calculating the average gene expression per
cell group. By default, CellChat uses a statistically robust
mean method called ‘trimean’, which produces fewer interactions than
other methods. However, we find that CellChat performs well at
predicting stronger interactions, which is very helpful for narrowing
down on interactions for further experimental validations. In
computeCommunProb, we provide an option for using other
methods, such as 5% and 10% truncated mean, to calculating the average
gene expression. Of note, ‘trimean’ approximates 25% truncated mean,
implying that the average gene expression is zero if the percent of
expressed cells in one group is less than 25%. To use 10% truncated
mean, USER can set type = "truncatedMean" and
trim = 0.1. To determine a proper value of trim, CellChat
provides a function computeAveExpr, which can help to check
the average expression of signaling genes of interest, e.g,
computeAveExpr(cellchat, features = c("CXCL12","CXCR4"), type = "truncatedMean", trim = 0.1).
Therefore, if well-known signaling pathways in the studied biological
process are not predicted, users can try truncatedMean with
lower values of trim to change the method for calculating
the average gene expression per cell group.
To quickly examine the inference results, USER can set
nboot = 20 in computeCommunProb. Then “pvalue
< 0.05” means none of the permutation results are larger than the
observed communication probability.
If well-known signaling pathways in the studied biological process
are not predicted, USER can try truncatedMean with lower
values of trim to change the method for calculating the
average gene expression per cell group.
USERS may need to adjust the parameter scale.distance
when working on data from other spatial transcriptomics technologies.
Please check the documentation in detail via
?computeCommunProb.
When inferring contact-dependent or juxtacrine signaling, users
should provide a value of contact.range and set
contact.dependent = TRUE. Briefly, users can set
contact.range = 10, which is a typical human cell size.
However, for low-resolution spatial data such as 10X visium, it should
be the cell center-to-center distance (i.e.,
contact.range = 100 for 10X visium data). Please check the
vignette
of FAQ on applying CellChat to spatially resolved transcriptomics
data for detailed explanations. In this example, we did not use the
L-R pairs from Cell-Cell Contact signaling, therefore we
can set contact.dependent = FALSE and
contact.range = NULL. But as an illustration, we use the
following settings that lead to the same results.
ptm = Sys.time()
cellchat <- computeCommunProb(cellchat, type = "truncatedMean", trim = 0.1,
distance.use = TRUE, interaction.range = 250, scale.distance = 0.01,
contact.dependent = TRUE, contact.range = 100)
#> truncatedMean is used for calculating the average gene expression per cell group.
#> [1] ">>> Run CellChat on spatial transcriptomics data using distances as constraints of the computed communication probability <<< [2025-04-16 16:54:38.343122]"
#> The input L-R pairs have both secreted signaling and contact-dependent signaling. Run CellChat in a contact-dependent manner for `Cell-Cell Contact` signaling, and in a diffusion manner based on the `interaction.range` for other L-R pairs.
#> [1] ">>> CellChat inference is done. Parameter values are stored in `object@options$parameter` <<< [2025-04-16 16:54:44.828905]"
Users can filter out the cell-cell communication if there are only few cells in certain cell groups. By default, the minimum number of cells required in each cell group for cell-cell communication is 10.
cellchat <- filterCommunication(cellchat, min.cells = 10)
We provide a function subsetCommunication to easily
access the inferred cell-cell communications of interest. For
example,
df.net <- subsetCommunication(cellchat) returns a
data frame consisting of all the inferred cell-cell communications at
the level of ligands/receptors. Set slot.name = "netP" to
access the the inferred communications at the level of signaling
pathways
df.net <- subsetCommunication(cellchat, sources.use = c(1,2), targets.use = c(4,5))
gives the inferred cell-cell communications sending from cell groups 1
and 2 to cell groups 4 and 5.
df.net <- subsetCommunication(cellchat, signaling = c("WNT", "TGFb"))
gives the inferred cell-cell communications mediated by signaling WNT
and TGFb.
CellChat computes the communication probability on signaling pathway level by summarizing the communication probabilities of all ligands-receptors interactions associated with each signaling pathway.
NB: The inferred intercellular communication network of each ligand-receptor pair and each signaling pathway is stored in the slot ‘net’ and ‘netP’, respectively.
cellchat <- computeCommunProbPathway(cellchat)
We can calculate the aggregated cell-cell communication network by
counting the number of links or summarizing the communication
probability. USER can also calculate the aggregated network among a
subset of cell groups by setting sources.use and
targets.use.
cellchat <- aggregateNet(cellchat)
execution.time = Sys.time() - ptm
print(as.numeric(execution.time, units = "secs"))
#> [1] 7.520704
Upon infering the cell-cell communication network, CellChat provides
various functionality for further data exploration, analysis, and
visualization. Here we only showcase the circle plot and
the new spatial plot.
Visualization of cell-cell communication at different
levels: One can visualize the inferred communication network of
signaling pathways using netVisual_aggregate, and visualize
the inferred communication networks of individual L-R pairs associated
with that signaling pathway using netVisual_individual.
Here we take input of one signaling pathway as an example. All the
signaling pathways showing significant communications can be accessed by
cellchat@netP$pathways.
pathways.show <- c("IL16")
# Chord diagram
par(mfrow=c(1,1), xpd = TRUE) # `xpd = TRUE` should be added to show the title
netVisual_aggregate(cellchat, signaling = pathways.show, layout = "chord", scale = T)
execution.time = Sys.time() - ptm
print(as.numeric(execution.time, units = "secs"))
#> [1] 7.658171
Compute and visualize the network centrality scores:
# Compute the network centrality scores
cellchat <- netAnalysis_computeCentrality(cellchat, slot.name = "netP") # the slot 'netP' means the inferred intercellular communication network of signaling pathways
#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
#> ('future_sapply-1') unexpectedly generated random numbers without declaring so.
#> There is a risk that those random numbers are not statistically sound and the
#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
#> ensures that proper, parallel-safe random numbers are produced via the
#> L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set
#> option 'future.rng.onMisuse' to "ignore".
#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
#> ('future_sapply-2') unexpectedly generated random numbers without declaring so.
#> There is a risk that those random numbers are not statistically sound and the
#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
#> ensures that proper, parallel-safe random numbers are produced via the
#> L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set
#> option 'future.rng.onMisuse' to "ignore".
#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
#> ('future_sapply-3') unexpectedly generated random numbers without declaring so.
#> There is a risk that those random numbers are not statistically sound and the
#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
#> ensures that proper, parallel-safe random numbers are produced via the
#> L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set
#> option 'future.rng.onMisuse' to "ignore".
#> Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
#> ('future_sapply-4') unexpectedly generated random numbers without declaring so.
#> There is a risk that those random numbers are not statistically sound and the
#> overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
#> ensures that proper, parallel-safe random numbers are produced via the
#> L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set
#> option 'future.rng.onMisuse' to "ignore".
# Visualize the computed centrality scores using heatmap, allowing ready identification of major signaling roles of cell groups
par(mfrow=c(1,1))
netAnalysis_signalingRole_network(cellchat, signaling = pathways.show, width = 4, height = 2.5, font.size = 10)
Visualize gene expression distribution on tissue
# Take an input of a few genes
gg1 <- spatialFeaturePlot(cellchat, features = c("Il16"), point.size = 0.8, color.heatmap = "Reds", direction = 1, show.legend = F)
gg2 <- spatialFeaturePlot(cellchat, features = c("CD4"), point.size = 0.8, color.heatmap = "Blues", direction = 1, show.legend = F)
patchwork::wrap_plots(gg1, gg2, ncol =2)
# Take an input of a ligand-receptor pair
# spatialFeaturePlot(cellchat, pairLR.use = "IL16_CD4", point.size = 0.5, do.binary = FALSE, cutoff = NULL, enriched.only = F, color.heatmap = "Reds", direction = 1)
NB: Upon infering the intercellular communication network from spatial transcriptomics data, CellChat’s various functionality can be used for further data exploration, analysis, and visualization. Please check other functionalities in the basic tutorial of CellChat
saveRDS(cellchat, file = "cellchat_SPOTS_mouse_spleen.rds")
runCellChatApp(cellchat)
Please check the vignette of FAQ on applying CellChat to spatially resolved transcriptomics data for detailed setting of different technologies of spatial transcriptomics data.
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.5
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Asia/Shanghai
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] patchwork_1.3.0 CellChat_2.2.0 Biobase_2.60.0
#> [4] BiocGenerics_0.46.0 ggplot2_3.5.1 igraph_1.3.5
#> [7] dplyr_1.1.3
#>
#> loaded via a namespace (and not attached):
#> [1] pbapply_1.7-2 rlang_1.1.1 magrittr_2.0.3
#> [4] clue_0.3-64 GetoptLong_1.0.5 gridBase_0.4-7
#> [7] matrixStats_1.0.0 compiler_4.3.1 systemfonts_1.0.4
#> [10] png_0.1-8 vctrs_0.6.3 reshape2_1.4.4
#> [13] ggalluvial_0.12.5 stringr_1.5.0 pkgconfig_2.0.3
#> [16] shape_1.4.6 crayon_1.5.2 fastmap_1.1.1
#> [19] magick_2.8.1 backports_1.4.1 ellipsis_0.3.2
#> [22] labeling_0.4.3 utf8_1.2.3 promises_1.2.1
#> [25] rmarkdown_2.24 network_1.18.1 purrr_1.0.2
#> [28] xfun_0.40 cachem_1.0.8 jsonlite_1.8.7
#> [31] highr_0.10 later_1.3.1 BiocParallel_1.34.2
#> [34] irlba_2.3.5.1 broom_1.0.5 parallel_4.3.1
#> [37] cluster_2.1.4 R6_2.5.1 bslib_0.5.1
#> [40] stringi_1.7.12 RColorBrewer_1.1-3 reticulate_1.31
#> [43] parallelly_1.36.0 car_3.1-2 jquerylib_0.1.4
#> [46] Rcpp_1.0.11.6 iterators_1.0.14 knitr_1.43
#> [49] future.apply_1.11.0 IRanges_2.34.1 FNN_1.1.3.2
#> [52] httpuv_1.6.11 Matrix_1.6-5 tidyselect_1.2.0
#> [55] abind_1.4-5 rstudioapi_0.15.0 yaml_2.3.7
#> [58] doParallel_1.0.17 codetools_0.2-19 listenv_0.9.0
#> [61] lattice_0.21-8 tibble_3.2.1 plyr_1.8.8
#> [64] shiny_1.7.5 withr_2.5.0 coda_0.19-4
#> [67] evaluate_0.21 future_1.33.0 circlize_0.4.16
#> [70] pillar_1.9.0 BiocManager_1.30.22 ggpubr_0.6.0
#> [73] carData_3.0-5 rngtools_1.5.2 foreach_1.5.2
#> [76] stats4_4.3.1 generics_0.1.3 S4Vectors_0.38.1
#> [79] munsell_0.5.0 scales_1.3.0 NMF_0.26
#> [82] ggnetwork_0.5.12 globals_0.16.2 xtable_1.8-4
#> [85] glue_1.6.2 tools_4.3.1 data.table_1.14.9
#> [88] BiocNeighbors_1.18.0 RSpectra_0.16-1 ggsignif_0.6.4
#> [91] registry_0.5-1 Cairo_1.6-2 cowplot_1.1.1
#> [94] grid_4.3.1 tidyr_1.3.0 colorspace_2.1-0
#> [97] presto_1.0.0 cli_3.6.1 fansi_1.0.4
#> [100] svglite_2.1.1 ComplexHeatmap_2.15.4 gtable_0.3.4
#> [103] rstatix_0.7.2.999 sass_0.4.7 digest_0.6.33
#> [106] ggrepel_0.9.3 sna_2.7-1 farver_2.1.1
#> [109] rjson_0.2.21 htmltools_0.5.6 lifecycle_1.0.3
#> [112] statnet.common_4.9.0 GlobalOptions_0.1.2 mime_0.12