Single Sample Gene Set Enrichment Analysis (ssGSEA)¶
Single-sample Gene Set Enrichment Analysis (ssGSEA) is an variation of the GSEA algorithm that instead of calculating enrichment scores for groups of samples (i.e Control vs Disease) and sets of genes (i.e pathways), it provides a score for each each sample and gene set pair (https://www.genepattern.org/modules/docs/ssGSEAProjection/4).
In CKG this function is implemented in the function: run_ssgsea (analytics_core.analytics).
To visualize the results, they can be coupled with a Principal Component Analysis (PCA).
In this case, we are applied ssGSEA on two existing projects in CKG: PP4-interactomics experiments in two ovarian cancer cell lines (OVCAR-5 and COV318) expressing N-terminal FLAG- and C-terminal V5-tagged CT45, respectively from Coscia et al 2018 (https://pubmed.ncbi.nlm.nih.gov/30241606/).
Other References: - https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm
[1]:
from ckg.report_manager import project
from ckg.analytics_core.analytics import analytics
from ckg.analytics_core.viz import viz
from plotly.offline import init_notebook_mode, iplot
%matplotlib inline
init_notebook_mode(connected=True)
c:\users\sande\.conda\envs\pip_rev\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning:
The package pingouin is out of date. Your version is 0.3.11, the latest is 0.3.12.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
WGCNA functions will not work. Module Rpy2 not installed.
R functions will not work. Module Rpy2 not installed.
Reading the Project Data¶
These projects are stored in CKG’s graph database and the project reports need to be generated.
[2]:
project1 = "P0000004"
project2 = "P0000005"
[3]:
p1 = project.Project(project1, datasets={}, knowledge=None, report={}, configuration_files=None)
p1.build_project(force=False)
p1.generate_report()
[4]:
p2 = project.Project(project2, datasets={}, knowledge=None, report={}, configuration_files=None)
p2.build_project(force=False)
p2.generate_report()
[5]:
proteomics_dataset1 = p1.get_dataset("interactomics")
proteomics_dataset2 = p2.get_dataset("interactomics")
Gene Ontology Annotation¶
We can use the Gene Ontology annotation generated during the report creation and available as a dataframe to provide the gene/protein sets for the ssGSEA analysis.
One could also obtain this annotation directly from the graph database using one of the existing Cypher queries and the list of proteins (set).
[6]:
annotations1 = proteomics_dataset1.get_dataframe("go annotation")
annotations2 = proteomics_dataset2.get_dataframe("go annotation")
[7]:
annotations1
[7]:
annotation | group | identifier | source | |
---|---|---|---|---|
0 | mitochondrial genome maintenance | None | TYMP~P19971 | UniProt |
1 | mitochondrial genome maintenance | None | MGME1~Q9BQP7 | UniProt |
2 | mitochondrial genome maintenance | None | SLC25A4~P12235 | UniProt |
3 | mitochondrial genome maintenance | None | SLC25A33~Q9BSK2 | UniProt |
4 | mitochondrial genome maintenance | None | OPA1~O60313 | UniProt |
5 | mitochondrial genome maintenance | None | MSTO1~Q9BUK6 | UniProt |
6 | single strand break repair | None | ERCC6~Q03468 | UniProt |
7 | single strand break repair | None | SIRT1~Q96EB6 | UniProt |
8 | single strand break repair | None | APTX~Q7Z2E3 | UniProt |
9 | single strand break repair | None | APLF~Q8IW19 | UniProt |
10 | single strand break repair | None | XRCC1~P18887 | UniProt |
11 | regulation of DNA recombination | None | THOC1~Q96FV9 | UniProt |
12 | regulation of DNA recombination | background | KPNA2~P52292 | UniProt |
13 | regulation of DNA recombination | background | ALYREF~Q86V81 | UniProt |
14 | regulation of DNA recombination | None | KPNA1~P52294 | UniProt |
15 | regulation of DNA recombination | None | SMARCAD1~Q9H4L7 | UniProt |
16 | regulation of mitotic recombination | None | MRE11~P49959 | UniProt |
17 | regulation of mitotic recombination | None | RAD50~Q92878 | UniProt |
18 | mitotic spindle elongation | None | PRC1~O43663 | UniProt |
19 | mitotic spindle elongation | None | KIF23~Q02241 | UniProt |
20 | maltose metabolic process | None | MGAM~O43451 | UniProt |
21 | maltose metabolic process | None | GANC~Q8TET4 | UniProt |
22 | maltose metabolic process | None | GAA~P10253 | UniProt |
23 | ribosomal large subunit assembly | None | MDN1~Q9NU22 | UniProt |
24 | ribosomal large subunit assembly | background | RPL11~P62913 | UniProt |
25 | ribosomal large subunit assembly | background | BRIX1~Q8TDN6 | UniProt |
26 | ribosomal large subunit assembly | background | RPL12~P30050 | UniProt |
27 | ribosomal large subunit assembly | background | RPL6~Q02878 | UniProt |
28 | ribosomal large subunit assembly | background | RPL3~P39023 | UniProt |
29 | ribosomal large subunit assembly | None | PPAN~Q9NQ55 | UniProt |
... | ... | ... | ... | ... |
70530 | regulation of store-operated calcium entry | None | SPINK1~P00995 | UniProt |
70531 | negative regulation of cation channel activity | None | MMP9~P14780 | UniProt |
70532 | positive regulation of cation channel activity | None | ANK3~Q12955 | UniProt |
70533 | positive regulation of cation channel activity | None | CTSS~P25774 | UniProt |
70534 | regulation of semaphorin-plexin signaling pathway | None | NCAM1~P13591 | UniProt |
70535 | negative regulation of cysteine-type endopepti... | None | PIH1D1~Q9NWS0 | UniProt |
70536 | negative regulation of cysteine-type endopepti... | None | PLAUR~Q03405 | UniProt |
70537 | negative regulation of cysteine-type endopepti... | background | PARK7~Q99497 | UniProt |
70538 | negative regulation of cysteine-type endopepti... | None | CAAP1~Q9H8G2 | UniProt |
70539 | negative regulation of cysteine-type endopepti... | None | MMP9~P14780 | UniProt |
70540 | negative regulation of cysteine-type endopepti... | None | NLE1~Q9NVX2 | UniProt |
70541 | positive regulation of cysteine-type endopepti... | None | TFAP4~Q01664 | UniProt |
70542 | positive regulation of cysteine-type endopepti... | None | HTRA2~O43464 | UniProt |
70543 | positive regulation of cysteine-type endopepti... | None | GSN~P06396 | UniProt |
70544 | positive regulation of cysteine-type endopepti... | None | LGALS9~O00182 | UniProt |
70545 | positive regulation of cysteine-type endopepti... | None | FAS~P25445 | UniProt |
70546 | negative regulation of cysteine-type endopepti... | None | PAK2~Q13177 | UniProt |
70547 | positive regulation of cysteine-type endopepti... | background | RPS3~P23396 | UniProt |
70548 | positive regulation of unsaturated fatty acid ... | None | ABCD1~P33897 | UniProt |
70549 | regulation of caveolin-mediated endocytosis | None | SRC~P12931 | UniProt |
70550 | negative regulation of caveolin-mediated endoc... | None | PROM2~Q8N271 | UniProt |
70551 | positive regulation of caveolin-mediated endoc... | None | ITSN1~Q15811 | UniProt |
70552 | positive regulation of caveolin-mediated endoc... | None | NEDD4L~Q96PU5 | UniProt |
70553 | malonyl-CoA catabolic process | None | MLYCD~O95822 | UniProt |
70554 | malonyl-CoA biosynthetic process | None | ACACA~Q13085 | UniProt |
70555 | malonyl-CoA biosynthetic process | None | ACACB~O00763 | UniProt |
70556 | lipoxin biosynthetic process | None | ALOX5~P09917 | UniProt |
70557 | lipoxin biosynthetic process | None | ALOX5AP~P20292 | UniProt |
70558 | lipoxin A4 biosynthetic process | None | ALOX15~P16050 | UniProt |
70559 | lysobisphosphatidic acid metabolic process | None | ACP6~Q9NPH0 | UniProt |
70560 rows × 4 columns
[8]:
processed_dataset1 = proteomics_dataset1.get_dataframe('processed')
processed_dataset2 = proteomics_dataset2.get_dataframe('processed')
[9]:
processed_dataset1.head()
[9]:
ABCD3~P28288 | ABLIM1~O14639 | ACAP2~Q15057 | ACIN1~Q9UKV3 | ACTA1~P68133 | ACTG1~P63261 | ACTN4~O43707 | ACTR1A~P61163 | ACTR3~P61158 | ADAR~P55265 | ... | ZCCHC17~Q9NP64 | ZFR~Q96KR1 | ZNF326~Q5BKZ1 | ZNF629~Q9UEG4 | ZNF638~Q14966 | ZNRD2~O60232 | ZRANB2~O95218 | group | sample | subject | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 27.472355 | 24.510301 | 31.742221 | 29.535102 | 30.760022 | 35.118418 | 26.787807 | 26.299604 | 27.753916 | 24.588465 | ... | 28.397584 | 25.921480 | 27.055173 | 24.576694 | 23.742444 | 29.115204 | 29.057783 | FLAG | AS943 | S138 |
1 | 27.460296 | 26.011139 | 31.011320 | 31.247341 | 30.421617 | 34.972458 | 29.027229 | 26.210650 | 25.878253 | 25.974869 | ... | 28.232973 | 26.284133 | 26.887625 | 23.669367 | 25.411049 | 28.268343 | 28.489300 | FLAG | AS944 | S139 |
2 | 27.505821 | 23.608262 | 31.572265 | 29.430139 | 31.061174 | 35.345990 | 25.906052 | 25.960580 | 27.496556 | 24.543980 | ... | 28.640687 | 25.899775 | 27.219189 | 24.685440 | 24.414733 | 29.080731 | 29.234413 | FLAG | AS945 | S140 |
3 | 27.012120 | 25.254434 | 31.353738 | 29.941257 | 30.312787 | 34.827996 | 25.787982 | 25.991854 | 26.494452 | 23.478376 | ... | 28.908190 | 25.790942 | 26.534750 | 24.141646 | 25.989279 | 28.972269 | 30.331381 | FLAG+CT45 | AS946 | S141 |
4 | 26.591493 | 25.377726 | 31.171406 | 31.171585 | 30.140900 | 35.013676 | 27.243544 | 25.666928 | 26.412464 | 23.679287 | ... | 28.588887 | 25.749550 | 27.004746 | 24.608964 | 26.024305 | 28.598112 | 29.874860 | FLAG+CT45 | AS947 | S142 |
5 rows × 923 columns
ssGSEA¶
The function needs only the processed dataset (interactomics data matrix) and the annotation matrix. The result provides two dataframes:
es – Enrichment scores – degree of overrepresentation
nes – Normalized Enrichment scores – corrects by set size and correlations between gene sets and the dataset
[10]:
ssgsea_result1 = analytics.run_ssgsea(processed_dataset1, annotations1, annotation_col='annotation', identifier_col='identifier', set_index=['group', 'sample','subject'], outdir=None, min_size=10, scale=False, permutations=0)
[11]:
ssgsea_result1['nes']
[11]:
DNA duplex unwinding | DNA repair | DNA replication | Fc-epsilon receptor signaling pathway | Fc-gamma receptor signaling pathway involved in phagocytosis | G2/M transition of mitotic cell cycle | MAPK cascade | NIK/NF-kappaB signaling | RNA export from nucleus | RNA metabolic process | ... | translational initiation | transmembrane transport | tumor necrosis factor-mediated signaling pathway | ubiquitin-dependent protein catabolic process | vesicle-mediated transport | viral process | viral transcription | group | sample | subject | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FLAG_AS943_S138 | 0.011977 | 0.176476 | 0.052199 | 0.299403 | 0.211067 | 0.152337 | 0.336622 | 0.251237 | 0.117620 | 0.200810 | ... | 0.429929 | 0.254752 | 0.304584 | 0.069326 | 0.002255 | 0.120948 | 0.464412 | FLAG | AS943 | S138 |
FLAG_AS944_S139 | 0.063279 | 0.147166 | 0.069440 | 0.248400 | 0.064426 | 0.096150 | 0.328776 | 0.239456 | 0.221997 | 0.254738 | ... | 0.432992 | 0.261562 | 0.314577 | 0.021409 | 0.002524 | 0.098014 | 0.468801 | FLAG | AS944 | S139 |
FLAG_AS945_S140 | 0.044976 | 0.164713 | 0.038787 | 0.294322 | 0.225206 | 0.159580 | 0.339241 | 0.246767 | 0.111281 | 0.173833 | ... | 0.432074 | 0.266003 | 0.300269 | 0.058290 | 0.017175 | 0.099421 | 0.468696 | FLAG | AS945 | S140 |
FLAG+CT45_AS946_S141 | 0.023172 | 0.153221 | 0.062612 | 0.289909 | 0.137606 | 0.088625 | 0.355786 | 0.289914 | 0.111026 | 0.147257 | ... | 0.496105 | 0.260089 | 0.307746 | 0.076835 | -0.011761 | 0.069829 | 0.538429 | FLAG+CT45 | AS946 | S141 |
FLAG+CT45_AS947_S142 | 0.062469 | 0.151500 | 0.055752 | 0.247872 | 0.110875 | 0.067426 | 0.325069 | 0.252344 | 0.209442 | 0.207055 | ... | 0.484496 | 0.233157 | 0.285062 | 0.057515 | -0.024422 | 0.071443 | 0.524618 | FLAG+CT45 | AS947 | S142 |
FLAG+CT45_AS948_S143 | 0.080360 | 0.165221 | 0.061832 | 0.278415 | 0.106264 | 0.091682 | 0.341581 | 0.266505 | 0.128023 | 0.169994 | ... | 0.496060 | 0.245163 | 0.298267 | 0.059050 | -0.000553 | 0.079137 | 0.535138 | FLAG+CT45 | AS948 | S143 |
6 rows × 153 columns
[12]:
pca_result, args = analytics.run_pca(ssgsea_result1['nes'], drop_cols=['sample', 'subject'], group='group')
[13]:
args.update({"loadings":15, "title":'PCA plot groups', 'height':600, 'width':700, 'factor':1})
plot = viz.get_pca_plot(pca_result, identifier='pca', args=args)
iplot(plot.figure)
[14]:
ssgsea_result2 = analytics.run_ssgsea(processed_dataset2, annotations2, annotation_col='annotation', identifier_col='identifier', set_index=['group', 'sample','subject'], outdir=None, min_size=10, scale=False, permutations=0)
[15]:
ssgsea_result2['nes']
[15]:
ATP-dependent chromatin remodeling | COPII vesicle coating | DNA duplex unwinding | DNA repair | Fc-epsilon receptor signaling pathway | Fc-gamma receptor signaling pathway involved in phagocytosis | G2/M transition of mitotic cell cycle | MAPK cascade | NIK/NF-kappaB signaling | RNA export from nucleus | ... | translation | translational initiation | transmembrane transport | tumor necrosis factor-mediated signaling pathway | ubiquitin-dependent protein catabolic process | viral process | viral transcription | group | sample | subject | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
V5_AS921_S116 | 0.044262 | -0.010696 | 0.204359 | 0.170231 | -0.166454 | 0.278150 | 0.180737 | -0.070821 | -0.434774 | 0.058232 | ... | 0.387313 | 0.381839 | -0.212196 | -0.191530 | -0.261737 | 0.112756 | 0.417030 | V5 | AS921 | S116 |
V5_AS922_S117 | 0.053901 | -0.048848 | 0.160694 | 0.154869 | -0.225246 | 0.328772 | 0.165380 | -0.101982 | -0.542500 | 0.041150 | ... | 0.380391 | 0.377981 | -0.269878 | -0.224860 | -0.292436 | 0.100454 | 0.410628 | V5 | AS922 | S117 |
V5_AS923_S118 | -0.052357 | -0.011790 | 0.169802 | 0.169539 | -0.167112 | 0.219772 | 0.192022 | -0.040777 | -0.411418 | 0.043397 | ... | 0.395837 | 0.390451 | -0.190605 | -0.173792 | -0.278322 | 0.108900 | 0.428926 | V5 | AS923 | S118 |
V5+CT45_AS924_S119 | 0.062566 | -0.088893 | 0.197607 | 0.224874 | -0.112584 | 0.238658 | 0.229253 | -0.034617 | -0.182225 | -0.050214 | ... | 0.387171 | 0.373498 | -0.169060 | -0.074586 | -0.128237 | 0.103801 | 0.417750 | V5+CT45 | AS924 | S119 |
V5+CT45_AS925_S120 | 0.073479 | -0.093424 | 0.194265 | 0.216908 | -0.069579 | 0.207769 | 0.211539 | -0.013577 | -0.146667 | -0.012140 | ... | 0.382935 | 0.375096 | -0.141614 | -0.058460 | -0.130907 | 0.093867 | 0.413617 | V5+CT45 | AS925 | S120 |
V5+CT45_AS926_S121 | 0.079657 | -0.091373 | 0.180172 | 0.183295 | -0.134906 | 0.270098 | 0.203470 | -0.042416 | -0.206953 | -0.033957 | ... | 0.379303 | 0.368977 | -0.177034 | -0.106843 | -0.090751 | 0.082380 | 0.413381 | V5+CT45 | AS926 | S121 |
6 rows × 167 columns
[16]:
pca_result, args = analytics.run_pca(ssgsea_result2['nes'], drop_cols=['sample', 'subject'], group='group')
[17]:
args.update({"loadings":15, "title":'PCA plot groups', 'height':600, 'width':700, 'factor':1})
plot = viz.get_pca_plot(pca_result, identifier='pca', args=args)
iplot(plot.figure)
[ ]: