Tutorial#

Open In Colab

In this tutorial we demonstrate the use of the CoRegTor tool to find transcription co regulators for a gene from gene expression data.

Objective#

The aim of this tutorial is to find potential co-regulators of the gene GFAP by analyzing tissue gene expression data for Frontal cortex in adult brain.

Step 1 : Install and import the CoRegTor package and other dependencies#

Using pip, pip install coregtor .

Or poetry install coregtor to add the package as a dependency in your project

# Install coregtor if not already installed, then import it
try:
    import coregtor
except ImportError:
    %pip install coregtor
    import coregtor

# Additional imports
from pathlib import Path
import pandas as pd

Step 2 : Get data and load it#

Let’s gather all the data we require:

  • Gene Expression data ge_brain.gct. This file contains tissue gene expression data for the Frontal Cortex (BA9) in an adult brain. The data is downloaded from the GTEx portal

  • List of transcription factors human_tf.txt : This file was downloaded from aertslab.org

We load this data in dataframes. Now we are ready to use CoRegTor!

base_path = Path("docs/temp") # UPDATE THIS
data_file_path = Path(base_path/"brain_ge.gct") # UPDATE THIS 
tf_file_path = Path(base_path/"human_tf.txt") # UPDATE THIS
target_gene_name = "GFAP" # the gene we are interested in 

# load data 
ge_data = coregtor.utils.read_GE_data(file_path=data_file_path) # this is just a utility method
tf_data = pd.read_csv(tf_file_path, names=["gene_name"], header=None)
ge_data
gene_name DDX11L1 WASH7P MIR6859-1 MIR1302-2HG FAM138A OR4G4P OR4G11P OR4F5 ENSG00000238009 CICP27 ... MT-ND4 MT-TH MT-TS2 MT-TL2 MT-ND5 MT-ND6 MT-TE MT-CYB MT-TT MT-TP
sample_name
GTEX-1117F-0011-R10b-SM-GI4VE 0.000000 3.57928 0.0 0.093825 0.000000 0.000000 0.028731 0.046554 0.039501 0.058675 ... 49762.2 1.177570 2.754330 0.000000 7311.39 4788.56 6.47666 28676.5 3.077750 1.19489
GTEX-111FC-0011-R10a-SM-GIN8G 0.000000 2.32926 0.0 0.025333 0.000000 0.052233 0.031030 0.016759 0.000000 0.031684 ... 44692.0 0.953824 0.000000 1.544930 6831.00 5164.36 6.67677 26950.9 1.661970 3.54879
GTEX-117XS-0011-R10b-SM-GIN8Z 0.000000 4.79425 0.0 0.000000 0.046843 0.067977 0.020191 0.043622 0.013880 0.032987 ... 39249.9 0.827551 0.967814 1.206360 5603.53 3585.51 6.20663 20794.9 0.432584 2.93902
GTEX-1192W-0011-R10b-SM-GHWOF 0.000000 3.83774 0.0 0.032159 0.045693 0.000000 0.039392 0.053189 0.013539 0.000000 ... 50750.5 1.614480 2.832190 1.176750 9433.33 7697.90 12.51220 23405.4 1.265900 3.68601
GTEX-1192X-0011-R10a-SM-DO941 0.040388 1.47233 0.0 0.040318 0.000000 0.000000 0.049385 0.040010 0.050922 0.000000 ... 31566.9 2.024070 0.591784 0.983528 4424.64 3568.41 4.55416 14051.5 0.529019 1.54038
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
GTEX-ZVZQ-0011-R10b-SM-51MRT 0.017553 1.91964 0.0 0.070089 0.000000 0.180647 0.064389 0.092738 0.044262 0.043831 ... 44939.6 3.078850 1.543150 2.137230 7019.94 6874.29 16.71380 24296.2 1.379490 2.23152
GTEX-ZXG5-0011-R10a-SM-57WDD 0.000000 1.07536 0.0 0.036646 0.000000 0.000000 0.044887 0.084853 0.000000 0.000000 ... 62226.7 2.759570 1.613650 4.469730 11407.90 11061.80 15.17770 38732.2 1.442500 1.86677
GTEX-ZYFD-0011-R10a-SM-GPI91 0.000000 2.71020 0.0 0.000000 0.000000 0.000000 0.037432 0.000000 0.000000 0.030577 ... 43740.3 0.000000 2.691290 0.745473 6574.39 5241.85 9.20498 24934.3 0.000000 1.55672
GTEX-ZYY3-0011-R10a-SM-GNTAZ 0.015919 3.29538 0.0 0.000000 0.000000 0.065533 0.058395 0.031540 0.066902 0.079502 ... 40835.8 1.196680 0.933006 3.101260 6228.45 5626.94 10.37120 20992.5 5.004310 2.83332
GTEX-ZZPT-0011-R10b-SM-GPI8B 0.000000 2.85899 0.0 0.049821 0.000000 0.000000 0.030513 0.000000 0.041949 0.024925 ... 40128.3 1.250570 0.000000 1.215350 8300.34 8785.10 28.13790 30362.8 2.614830 3.80689

269 rows × 59033 columns

tf_data
gene_name
0 ZNF354C
1 KLF12
2 ZNF143
3 ZIC2
4 ZNF274
... ...
1887 ZNF826P
1888 ZNF827
1889 ZNF831
1890 ZRSR2
1891 ZSWIM1

1892 rows × 1 columns

Step 3 : Create Ensemble model#

The first step in the process is to generate a random forest ensemble model using the gene expression data that predicts the expression value of the gene “GFAP” based on all other genes.

Since we are interested in identifying potential transcription co regulators, we filter the data to include only transcription factors. We use the create_model_input function for filtering and preparing the input for training. It takes a dataframe t_factors which should have a column gene_name listing the transcription factors to consider. The function outputs a tuple containing 2 data frames: X with the feature genes and Y with the target gene. These can then be passed to the generate_model function.

# first generate the training input for the model
X,Y = coregtor.create_model_input(ge_data,target_gene_name,tf_data)

# use the training data to create a model
model = coregtor.create_model(X,Y,"rf")
model
RandomForestRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Step 4 : Generating tree paths#

The genes on the root node are important. These also serve as potential regulators in other tree based GRN inference methods.

Forest based ensemble methods contains multiple decision tress. We want to analyze the structure of the trees in the model.

For each tree, there exists multiple root to leaf paths. We first enumerate all the paths in all the trees in the model.

all_paths = coregtor.tree_paths(model,X,Y)
all_paths
tree source target path_length node1 node2 node3 node4 node5 node6 ... node9 node10 node11 node12 node13 node14 node15 node16 node17 node18
0 0 YBX1 GFAP 8 TEAD2 HES7 ISL1 LHX4 ZNF135 ZNF536 ... None None None None None None None None None None
1 0 YBX1 GFAP 12 TEAD2 TSC22D4 PAX5 EBF1 GSX2 TFEB ... VENTX PAX3 ZNF70 None None None None None None None
2 0 YBX1 GFAP 13 TEAD2 TSC22D4 PAX5 EBF1 GSX2 TFEB ... VENTX PAX3 NR1I2 TBX21 None None None None None None
3 0 YBX1 GFAP 6 POU5F1B ZNF433 GLIS3 ZKSCAN3 ZNF436 None ... None None None None None None None None None None
4 0 YBX1 GFAP 10 TEAD2 TSC22D4 PAX5 SNAI2 RFX6 HOXA5 ... SMAD2 None None None None None None None None None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11013 99 FEZ1 GFAP 8 YBX1 IRF9 HMG20B ZBTB18 ZNF683 ZNF556 ... None None None None None None None None None None
11014 99 FEZ1 GFAP 14 YBX1 IRF9 HMG20B ZBTB18 TFEB NKX2-2 ... ZNF683 TBX5 BHLHE40 MAGEA8 E2F3 None None None None None
11015 99 FEZ1 GFAP 15 YBX1 IRF9 HMG20B ZBTB18 TFEB NKX2-2 ... TBX22 FOXP1 CREB3L4 ZNF554 MAP4K2 ZNF814 None None None None
11016 99 FEZ1 GFAP 14 YBX1 IRF9 HMG20B ZBTB18 TFEB NKX2-2 ... TBX22 FOXP1 CREB3L4 ZNF554 MAP4K2 None None None None None
11017 99 FEZ1 GFAP 7 YBX1 IRF9 HMG20B TAF1L NHLH1 NAP1L1 ... None None None None None None None None None None

11018 rows × 22 columns

Step 5 : Generating a set of common sub paths (or the context) for all root nodes#

In the table of paths above, we observe many unique genes appear as root nodes. Since we train the decision trees to predict the same target gene, the leaf nodes for all paths are the same.

We can thus consider these paths as potential regulatory links, where genes at the root regulates the target. All the genes at the root become potential regulators or the target. However, note that there are multiple intermediate nodes between the root and the target. Comparing these

We consider the root nodes as potential regulators of the target gene and to find if they are co regulators, we compare how similar the intermediate nodes are in between 2 unique root nodes.

pathset = coregtor.create_context(all_paths)
pathset.keys()
dict_keys(['YBX1', 'POU2AF1', 'PIR', 'ALX3', 'HCLS1', 'SP110', 'TIGD1', 'ZNF577', 'TAGLN2', 'KLF15', 'ZRSR2', 'FOXB2', 'ZNF837', 'IRF9', 'HMBOX1', 'PPP2R3B', 'DPRX', 'NFYA', 'ZNF775', 'SMC3', 'ID1', 'ZNF768', 'NKX6-2', 'ZNF706', 'CLK1', 'HDAC1', 'ZNF322', 'RAD21', 'PHF21A', 'MAFK', 'CREB3L2', 'TBX19', 'HTATIP2', 'ZSCAN16', 'IKZF2', 'SOX10', 'MXI1', 'BCL6', 'FEZ1'])

Step 6 : Comparing context of all root nodes with each other#

similarity

# transforming the context into a more comparable representation 
gf_histogram = coregtor.transform_context(pathset,method="gene_frequency")
gf_histogram
HES7 ISL1 LHX4 ZNF135 ZNF536 TSC22D4 PAX5 EBF1 GSX2 TFEB ... ZNF34 TCEAL6 RXRB FOXI1 PSMD12 KLF12 ZNF783 ZNF749 ZCCHC14 FOXP1
YBX1 8 4 4 3 4 362 143 42 42 26 ... 0 0 0 0 0 0 0 0 0 0
POU2AF1 0 5 0 0 0 74 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
PIR 0 0 0 0 0 12 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
ALX3 2 0 0 0 0 61 0 0 0 2 ... 0 0 0 0 0 0 0 0 0 0
HCLS1 0 0 1 0 0 63 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
SP110 0 0 0 0 0 48 47 6 1 0 ... 0 0 0 0 0 0 0 0 0 0
TIGD1 0 0 0 0 0 72 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
ZNF577 0 0 0 0 0 3 21 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
TAGLN2 0 0 3 0 0 69 12 0 5 0 ... 0 0 0 0 0 0 0 0 0 0
KLF15 1 0 0 0 0 139 22 0 0 6 ... 0 0 0 0 0 0 0 0 0 0
ZRSR2 0 0 0 0 0 0 0 2 0 0 ... 0 0 0 0 0 0 0 0 0 0
FOXB2 0 0 0 0 1 178 1 14 0 0 ... 0 0 0 0 0 0 0 0 0 0
ZNF837 0 0 0 0 0 0 1 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
IRF9 2 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
HMBOX1 3 2 0 0 12 443 117 0 11 0 ... 0 0 0 0 0 0 0 0 0 0
PPP2R3B 0 0 0 0 1 0 0 0 6 0 ... 0 0 0 0 0 0 0 0 0 0
DPRX 0 0 1 0 1 114 10 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
NFYA 0 0 0 0 0 88 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
ZNF775 0 0 0 1 0 1 0 0 4 0 ... 0 0 0 0 0 0 0 0 0 0
SMC3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
ID1 0 0 0 0 0 57 0 3 30 5 ... 0 0 0 0 0 0 0 0 0 0
ZNF768 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
NKX6-2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
ZNF706 0 1 0 0 2 33 0 0 7 0 ... 0 0 0 0 0 0 0 0 0 0
CLK1 7 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
HDAC1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
ZNF322 0 0 0 0 0 66 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
RAD21 0 0 0 0 0 0 8 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
PHF21A 0 0 2 0 0 79 0 1 5 0 ... 1 0 0 0 0 0 0 0 0 0
MAFK 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
CREB3L2 0 0 0 0 0 44 42 0 2 0 ... 0 0 1 0 0 0 0 0 0 0
TBX19 0 0 0 0 0 0 0 3 0 0 ... 1 0 0 3 0 0 0 0 0 0
HTATIP2 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 0 1 4 1 0 0 0
ZSCAN16 0 0 0 0 0 45 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
IKZF2 0 0 0 0 0 30 14 0 0 3 ... 1 0 0 0 0 0 0 1 0 0
SOX10 0 0 0 0 0 32 0 0 0 0 ... 0 0 0 0 0 0 0 0 2 0
MXI1 2 0 0 0 0 0 17 0 0 0 ... 0 0 0 0 3 0 0 0 0 0
BCL6 0 0 0 0 3 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
FEZ1 3 0 0 0 0 0 0 0 0 24 ... 0 0 0 0 0 0 0 0 0 4

39 rows × 1815 columns

sim_matrix = coregtor.compare_context(gf_histogram,"cosine")
sim_matrix
YBX1 POU2AF1 PIR ALX3 HCLS1 SP110 TIGD1 ZNF577 TAGLN2 KLF15 ... MAFK CREB3L2 TBX19 HTATIP2 ZSCAN16 IKZF2 SOX10 MXI1 BCL6 FEZ1
YBX1 1.000000 0.390690 0.282963 0.441397 0.347206 0.357401 0.437412 0.178489 0.427796 0.527324 ... 0.207780 0.443300 0.268885 0.110692 0.464645 0.458956 0.444353 0.229340 0.132132 0.048082
POU2AF1 0.390690 1.000000 0.097367 0.189427 0.191951 0.271433 0.280371 0.039441 0.151647 0.246245 ... 0.057232 0.274348 0.135393 0.009952 0.208366 0.147100 0.172599 0.025864 0.054748 0.272656
PIR 0.282963 0.097367 1.000000 0.271940 0.081994 0.174818 0.071998 0.030674 0.101351 0.290197 ... 0.055558 0.295340 0.404131 0.010030 0.387235 0.326790 0.390717 0.013391 0.030086 0.006771
ALX3 0.441397 0.189427 0.271940 1.000000 0.462176 0.183100 0.205235 0.306286 0.466232 0.456986 ... 0.297251 0.291815 0.278464 0.088860 0.355313 0.272073 0.332201 0.017341 0.036216 0.077981
HCLS1 0.347206 0.191951 0.081994 0.462176 1.000000 0.182966 0.269786 0.328728 0.485007 0.328482 ... 0.336335 0.152378 0.052297 0.017688 0.166456 0.104280 0.114887 0.083077 0.110200 0.024756
SP110 0.357401 0.271433 0.174818 0.183100 0.182966 1.000000 0.251293 0.045317 0.140399 0.264442 ... 0.066215 0.278668 0.182943 0.060086 0.228899 0.191671 0.188913 0.135039 0.049885 0.072055
TIGD1 0.437412 0.280371 0.071998 0.205235 0.269786 0.251293 1.000000 0.027062 0.219067 0.350281 ... 0.023705 0.221418 0.030539 0.049867 0.235625 0.214205 0.204016 0.124777 0.068017 0.005000
ZNF577 0.178489 0.039441 0.030674 0.306286 0.328728 0.045317 0.027062 1.000000 0.344919 0.170948 ... 0.279431 0.082814 0.004852 0.069558 0.039215 0.026043 0.008941 0.027977 0.024586 0.001571
TAGLN2 0.427796 0.151647 0.101351 0.466232 0.485007 0.140399 0.219067 0.344919 1.000000 0.309809 ... 0.433702 0.177477 0.062660 0.066611 0.173707 0.171206 0.169182 0.281993 0.025474 0.040933
KLF15 0.527324 0.246245 0.290197 0.456986 0.328482 0.264442 0.350281 0.170948 0.309809 1.000000 ... 0.132012 0.342737 0.285807 0.428690 0.413125 0.599421 0.434714 0.308292 0.299045 0.192649
ZRSR2 0.033044 0.263826 0.008146 0.132157 0.014655 0.058607 0.010481 0.027319 0.078656 0.047919 ... 0.022265 0.110977 0.006141 0.004314 0.005255 0.011769 0.017606 0.006950 0.028671 0.314943
FOXB2 0.610984 0.357531 0.276302 0.345941 0.287703 0.249759 0.464244 0.024721 0.279255 0.550246 ... 0.014182 0.369243 0.254869 0.171139 0.487059 0.503505 0.461591 0.202323 0.219223 0.006372
ZNF837 0.295305 0.064716 0.330254 0.247707 0.039826 0.184614 0.047311 0.007792 0.188687 0.248245 ... 0.072768 0.236638 0.390586 0.004310 0.359435 0.313760 0.357238 0.156812 0.016106 0.002320
IRF9 0.295182 0.064347 0.335614 0.376965 0.061234 0.166046 0.039170 0.009453 0.094527 0.300712 ... 0.005489 0.259299 0.464632 0.122566 0.338762 0.285473 0.344640 0.028046 0.042869 0.098552
HMBOX1 0.647789 0.329978 0.241217 0.462533 0.423149 0.397572 0.376354 0.216785 0.415346 0.647799 ... 0.189737 0.395000 0.263515 0.341196 0.382428 0.520478 0.392212 0.374563 0.368752 0.215369
PPP2R3B 0.048033 0.009664 0.016664 0.039575 0.145264 0.053923 0.015605 0.030973 0.032809 0.009607 ... 0.006765 0.034600 0.139428 0.054337 0.041394 0.004065 0.010393 0.018624 0.011406 0.044133
DPRX 0.465090 0.272496 0.073816 0.339412 0.450710 0.202101 0.390178 0.181192 0.475072 0.414972 ... 0.234993 0.192898 0.051587 0.114427 0.204696 0.246931 0.198077 0.253205 0.199409 0.049899
NFYA 0.438038 0.296309 0.086302 0.221245 0.392647 0.187856 0.436239 0.018539 0.211032 0.328365 ... 0.005740 0.246735 0.087293 0.015994 0.331019 0.177932 0.235494 0.012583 0.001819 0.030263
ZNF775 0.288816 0.076733 0.302994 0.202763 0.010507 0.131376 0.064825 0.004195 0.105839 0.472321 ... 0.023656 0.245449 0.344232 0.332860 0.315589 0.621905 0.406507 0.399331 0.325837 0.000000
SMC3 0.274880 0.362013 0.310281 0.251701 0.012456 0.184213 0.003516 0.057674 0.071365 0.228840 ... 0.061498 0.354918 0.384980 0.009969 0.325361 0.273689 0.324689 0.007600 0.019026 0.349449
ID1 0.556863 0.264554 0.300524 0.303235 0.211422 0.221503 0.240994 0.023985 0.303998 0.486265 ... 0.128613 0.294769 0.379898 0.229686 0.394576 0.519505 0.446451 0.403588 0.210834 0.061608
ZNF768 0.272612 0.056057 0.376421 0.261556 0.033323 0.198101 0.020632 0.040113 0.069010 0.298104 ... 0.018190 0.282073 0.441665 0.017832 0.386520 0.346977 0.395583 0.014390 0.202221 0.067571
NKX6-2 0.285358 0.062473 0.415863 0.284157 0.021398 0.202981 0.017929 0.002389 0.077482 0.285184 ... 0.047676 0.297630 0.465344 0.017281 0.452095 0.359295 0.428401 0.023911 0.020326 0.002228
ZNF706 0.359139 0.291627 0.266523 0.466796 0.329536 0.240143 0.139865 0.229249 0.335848 0.412629 ... 0.240955 0.277752 0.377293 0.048290 0.323392 0.305714 0.374505 0.069505 0.136062 0.114209
CLK1 0.247502 0.070450 0.353188 0.384848 0.004387 0.157336 0.005762 0.018712 0.156468 0.305405 ... 0.003527 0.256807 0.413187 0.130467 0.377501 0.318310 0.376075 0.004852 0.003137 0.130224
HDAC1 0.065214 0.015740 0.021484 0.021804 0.280865 0.070350 0.016731 0.080937 0.196380 0.007186 ... 0.006690 0.048311 0.012432 0.081104 0.043150 0.010721 0.034793 0.148792 0.129979 0.008133
ZNF322 0.383818 0.326568 0.063726 0.226584 0.234236 0.161013 0.374197 0.013847 0.184562 0.257207 ... 0.009382 0.228177 0.014658 0.000900 0.237554 0.151398 0.189613 0.009085 0.002382 0.105703
RAD21 0.255462 0.067092 0.339623 0.269857 0.013692 0.224968 0.002939 0.013755 0.077211 0.284662 ... 0.001262 0.270840 0.416535 0.014091 0.355123 0.365356 0.399475 0.011585 0.003843 0.009626
PHF21A 0.442503 0.189444 0.217075 0.618650 0.599663 0.183091 0.260008 0.401262 0.497952 0.569636 ... 0.334362 0.246240 0.234853 0.193900 0.294922 0.417305 0.354604 0.199902 0.208485 0.042961
MAFK 0.207780 0.057232 0.055558 0.297251 0.336335 0.066215 0.023705 0.279431 0.433702 0.132012 ... 1.000000 0.008271 0.005600 0.008667 0.011967 0.016547 0.011851 0.163160 0.018630 0.024411
CREB3L2 0.443300 0.274348 0.295340 0.291815 0.152378 0.278668 0.221418 0.082814 0.177477 0.342737 ... 0.008271 1.000000 0.269212 0.031385 0.393787 0.338274 0.341131 0.063796 0.016362 0.137304
TBX19 0.268885 0.135393 0.404131 0.278464 0.052297 0.182943 0.030539 0.004852 0.062660 0.285807 ... 0.005600 0.269212 1.000000 0.007271 0.398668 0.350113 0.428055 0.022620 0.015436 0.015548
HTATIP2 0.110692 0.009952 0.010030 0.088860 0.017688 0.060086 0.049867 0.069558 0.066611 0.428690 ... 0.008667 0.031385 0.007271 1.000000 0.033336 0.354398 0.088057 0.355422 0.302257 0.420583
ZSCAN16 0.464645 0.208366 0.387235 0.355313 0.166456 0.228899 0.235625 0.039215 0.173707 0.413125 ... 0.011967 0.393787 0.398668 0.033336 1.000000 0.443097 0.476257 0.006599 0.010767 0.048056
IKZF2 0.458956 0.147100 0.326790 0.272073 0.104280 0.191671 0.214205 0.026043 0.171206 0.599421 ... 0.016547 0.338274 0.350113 0.354398 0.443097 1.000000 0.479705 0.395397 0.316269 0.044545
SOX10 0.444353 0.172599 0.390717 0.332201 0.114887 0.188913 0.204016 0.008941 0.169182 0.434714 ... 0.011851 0.341131 0.428055 0.088057 0.476257 0.479705 1.000000 0.123353 0.083377 0.008350
MXI1 0.229340 0.025864 0.013391 0.017341 0.083077 0.135039 0.124777 0.027977 0.281993 0.308292 ... 0.163160 0.063796 0.022620 0.355422 0.006599 0.395397 0.123353 1.000000 0.338336 0.006669
BCL6 0.132132 0.054748 0.030086 0.036216 0.110200 0.049885 0.068017 0.024586 0.025474 0.299045 ... 0.018630 0.016362 0.015436 0.302257 0.010767 0.316269 0.083377 0.338336 1.000000 0.070355
FEZ1 0.048082 0.272656 0.006771 0.077981 0.024756 0.072055 0.005000 0.001571 0.040933 0.192649 ... 0.024411 0.137304 0.015548 0.420583 0.048056 0.044545 0.008350 0.006669 0.070355 1.000000

39 rows × 39 columns

Step 7 : Interactive generation of co-regulating gene clusters#

# Dendrogram 
coregtor.plot_dendrogram(sim_matrix)
_images/39fce2012fa75b47ae525e4c1b97034a3d534d9487aad34c1fe65d263e4bfb52.png
(<Figure size 1500x900 with 1 Axes>,
 <Axes: title={'center': 'Hierarchical Clustering Dendrogram (average linkage)'}, xlabel='Gene', ylabel='Distance'>,
 array([[ 9.        , 14.        ,  0.35220059,  2.        ],
        [18.        , 34.        ,  0.37809491,  2.        ],
        [ 3.        , 28.        ,  0.38135024,  2.        ],
        [ 0.        , 11.        ,  0.38901568,  2.        ],
        [39.        , 42.        ,  0.42850833,  4.        ],
        [ 4.        , 41.        ,  0.46908031,  3.        ],
        [20.        , 43.        ,  0.47638252,  5.        ],
        [ 8.        , 44.        ,  0.51693641,  4.        ],
        [33.        , 35.        ,  0.52374254,  2.        ],
        [22.        , 31.        ,  0.53465561,  2.        ],
        [40.        , 45.        ,  0.53960328,  7.        ],
        [21.        , 48.        ,  0.54866987,  3.        ],
        [16.        , 46.        ,  0.56050718,  5.        ],
        [ 6.        , 17.        ,  0.5637613 ,  2.        ],
        [47.        , 49.        ,  0.57385334,  9.        ],
        [32.        , 38.        ,  0.57941738,  2.        ],
        [13.        , 50.        ,  0.58256298,  4.        ],
        [24.        , 55.        ,  0.58741151,  5.        ],
        [23.        , 51.        ,  0.59591912,  6.        ],
        [26.        , 52.        ,  0.60223391,  3.        ],
        [27.        , 56.        ,  0.61093111,  6.        ],
        [ 2.        , 59.        ,  0.62919343,  7.        ],
        [12.        , 60.        ,  0.636039  ,  8.        ],
        [ 1.        , 19.        ,  0.63798693,  2.        ],
        [30.        , 53.        ,  0.64847878, 10.        ],
        [36.        , 37.        ,  0.661664  ,  2.        ],
        [57.        , 63.        ,  0.68634988, 16.        ],
        [10.        , 62.        ,  0.70978445,  3.        ],
        [ 7.        , 29.        ,  0.72056934,  2.        ],
        [58.        , 65.        ,  0.73153913, 19.        ],
        [ 5.        , 68.        ,  0.77672799, 20.        ],
        [61.        , 69.        ,  0.78269649, 28.        ],
        [54.        , 64.        ,  0.81632422,  4.        ],
        [66.        , 70.        ,  0.83765542, 31.        ],
        [71.        , 72.        ,  0.89525603, 35.        ],
        [67.        , 73.        ,  0.90940586, 37.        ],
        [15.        , 74.        ,  0.94759734, 38.        ],
        [25.        , 75.        ,  0.94942835, 39.        ]]))
# Cophonetic distance 
coregtor.plot_cophenetic(sim_matrix,methods=["average"])
_images/dcd05c965d29aa055ca96ba1a17aa0a717c964a4650dcfc80c20b8bd9b37e943.png
(<Figure size 1200x500 with 1 Axes>,
 [<Axes: title={'center': 'Average\nCCC = 0.739'}, xlabel='Cophenetic distance', ylabel='Original distance'>],
 {'average': np.float64(0.7391324338478736)},
 np.float64(0.6350354525242414))
# generate results
results1,_ = coregtor.identify_coregulators(sim_matrix,target_gene=target_gene_name,distance_threshold=0.65)
results1
target_gene gene_cluster n_genes cluster_id
0 GFAP CREB3L2,FOXB2,HMBOX1,ID1,IKZF2,KLF15,SOX10,YBX... 10 0
1 GFAP CLK1,IRF9,NKX6-2,PIR,RAD21,TBX19,ZNF768,ZNF837 8 3
2 GFAP ALX3,DPRX,HCLS1,PHF21A,TAGLN2,ZNF706 6 2
3 GFAP NFYA,TIGD1,ZNF322 3 4
4 GFAP POU2AF1,SMC3 2 1
5 GFAP FEZ1,HTATIP2 2 5
results1 = results1[results1["n_genes"]<10]
results1
target_gene gene_cluster n_genes cluster_id
1 GFAP CLK1,IRF9,NKX6-2,PIR,RAD21,TBX19,ZNF768,ZNF837 8 3
2 GFAP ALX3,DPRX,HCLS1,PHF21A,TAGLN2,ZNF706 6 2
3 GFAP NFYA,TIGD1,ZNF322 3 4
4 GFAP POU2AF1,SMC3 2 1
5 GFAP FEZ1,HTATIP2 2 5
results2,_ = coregtor.identify_coregulators(sim_matrix,target_gene=target_gene_name,distance_threshold=0.75)
results2
target_gene gene_cluster n_genes cluster_id
0 GFAP ALX3,CREB3L2,DPRX,FOXB2,HCLS1,HMBOX1,ID1,IKZF2... 19 0
1 GFAP CLK1,IRF9,NKX6-2,PIR,RAD21,TBX19,ZNF768,ZNF837 8 3
2 GFAP POU2AF1,SMC3,ZRSR2 3 2
3 GFAP MAFK,ZNF577 2 1
4 GFAP BCL6,MXI1 2 4
5 GFAP FEZ1,HTATIP2 2 5
results2 = results2[results2["n_genes"]<10]
results2
target_gene gene_cluster n_genes cluster_id
1 GFAP CLK1,IRF9,NKX6-2,PIR,RAD21,TBX19,ZNF768,ZNF837 8 3
2 GFAP POU2AF1,SMC3,ZRSR2 3 2
3 GFAP MAFK,ZNF577 2 1
4 GFAP BCL6,MXI1 2 4
5 GFAP FEZ1,HTATIP2 2 5

Validation of results#