Tutorial#

In this tutorial we demonstrate the use of the CoRegTor tool to find transcription co regulators for a gene from gene expression data.

Objective#

The aim of this tutorial is to find potential co-regulators of the gene GFAP by analyzing tissue gene expression data for Frontal cortex in adult brain.

Step 1 : Install and import the `CoRegTor` package and other dependencies#

Using pip, pip install coregtor .

Or poetry install coregtor to add the package as a dependency in your project

# Install coregtor if not already installed, then import it
try:
    import coregtor
except ImportError:
    %pip install coregtor
    import coregtor

# Additional imports
from pathlib import Path
import pandas as pd

Step 2 : Get data and load it#

Let’s gather all the data we require:

Gene Expression data ge_brain.gct. This file contains tissue gene expression data for the Frontal Cortex (BA9) in an adult brain. The data is downloaded from the GTEx portal
List of transcription factors human_tf.txt : This file was downloaded from aertslab.org

We load this data in dataframes. Now we are ready to use CoRegTor!

base_path = Path("docs/temp") # UPDATE THIS
data_file_path = Path(base_path/"brain_ge.gct") # UPDATE THIS 
tf_file_path = Path(base_path/"human_tf.txt") # UPDATE THIS
target_gene_name = "GFAP" # the gene we are interested in 

# load data 
ge_data = coregtor.utils.read_GE_data(file_path=data_file_path) # this is just a utility method
tf_data = pd.read_csv(tf_file_path, names=["gene_name"], header=None)

ge_data

gene_name	DDX11L1	WASH7P	MIR6859-1	MIR1302-2HG	FAM138A	OR4G4P	OR4G11P	OR4F5	ENSG00000238009	CICP27	...	MT-ND4	MT-TH	MT-TS2	MT-TL2	MT-ND5	MT-ND6	MT-TE	MT-CYB	MT-TT	MT-TP
sample_name
GTEX-1117F-0011-R10b-SM-GI4VE	0.000000	3.57928	0.0	0.093825	0.000000	0.000000	0.028731	0.046554	0.039501	0.058675	...	49762.2	1.177570	2.754330	0.000000	7311.39	4788.56	6.47666	28676.5	3.077750	1.19489
GTEX-111FC-0011-R10a-SM-GIN8G	0.000000	2.32926	0.0	0.025333	0.000000	0.052233	0.031030	0.016759	0.000000	0.031684	...	44692.0	0.953824	0.000000	1.544930	6831.00	5164.36	6.67677	26950.9	1.661970	3.54879
GTEX-117XS-0011-R10b-SM-GIN8Z	0.000000	4.79425	0.0	0.000000	0.046843	0.067977	0.020191	0.043622	0.013880	0.032987	...	39249.9	0.827551	0.967814	1.206360	5603.53	3585.51	6.20663	20794.9	0.432584	2.93902
GTEX-1192W-0011-R10b-SM-GHWOF	0.000000	3.83774	0.0	0.032159	0.045693	0.000000	0.039392	0.053189	0.013539	0.000000	...	50750.5	1.614480	2.832190	1.176750	9433.33	7697.90	12.51220	23405.4	1.265900	3.68601
GTEX-1192X-0011-R10a-SM-DO941	0.040388	1.47233	0.0	0.040318	0.000000	0.000000	0.049385	0.040010	0.050922	0.000000	...	31566.9	2.024070	0.591784	0.983528	4424.64	3568.41	4.55416	14051.5	0.529019	1.54038
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
GTEX-ZVZQ-0011-R10b-SM-51MRT	0.017553	1.91964	0.0	0.070089	0.000000	0.180647	0.064389	0.092738	0.044262	0.043831	...	44939.6	3.078850	1.543150	2.137230	7019.94	6874.29	16.71380	24296.2	1.379490	2.23152
GTEX-ZXG5-0011-R10a-SM-57WDD	0.000000	1.07536	0.0	0.036646	0.000000	0.000000	0.044887	0.084853	0.000000	0.000000	...	62226.7	2.759570	1.613650	4.469730	11407.90	11061.80	15.17770	38732.2	1.442500	1.86677
GTEX-ZYFD-0011-R10a-SM-GPI91	0.000000	2.71020	0.0	0.000000	0.000000	0.000000	0.037432	0.000000	0.000000	0.030577	...	43740.3	0.000000	2.691290	0.745473	6574.39	5241.85	9.20498	24934.3	0.000000	1.55672
GTEX-ZYY3-0011-R10a-SM-GNTAZ	0.015919	3.29538	0.0	0.000000	0.000000	0.065533	0.058395	0.031540	0.066902	0.079502	...	40835.8	1.196680	0.933006	3.101260	6228.45	5626.94	10.37120	20992.5	5.004310	2.83332
GTEX-ZZPT-0011-R10b-SM-GPI8B	0.000000	2.85899	0.0	0.049821	0.000000	0.000000	0.030513	0.000000	0.041949	0.024925	...	40128.3	1.250570	0.000000	1.215350	8300.34	8785.10	28.13790	30362.8	2.614830	3.80689

269 rows × 59033 columns

tf_data

	gene_name
0	ZNF354C
1	KLF12
2	ZNF143
3	ZIC2
4	ZNF274
...	...
1887	ZNF826P
1888	ZNF827
1889	ZNF831
1890	ZRSR2
1891	ZSWIM1

1892 rows × 1 columns

Step 3 : Create Ensemble model#

The first step in the process is to generate a random forest ensemble model using the gene expression data that predicts the expression value of the gene “GFAP” based on all other genes.

Since we are interested in identifying potential transcription co regulators, we filter the data to include only transcription factors. We use the create_model_input function for filtering and preparing the input for training. It takes a dataframe t_factors which should have a column gene_name listing the transcription factors to consider. The function outputs a tuple containing 2 data frames: X with the feature genes and Y with the target gene. These can then be passed to the generate_model function.

# first generate the training input for the model
X,Y = coregtor.create_model_input(ge_data,target_gene_name,tf_data)

# use the training data to create a model
model = coregtor.create_model(X,Y,"rf")

model

RandomForestRegressor()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Step 4 : Generating tree paths#

The genes on the root node are important. These also serve as potential regulators in other tree based GRN inference methods.

Forest based ensemble methods contains multiple decision tress. We want to analyze the structure of the trees in the model.

For each tree, there exists multiple root to leaf paths. We first enumerate all the paths in all the trees in the model.

all_paths = coregtor.tree_paths(model,X,Y)

all_paths

	tree	source	target	path_length	node1	node2	node3	node4	node5	node6	...	node9	node10	node11	node12	node13	node14	node15	node16	node17	node18
0	0	YBX1	GFAP	8	TEAD2	HES7	ISL1	LHX4	ZNF135	ZNF536	...	None	None	None	None	None	None	None	None	None	None
1	0	YBX1	GFAP	12	TEAD2	TSC22D4	PAX5	EBF1	GSX2	TFEB	...	VENTX	PAX3	ZNF70	None	None	None	None	None	None	None
2	0	YBX1	GFAP	13	TEAD2	TSC22D4	PAX5	EBF1	GSX2	TFEB	...	VENTX	PAX3	NR1I2	TBX21	None	None	None	None	None	None
3	0	YBX1	GFAP	6	POU5F1B	ZNF433	GLIS3	ZKSCAN3	ZNF436	None	...	None	None	None	None	None	None	None	None	None	None
4	0	YBX1	GFAP	10	TEAD2	TSC22D4	PAX5	SNAI2	RFX6	HOXA5	...	SMAD2	None	None	None	None	None	None	None	None	None
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
11013	99	FEZ1	GFAP	8	YBX1	IRF9	HMG20B	ZBTB18	ZNF683	ZNF556	...	None	None	None	None	None	None	None	None	None	None
11014	99	FEZ1	GFAP	14	YBX1	IRF9	HMG20B	ZBTB18	TFEB	NKX2-2	...	ZNF683	TBX5	BHLHE40	MAGEA8	E2F3	None	None	None	None	None
11015	99	FEZ1	GFAP	15	YBX1	IRF9	HMG20B	ZBTB18	TFEB	NKX2-2	...	TBX22	FOXP1	CREB3L4	ZNF554	MAP4K2	ZNF814	None	None	None	None
11016	99	FEZ1	GFAP	14	YBX1	IRF9	HMG20B	ZBTB18	TFEB	NKX2-2	...	TBX22	FOXP1	CREB3L4	ZNF554	MAP4K2	None	None	None	None	None
11017	99	FEZ1	GFAP	7	YBX1	IRF9	HMG20B	TAF1L	NHLH1	NAP1L1	...	None	None	None	None	None	None	None	None	None	None

11018 rows × 22 columns

Step 5 : Generating a set of common sub paths (or the `context`) for all root nodes#

In the table of paths above, we observe many unique genes appear as root nodes. Since we train the decision trees to predict the same target gene, the leaf nodes for all paths are the same.

We can thus consider these paths as potential regulatory links, where genes at the root regulates the target. All the genes at the root become potential regulators or the target. However, note that there are multiple intermediate nodes between the root and the target. Comparing these

We consider the root nodes as potential regulators of the target gene and to find if they are co regulators, we compare how similar the intermediate nodes are in between 2 unique root nodes.

pathset = coregtor.create_context(all_paths)

pathset.keys()

dict_keys(['YBX1', 'POU2AF1', 'PIR', 'ALX3', 'HCLS1', 'SP110', 'TIGD1', 'ZNF577', 'TAGLN2', 'KLF15', 'ZRSR2', 'FOXB2', 'ZNF837', 'IRF9', 'HMBOX1', 'PPP2R3B', 'DPRX', 'NFYA', 'ZNF775', 'SMC3', 'ID1', 'ZNF768', 'NKX6-2', 'ZNF706', 'CLK1', 'HDAC1', 'ZNF322', 'RAD21', 'PHF21A', 'MAFK', 'CREB3L2', 'TBX19', 'HTATIP2', 'ZSCAN16', 'IKZF2', 'SOX10', 'MXI1', 'BCL6', 'FEZ1'])

Step 6 : Comparing context of all root nodes with each other#

similarity

# transforming the context into a more comparable representation 
gf_histogram = coregtor.transform_context(pathset,method="gene_frequency")

gf_histogram

	HES7	ISL1	LHX4	ZNF135	ZNF536	TSC22D4	PAX5	EBF1	GSX2	TFEB	...	ZNF34	TCEAL6	RXRB	FOXI1	PSMD12	KLF12	ZNF783	ZNF749	ZCCHC14	FOXP1
YBX1	8	4	4	3	4	362	143	42	42	26	...	0	0	0	0	0	0	0	0	0	0
POU2AF1	0	5	0	0	0	74	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
PIR	0	0	0	0	0	12	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
ALX3	2	0	0	0	0	61	0	0	0	2	...	0	0	0	0	0	0	0	0	0	0
HCLS1	0	0	1	0	0	63	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
SP110	0	0	0	0	0	48	47	6	1	0	...	0	0	0	0	0	0	0	0	0	0
TIGD1	0	0	0	0	0	72	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
ZNF577	0	0	0	0	0	3	21	0	0	0	...	0	0	0	0	0	0	0	0	0	0
TAGLN2	0	0	3	0	0	69	12	0	5	0	...	0	0	0	0	0	0	0	0	0	0
KLF15	1	0	0	0	0	139	22	0	0	6	...	0	0	0	0	0	0	0	0	0	0
ZRSR2	0	0	0	0	0	0	0	2	0	0	...	0	0	0	0	0	0	0	0	0	0
FOXB2	0	0	0	0	1	178	1	14	0	0	...	0	0	0	0	0	0	0	0	0	0
ZNF837	0	0	0	0	0	0	1	0	1	0	...	0	0	0	0	0	0	0	0	0	0
IRF9	2	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
HMBOX1	3	2	0	0	12	443	117	0	11	0	...	0	0	0	0	0	0	0	0	0	0
PPP2R3B	0	0	0	0	1	0	0	0	6	0	...	0	0	0	0	0	0	0	0	0	0
DPRX	0	0	1	0	1	114	10	1	0	0	...	0	0	0	0	0	0	0	0	0	0
NFYA	0	0	0	0	0	88	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
ZNF775	0	0	0	1	0	1	0	0	4	0	...	0	0	0	0	0	0	0	0	0	0
SMC3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
ID1	0	0	0	0	0	57	0	3	30	5	...	0	0	0	0	0	0	0	0	0	0
ZNF768	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
NKX6-2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
ZNF706	0	1	0	0	2	33	0	0	7	0	...	0	0	0	0	0	0	0	0	0	0
CLK1	7	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
HDAC1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
ZNF322	0	0	0	0	0	66	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
RAD21	0	0	0	0	0	0	8	0	0	0	...	0	0	0	0	0	0	0	0	0	0
PHF21A	0	0	2	0	0	79	0	1	5	0	...	1	0	0	0	0	0	0	0	0	0
MAFK	0	0	0	0	0	0	0	0	0	0	...	0	1	0	0	0	0	0	0	0	0
CREB3L2	0	0	0	0	0	44	42	0	2	0	...	0	0	1	0	0	0	0	0	0	0
TBX19	0	0	0	0	0	0	0	3	0	0	...	1	0	0	3	0	0	0	0	0	0
HTATIP2	0	0	0	0	0	0	0	0	0	1	...	0	0	0	0	1	4	1	0	0	0
ZSCAN16	0	0	0	0	0	45	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
IKZF2	0	0	0	0	0	30	14	0	0	3	...	1	0	0	0	0	0	0	1	0	0
SOX10	0	0	0	0	0	32	0	0	0	0	...	0	0	0	0	0	0	0	0	2	0
MXI1	2	0	0	0	0	0	17	0	0	0	...	0	0	0	0	3	0	0	0	0	0
BCL6	0	0	0	0	3	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
FEZ1	3	0	0	0	0	0	0	0	0	24	...	0	0	0	0	0	0	0	0	0	4

39 rows × 1815 columns

sim_matrix = coregtor.compare_context(gf_histogram,"cosine")

sim_matrix

	YBX1	POU2AF1	PIR	ALX3	HCLS1	SP110	TIGD1	ZNF577	TAGLN2	KLF15	...	MAFK	CREB3L2	TBX19	HTATIP2	ZSCAN16	IKZF2	SOX10	MXI1	BCL6	FEZ1
YBX1	1.000000	0.390690	0.282963	0.441397	0.347206	0.357401	0.437412	0.178489	0.427796	0.527324	...	0.207780	0.443300	0.268885	0.110692	0.464645	0.458956	0.444353	0.229340	0.132132	0.048082
POU2AF1	0.390690	1.000000	0.097367	0.189427	0.191951	0.271433	0.280371	0.039441	0.151647	0.246245	...	0.057232	0.274348	0.135393	0.009952	0.208366	0.147100	0.172599	0.025864	0.054748	0.272656
PIR	0.282963	0.097367	1.000000	0.271940	0.081994	0.174818	0.071998	0.030674	0.101351	0.290197	...	0.055558	0.295340	0.404131	0.010030	0.387235	0.326790	0.390717	0.013391	0.030086	0.006771
ALX3	0.441397	0.189427	0.271940	1.000000	0.462176	0.183100	0.205235	0.306286	0.466232	0.456986	...	0.297251	0.291815	0.278464	0.088860	0.355313	0.272073	0.332201	0.017341	0.036216	0.077981
HCLS1	0.347206	0.191951	0.081994	0.462176	1.000000	0.182966	0.269786	0.328728	0.485007	0.328482	...	0.336335	0.152378	0.052297	0.017688	0.166456	0.104280	0.114887	0.083077	0.110200	0.024756
SP110	0.357401	0.271433	0.174818	0.183100	0.182966	1.000000	0.251293	0.045317	0.140399	0.264442	...	0.066215	0.278668	0.182943	0.060086	0.228899	0.191671	0.188913	0.135039	0.049885	0.072055
TIGD1	0.437412	0.280371	0.071998	0.205235	0.269786	0.251293	1.000000	0.027062	0.219067	0.350281	...	0.023705	0.221418	0.030539	0.049867	0.235625	0.214205	0.204016	0.124777	0.068017	0.005000
ZNF577	0.178489	0.039441	0.030674	0.306286	0.328728	0.045317	0.027062	1.000000	0.344919	0.170948	...	0.279431	0.082814	0.004852	0.069558	0.039215	0.026043	0.008941	0.027977	0.024586	0.001571
TAGLN2	0.427796	0.151647	0.101351	0.466232	0.485007	0.140399	0.219067	0.344919	1.000000	0.309809	...	0.433702	0.177477	0.062660	0.066611	0.173707	0.171206	0.169182	0.281993	0.025474	0.040933
KLF15	0.527324	0.246245	0.290197	0.456986	0.328482	0.264442	0.350281	0.170948	0.309809	1.000000	...	0.132012	0.342737	0.285807	0.428690	0.413125	0.599421	0.434714	0.308292	0.299045	0.192649
ZRSR2	0.033044	0.263826	0.008146	0.132157	0.014655	0.058607	0.010481	0.027319	0.078656	0.047919	...	0.022265	0.110977	0.006141	0.004314	0.005255	0.011769	0.017606	0.006950	0.028671	0.314943
FOXB2	0.610984	0.357531	0.276302	0.345941	0.287703	0.249759	0.464244	0.024721	0.279255	0.550246	...	0.014182	0.369243	0.254869	0.171139	0.487059	0.503505	0.461591	0.202323	0.219223	0.006372
ZNF837	0.295305	0.064716	0.330254	0.247707	0.039826	0.184614	0.047311	0.007792	0.188687	0.248245	...	0.072768	0.236638	0.390586	0.004310	0.359435	0.313760	0.357238	0.156812	0.016106	0.002320
IRF9	0.295182	0.064347	0.335614	0.376965	0.061234	0.166046	0.039170	0.009453	0.094527	0.300712	...	0.005489	0.259299	0.464632	0.122566	0.338762	0.285473	0.344640	0.028046	0.042869	0.098552
HMBOX1	0.647789	0.329978	0.241217	0.462533	0.423149	0.397572	0.376354	0.216785	0.415346	0.647799	...	0.189737	0.395000	0.263515	0.341196	0.382428	0.520478	0.392212	0.374563	0.368752	0.215369
PPP2R3B	0.048033	0.009664	0.016664	0.039575	0.145264	0.053923	0.015605	0.030973	0.032809	0.009607	...	0.006765	0.034600	0.139428	0.054337	0.041394	0.004065	0.010393	0.018624	0.011406	0.044133
DPRX	0.465090	0.272496	0.073816	0.339412	0.450710	0.202101	0.390178	0.181192	0.475072	0.414972	...	0.234993	0.192898	0.051587	0.114427	0.204696	0.246931	0.198077	0.253205	0.199409	0.049899
NFYA	0.438038	0.296309	0.086302	0.221245	0.392647	0.187856	0.436239	0.018539	0.211032	0.328365	...	0.005740	0.246735	0.087293	0.015994	0.331019	0.177932	0.235494	0.012583	0.001819	0.030263
ZNF775	0.288816	0.076733	0.302994	0.202763	0.010507	0.131376	0.064825	0.004195	0.105839	0.472321	...	0.023656	0.245449	0.344232	0.332860	0.315589	0.621905	0.406507	0.399331	0.325837	0.000000
SMC3	0.274880	0.362013	0.310281	0.251701	0.012456	0.184213	0.003516	0.057674	0.071365	0.228840	...	0.061498	0.354918	0.384980	0.009969	0.325361	0.273689	0.324689	0.007600	0.019026	0.349449
ID1	0.556863	0.264554	0.300524	0.303235	0.211422	0.221503	0.240994	0.023985	0.303998	0.486265	...	0.128613	0.294769	0.379898	0.229686	0.394576	0.519505	0.446451	0.403588	0.210834	0.061608
ZNF768	0.272612	0.056057	0.376421	0.261556	0.033323	0.198101	0.020632	0.040113	0.069010	0.298104	...	0.018190	0.282073	0.441665	0.017832	0.386520	0.346977	0.395583	0.014390	0.202221	0.067571
NKX6-2	0.285358	0.062473	0.415863	0.284157	0.021398	0.202981	0.017929	0.002389	0.077482	0.285184	...	0.047676	0.297630	0.465344	0.017281	0.452095	0.359295	0.428401	0.023911	0.020326	0.002228
ZNF706	0.359139	0.291627	0.266523	0.466796	0.329536	0.240143	0.139865	0.229249	0.335848	0.412629	...	0.240955	0.277752	0.377293	0.048290	0.323392	0.305714	0.374505	0.069505	0.136062	0.114209
CLK1	0.247502	0.070450	0.353188	0.384848	0.004387	0.157336	0.005762	0.018712	0.156468	0.305405	...	0.003527	0.256807	0.413187	0.130467	0.377501	0.318310	0.376075	0.004852	0.003137	0.130224
HDAC1	0.065214	0.015740	0.021484	0.021804	0.280865	0.070350	0.016731	0.080937	0.196380	0.007186	...	0.006690	0.048311	0.012432	0.081104	0.043150	0.010721	0.034793	0.148792	0.129979	0.008133
ZNF322	0.383818	0.326568	0.063726	0.226584	0.234236	0.161013	0.374197	0.013847	0.184562	0.257207	...	0.009382	0.228177	0.014658	0.000900	0.237554	0.151398	0.189613	0.009085	0.002382	0.105703
RAD21	0.255462	0.067092	0.339623	0.269857	0.013692	0.224968	0.002939	0.013755	0.077211	0.284662	...	0.001262	0.270840	0.416535	0.014091	0.355123	0.365356	0.399475	0.011585	0.003843	0.009626
PHF21A	0.442503	0.189444	0.217075	0.618650	0.599663	0.183091	0.260008	0.401262	0.497952	0.569636	...	0.334362	0.246240	0.234853	0.193900	0.294922	0.417305	0.354604	0.199902	0.208485	0.042961
MAFK	0.207780	0.057232	0.055558	0.297251	0.336335	0.066215	0.023705	0.279431	0.433702	0.132012	...	1.000000	0.008271	0.005600	0.008667	0.011967	0.016547	0.011851	0.163160	0.018630	0.024411
CREB3L2	0.443300	0.274348	0.295340	0.291815	0.152378	0.278668	0.221418	0.082814	0.177477	0.342737	...	0.008271	1.000000	0.269212	0.031385	0.393787	0.338274	0.341131	0.063796	0.016362	0.137304
TBX19	0.268885	0.135393	0.404131	0.278464	0.052297	0.182943	0.030539	0.004852	0.062660	0.285807	...	0.005600	0.269212	1.000000	0.007271	0.398668	0.350113	0.428055	0.022620	0.015436	0.015548
HTATIP2	0.110692	0.009952	0.010030	0.088860	0.017688	0.060086	0.049867	0.069558	0.066611	0.428690	...	0.008667	0.031385	0.007271	1.000000	0.033336	0.354398	0.088057	0.355422	0.302257	0.420583
ZSCAN16	0.464645	0.208366	0.387235	0.355313	0.166456	0.228899	0.235625	0.039215	0.173707	0.413125	...	0.011967	0.393787	0.398668	0.033336	1.000000	0.443097	0.476257	0.006599	0.010767	0.048056
IKZF2	0.458956	0.147100	0.326790	0.272073	0.104280	0.191671	0.214205	0.026043	0.171206	0.599421	...	0.016547	0.338274	0.350113	0.354398	0.443097	1.000000	0.479705	0.395397	0.316269	0.044545
SOX10	0.444353	0.172599	0.390717	0.332201	0.114887	0.188913	0.204016	0.008941	0.169182	0.434714	...	0.011851	0.341131	0.428055	0.088057	0.476257	0.479705	1.000000	0.123353	0.083377	0.008350
MXI1	0.229340	0.025864	0.013391	0.017341	0.083077	0.135039	0.124777	0.027977	0.281993	0.308292	...	0.163160	0.063796	0.022620	0.355422	0.006599	0.395397	0.123353	1.000000	0.338336	0.006669
BCL6	0.132132	0.054748	0.030086	0.036216	0.110200	0.049885	0.068017	0.024586	0.025474	0.299045	...	0.018630	0.016362	0.015436	0.302257	0.010767	0.316269	0.083377	0.338336	1.000000	0.070355
FEZ1	0.048082	0.272656	0.006771	0.077981	0.024756	0.072055	0.005000	0.001571	0.040933	0.192649	...	0.024411	0.137304	0.015548	0.420583	0.048056	0.044545	0.008350	0.006669	0.070355	1.000000

39 rows × 39 columns

Step 7 : Interactive generation of co-regulating gene clusters#

# Dendrogram 
coregtor.plot_dendrogram(sim_matrix)

_images/39fce2012fa75b47ae525e4c1b97034a3d534d9487aad34c1fe65d263e4bfb52.png

(<Figure size 1500x900 with 1 Axes>,
 <Axes: title={'center': 'Hierarchical Clustering Dendrogram (average linkage)'}, xlabel='Gene', ylabel='Distance'>,
 array([[ 9.        , 14.        ,  0.35220059,  2.        ],
        [18.        , 34.        ,  0.37809491,  2.        ],
        [ 3.        , 28.        ,  0.38135024,  2.        ],
        [ 0.        , 11.        ,  0.38901568,  2.        ],
        [39.        , 42.        ,  0.42850833,  4.        ],
        [ 4.        , 41.        ,  0.46908031,  3.        ],
        [20.        , 43.        ,  0.47638252,  5.        ],
        [ 8.        , 44.        ,  0.51693641,  4.        ],
        [33.        , 35.        ,  0.52374254,  2.        ],
        [22.        , 31.        ,  0.53465561,  2.        ],
        [40.        , 45.        ,  0.53960328,  7.        ],
        [21.        , 48.        ,  0.54866987,  3.        ],
        [16.        , 46.        ,  0.56050718,  5.        ],
        [ 6.        , 17.        ,  0.5637613 ,  2.        ],
        [47.        , 49.        ,  0.57385334,  9.        ],
        [32.        , 38.        ,  0.57941738,  2.        ],
        [13.        , 50.        ,  0.58256298,  4.        ],
        [24.        , 55.        ,  0.58741151,  5.        ],
        [23.        , 51.        ,  0.59591912,  6.        ],
        [26.        , 52.        ,  0.60223391,  3.        ],
        [27.        , 56.        ,  0.61093111,  6.        ],
        [ 2.        , 59.        ,  0.62919343,  7.        ],
        [12.        , 60.        ,  0.636039  ,  8.        ],
        [ 1.        , 19.        ,  0.63798693,  2.        ],
        [30.        , 53.        ,  0.64847878, 10.        ],
        [36.        , 37.        ,  0.661664  ,  2.        ],
        [57.        , 63.        ,  0.68634988, 16.        ],
        [10.        , 62.        ,  0.70978445,  3.        ],
        [ 7.        , 29.        ,  0.72056934,  2.        ],
        [58.        , 65.        ,  0.73153913, 19.        ],
        [ 5.        , 68.        ,  0.77672799, 20.        ],
        [61.        , 69.        ,  0.78269649, 28.        ],
        [54.        , 64.        ,  0.81632422,  4.        ],
        [66.        , 70.        ,  0.83765542, 31.        ],
        [71.        , 72.        ,  0.89525603, 35.        ],
        [67.        , 73.        ,  0.90940586, 37.        ],
        [15.        , 74.        ,  0.94759734, 38.        ],
        [25.        , 75.        ,  0.94942835, 39.        ]]))

# Cophonetic distance 
coregtor.plot_cophenetic(sim_matrix,methods=["average"])

_images/dcd05c965d29aa055ca96ba1a17aa0a717c964a4650dcfc80c20b8bd9b37e943.png

(<Figure size 1200x500 with 1 Axes>,
 [<Axes: title={'center': 'Average\nCCC = 0.739'}, xlabel='Cophenetic distance', ylabel='Original distance'>],
 {'average': np.float64(0.7391324338478736)},
 np.float64(0.6350354525242414))

# generate results
results1,_ = coregtor.identify_coregulators(sim_matrix,target_gene=target_gene_name,distance_threshold=0.65)

results1

	target_gene	gene_cluster	n_genes	cluster_id
0	GFAP	CREB3L2,FOXB2,HMBOX1,ID1,IKZF2,KLF15,SOX10,YBX...	10	0
1	GFAP	CLK1,IRF9,NKX6-2,PIR,RAD21,TBX19,ZNF768,ZNF837	8	3
2	GFAP	ALX3,DPRX,HCLS1,PHF21A,TAGLN2,ZNF706	6	2
3	GFAP	NFYA,TIGD1,ZNF322	3	4
4	GFAP	POU2AF1,SMC3	2	1
5	GFAP	FEZ1,HTATIP2	2	5

results1 = results1[results1["n_genes"]<10]
results1

	target_gene	gene_cluster	n_genes	cluster_id
1	GFAP	CLK1,IRF9,NKX6-2,PIR,RAD21,TBX19,ZNF768,ZNF837	8	3
2	GFAP	ALX3,DPRX,HCLS1,PHF21A,TAGLN2,ZNF706	6	2
3	GFAP	NFYA,TIGD1,ZNF322	3	4
4	GFAP	POU2AF1,SMC3	2	1
5	GFAP	FEZ1,HTATIP2	2	5

results2,_ = coregtor.identify_coregulators(sim_matrix,target_gene=target_gene_name,distance_threshold=0.75)
results2

	target_gene	gene_cluster	n_genes	cluster_id
0	GFAP	ALX3,CREB3L2,DPRX,FOXB2,HCLS1,HMBOX1,ID1,IKZF2...	19	0
1	GFAP	CLK1,IRF9,NKX6-2,PIR,RAD21,TBX19,ZNF768,ZNF837	8	3
2	GFAP	POU2AF1,SMC3,ZRSR2	3	2
3	GFAP	MAFK,ZNF577	2	1
4	GFAP	BCL6,MXI1	2	4
5	GFAP	FEZ1,HTATIP2	2	5

results2 = results2[results2["n_genes"]<10]
results2

	target_gene	gene_cluster	n_genes	cluster_id
1	GFAP	CLK1,IRF9,NKX6-2,PIR,RAD21,TBX19,ZNF768,ZNF837	8	3
2	GFAP	POU2AF1,SMC3,ZRSR2	3	2
3	GFAP	MAFK,ZNF577	2	1
4	GFAP	BCL6,MXI1	2	4
5	GFAP	FEZ1,HTATIP2	2	5

	n_estimators	100
	criterion	'squared_error'
	max_depth	None
	min_samples_split	2
	min_samples_leaf	1
	min_weight_fraction_leaf	0.0
	max_features	1.0
	max_leaf_nodes	None
	min_impurity_decrease	0.0
	bootstrap	True
	oob_score	False
	n_jobs	None
	random_state	None
	verbose	0
	warm_start	False
	ccp_alpha	0.0
	max_samples	None
	monotonic_cst	None

Tutorial

Contents

Tutorial#

Objective#

Step 1 : Install and import the `CoRegTor` package and other dependencies#

Step 2 : Get data and load it#

Step 3 : Create Ensemble model#

Step 4 : Generating tree paths#

Step 5 : Generating a set of common sub paths (or the `context`) for all root nodes#

Step 6 : Comparing context of all root nodes with each other#

Step 7 : Interactive generation of co-regulating gene clusters#

Validation of results#

Tutorial

Contents

Tutorial#

Objective#

Step 1 : Install and import the CoRegTor package and other dependencies#

Step 2 : Get data and load it#

Step 3 : Create Ensemble model#

Step 4 : Generating tree paths#

Step 5 : Generating a set of common sub paths (or the context) for all root nodes#

Step 6 : Comparing context of all root nodes with each other#

Step 7 : Interactive generation of co-regulating gene clusters#

Validation of results#

Step 1 : Install and import the `CoRegTor` package and other dependencies#

Step 5 : Generating a set of common sub paths (or the `context`) for all root nodes#