Tutorial#
Using TFitPy to evaluate the performance of a set of Co Regulators for a given target
Getting things ready#
The first step is downloading required dataset and generating processing files. This is a one time step after every time a new version is installed. You need to specify a path of a folder where data will be stored. This can be reused.
from tfitpy.datasets import install
data_path = "$HOME/datasets"
install(data_path)
We are now read to evaluate out TFs!
Data#
In this tutorial we illustrate the use of the package on a potential set of coregtulators
data1 = {
"sources":["IKZF4","TBP","ZNF841"],
"target":"CLK3"
}
Load datasets cache#
# Set the folder path where data was stored during setup
from pathlib import Path
import os
folder_path = os.path.expandvars(data_path)
import tfitpy as tt
cache = tt.load_cache(data_path=folder_path)
import importlib
importlib.reload(tt)
PPI based scores#
importlib.reload(g)
GO Functional Similarity#
import tfitpy.indices.go as g
s,df = g.goa_resnik_similarity(data1["sources"],datasets=cache)
s
1.3942496226771082
df
| tf1 | tf2 | score | n_terms_tf1 | n_terms_tf2 | |
|---|---|---|---|---|---|
| 0 | IKZF4 | TBP | 1.686905 | 17 | 36 |
| 1 | IKZF4 | ZNF841 | 1.573010 | 17 | 7 |
| 2 | TBP | ZNF841 | 0.922834 | 36 | 7 |
gene2go = cache["go"]["gene2go"]
print(list(gene2go.keys())[:5]) # Should now be ['TP53', 'BRCA1', ...]
for gene in data1["sources"] + [data1["target"]]:
print(f"{gene}: {len(gene2go.get(gene, set()))} GO terms")
['NUDT4B', 'TRBV20OR9-2', 'IGKV3-7', 'IGKV1D-42', 'IGLV4-69']
IKZF4: 17 GO terms
TBP: 36 GO terms
ZNF841: 7 GO terms
CLK3: 21 GO terms