Tutorial#

Using TFitPy to evaluate the performance of a set of Co Regulators for a given target

Getting things ready#

The first step is downloading required dataset and generating processing files. This is a one time step after every time a new version is installed. You need to specify a path of a folder where data will be stored. This can be reused.

from tfitpy.datasets import install
data_path = "$HOME/datasets"
install(data_path)

We are now read to evaluate out TFs!

Data#

In this tutorial we illustrate the use of the package on a potential set of coregtulators

data1 = {
    "sources":["IKZF4","TBP","ZNF841"],
    "target":"CLK3"
}

Load datasets cache#

# Set the folder path where data was stored during setup
from pathlib import Path
import os
folder_path = os.path.expandvars(data_path)
import tfitpy as tt
cache = tt.load_cache(data_path=folder_path)
import importlib
importlib.reload(tt)

PPI based scores#

importlib.reload(g)

GO Functional Similarity#

import  tfitpy.indices.go  as g
s,df = g.goa_resnik_similarity(data1["sources"],datasets=cache)
s
1.3942496226771082
df
tf1 tf2 score n_terms_tf1 n_terms_tf2
0 IKZF4 TBP 1.686905 17 36
1 IKZF4 ZNF841 1.573010 17 7
2 TBP ZNF841 0.922834 36 7
gene2go = cache["go"]["gene2go"]
print(list(gene2go.keys())[:5])          # Should now be ['TP53', 'BRCA1', ...]

for gene in data1["sources"] + [data1["target"]]:
    print(f"{gene}: {len(gene2go.get(gene, set()))} GO terms")
['NUDT4B', 'TRBV20OR9-2', 'IGKV3-7', 'IGKV1D-42', 'IGLV4-69']
IKZF4: 17 GO terms
TBP: 36 GO terms
ZNF841: 7 GO terms
CLK3: 21 GO terms