The following co-expression coefficient features were attained from COXPRESdb.
http://coxpresdb.jp/download.shtml
打开这个页面我们点击bulk download
然后我们下载budding yeast 文件。
在最下面我们也可以看到文件格式的说明
Under the directory named Hsa.coex.v6, 19777 files will appear.
Hsa.coex.v6 ----- 1 |-- 10 |-- 100 |-- ... |-- 9997
1
462 8.1 0.596
2158 10.9 0.590
189 12.7 0.574
...
220963 19749.5 -0.163
130367 19760.5 -0.175 | 10
80168 4.9 0.553
10223 5.8 0.650
27284 5.9 0.608
...
84058 19772.0 -0.276
83871 19775.5 -0.304 | 100
85449 37.9 0.478
140807 47.7 0.391
636 50.2 0.469
...
126969 19269.8 -0.113
55930 19273.0 -0.082 |
Column 1; Entrez Gene ID of an opposite gene of coexpression (19776 genes)
Column 2; MR (Mutual Rank) as a final measure of coexpression. Lines are sorted by this value.
Column 3; Pearson‘s correlation coefficient of gene expression pattern
# -*- coding: utf-8 -*- """ Created on Thu Nov 10 10:49:21 2016 @author: sun """ import pandas as pd import os yeast_gold_protein_pair=pd.read_csv(‘yeast_gold_protein_pair.csv‘,usecols=[‘idA‘,‘idB‘]) GeneID=pd.read_csv(‘uniprot_to_geneid.csv‘,usecols=[‘Entry‘,‘Cross-reference (GeneID)‘],index_col=0) #注loc通过标签选择数据,iloc通过位置选择数据 idA=GeneID.loc[yeast_gold_protein_pair.idA,:] idB=GeneID.loc[yeast_gold_protein_pair.idB,:] idA.index=range(len(idA)) idB.index=range(len(idB)) mr=[] cor=[] for i in range(len(idA)): GeneIDA=str(idA.iloc[i].values) GeneIDB=str(idB.iloc[i].values) ifGeneIDB!=‘[nan]‘andGeneIDA!=‘[nan]‘: GeneIDA=GeneIDA[2:8] GeneIDB=int(GeneIDB[2:8]) path=‘Sce.v14-08.G4461-S3819.rma.mrgeo.d/‘+GeneIDA if os.path.exists(path): coex=pd.read_csv(path,header=None,sep=‘ ‘,index_col=0) ifGeneIDBin coex.index: mr.append(coex.loc[GeneIDB,1]) cor.append(coex.loc[GeneIDB,2]) else: mr.append("nan") cor.append("nan") else: mr.append("nan") cor.append("nan") else: mr.append("nan") cor.append("nan") yeast_gold_protein_pair[‘MR‘]=mr yeast_gold_protein_pair[‘COR‘]=cor yeast_gold_protein_pair.to_csv(‘coexpression.csv‘,index=False)
评论专区