json: Hierarchical clustering a pairwise distance matrix of precomputed distances

samedi 27 juin 2015

Hierarchical clustering a pairwise distance matrix of precomputed distances

I have a pairwise distance dataframe that I've made with pandas:

#Get files
import glob
import itertools
one_dimension = glob.glob('*.pdb')

dataframe = []
for combo in itertools.combinations(one_dimension,2):
    pdb_1 = combo[0]
    pdb_2 = combo[1]
    entry = { 'pdb_1' : pdb_1, 'pdb_2', 'rmsd': get_rmsd(pdb_1,pdb_2)
    dataframe.append(entry)

import pandas
dataframe = Dataframe(dataframe)
dataframe

enter image description here

All I want to do is cluster the dataframe in such a way where all clusters contain pdbs that are less than some cutoff ( lets say less than 2). I have read that complete linkage is the way to go.

For instance:

pdb_1,pdb_2 have an rmsd 1.56
pdb_3,pdb_2 have an rmsd 1.03
pdb_2, pdb_1 have an rmsd of 1.60

So they are can all appear in a cluster together. But if any new pdb tries to be added to the cluster, if it is > 2 for any member already in the cluster, it will be rejected.

I understand that this is a complete linkage with a cutoff.

I have looked into scipy.cluster.hierarchy.linkage, but I'm having an extremely hard time formatting the array to enter into the linkage.

What is the best way to complete this task?
How do I go from my dataframe to something that can be useable by
scipy.cluster?
Should I turn it into an R dataframe?
How do I find out which members are in the cluster if I transform the pairwise distance to an array.

I have found this, this, and this question similar, and found this tutorial

json

samedi 27 juin 2015

Hierarchical clustering a pairwise distance matrix of precomputed distances

Aucun commentaire:

Enregistrer un commentaire