rnaglib.algorithms.SimFunctionNode¶

class rnaglib.algorithms.SimFunctionNode(method, depth, decay=0.5, idf=False, normalization=None, hash_init_path='/home/docs/checkouts/readthedocs.org/user_builds/rnaglib/checkouts/latest/src/rnaglib/algorithms/../data/hashing/NR_chops_hash.p')[source]¶

Factory object to compute all node similarities. These methods take as input an annotated pair of nodes and compare them.

These methods are detailed in the supplemental of the paper, but include five methods. These methods frequently rely on the hungarian algorithm, an algorithm that finds optimal matches according to a cost function.

Three of them compare the edges :

R_1 compares the histograms of each ring, possibly with an idf weighting (to emphasize differences in rare edges)
R_iso compares each ring with the best matching based on the isostericity values
hungarian compares the whole annotation, with the rings being differentiated with an additional ‘depth’ field.

Then all the nodes are compared based on isostericity and this depth field.

Two of them compare the graphlets.

The underlying idea is that just comparing lists of edges does not constraint the graph structure,

while the assembling of graphlet does it more (exceptions can be found but for most graphs,: knowing all graphlets at each depth enables recreating the graph) :

R_graphlets works like R_iso except that the isostericity is replaced by the GED
graphlet works like the hungarian except that the isostericity is replaced by the GED

Parameters:

method – a string that identifies which of these method to use
depth – The depth to use in the annotations rings
decay – When using rings comparison function, the weight decay of importance based on the depth (the closest rings weigh more as they are central to the comparison)
idf – Whether to use IDF weighting on the frequency of labels.
normalization – We experiment with three normalization scheme, the basal one is just a division of the score by the maximum value, ‘sqrt’ denotes using the square root of the ratio as a power of the raw value and ‘log’ uses the log. The underlying idea is that we want to put more emphasis on the long matches than on just matching a few nodes
hash_init_path – For the graphlets comparisons, we need to supply a hashing path to be able to store the values of ged and reuse them based on the hash.

__init__(method, depth, decay=0.5, idf=False, normalization=None, hash_init_path='/home/docs/checkouts/readthedocs.org/user_builds/rnaglib/checkouts/latest/src/rnaglib/algorithms/../data/hashing/NR_chops_hash.p')[source]¶

Factory object to compute all node similarities. These methods take as input an annotated pair of nodes and compare them.

These methods are detailed in the supplemental of the paper, but include five methods. These methods frequently rely on the hungarian algorithm, an algorithm that finds optimal matches according to a cost function.

Three of them compare the edges :

R_1 compares the histograms of each ring, possibly with an idf weighting (to emphasize differences in rare edges)
R_iso compares each ring with the best matching based on the isostericity values
hungarian compares the whole annotation, with the rings being differentiated with an additional ‘depth’ field.

Then all the nodes are compared based on isostericity and this depth field.

Two of them compare the graphlets.

The underlying idea is that just comparing lists of edges does not constraint the graph structure,

while the assembling of graphlet does it more (exceptions can be found but for most graphs,: knowing all graphlets at each depth enables recreating the graph) :

R_graphlets works like R_iso except that the isostericity is replaced by the GED
graphlet works like the hungarian except that the isostericity is replaced by the GED

Parameters:

method – a string that identifies which of these method to use
depth – The depth to use in the annotations rings
decay – When using rings comparison function, the weight decay of importance based on the depth (the closest rings weigh more as they are central to the comparison)
idf – Whether to use IDF weighting on the frequency of labels.
normalization – We experiment with three normalization scheme, the basal one is just a division of the score by the maximum value, ‘sqrt’ denotes using the square root of the ratio as a power of the raw value and ‘log’ uses the log. The underlying idea is that we want to put more emphasis on the long matches than on just matching a few nodes
hash_init_path – For the graphlets comparisons, we need to supply a hashing path to be able to store the values of ged and reuse them based on the hash.

Methods

`R_1`(ring1, ring2)	Compute R_1 function over lists of features by counting intersect and normalise by the number
`R_graphlets`(ring1, ring2)	Compute R_graphlets function over lists of features.
`R_iso`(list1, list2)	Compute R_iso function over lists of features by matching each ring with the hungarian algorithm on the iso values
`__init__`(method, depth[, decay, idf, ...])	Factory object to compute all node similarities.
`add_hashtable`(hash_init_path)
`compare`(rings1, rings2)	Compares two nodes represented as their rings.
`delta_indices_sim`(i, j[, distance])	We need a scoring related to matching different depth nodes.
`get_cost_nodes`(node_i, node_j[, bb, pos])	Compare two nodes and returns a cost.
`get_length`(ring1, ring2[, graphlets])	This is meant to return an adapted 'length' that represents the optimal score obtained when matching all the elements in the two rings at hands
`graphlet`(rings1, rings2)	This function performs an operation similar to the hungarian algorithm using ged between graphlets instead of isostericity.
`graphlet_cost_nodes`(node_i, node_j[, pos, ...])	Returns a node distance between nodes represented as graphlets
`hungarian`(rings1, rings2)	Compute hungarian function over lists of features by adding a depth field into each ring (based on its index in rings).
`normalize`(unnormalized, max_score)	We want our normalization to be more lenient to longer matches