rnaglib.algorithms.SimFunctionNode

class rnaglib.algorithms.SimFunctionNode(method, depth, decay=0.5, idf=False, normalization=None, hash_init_path='/home/docs/checkouts/readthedocs.org/user_builds/rnaglib/envs/latest/lib/python3.10/site-packages/rnaglib/algorithms/../data/hashing/NR_chops_hash.p')[source]

Factory object to compute all node similarities. These methods take as input an annotated pair of nodes and compare them.

These methods are detailed in the supplemental of the paper, but include five methods. These methods frequently rely on the hungarian algorithm, an algorithm that finds optimal matches according to a cost function.

Three of them compare the edges :

  • R_1 compares the histograms of each ring, possibly with an idf weighting (to emphasize differences in rare edges)

  • R_iso compares each ring with the best matching based on the isostericity values

  • hungarian compares the whole annotation, with the rings being differentiated with an additional ‘depth’ field.

Then all the nodes are compared based on isostericity and this depth field.

Two of them compare the graphlets.
The underlying idea is that just comparing lists of edges does not constraint the graph structure,
while the assembling of graphlet does it more (exceptions can be found but for most graphs,

knowing all graphlets at each depth enables recreating the graph) :

  • R_graphlets works like R_iso except that the isostericity is replaced by the GED

  • graphlet works like the hungarian except that the isostericity is replaced by the GED

Parameters:
  • method – a string that identifies which of these method to use

  • depth – The depth to use in the annotations rings

  • decay – When using rings comparison function, the weight decay of importance based on the depth (the

closest rings weigh more as they are central to the comparison) :type idf: :param idf: Whether to use IDF weighting on the frequency of labels. :type normalization: :param normalization: We experiment with three normalization scheme,

the basal one is just a division of the

score by the maximum value, ‘sqrt’ denotes using the square root of the ratio as a power of the raw value and ‘log’ uses the log. The underlying idea is that we want to put more emphasis on the long matches than on just matching a few nodes :type hash_init_path: :param hash_init_path: For the graphlets comparisons, we need to supply a hashing path to be able to store the values of ged and reuse them based on the hash.

__init__(method, depth, decay=0.5, idf=False, normalization=None, hash_init_path='/home/docs/checkouts/readthedocs.org/user_builds/rnaglib/envs/latest/lib/python3.10/site-packages/rnaglib/algorithms/../data/hashing/NR_chops_hash.p')[source]

Factory object to compute all node similarities. These methods take as input an annotated pair of nodes and compare them.

These methods are detailed in the supplemental of the paper, but include five methods. These methods frequently rely on the hungarian algorithm, an algorithm that finds optimal matches according to a cost function.

Three of them compare the edges :

  • R_1 compares the histograms of each ring, possibly with an idf weighting (to emphasize differences in rare edges)

  • R_iso compares each ring with the best matching based on the isostericity values

  • hungarian compares the whole annotation, with the rings being differentiated with an additional ‘depth’ field.

Then all the nodes are compared based on isostericity and this depth field.

Two of them compare the graphlets.
The underlying idea is that just comparing lists of edges does not constraint the graph structure,
while the assembling of graphlet does it more (exceptions can be found but for most graphs,

knowing all graphlets at each depth enables recreating the graph) :

  • R_graphlets works like R_iso except that the isostericity is replaced by the GED

  • graphlet works like the hungarian except that the isostericity is replaced by the GED

Parameters:
  • method – a string that identifies which of these method to use

  • depth – The depth to use in the annotations rings

  • decay – When using rings comparison function, the weight decay of importance based on the depth (the

closest rings weigh more as they are central to the comparison) :type idf: :param idf: Whether to use IDF weighting on the frequency of labels. :type normalization: :param normalization: We experiment with three normalization scheme,

the basal one is just a division of the

score by the maximum value, ‘sqrt’ denotes using the square root of the ratio as a power of the raw value and ‘log’ uses the log. The underlying idea is that we want to put more emphasis on the long matches than on just matching a few nodes :type hash_init_path: :param hash_init_path: For the graphlets comparisons, we need to supply a hashing path to be able to store the values of ged and reuse them based on the hash.

Methods

R_1(ring1, ring2)

Compute R_1 function over lists of features by counting intersect and normalise by the number

R_graphlets(ring1, ring2)

Compute R_graphlets function over lists of features.

R_iso(list1, list2)

Compute R_iso function over lists of features by matching each ring with the hungarian algorithm on the iso values

__init__(method, depth[, decay, idf, ...])

Factory object to compute all node similarities.

add_hashtable(hash_init_path)

compare(rings1, rings2)

Compares two nodes represented as their rings.

delta_indices_sim(i, j[, distance])

We need a scoring related to matching different depth nodes.

get_cost_nodes(node_i, node_j[, bb, pos])

Compare two nodes and returns a cost.

get_length(ring1, ring2[, graphlets])

This is meant to return an adapted 'length' that represents the optimal score obtained when matching all the elements in the two rings at hands

graphlet(rings1, rings2)

This function performs an operation similar to the hungarian algorithm using ged between graphlets instead of isostericity.

graphlet_cost_nodes(node_i, node_j[, pos, ...])

Returns a node distance between nodes represented as graphlets

hungarian(rings1, rings2)

Compute hungarian function over lists of features by adding a depth field into each ring (based on its index in rings).

normalize(unnormalized, max_score)

We want our normalization to be more lenient to longer matches