rnaglib.dataset_transforms.RedundancyRemover

class rnaglib.dataset_transforms.RedundancyRemover(distance_name='USalign', threshold=0.95)[source]

Dataset transform removing redundancy in a dataset by performing clustering on the dataset then keeping only the RNA with the highest resolution within each cluster

Parameters:
  • distance_name (str) – the name of the distance metric which has to be used to perform clustering. The distance must have been computed on the dataset (see DistanceComputer)

  • threshold (float) – the similarity threshold (considering similarity as 1-distance) to use to perform clustering

__init__(distance_name='USalign', threshold=0.95)[source]

Methods

__init__([distance_name, threshold])