rnaglib.dataset_transforms.RedundancyRemover¶
- class rnaglib.dataset_transforms.RedundancyRemover(distance_name='USalign', threshold=0.95)[source]¶
Dataset transform removing redundancy in a dataset by performing clustering on the dataset then keeping only the RNA with the highest resolution within each cluster
- Parameters:
distance_name (str) – the name of the distance metric which has to be used to perform clustering. The distance must have been computed on the dataset (see DistanceComputer)
threshold (float) – the similarity threshold (considering similarity as 1-distance) to use to perform clustering
Methods
__init__
([distance_name, threshold])