rnaglib.splitters.CDHitSplitter¶
- class rnaglib.splitters.CDHitSplitter(similarity_threshold=0.3, n_jobs=-1, seed=0, *args, **kwargs)[source]¶
Splits based on sequence similarity using CDHit. NOTE: Make sure cd-hit is in your PATH.
- __init__(similarity_threshold=0.3, n_jobs=-1, seed=0, *args, **kwargs)¶
Methods
__init__
([similarity_threshold, n_jobs, seed])cluster_split
(dataset, frac[, n])Fast cluster-based splitting adapted from ProteinShake (https://github.com/BorgwardtLab/proteinshake_release/blob/main/structure_split.py).
compute_similarity_matrix
(dataset)Computes sequence similarity between all pairs of RNAs.