rnaglib.splitters.RNAalignSplitter¶
- class rnaglib.splitters.RNAalignSplitter(structures_dir, seed=0, use_substructures=False, *args, **kwargs)[source]¶
Splits based on structural similarity using RNAalign. NOTE: running this splitter requires that you have the RNAalign executable in your PATH. You can install it by following these instructions: https://zhanggroup.org/RNA-align/download.html.
Use RNAalign to split structures.
- Parameters:
structures_dir (
Union
[str
,PathLike
]) – path to folder containing mmCIF files for all elements in dataset.seed (
int
) – random seed for reproducibility.use_substructures (
bool
) – if True only uses residues in the dataset item’s graph.nodes(). Useful for pocket tasks. Otherwise uses the full mmCIF file from the PDB.
- __init__(structures_dir, seed=0, use_substructures=False, *args, **kwargs)[source]¶
Use RNAalign to split structures.
- Parameters:
structures_dir (
Union
[str
,PathLike
]) – path to folder containing mmCIF files for all elements in dataset.seed (
int
) – random seed for reproducibility.use_substructures (
bool
) – if True only uses residues in the dataset item’s graph.nodes(). Useful for pocket tasks. Otherwise uses the full mmCIF file from the PDB.
Methods
__init__
(structures_dir[, seed, ...])Use RNAalign to split structures.
cluster_split
(dataset, frac[, n])Fast cluster-based splitting adapted from ProteinShake (https://github.com/BorgwardtLab/proteinshake_release/blob/main/structure_split.py).
compute_similarity_matrix
(dataset)Computes pairwise structural similarity between all pairs of RNAs using rna-align.