rnaglib.splitters.RNAalignSplitter

class rnaglib.splitters.RNAalignSplitter(structures_dir, seed=0, use_substructures=False, *args, **kwargs)[source]

Splits based on structural similarity using RNAalign. NOTE: running this splitter requires that you have the RNAalign executable in your PATH. You can install it by following these instructions: https://zhanggroup.org/RNA-align/download.html.

Use RNAalign to split structures.

Parameters:
  • structures_dir (Union[str, PathLike]) – path to folder containing mmCIF files for all elements in dataset.

  • seed (int) – random seed for reproducibility.

  • use_substructures (bool) – if True only uses residues in the dataset item’s graph.nodes(). Useful for pocket tasks. Otherwise uses the full mmCIF file from the PDB.

__init__(structures_dir, seed=0, use_substructures=False, *args, **kwargs)[source]

Use RNAalign to split structures.

Parameters:
  • structures_dir (Union[str, PathLike]) – path to folder containing mmCIF files for all elements in dataset.

  • seed (int) – random seed for reproducibility.

  • use_substructures (bool) – if True only uses residues in the dataset item’s graph.nodes(). Useful for pocket tasks. Otherwise uses the full mmCIF file from the PDB.

Methods

__init__(structures_dir[, seed, ...])

Use RNAalign to split structures.

cluster_split(dataset, frac[, n])

Fast cluster-based splitting adapted from ProteinShake (https://github.com/BorgwardtLab/proteinshake_release/blob/main/structure_split.py).

compute_similarity_matrix(dataset)

Computes pairwise structural similarity between all pairs of RNAs using rna-align.