rnaglib.dataset_transforms

Dataset transforms are trraansforms which process a whole dataset. They take as input a dataset and return the same dataset with some features being added or modified or some elements removed or added.

imports for splitting module

Abstract classes

Subclass these to create your own dataset transforms.

DSTransform()

Transforms is just a base class that performs a processing of a whole RNADataset

Splitter([split_train, split_valid, ...])

Objects enabling the splitting of an RNADataset into train, validation and test sets

DistanceComputer(name[, recompute])

Dataset transform adding to the dataset attributes a distance matrix encoding the pairwise distances between all RNAs of the dataset

RedundancyRemover([distance_name, threshold])

Dataset transform removing redundancy in a dataset by performing clustering on the dataset then keeping only the RNA with the highest resolution within each cluster

Splitters

Ways to split your data (all of these are subclasses of Splitter abstract class).

ClusterSplitter([similarity_threshold, ...])

Abstract class for splitting by clustering with a similarity function.

RandomSplitter([seed])

Just split a dataset randomly.

NameSplitter(train_names, val_names, ...)

Splits a dataset based on hard-coded lists of RNA names to be included in train, val and test sets

Distance computers

Ways to add to the dataset a distance matrix indicating distance between the samples of the dataset (all of these are subclasses of DistanceComputer abstract class)

CDHitComputer([similarity_threshold])

StructureDistanceComputer([name, ...])

Distance computer computing a structure-based pairwise distance between RNAs from a dataset

Loading

Tools for loading RNAs stored in an RNADataset batch-wise for deep learning models.

Collater(dataset)

Wrapper for collate function, so we can use different node similarities.

get_loader(dataset[, batch_size, ...])

Fetch a loader object for a given dataset.

EdgeLoaderGenerator(graph_loader[, ...])

This turns a graph dataloader or dataset into an edge data loader generator.

DefaultBasePairLoader([dataset, data_path, ...])

Dataloader that yields base pairs