rnaglib.tasks.Task¶

class rnaglib.tasks.Task(root, debug=False, in_memory=False, recompute=False, precomputed=True, additional_metadata=None, size_thresholds=None, splitter=None, redundancy_removal=True)[source]¶

Abstract class for a benchmarking task using the rnaglib datasets.

This class handles the logic for building the underlying dataset which is held in an rnaglib.dataset.RNADataset object. Once the dataset is created, the splitter is invoked to create the train/val/test indices. Tasks also define an evaluate() function to yield appropriate model performance metrics.

Parameters:

root (Union[str, PathLike]) – path to a folder where the task information will be stored for fast loading.
debug (bool) – if True, only a fraction of the dataset is used.
in_memory (bool) – if True, dataset is loaded from disk in the memory once as a whole instead of on the fly.
recompute (bool) – whether to recompute the task info from scratch or use what is stored in root.
precomputed (bool) – if True, tries to download processed task from Zenodo.
additional_metadata (Optional[Mapping]) – dictionary with metadata to include in task.
size_thresholds (Optional[Sequence]) – 2 element list with lower and upper bound on RNA size to consider.
splitter (Optional[Splitter]) – rnaglib.dataset_transforms.Splitter object that handles splitting of data into train/val/test indices. If None uses task’s default_splitter() attribute.

__init__(root, debug=False, in_memory=False, recompute=False, precomputed=True, additional_metadata=None, size_thresholds=None, splitter=None, redundancy_removal=True)[source]¶

Methods

`__init__`(root[, debug, in_memory, ...])
`add_feature`(feature[, feature_level, is_input])	Add a feature to the dataset.
`add_representation`(representation)	Add a representation transform to the dataset.
`add_rna_to_building_list`(all_rnas, rna)	Add an RNA to the building list.
`compute_distances`()	Compute similarity distances between RNAs in the dataset.
`create_dataset_from_list`(rnas)	Compute an RNADataset object from the lists touched in add_rna_to_building_list.
`describe`()	Get description of task dataset.
`evaluate`(model, loader)	Evaluate model performance on a dataset.
`from_scratch`(size_thresholds)	Create task dataset from scratch.
`from_zenodo`()	Download the task dataset from Zenodo and load it.
`get_split_datasets`([recompute])	Get train, validation, and test datasets.
`get_split_loaders`([recompute])	Get train, validation, and test dataloaders.
`get_task_vars`()	Define a FeaturesComputer object to set which input and output variables will be used in the task.
`init_metadata`([additional_metadata])	Initialize dictionary to hold key/value pairs to self.metadata.
`load`()	Load dataset and splits from disk.
`post_process`()	Apply post-processing steps to remove redundancy.
`process`()	Tasks must implement this method.
`remove_redundancy`()	Remove redundant RNAs from the dataset based on similarity.
`remove_representation`(representation_name)	Remove a representation transform from the dataset.
`set_datasets`([recompute])	Set the train, val and test datasets.
`set_loaders`([recompute])	Set the dataloader properties.
`split`(dataset)	Calls the splitter and returns train, val, test splits.
`to_csv`(path)	Write a single CSV with all task data.
`write`()	Save task data and splits to root.

Attributes

`default_splitter`	The splitter used if no other splitter is specified.
`task_id`	Task hash is a hash of all RNA ids and node IDs in the dataset.