rnaglib.tasks.Task¶
- class rnaglib.tasks.Task(root, debug=False, in_memory=False, recompute=False, precomputed=True, additional_metadata=None, size_thresholds=None, splitter=None)[source]¶
Abstract class for a benchmarking task using the rnaglib datasets. This class handles the logic for building the underlying dataset which is held in an rnaglib.dataset.RNADataset object.
Once the dataset is created, the splitter is invoked to create the train/val/test indices. Tasks also define an evaluate() function to yield appropriate model performance metrics.
- Parameters:
root (
Union
[str
,PathLike
]) – path to a folder where the task information will be stored for fast loading.debug (
bool
) – if True, only a fraction of the dataset is used.in_memory (
bool
) – if True, dataset is loaded from disk in the memory once as a whole instead of on the fly.recompute (
bool
) – whether to recompute the task info from scratch or use what is stored in root.precomputed (
bool
) – if True, tries to download processed task from Zenodo.additional_metadata (
Optional
[Mapping
]) – dictionary with metadata to include in task.size_thresholds (
Optional
[Sequence
]) – 2 element list with lower and upper bound on RNA size to consider.splitter (
Optional
[Splitter
]) – rnaglib.dataset_transforms.Splitter object that handles splitting of data into
train/val/test indices. If None uses task’s default_splitter() attribute.
- __init__(root, debug=False, in_memory=False, recompute=False, precomputed=True, additional_metadata=None, size_thresholds=None, splitter=None)[source]¶
Methods
__init__
(root[, debug, in_memory, ...])add_feature
(feature[, feature_level, is_input])Shortcut to RNADataset.add_feature
add_representation
(representation)add_rna_to_building_list
(all_rnas, rna)compute_distances
()create_dataset_from_list
(rnas)Computes an RNADataset object from the lists touched in add_rna_to_building_list
describe
()Get description of task dataset.
evaluate
(model, loader)from_scratch
(size_thresholds)from_zenodo
()Downloads the task dataset from Zenodo and loads it.
get_split_datasets
([recompute])get_split_loaders
([recompute])get_task_vars
()Define a FeaturesComputer object to set which input and output variables will be used in the task.
init_metadata
([additional_metadata])Initialize dictionary to hold key/value pairs to self.metadata.
load
()Load dataset and splits from disk.
post_process
()The most common post_processing steps to remove redundancy.
process
()Tasks must implement this method.
remove_redundancy
()remove_representation
(representation_name)set_datasets
([recompute])Sets the train, val and test datasets Call this each time you modify
self.dataset
.set_loaders
([recompute])Sets the dataloader properties.
split
(dataset)Calls the splitter and returns train, val, test splits.
to_csv
(path)Write a single CSV with all task data.
write
()Save task data and splits to root.
Attributes
default_splitter
The splitter used if no other splitter is specified.
task_id
Task hash is a hash of all RNA ids and node IDs in the dataset