rnaglib.tasks.Task¶
- class rnaglib.tasks.Task(root, recompute=False, splitter=None, debug=False, save=True, in_memory=True)[source]¶
Abstract class for a benchmarking task using the rnaglib datasets. This class handles the logic for building the underlying dataset which is held in an rnaglib.data_loading.RNADataset object. Once the dataset is created, the splitter is invoked to create the train/val/test indices. Tasks also define an evaluate() function to yield appropriate model performance metrics.
- Parameters:
root (
Union
[str
,PathLike
]) – path to a folder where the task information will be stored for fast loading.recompute (
bool
) – whether to recompute the task info from scratch or use what is stored in root.splitter (
Optional
[Splitter
]) – rnaglib.splitters.Splitter object that handles splitting of data into train/val/test indices.
If None uses task’s default_splitter() attribute.
Methods
__init__
(root[, recompute, splitter, debug, ...])describe
([recompute])Get description of task dataset, including dimensions needed for model initialization and other relevant statistics.
evaluate
(model, loader)- rtype:
dict
get_split_datasets
([recompute])get_split_loaders
([recompute])get_task_vars
()Define a FeaturesComputer object to set which input and output variables will be used in the task.
init_metadata
()Optionally adds some key/value pairs to self.metadata.
load
()Load dataset and splits from disk.
process
()Tasks must implement this method.
set_datasets
([recompute])Sets the train, val and test datasets Call this each time you modify
self.dataset
.set_loaders
([recompute])Sets the dataloader properties.
split
(dataset)Calls the splitter and returns train, val, test splits.
write
()Save task data and splits to root.
Attributes
default_splitter
task_id
Task hash is a hash of all RNA ids and node IDs in the dataset