rnaglib.tasks.Task

class rnaglib.tasks.Task(root, recompute=False, splitter=None, debug=False, save=True, in_memory=True)[source]

Abstract class for a benchmarking task using the rnaglib datasets. This class handles the logic for building the underlying dataset which is held in an rnaglib.data_loading.RNADataset object. Once the dataset is created, the splitter is invoked to create the train/val/test indices. Tasks also define an evaluate() function to yield appropriate model performance metrics.

Parameters:
  • root (Union[str, PathLike]) – path to a folder where the task information will be stored for fast loading.

  • recompute (bool) – whether to recompute the task info from scratch or use what is stored in root.

  • splitter (Optional[Splitter]) – rnaglib.splitters.Splitter object that handles splitting of data into train/val/test indices.

If None uses task’s default_splitter() attribute.

__init__(root, recompute=False, splitter=None, debug=False, save=True, in_memory=True)[source]

Methods

__init__(root[, recompute, splitter, debug, ...])

describe([recompute])

Get description of task dataset, including dimensions needed for model initialization and other relevant statistics.

evaluate(model, loader)

rtype:

dict

get_split_datasets([recompute])

get_split_loaders([recompute])

get_task_vars()

Define a FeaturesComputer object to set which input and output variables will be used in the task.

init_metadata()

Optionally adds some key/value pairs to self.metadata.

load()

Load dataset and splits from disk.

process()

Tasks must implement this method.

set_datasets([recompute])

Sets the train, val and test datasets Call this each time you modify self.dataset.

set_loaders([recompute])

Sets the dataloader properties.

split(dataset)

Calls the splitter and returns train, val, test splits.

write()

Save task data and splits to root.

Attributes

default_splitter

task_id

Task hash is a hash of all RNA ids and node IDs in the dataset