rnaglib.tasks.RNAGo

class rnaglib.tasks.RNAGo(size_thresholds=(15, 500), **kwargs)[source]

Predict the GO terms associated with the Rfam family of a given RNA chain. Of course, this task is solved by definition since families are constructed using covariance models. However, it can still test the ability of a model to capture characteristic structural features from 3D.

Task type: multi-class classification Task level: RNA-level

Parameters:

size_thresholds (tuple[int]) – range of RNA sizes to keep in the task dataset(default (15, 500))

__init__(size_thresholds=(15, 500), **kwargs)[source]

Methods

__init__([size_thresholds])

add_feature(feature[, feature_level, is_input])

Shortcut to RNADataset.add_feature

add_representation(representation)

add_rna_to_building_list(all_rnas, rna)

compute_distances()

compute_metrics(all_preds, all_probs, all_labels)

compute_one_metric(preds, probs, labels)

create_dataset_from_list(rnas)

Computes an RNADataset object from the lists touched in add_rna_to_building_list

describe()

Get description of task dataset.

dummy_inference()

evaluate(model, loader)

from_scratch(size_thresholds)

from_zenodo()

Downloads the task dataset from Zenodo and loads it.

get_split_datasets([recompute])

get_split_loaders([recompute])

get_task_vars()

Specifies the FeaturesComputer object of the tasks which defines the features which have to be added to the RNAs (graphs) and nucleotides (graph nodes)

init_metadata([additional_metadata])

Initialize dictionary to hold key/value pairs to self.metadata.

load()

Load dataset and splits from disk.

post_process()

Computes sequence similarity between all atom pairs using CD-Hit

process()

Creates the task-specific dataset.

remove_redundancy()

remove_representation(representation_name)

set_datasets([recompute])

Sets the train, val and test datasets Call this each time you modify self.dataset.

set_loaders([recompute])

Sets the dataloader properties.

split(dataset)

Calls the splitter and returns train, val, test splits.

to_csv(path)

Write a single CSV with all task data.

write()

Save task data and splits to root.

Attributes

default_metric

default_splitter

Returns the splitting strategy to be used for this specific task.

dummy_model

input_var

name

target_var

task_id

Task hash is a hash of all RNA ids and node IDs in the dataset

version