rnaglib.tasks.RNAGo¶
- class rnaglib.tasks.RNAGo(size_thresholds=(15, 500), **kwargs)[source]¶
Predict the GO terms associated with the Rfam family of a given RNA chain. Of course, this task is solved by definition since families are constructed using covariance models. However, it can still test the ability of a model to capture characteristic structural features from 3D.
Task type: multi-class classification Task level: RNA-level
- Parameters:
size_thresholds (tuple[int]) – range of RNA sizes to keep in the task dataset(default (15, 500))
Methods
__init__
([size_thresholds])add_feature
(feature[, feature_level, is_input])Shortcut to RNADataset.add_feature
add_representation
(representation)add_rna_to_building_list
(all_rnas, rna)compute_distances
()compute_metrics
(all_preds, all_probs, all_labels)compute_one_metric
(preds, probs, labels)create_dataset_from_list
(rnas)Computes an RNADataset object from the lists touched in add_rna_to_building_list
describe
()Get description of task dataset.
dummy_inference
()evaluate
(model, loader)from_scratch
(size_thresholds)from_zenodo
()Downloads the task dataset from Zenodo and loads it.
get_split_datasets
([recompute])get_split_loaders
([recompute])get_task_vars
()Specifies the FeaturesComputer object of the tasks which defines the features which have to be added to the RNAs (graphs) and nucleotides (graph nodes)
init_metadata
([additional_metadata])Initialize dictionary to hold key/value pairs to self.metadata.
load
()Load dataset and splits from disk.
post_process
()Computes sequence similarity between all atom pairs using CD-Hit
process
()Creates the task-specific dataset.
remove_redundancy
()remove_representation
(representation_name)set_datasets
([recompute])Sets the train, val and test datasets Call this each time you modify
self.dataset
.set_loaders
([recompute])Sets the dataloader properties.
split
(dataset)Calls the splitter and returns train, val, test splits.
to_csv
(path)Write a single CSV with all task data.
write
()Save task data and splits to root.
Attributes
default_metric
default_splitter
Returns the splitting strategy to be used for this specific task.
dummy_model
input_var
name
target_var
task_id
Task hash is a hash of all RNA ids and node IDs in the dataset
version