rnaglib.tasks.BenchmarkBindingSite

class rnaglib.tasks.BenchmarkBindingSite(cutoff=6.0, **kwargs)[source]

Version of RNA-Site implemented using the data and splitting of the experiment by Su et al. (2021)

Hong Su, Zhenling Peng, and Jianyi Yang. Recognition of small molecule–rna binding sites using rna sequence and structure. Bioinformatics, 37(1):36–42, 2021. <https://doi.org/10.1093/bioinformatics/btaa1092>

Task type: binary classification Task level: residue-level

Parameters:

cutoff (float) – distance (in Angstroms) between an RNA atom and any small molecule atom below which the RNA residue is considered as part of a binding site (default 6.0)

__init__(cutoff=6.0, **kwargs)[source]

Methods

__init__([cutoff])

add_feature(feature[, feature_level, is_input])

Add a feature to the dataset.

add_representation(representation)

Add a representation transform to the dataset.

add_rna_to_building_list(all_rnas, rna)

Add an RNA to the building list.

compute_distances()

Compute similarity distances between RNAs in the dataset.

compute_metrics(all_preds, all_probs, all_labels)

Compute classification metrics aggregated across all predictions.

compute_one_metric(preds, probs, labels)

Compute classification metrics for a single set of predictions.

create_dataset_from_list(rnas)

Compute an RNADataset object from the lists touched in add_rna_to_building_list.

describe()

Get description of task dataset.

dummy_inference()

Run dummy inference on the test dataset.

evaluate(model, loader)

Evaluate model performance on a dataset.

from_scratch(size_thresholds)

Create task dataset from scratch.

from_zenodo()

Download the task dataset from Zenodo and load it.

get_split_datasets([recompute])

Get train, validation, and test datasets.

get_split_loaders([recompute])

Get train, validation, and test dataloaders.

get_task_vars()

Specifies the FeaturesComputer object of the tasks which defines the features which have to be added to the RNAs (graphs) and nucleotides (graph nodes)

init_metadata([additional_metadata])

Initialize dictionary to hold key/value pairs to self.metadata.

load()

Load dataset and splits from disk.

post_process()

Apply post-processing steps to remove redundancy.

process()

"Creates the task-specific dataset.

remove_redundancy()

Remove redundant RNAs from the dataset based on similarity.

remove_representation(representation_name)

Remove a representation transform from the dataset.

set_datasets([recompute])

Set the train, val and test datasets.

set_loaders([recompute])

Set the dataloader properties.

split(dataset)

Calls the splitter and returns train, val, test splits.

to_csv(path)

Write a single CSV with all task data.

write()

Save task data and splits to root.

Attributes

default_metric

default_splitter

Returns the splitting strategy to be used for this specific task.

dummy_model

Get a dummy model for testing purposes.

input_var

name

target_var

task_id

Task hash is a hash of all RNA ids and node IDs in the dataset.

version