rnaglib.tasks.LigandIdentification

class rnaglib.tasks.LigandIdentification(size_thresholds=(15, 500), admissible_ligands=('PAR', 'LLL', '8UZ'), use_balanced_sampler=False, **kwargs)[source]

Binding pocket-level task where the job is to predict the (small molecule) ligand which is the most likely to bind a binding pocket with a given structure

Task type: multi-class classification Task level: substructure-level

Parameters:
  • size_thresholds (tuple[int]) – range of RNA sizes to keep in the task dataset(default (15, 500))

  • admissible_ligands (tuple[str]) – list of the names of the ligands to include in the dataset (default (‘PAR’, ‘LLL’, ‘8UZ’)). By default, they are paromomycin (PAR), LLL and 8UZ since these are the four most frequent small molecules binding RNAs in our database.

  • use_balanced_sampler (bool) – whether to sample RNAs according to the distribution of their classes

__init__(size_thresholds=(15, 500), admissible_ligands=('PAR', 'LLL', '8UZ'), use_balanced_sampler=False, **kwargs)[source]

Methods

__init__([size_thresholds, ...])

add_feature(feature[, feature_level, is_input])

Shortcut to RNADataset.add_feature

add_representation(representation)

add_rna_to_building_list(all_rnas, rna)

compute_distances()

compute_metrics(all_preds, all_probs, all_labels)

compute_one_metric(preds, probs, labels)

create_dataset_from_list(rnas)

Computes an RNADataset object from the lists touched in add_rna_to_building_list

describe()

Get description of task dataset.

dummy_inference()

evaluate(model, loader)

from_scratch(size_thresholds)

from_zenodo()

Downloads the task dataset from Zenodo and loads it.

get_split_datasets([recompute])

get_split_loaders([recompute])

get_task_vars()

Specifies the FeaturesComputer object of the tasks which defines the features which have to be added to the RNAs (graphs) and nucleotides (graph nodes)

init_metadata([additional_metadata])

Initialize dictionary to hold key/value pairs to self.metadata.

load()

Load dataset and splits from disk.

post_process()

The task-specific post processing steps to remove redundancy and compute distances which will be used by the splitters.

process()

Creates the task-specific dataset.

remove_redundancy()

remove_representation(representation_name)

set_datasets([recompute])

Sets the train, val and test datasets Call this each time you modify self.dataset.

set_loaders([recompute])

Sets the dataloader properties.

split(dataset)

Calls the splitter and returns train, val, test splits.

to_csv(path)

Write a single CSV with all task data.

write()

Save task data and splits to root.

Attributes

default_metric

default_splitter

Returns the splitting strategy to be used for this specific task.

dummy_model

input_var

name

target_var

task_id

Task hash is a hash of all RNA ids and node IDs in the dataset

version