rnaglib.transforms

Transforms are objects which modify RNA dictionaries in various ways. You can apply a transform to an individual RNA or to a collection (filters can only be applied to collections).

In this example, we add a field 'rfam' with the Rfam ID of an RNA.:

>>> from rnaglib.transforms import RfamTransform
>>> from rnaglib.dataset import RNADataset
>>> dataset = RNADataset(debug=True)
>>> t = RfamTransform()
>>> dataset = t(dataset)
>>> dataset[2]['rna'].graph['rfam']
'RF00005'

Note

You can often speed up a transform by passing parallel=True to the transform constructor to apply the transform in parallel.

Generic transforms

This is the general formulation of the transform, from which specific Transforms described below are derived.

Transform([parallel, num_workers])

Transforms modify and add information to an RNA graph via the networkx.Graph data structure.

Annotation Transforms

These transforms update the information stored in an RNA dictionary.

AnnotationTransform(**kwargs)

A transform that computes an annotation for the RNA.

RfamTransform([parallel, num_workers])

Obtain the Rfam classification of an RNA and store as a graph attribute.

RNAFMTransform([chunking_strategy, ...])

Use the RNA-FM model to compute residue-level embeddings.

PDBIDNameTransform([parallel, num_workers])

Assign the RNA name using its PDBID

ChainNameTransform([parallel, num_workers])

Set the rna.name field using the pdbid and chain ID.

SecondaryStructureTransform(structures_dir)

Compute secondary structure in dot-bracket notation for each chain in the RNA and store in a graph-level dictionary.

SmallMoleculeBindingTransform(structures_dir)

Annotate RNAs with small molecule binding information.

RBPTransform(structures_dir[, ...])

Adds information at the residue level about the protein content of the environment of the residue.

AnnotatorFromDict(annotation_dict, name, ...)

Generic annotator which enables to add node-level features to a dataset by only using a dictionary mapping the node names to the desired node features.

DummyAnnotator(**kwargs)

Add a dummy attribute with value 1 to all nodes in an RNA graph.

CifMetadata(structures_dir)

BindingSiteAnnotator([include_ions, cutoff])

Annotation transform adding to each node of the dataset a binary node feature indicating whether it is part of a binding site

Filters

These transforms filter out RNAs from a collection of RNAs based on various criteria.

FilterTransform([parallel, num_workers])

Reject items from a dataset based on some conditions.

SizeFilter([min_size, max_size])

Reject RNAs that are not in the given size bounds.

RNAAttributeFilter(attribute[, value_checker])

Reject RNAs that lack a certain annotation at the whole RNA level.

ResidueAttributeFilter(attribute[, ...])

Reject RNAs that lack a certain annotation at the whole residue-level.

ResidueNameFilter([value_checker, min_valid])

Filter RNAs based on their residuess' names.

RibosomalFilter(**kwargs)

Remove RNA if ribosomal

NameFilter(names, **kwargs)

Filter RNAs based on their names.

ChainFilter(valid_chains_dict, **kwargs)

Filter RNAs based on valid chain names for each structure.

ResolutionFilter(resolution_threshold, **kwargs)

Filters RNA based on their resolution.

Partitions

These transforms take an RNA and return an iterator of RNAs. Useful for splitting the RNA into substructures (e.g. by chain ID, binding pockets, etc.)

PartitionTransform([parallel, num_workers])

Break up a whole RNAs into substructures.

ChainSplitTransform([parallel, num_workers])

Split up an RNA by chain.

ConnectedComponentPartition([parallel, ...])

Split up an RNA by connected components.

PartitionFromDict(partition_dict, **kwargs)

Partitions an RNA according to a partition defined in a dictionary.

Representations

These transforms convert a raw RNA into a geometric representation such as graph, voxel and point cloud.

Representation()

Callable object that accepts a raw RNA networkx object along with features and target vector representations and returns a representation of it (e.g. graph, voxel, point cloud).

SequenceRepresentation([framework, backbone])

Represents RNA as a linear sequence following the 5'to 3' order of backbone edges.

GraphRepresentation([framework, ...])

Converts RNA into a Leontis-Westhof graph (2.5D) where nodes are residues and edges are either base pairs or backbones.

PointCloudRepresentation([hstack, sorted_nodes])

Converts RNA into a point cloud based representation

VoxelRepresentation([spacing, padding, sigma])

Converts RNA into a voxel based representation

RingRepresentation([node_simfunc, ...])

Converts RNA into a ring based representation

Featurizers

These transforms take an annotation in the RNA and cast it into a feature vector.

FeaturesComputer([nt_features, nt_targets, ...])

This class takes as input an RNA in the networkX form and computes the features_dict which maps node IDs to a tensor of features.