rnaglib.transforms

Transforms are objects which modify RNA dictionaries in various ways. You can apply a transform to an individual RNA or to a collection (filters can only be applied to collections).

In this example, we add a field 'rfam' with the Rfam ID of an RNA.:

>>> from rnaglib.transforms import RfamTransform
>>> from rnaglib.data_loading import RNADataset
>>> dataset = RNADataset(debug=True)
>>> t = RfamTransform()
>>> t(dataset)
>>> dataset[1]['rna'].graph['rfam']
'RF00005'

Note

You can often speed up a transform by passing parallel=True to the transform constructor to apply the transform in parallel.

Simple Transforms

These transforms update the information stored in an RNA dictionary.

Transform([parallel, num_workers])

Transforms modify and add information to an RNA graph via the networkx.Graph data structure.

RfamTransform([parallel, num_workers])

Obtain the Rfam classification of an RNA and store as a graph attribute.

RNAFMTransform([chunking_strategy, ...])

Use the RNA-FM model to compute residue-level embeddings.

PDBIDNameTransform([parallel, num_workers])

Assign the RNA name using its PDBID

ChainNameTransform([parallel, num_workers])

Set the rna.name field using the pdbid and chain ID.

Filters

These transforms filter out RNAs from a collection of RNAs based on various criteria.

FilterTransform([parallel, num_workers])

Reject items from a dataset based on some conditions.

SizeFilter([min_size, max_size])

Reject RNAs that are not in the given size bounds.

RNAAttributeFilter(attribute, **kwargs)

Reject RNAs that lack a certain annotation at the whole RNA level.

ResidueAttributeFilter(attribute[, ...])

Reject RNAs that lack a certain annotation at the whole residue-level.

RibosomalFilter(**kwargs)

Remove RNA if ribosomal

Partitions

These transforms take an RNA and return an iterator of RNAs. Useful for splitting the RNA into substructures (e.g. by chain ID, binding pockets, etc.)

PartitionTransform([parallel, num_workers])

Break up a whole RNAs into substructures.

ChainSplitTransform([parallel, num_workers])

Split up an RNA by chain.

Representations

These transforms convert a raw RNA into a geometric representation such as graph, voxel and point cloud.

Representation()

Callable object that accepts a raw RNA networkx object along with features and target vector representations and returns a representation of it (e.g.

GraphRepresentation([clean_edges, ...])

Converts RNA into a Leontis-Westhof graph (2.5D) where nodes are residues and edges are either base pairs or backbones.

PointCloudRepresentation([hstack, sorted_nodes])

Converts RNA into a point cloud based representation

VoxelRepresentation([spacing, padding, sigma])

Converts RNA into a voxel based representation

Featurizers

These transforms take an annotation in the RNA and cast it into a feature vector.

FeaturesComputer([nt_features, nt_targets, ...])

This class takes as input an RNA in the networkX form and computes the features_dict which maps node IDs to a tensor of features.