EnsembleGen: RNA Ensemble Structure Selection Based on NMR Data

 

EnsembleGen (also named "RESSD") is an efficient structure ensemble selection algorithm by integrating with residual dipolar coupling (RDC) data from nuclear magnetic resonance (NMR) experiments. This page includes instruction to the program. The whole package can be downloaded here. If you use this tool, our publication can be cited as: Chang AT, Chen L, Song L, Zhang S, Nikonowicz EP. Biochemistry. 2020 Sep 8;59(35):3225-3234. doi: 10.1021/acs.biochem.0c00369.

 

Pre-requirement:

1. There is no requirement for operating system to run EnsembleGen. However, you will need python interpretor to execute the program. Python 2.7 is recommended.

2. Please download and install relax-4.0.3 based on your platform from this website: https://www.nmr-relax.com/download.html#series_4.0.


Usage:
                             *** EnsembleGen ***

EnsembleGen [-c/--cofig <config file>]
            [-o/--output <output file>]
            [-n/--nproc <num cpu>]
            [-r/--restart <restart file>]
            [-h/--help]

    e.g.,  EnsembleGen -c config
           EnsembleGen -c config -o output -n 8 >> somefile
           EnsembleGen -c config -r restartfile

Description:

This script aims to generate a reasonable structural ensemble given the residual dipolar coupling (RDC) restraints and a number of conformers (e.g, generated by MD simualtions). There are many ways of generating a structural ensemble using MD simulation snapshots (e.g., random selection, clustering analysis). However, it will be more accurate to combine with NMR restraints for ensemble generation, especially when one wants to use a fixed number of conformers (or states) to represent the dynamics over microseconds timelap.

EnsembleGen utilizes relax package (www.nmr-relax.com) to calculate the N-state RDC values and compare with the experimental values. In order to select the best N-state models from a large pool of conformers, clustering algorithm simualted annealing will be used. This clustering algorithm simulated annealing parallellizes N-state model selection, which has demonstrated a significant improvement for RMSD convergence.


Example Input Files:

[config file]
// note: The current implementation cannot deal with path with delimiters such as space/tab/comma. Please try to avoid them.
// n_steps_rst option is useful when combined with mutliprocessing
// n_steps_rst must be smaller than n_steps_ttl
e.g.,
  name            test            // project name
  pdb_dir         selex_test      // folder to pdb repository
  rdc_file        rdc_test.txt    // rdc file
  verbosity       0               // verbosity of the stdout
  n_state         20              // number of model to select
  n_mutation      1               // mutation for each simulated annealing step
  n_steps_ttl     100             // total number of step
  n_steps_rst     10              // restart from the best solution every several steps
  T_start         1e-3            // starting temperature
  T_max           500             // maximum temperature
  T_end           1e-3            // ending temperature

[rdc file]
// column1 is atomname of spin 1 (: is residue Id)
// column2 is the atomname of spin 2 (@ is atom name)
// column3 is the rdc value in Hz
// please refer to relax mannual for more atom naming syntax

e.g., an rdc file containing two rdc values
  :12@C6   :12@H6   13.7
  :13@C1'  :13@H1'  -8.0


Options:

  -h, --help            show this help message and exit
  -c CONFIGFILE, --config=CONFIGFILE
                        SA_relax config file (default = config)
  -o OUTPUTFILE, --output=OUTPUTFILE
                        output file (default = sa_result)
  -n NPROC, --nproc=NPROC
                        number of processors (default = 1)
  -r RESTARTFILE, --restart=RESTARTFILE
                        restart file (default = __SA.RESTART__)