這部手冊的基本思路（如何使用我）

在 Rosetta入門? 這一部分，我記錄了Rosetta能夠解決的生物問題及其使用的抽樣策略，主要包括：如何將你的問題轉化為Rosetta可以理解的生物問題？Rosetta可以解決哪些生物問題？如何分析Rosetta給出的結果？我需要哪些生物物理學知識（以使用Rosetta）？

隨后，在接下來的四個部分中，將分別將 Rosetta建模、 Docking 、蛋白質改造和設計 以及 使用特殊的抽樣模型 的簡要原理及方法（使用的命令），你可以直接跳轉到特定部分，使用已經標注好的命令來執(zhí)行你的任務。

在最后一部分，我將記錄一些來自Rosetta社區(qū)的使用案例或者最新的開發(fā)進展。

Rosetta入門（基礎知識）

Rosetta的基本架構和官方文檔地址：

Rosetta官方文檔：從這里開始了解Rosetta的開發(fā)者們如何介紹他們的工作

https://www.rosettacommons.org/docs/latest/getting_started/Getting-Started

簡單來說，我們下載到本地的Rosetta文件實際上是一個文件夾，這個文件夾中包含了Rosetta運行所需要的生物物理學數據庫、所有的Rosetta方法的程序以及一些其他的配置文件，當我們使用Rosetta時，只需要按順序進行以下幾個步驟：

明確自己要解決的生物學問題，了解對應的Rosetta Protocol甚至自行編寫Protocol；

準備好我們所需要操作的Input文件：比如蛋白質結構的PDB文件，相關的參數列表；

查看對應的Protocol文檔，明確該Protocol的原理、參數和命令格式；

運行你的任務，等待結果，然后去看Rosetta官方文檔中關于結果分析的攻略；

部分Protocol會提供詳盡的結果，而大部分則需要你自行分析結果，大部分時候你需要指定恰當的聚類中心和距離進行聚類；

如果你的結果還需要進一步完善，不妨回到第一步重新進行下一輪。

因此，你首先需要做的就是了解Rosetta能夠解決的生物學問題——如圖所示，Rosetta可以完成下圖（蛋白質計算生物學）中幾乎所有的內容，只是，這些方法Protocol被獨立地存放在Rosetta的方法庫中，需要你對你的課題有完善且清晰的架構后再去使用。

相對比地，如果說Rosetta是蛋白質設計領域最值得推薦的工具庫，那么Schoedinger就是藥物設計領域最值得推薦的工具庫，只不過前者開源免費不易上手，后者簡單易上手但收費——這兩種軟件或者其他的你所擅長使用的 Bioinformatics 的軟件都可以互為補充，只要你能夠熟練地掌握他們之間互通的各種文件格式作為橋梁（比如PDB、SDF、Mol2）——同樣，這更加需要實驗者清楚地認識到建立一個框架然后按照需求去使用對應的領域中最好的工具或編寫符合自己需求的工具，而不是簡單Follow一種工具包。

下面是Rosetta為我們提供的所有工具包的介紹：

Rosetta能夠解決的生物學問題：

基本Rosetta組件：

Protein Structure Prediction

De Novo Modeling 從頭建模

Ab initio Modeling Tutorial

Tutorial on protein folding using the Broker?

Ab initio

Comparative Modeling (Homology Modeling) 同源建模

Comparative Modeling Tutorial

Comparative Modeling?(potentially out of date)?

Comparative Modeling via RosettaScripts?(uses RosettaScripts)

Specialized Protocols

Symmetric folding and docking of homooligomeric proteins.

Homology modeling of antibody variable fragments.

Ab initio modeling of membrane proteins.

Protein–Protein Docking

Docking Prepare：

Docking：

Protein - Protein Docking整體策略：

?introductory tutorialon protein-protein docking

同源寡聚體：

Symmetric Docking

symmetry file

蛋白質-肽 Docking：

protein–peptide docking?

Protein–Ligand Docking：

RosettaLigand

RosettaLigand via RosettaScripts

Docking Approach using Ray Casting (DARC)

Protein Design:

生成自定義的Rosetta腳本：Rosetta scripting interfaces

關于殘基文件（準備好的蛋白質才能用于蛋白質設計）：resfiles

蛋白質重設計：

優(yōu)化Sidechain：fixed backbone design

優(yōu)化Sidechain以優(yōu)化疏水表面：?fixed backbone design can be run with consideration of hydrophobic surface patches

其他應用：

scanning for?stabilizing point mutations,

specificity prediction and library design with?sequence tolerance

multistate design?of different functions in different contexts

RosettaRemodel?is a generalized framework for flexible backbone design

Rosetta使用-在線Sever、本地安裝及配置：

Rosetta有多種使用方式，包括在線的Sever、及本地安裝的Py與命令行模式的Rosetta，你可以從這里了解它們：

https://www.rosettacommons.org/software

https://www.rosettacommons.org/software/ways-to-use

你可以從這里直接注冊學術版用戶并下載Rosetta：

https://www.rosettacommons.org/software/license-and-download

下載后，可以從這里按照建議安裝：

https://www.rosettacommons.org/docs/latest/build_documentation/Build-Documentation

在Ubuntu系統(tǒng)下，假如你下載的是編譯后的文件（大約13G），則直接使用即可，若是編譯前的文件（大約3G），你只需要解壓下載到的文件，隨后進入Rosetta/main/source目錄下，運行命令：./scons.py -j2 bin （注意這里的-j2指的是用兩個核）即可安裝Rosetta，隨后，記得在bashrc中添加你的Rosetta文件夾的路徑，這樣會為后面的工作節(jié)省很多時間。

Rosetta——入門示例：如何進行蛋白質的穩(wěn)定性突變掃描？

Rosetta本身可以作為一個黑箱來使用，但是正如它的官方文檔中所闡述的那樣，如果你僅僅將Rosetta作為一個黑盒來使用，那對于你的科學研究將是一種莫大的傷害。然而，本手冊的目的卻看起來與之相悖，因此需要著重強調的是，這份手冊可以作為你使用Rosetta的入門參考和接觸蛋白質設計領域的起點，但絕不應該成為你使用Rosetta的終點。

使用Rosetta進行生物物理學問題的分析思路大概如下圖所示：

首先你要有一個結構，這個結構的可信度越高越好，當然，你可以使用Rosetta根據電子云密度圖、蛋白質序列信息等來生成這個結構；

你要選擇能夠幫助你解決問題的Rosetta方法，以及針對不同Rosetta方法所采用的結果分析的方法（一般來說，結果需要使用者自行分析）；

計算資源——你需要充足的計算機資源，Rosetta是針對超級計算機編寫的，當然它也可以適用于不同的尺度，不過充足的計算資源會讓你更快得到結果。

在本次實例中，我們將使用一個簡單的Rosetta方法來進行介紹。

對于單一Rosetta任務，你需要給出：

Input文件，通常是蛋白質結構；

命令，其中包括調用的Protocol的位置、各種參數以及一些其他的內容；

只有兩點，如此簡單。

那么，對于home目錄下我們保存的一個pdb文件，使用Rosetta的蛋白穩(wěn)定性改造掃描的Protocol，我們應當按照這樣的流程來完成整個分析過程：

1：準備好要使用的結構文件，最好是存放在一個特定目錄下；

因此，我們將這個文件放到home目錄下的一個叫Rosetta_task的文件夾下，這個文件夾將成為本手冊所使用的規(guī)范化文件夾，假定每次任務時都要清空這個文件夾再放入我們要使用的各種文件，同時結果也會輸出到這個文件夾，input文件命名為in_n，結果文件命名為out_n，n為序號。那么此時，我們的蛋白文件路徑為：~/Rosetta_task/in_1.pdb

2：打開對應Protocol的文檔，查看相關信息，尤其是Protocol的適用目標、命令的參數信息、范例、原理等；

打開相應文檔：https://www.rosettacommons.org/docs/latest/application_documentation/design/pmut-scan-parallel

了解到關鍵信息：

input文件命令：-s 給出要進行的結構列表，pdb格式；-l 給出文本文件包含所有要進行的結構的文件名，一行一個；

-database 指定path/to/rosetta/main/database；

指定任務設置：

-double_mutant_scan? ? ? ? 雙突變? ? ? ? ?

-mutants_list <file>? ? ? ? ? ? 只做指定的特殊突變

-output_mutant_structures? 輸出突變后的結構?

-DDG_cutoff? ? ? ? ? ? ? ? ? ? ? ? 設定僅輸出ddG優(yōu)于此限制的突變

（其他信息略）

3：確認要使用的代碼，記錄下來并到終端運行。

我們使用的代碼是（此時在Rosetta目錄下）：

./main/source/bin/pmut_scan_parallel.linuxgccrelease -database ./database -s ~/Rosetta_task/in_1.pdb -ex1 -ex2 -extrachi_cutoff 1 -use_input_sc -ignore_unrecognized_res -no_his_his_pairE -multi_cool_annealer 10 -mute basic core > ~/Rosetta_task/out_1

指定方法的路徑，是一個目錄；

指定Rosetta的數據庫的位置；

指定蛋白結構文件的位置；

指定任務的生物物理學相關參數；

結果輸出到對應的路徑下，并創(chuàng)建一個文本記錄；

4：查看結果文件，進行分析

這次任務輸出的結果是這樣的：

mutation mutation_PDB_numbering average_ddG average_total_energy

A-P55S A-P64S -64.671 305.99

A-P55R A-P64R -63.541 307.12

A-P55H A-P64H -63.462 307.19

A-P55T A-P64T -63.232 307.42

A-P55K A-P64K -62.909 307.75

A-P55N A-P64N -62.753 307.9

A-P55Q A-P64Q -62.628 308.03

……

很顯然，我們需要對結果進行排序并選出更好的，因此，我們可以用Excel打開此文本，并指定空格為分列符，隨后進行排序即可。

你還可以參考以下內容了解如何更進一步使用Rosetta：

https://www.rosettacommons.org/docs/latest/getting_started/Analyzing-Results

https://www.rosettacommons.org/docs/latest/getting_started/Determining-what-a-problem-is

https://www.rosettacommons.org/docs/latest/getting_started/Resources-for-learning-biophysics-and-computational-modeling

https://www.rosettacommons.org/docs/latest/rosetta_basics/Incorporating-Experimental-Data

Rosetta-建模：

Rosetta-Docking：

Rosetta-蛋白質改造與設計：

Rosetta-特殊抽樣模型：

Rosetta-社區(qū)專題摘錄：

附錄1 SampleX 組件

Structure determination via fragment substitution

AbscriptLoopCloserCM

handles loop closure in ab initio relax circumstances

AbscriptMover

Structure generation

BuildSheet

PerturbBundleHelix

MakeBundleHelix

BackboneSampler

FitSimpleHelix

InsertPoseIntoPoseMover pose combination

build_Ala_pose

SetupForSymmetry?Necessary before doing anything else symmetrically

AddHydrogens?adds and optimizes missing hydrogens

SymmetricAddMembraneMover

GrowPeptides

From electron density:

IdealizeHelices

RecomputeDensityMap

CartesianSampler

Residue Insertion and Deletion

InsertPoseIntoPoseMover

Structure optimization

IdealizeMover

Replace every residue with a version with bond lengths and angles from the database. Add constraints to maintain original hydrogen bonds. Then, minimize every side-chain and backbone dihedral (except proline phi) using dfpmin.

FinalMinimizer

SaneMinMover

TaskAwareMinMover

Symmetrizer

Functionally an optimization mover; will take a pose with sufficiently small deviations from symmetry and resolve them.

TaskAwareSymMinMover

SymMinMover

minimize with symmetry

LocalRelax

FastRelax

Repeatedly repack sidechains and minimize sidechains and backbone while ramping the repulsive weight up and down. Respects resfiles, movemaps, and task operations.

RepackMinimize?Like a single cycle of relax, with a constant repulsive weight.

NormalModeMinimizer

Ensemble generation

FastRelax

Backrub

ParallelTempering

CanonicalSampling

BBGaussian

GenericSimulatedAnnealer

GeneralizedKIC

Backbone degrees of freedom

Backrub

BackrubDD

BackrubSidechain

ShortBackrubMover

A particular form of backbone movement intended to coordinate with maintaining particular side chain positions.

Small?Make small perturbations to a backbone degree of freedom

Shear?Make small perturbations to one dihedral of a residue and contravarying perturbations to the other dihedral, to avoid a "lever arm effect"

SetTorsion

Either set a torsion to a value or perturb a torsion by a value (with the perturb flag)

MinimizeBackbone

Just minimize the backbone

RandomOmegaFlipMover Flip a random omega angle; most useful for peptoids

BackboneTorsionPerturbation

BackboneTorsionSampler

BBGaussian

Sidechain degrees of freedom

SetChiMover

SymRotamerTrialsMover

PackRotamersMover

RotamerTrialsMinMover

RepackTrial

BoltzmannRotamerMover

PackRotamersMoverPartGreedyMover

Prepack

SymPackRotamersMover

PerturbRotamerSidechain

DnaInterfacePacker

PerturbChiSidechain

Any conformational degree of freedom

RandomTorsionMover

Perturbs a random torsion selected from a movemap

Loop conformational sampling

AnchoredGraftMover

a composite mover that does a lot of loop modeling followed by repacking to graft in residues

KicMover

LegacyKicSampler

SmallMinCCDTrial

ShearMinCCDTrial

LoopBuilder

LoopCM

LoopCreationMover

LoopFinder

LoopHash

LoopHashDiversifier

LoopHashLoopClosureMover

The LoopHash algorithms constitute a very rapid way to draw on loop conformations from fragment libraries that could achieve a given closure

LoopLengthChange

LoopModeler

LoopMoverFromCommandLine

LoopMover_Perturb_CCD

LoopMover_Perturb_KIC

LoopMover_Perturb_QuickCCD

LoopMover_Perturb_QuickCCD_Moves

LoopMover_Refine_Backrub

LoopMover_Refine_CCD

LoopMover_Refine_KIC

LoopMover_SlidingWindow

LoopProtocol

LoopRefineInnerCycleContainer

LoopRelaxMover

LoopRemodel

LoophashLoopInserter

LoopmodelWrapper

CCDEndsGraftMover

CCDLoopCloser

CCDLoopClosureMover

DefineMovableLoops

GeneralizedKIC

An enormous, intricate system that largely operates on its own to perform kinematic loop closure on an arbitrary sequence of atoms.

Docking

DARC?app

Via a ray casting algorithm particularly fast on GPUs

FlexPepDock

Concurrently samples backbone degrees of freedom on the peptide

SymDockProtocol

Symmetric oligomer docking

RigidBodyTransMover

manually manipulate the relative position of two bodies across a jump

RigidBodyPerturbNoCenter

UnbiasedRigidBodyPerturbNoCenter

UniformRigidBodyCM

Docking

DockingInitialPerturbation

DockingProtocol

DnaInterfaceMinMover

SymFoldandDockRbTrialMover

HighResDocker

DockSetupMover

DockWithHotspotMover

Chemical connectivity

ForceDisulfides

Given a list of residue pairs (for example, disulfides), repack residue shells around them but do not change the CYS-type residues themselves.

DisulfideInsertion

Mutates two residue positions to CYS:disulfide, link them conformationally, and add constraints to have good disulfide distance, angle, and dihedral to the pose. Intended for adding a disulfide to short potentially macrocyclic peptides.

DisulfideMover

Given two residue positions, mutate both to CYS:disulfide and link them conformationally; do no repacking or minimization

Disulfidize

Tries every possible pair of residues in a pose to try to introduce one or more new disulfides as long as they score well

Design

FastDesign

Controlling amino acid composition during design

CoupledMover

FastRelax mover that does design during repacking

RemodelMover

Extremely diverse function: can do design, repacking, complete backbone remodeling, disulfide construction, and so forth

enzyme design