Quick Start
quick_start.Rmd
This document demonstrates how to use the gluedocking package.
Note: Ensure dependency packages are installed:
install.packages(c("httr", "bio3d"))
library(gluedocking)
Setup
Before using gluedocking, you need to configure the environment by specifying the paths to the required external tools:
library(gluedocking)
prepare_for_gluedocking(
python_path = "/your_path/autodock/python.exe",
prepare_receptor_script = "/your_path/autodock/Lib/site-packages/AutoDockTools/Utilities24/prepare_receptor4.py",
prepare_ligand_script = "/your_path/autodock/Lib/site-packages/AutoDockTools/Utilities24/prepare_ligand4.py",
prepare_split_alt_script = "/your_path/autodock/Lib/site-packages/AutoDockTools/Utilities24/prepare_pdb_split_alt_confs.py",
obabel_path = "/your_path/obabel.exe",
vina_path = "/your_path/anaconda/Library/bin/qvinaw.exe"
)
This will store the paths in your R environment file, so you only need to run this once.
Usage Examples
1. Download Receptor Structures
First, we need to obtain the receptor (protein) structures. The
download_receptor
function retrieves PDB files from the
RCSB Protein Data Bank.
# Download PDB files from RCSB PDB database
pdb_files <- download_receptor(c("1iep", "4hg7"), output_dir = "receptors")
This step downloads the 3D structures of proteins that will serve as receptors (targets) for our docking experiments. The PDB files contain atomic coordinates of the protein structures determined by experimental methods like X-ray crystallography or NMR.
2. Download Ligand Structures
Next, we need small molecules (ligands) to dock into our receptors.
The download_ligand
function retrieves chemical structures
from PubChem.
# Download ligand structures from PubChem
ligand_files <- download_ligand(c("2244", "5090"), output_dir = "ligands")
This step obtains the 3D structures of small molecules that will be docked into the binding sites of our receptor proteins. These compounds are potential drug candidates or known drugs that we want to study for their binding interactions.
3. Prepare Receptor Files
Raw PDB files often contain elements that need to be removed or modified before docking. We’ll process them in several steps:
# Step 1: Remove non-protein atoms (water, ligands, etc.)
trimmed_files<-trim_receptor("./receptors/", output_dir = "trimmed")
# Step 2: Split alternate conformations (if present)
pbd_files<-split_alt(
inputs = trimmed_files,
output_dir = "split_alt",
keep_label = "A" # Keep only the "A" conformation
)
# Step 3: Convert PDB files to PDBQT format for docking
receptor_pdbqt <- prepare_receptor(
pbd_files,
output_dir = "prepared_receptors"
)
These steps prepare the receptor structures for docking:
- trim_receptor removes non-protein atoms like water molecules, ions, and co-crystallized ligands
- split_alt handles alternate conformations of amino acid side chains, keeping only the primary (A) conformation
- prepare_receptor converts the cleaned PDB files to PDBQT format, adding partial charges and atom types required by AutoDock Vina
4. Prepare Ligand Files
Ligands also need to be prepared for docking:
# Step 1: Convert SDF files to MOL2 format
converted_files <- convert_molecule(
input_file = "./ligands/",
output_format = "mol2",
output_dir = "./ligands_mol2/"
)
# Step 2: Convert ligand files to PDBQT format for docking
ligand_pdbqt <- prepare_ligand(
converted_files,
output_dir = "prepared_ligands"
)
These steps prepare the ligand structures:
- convert_molecule converts the downloaded structures to MOL2 format using OpenBabel
- prepare_ligand converts the MOL2 files to PDBQT format, adding partial charges, atom types, and setting up rotatable bonds
5. Calculate Docking Box Parameters
Before docking, we need to define the search space (docking box) where the ligand will be positioned:
# Determine appropriate docking box dimensions and center coordinates
box_params <- calculate_box(receptor_pdbqt, padding = 5)
print(box_params)
This step analyzes the receptor structures to determine the optimal dimensions and center coordinates for the docking box. The padding parameter adds extra space around the protein to ensure the entire binding site is included in the search space.
6. Generate Configuration Files
Now we’ll create configuration files for AutoDock Vina:
# Create configuration files for each receptor-ligand pair
config_files <- write_configs(
receptor_paths = receptor_pdbqt,
ligand_paths = ligand_pdbqt,
box_df = box_params,
output_dir = "configs",
exhaustiveness = 8 # Search thoroughness
)
This step generates configuration files that specify all parameters for the docking runs, including:
- Paths to receptor and ligand files
- Docking box center and dimensions
- Search exhaustiveness (higher values give more thorough but slower searches)
7. Run Docking
Now we’re ready to perform the actual docking simulations:
# Run molecular docking using AutoDock Vina
results <- run_vina(
config_paths = "configs",
output_dir = "docked",
logs_dir = "logs",
cpu = 4 # Use 4 CPU cores
)
This step executes AutoDock Vina to perform the docking simulations for all receptor-ligand pairs. The docking algorithm searches for optimal binding poses and scores them based on estimated binding affinity. Using multiple CPU cores can significantly speed up the process.
8. Check and Rerun Failed Docking Jobs
Sometimes docking jobs may fail, so we can check and rerun them:
# Check log files and rerun any failed docking jobs
check_logs(
logs_dir = "logs",
output_dir = "docked",
config_paths = "configs"
)
This step examines the log files to identify any failed docking runs and automatically reruns them. This ensures that all receptor-ligand pairs are successfully processed.
9. Parse Docking Results
Finally, we’ll analyze the docking results:
# Parse all log files and combine results
results_df <- parse_logs(logs_dir = "logs")
# View top binding poses sorted by binding affinity
top_results <- head(results_df[order(results_df$affinity_kcalmol), ])
print(top_results)
# Save results to CSV file
write.csv(results_df, "docking_results.csv", row.names = FALSE)
This step extracts and organizes the docking results from all log files, including:
- Binding affinities (lower values indicate stronger binding)
- Receptor and ligand identifiers
- Ranking of binding poses