Skip to main content

AlphaFold on ACCRE

 

Overview

AlphaFold is a deep-learning protein folding prediction software developed by the DeepMind team that has revolutionized the field of protein structure prediction. This documentation will guide you on how to run AlphaFold on the ACCRE cluster at Vanderbilt University.

Prerequisites

Before running AlphaFold on ACCRE, you need to have the following:

  • An ACCRE account (See Instructions)
  • Access to CSB GPUs (Check by typing $id in the ACCRE command prompt)
  • A working knowledge of Linux commands and ACCRE’s batch job submission system, SLURM
  • Amino acid sequences in FASTA format of the protein(s) you want to predict the structure of

Instructions

  1. If you do not have an account, you can apply for one at ACCRE
  2. Instructions for requesting a new CSB/Mchaourab account (New Users):
    • Open a web browser to: New CSB User using login: sbrequest password: welcomevsb
    • Fill out the form completely. Request a password that is NOT the same as your Vanderbilt University e-password. Leave the shell as tcsh.
    • Under the Email section fill in item #1 with your Vanderbilt email address, leave #2 blank.
    • For the associated lab, choose ‘Mchaourab’ in the drop down box. If you have an office address and phone # please put that information into the fields. The home phone # is optional.
    • Click ‘Continue’ at bottom.
    • Check your information and once verified, click ‘Submit’ to send this.
  3. After logging into ACCRE, Copy and edit the script below for your run.
    • Set up the input/output data path in the script. Replace /Path/to/your/input/and/output/data with your own input/output data path.
    • Set the path to the input FASTA file. Replace CTD-EF.fasta with the name of your own FASTA file.
    • Make sure to replace the values for the CALCDIR variable, and AF2_MINICONDA with the appropriate paths for your input files and version of AlphaFold you want to use (version 2.1 for this example is shown)
  4. Submit the script using SLURM by running the command sbatch <filename>, where <filename> is the name of the script file you created.
  5. While waiting for the job to finish, you can check the status of your job using the command squeue -u <username>. (You are able to log out of ACCRE while you are waiting for you jobs to finish)
  6. The output files will be in the input/output data path you set in the script.

To use AlphaFold on the ACCRE cluster, you can use the following SBATCH script:



#!/bin/bash --norc

#SBATCH --mail-user=youremail@vanderbilt.edu
#SBATCH --account=csb_gpu_acc
#SBATCH --partition=turing
#SBATCH --constraint=csbtmp
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --nodes=1
#SBATCH --ntasks=6
#SBATCH --gres=gpu:1
#SBATCH --mem=24G
#SBATCH --time=16:00:00
#SBATCH --job-name=af2-test
#SBATCH --output=af2-test.log

# Set your input/output data path
CALCDIR=/path/to/your/input/and/output/data

# Your input fasta should be in the directory above:
FASTA=CTD-EF.fasta
# Where is the AF2 miniconda environment
AF2_MINICONDA=/sb/apps/alphafold211/miniconda3
# Where is the AF2 Inference data
AF2_DATADIR=/csbtmp/alphafold-data
# Where is the AF2 Git?
AF2_REPO=/sb/apps/alphafold211/alphafold

cd $CALCDIR

#Look at the driver and GPUs
nvidia-smi

echo -n "Running on "
echo $SLURM_JOB_NODELIST

# Activate CSB Alphafold2 miniconda environment
source $AF2_MINICONDA/bin/activate af2

python $AF2_REPO/run_alphafold.py \
      --fasta_paths=$FASTA \
      --max_template_date=9999-12-31 \
      --data_dir=$AF2_DATADIR \
      --output_dir=$CALCDIR \
      --uniref90_database_path=$AF2_DATADIR/uniref90/uniref90.fasta \
      --mgnify_database_path=$AF2_DATADIR/mgnify/mgy_clusters.fa \
      --uniclust30_database_path=$AF2_DATADIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
      --bfd_database_path=$AF2_DATADIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
      --pdb70_database_path=$AF2_DATADIR/pdb70/pdb70 \
      --template_mmcif_dir=$AF2_DATADIR/pdb_mmcif/mmcif_files \
      --obsolete_pdbs_path=$AF2_DATADIR/pdb_mmcif/obsolete.dat

This example script runs AlphaFold in the CSB-supported Miniconda af2 environment, in the turing partition at ACCRE. This requires that you or a collaborator has an ACCRE account with access to the CSB GPUs. Note: You might reach memory issues with this example when dealing with larger protein sizes resulting in failed jobs.

Monitoring the Job

You can monitor the status of your job using the squeue command:


      squeue -u your_username
      

Conclusion

That’s it! This guide should have given you a good idea of how to run AlphaFold on the ACCRE cluster using SBATCH examples. If you have any questions or run into any issues, feel free to contact the ACCRE support team for assistance.

Documentation

See here for more information on AlphaFold.
AlphaFold2 on Github

AlphaFold Changelog

AlphaFold Changelog

Version 2.3

  • Improved accuracy of protein structure predictions
  • Added support for predicting membrane protein structures
  • Updated training data and methods

Version 2.2

  • Improved performance and speed of protein structure predictions
  • Added support for predicting protein-ligand complex structures
  • Updated training data and methods

Version 2.1

  • Improved accuracy and reliability of protein structure predictions
  • Added support for predicting disordered protein regions
  • Updated training data and methods

Version 2.0

  • Initial release of AlphaFold 2
  • Revolutionized protein structure prediction with unprecedented accuracy
  • Based on deep learning neural networks and advanced modeling techniques

 


This document has been developed by the Center for Applied AI in Protein Dynamics.