Refs:
- conda配置 原github:https://github.com/kalininalab/alphafold_non_docker
- 官方github:https://github.com/google-deepmind/alphafold
(一)运行Alphafold所需的数据库下载
独立于环境配置,可以先下载也可以后下载。之所以写在前面,是希望大家可以根据数据库下载是否顺利来决定是否继续配置conda环境。
注意:官方推荐不要下载到alphafold的子目录下。
1. 所需数据库
图为官网给出的下载后的数据,我们一共需要下载9个文件(解压前)。
2. 三种下载方式
2.1 Alphafold官网
运行 scripts/download_all_data.sh
,对网络要求很高,基本不可能一次成功。
2.2 Ref.1中提供的脚本
作者看到这个脚本的时候,数据库已经下载好了,大家可以自己试试。
https://github.com/kalininalab/alphafold_non_docker/blob/main/download_db.sh
2.3 复制各个数据库的网址,再到本地下载
-
方法1:打开图2中的各个sh文件可以看到下载网址,复制网址到本地,下载后再传回linux服务器。将下载后的gz、tar文件依次解压,并整理成图1中的文件树。
-
方法2:这个博主分享了自己的迅雷网盘链接,大家可以在这里下载。https://www.bilibili.com/read/cv26467969/
注意:pdb_mmcif文件可以直接使用图2中的sh文件下载,它可以断点续传,但是需要wget, rsync, gunzip and tar等工具。
(二)conda环境配置
1. 虚拟环境创建
conda create --name alphafold python==3.8
conda activate alphafold
2. 安装依赖
2.1 不容易报错的包
cudatoolkit根据自己的cuda设置;Alphafold目前只支持openmm到7.5.1,超过该版本会报错,需要修改某些文件;tensorflow-cpu不用担心使用gpu
conda install -y -c conda-forge openmm==7.5.1 cudatoolkit==11.3 pdbfixer
conda install -y -c bioconda hmmer hhsuite==3.3.0 kalign2
pip install absl-py==1.0.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.9 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.3.25 ml-collections==0.1.0 numpy==1.21.6 pandas==1.3.4 protobuf==3.20.1 scipy==1.7.0 tensorflow-cpu==2.9.0
2.2 容易报错的jax和jaxlib
- 容易报各种错误:No matching distribution found for jaxlib;
- 必须安装对应cuda的版本,否则最后运行alphafold时还会报错:Unable to initialize backend ‘cuda’: module ‘jaxlib.xla_extension’ has no attribute ‘GpuAllocatorConfig’
所以报错的小伙伴们,建议大家去jax官网找匹配的版本,下载安装。链接: https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
作者配置:cuda113和cudnn804,安装的是:jaxlib-0.3.25+cuda11.cudnn805-cp38-cp38-manylinux2014_x86_64.whl
3. 下载alphafold2代码 v2.3.1
3.1 仓库下载
下载方式大家可以自行选择,下面是其中一种下载方式
wget https://github.com/deepmind/alphafold/archive/refs/tags/v2.3.1.tar.gz && tar -xzf v2.3.1.tar.gz && export alphafold_path="$(pwd)/alphafold-2.3.1"
3.2 下载化学性质到common文件夹
下载好整个alphafold仓库后,将stereo_chemical_props.txt文件下载到/alphafold-2.3.1/alphafold/common文件夹下
wget -q -P $alphafold_path/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt
4. 测试环境配置是否成功
运行run_alphafold_test.py,出现图中内容即为配置成功。如不成功,可按照报错信息补充下载其他依赖。
python /alphafold-2.3.1/run_alphafold_test.py
(三)运行alphafold2
1. 下载Ref1中的run_alphafold.sh文件,填写参数运行即可
Usage: run_alphafold.sh <OPTIONS>
Required Parameters:
-d <data_dir> Path to directory of supporting data
-o <output_dir> Path to a directory that will store the results.
-f <fasta_paths> Path to FASTA files containing sequences. If a FASTA file contains multiple sequences, then it will be folded as a multimer. To fold more sequences one after another, write the files separated by a comma
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets
Optional Parameters:
-g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: true)
-r <run_relax> Whether to run the final relaxation step on the predicted models. Turning relax off might result in predictions with distracting stereochemical violations but might help in case you are having issues with the relaxation stage (default: true)
-e <enable_gpu_relax> Run relax on GPU if GPU is enabled (default: true)
-n <openmm_threads> OpenMM threads (default: all available cores)
-a <gpu_devices> Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-m <model_preset> Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model (default: 'monomer')
-c <db_preset> Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (default: 'full_dbs')
-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed (default: 'false')
-l <num_multimer_predictions_per_model> How many predictions (each with a different random seed) will be generated per model. E.g. if this is 2 and there are 5 models then there will be 10 predictions per input. Note: this FLAG only applies if model_preset=multimer (default: 5)
-b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')
运行示例:-d、-o、-f、-t 为必须参数
# Example run (Uses the GPU with index id 0 as default)
bash run_alphafold.sh -d /data/afdb -o /alphafold-2.3.1/output -f /alphafold-2.3.1/input/input.fasta -t 2023-12-26
# OR for CPU only run
bash run_alphafold.sh -d /data/afdb -o /alphafold-2.3.1/output -f /alphafold-2.3.1/input/input.fasta -t 2023-12-26 -g False
2. 可能出现的错误
No compatible CUDA device is available
解决方法:https://github.com/google-deepmind/alphafold/issues/403。
# 设置 GPU 的计算模式
nvidia-smi -c 0
3. 预测用时
配置:cpu(256 GB);1 gpu(3090,24GB)
716个残基,五个模型,全数据库,大约2个小时
(有其他程序在同时占用该服务器)文章来源:https://www.toymoban.com/news/detail-783056.html
4. 结果
生成的pdb文件都在output/protein_name 文件夹中
文章来源地址https://www.toymoban.com/news/detail-783056.html
到了这里,关于Alphafold2 安装 linux 本地配置 conda配置 非docker的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!