First step to be able to run any software is to install it. This can be a difficult and challenged process starting by download
the software, dependencies, etc; installation of multiple packages; dealing with depencencies craches, etc. See all the steps
Here is, where BioContainers plays his major role. See how to download and “install” blast in your local machine:
$ docker pull biocontainers/blast
This is the docker and containers magic, the software is distributed with all the dependencies and shared OS needed to run.
Docker allows applications to be isolated into containers with instructions for exactly what they need to survive that can be easily ported from machine to machine. If you have 30 Docker containers that you want to run, you can run them all on a single VM.
$ docker run biocontainers/blast blastp -help
This will print the help page for
blastp tool. The first part of the command
docker run biocontainers/blast enable docker
to identified the correct container in your local registry. The second part
blastp -help is the command that you want to
use in the container.
$ docker images
For this example let’s try something practical, suppose that we are molecular biologists studying prion proteins, and we want to find out if the zebrafish, a model organism, has a prion protein similar to the human form.
1) Downloading the human prion sequence
We can grab the huma prion FASTA sequence from UniProt:
$ wget http://www.uniprot.org/uniprot/P04156.fasta
2) Downloading the zebrafish database
Now, lets download and unpack our database, from NCBI
$ curl -O ftp://ftp.ncbi.nih.gov/refseq/D_rerio/mRNA_Prot/zebrafish.1.protein.faa.gz $ gunzip zebrafish.1.protein.faa.gz 3) Preparing the database We need to prepare the zebrafish database with `makeblastdb` for the search, but first we need to make our files available inside the containers. The docker daemon has a parameter called volume (-v), it allows us to map a folder from our operating system inside the container, that way all files in that folder will be visible inside the container, and the BLAST results will also be available to us, outside the container. In the example below, I'm mapping the folder /Users/yperez/workplace (my computer) into /data/ (the container). When running the command on your computer, you should use the correct paths for your files.
$ docker run -v /Users/yperez/workplace:/data/ biocontainers/blast makeblastdb -in zebrafish.1.protein.faa -dbtype prot
The programs log will be displayed on the terminal, indicating if the program finished correctly. Also, you will see some new files on your local folder, those are part of the BLAST database.
-v /Users/yperez/workplace:/data/. This command creates a symbolic link between the
workplacewhere the downloaded files are store and the
/data/inside the container. You can check here for more documentation.
No, that you know how to run a container with all the tricks, then lets go for the final alignments:
$ docker run -v /Users/yperez/workplace:/data/ biocontainers/blast blastp -query P04156.fasta -db zebrafish.1.protein.faa -out results.txt
The results will be saved on the results.txt file, then you can proceed to analyse the matches. By looking the list of the best hits we can observe that zebrafish has a few predicted proteins matching to the human prion with better scores than the predicted prion protein (score:33.9, e-value: 0.22). That’s interesting isn’t ?
Now that you have enough information to start comparing sequences using BLAST, you can move your analysis even further.
We hope that this short example can provide some light on how important and easy it is to run containerized software.
$ cd /home/user/workplace $ docker pull biocontainers/blast $ docker run biocontainers/blast blastp -help $ wget http://www.uniprot.org/uniprot/P04156.fasta $ curl -O ftp://ftp.ncbi.nih.gov/refseq/D_rerio/mRNA_Prot/zebrafish.1.protein.faa.gz $ gunzip zebrafish.1.protein.faa.gz $ docker run -v /Users/yperez/workplace:/data/ biocontainers/blast makeblastdb -in zebrafish.1.protein.faa -dbtype prot $ docker run -v /Users/yperez/workplace:/data/ biocontainers/blast blastp -query P04156.fasta -db zebrafish.1.protein.faa -out results.txt