BACKGROUND INFORMATION: The three BLAST programs that one will commonly use are BLASTN, BLASTP and BLASTX. BLASTN will compare your DNA sequence with all the DNA sequences in the nonredundant database (nr). BLASTP will compare your protein sequence with all the protein sequences in nr. In BLASTX your nucleotide sequence will be translated in all six reading frames and the products compared with the nr protein database. A tutorial is available at NCBI.
Nucleotide BLAST ( BLASTN) N.B. the default database is "human genomic and transcript" not "nucleotide collection (nt/nr)"
Protein BLAST ( BLASTP) N.B. This program is also coupled with a motif search. If you suspect that your pprotein may only show weak sequence similarity to other proteins, I would suggest clicking on the
Translated BLAST ( BLASTX)
TBLASTX searches translated nucleotide databases using a translated nucleotide query; while TBLASTN searches translated nucleotide databases using a protein query. These are useful resources if you are interested in homologs in unfinished genomes. Undeter "Databases" select "genomic survey sequences", "High throughput genomic sequences" or "whole-genome shotgun reads"
Blast with Microbial Genomes (BLASTN, TBLASTN, TBLASTX etc.). Permits one to compare a nucleic acid or protein sequence against finished archaeal and bacterial genomes.
N.B.
1. Depending upon the time of day your results may appear almost immediately or your search may be delayed or not accepted at all. Be prepared for plenty of results. You may only want to print the first few pages (e.g.1-5). Alternatively under "Algorithm Parameters" change the "Maximum targets" from 100 (default) to 10 or 50 .2. For PSI-BLAST, and other searches I frequently enter information in the "Entrez Query" section e.g. Escherichia coli[organism] or Viruses[organism] to see "hits" specifically to E. coli or viruses/bacteriophages (see here for details)
3. It is adviseable to always select "
EMB
BLAST - (European
Molecular Biology network - Swiss node). Very convenient since it permits
one to specifically search
databases
such as prokaryote, bacteriophage,
fungal, & 16S rRNA using BLASTN, and specific bacterial genomes
or SwissProt
using BLASTX or BLASTN.
ParAlign
(CMBN Bioinformatics Group,
University of Oslo, Norway) -
employs a heuristic method for sequence alignment. In
essence, ParAlign is about as sensitive as Smith-Waterman but runs at the
speed of BLAST. Nice graphics.
GTOP
Sequence Homology Search (Laboratory for Gene-Product Informatics,
National Institute of Genetics, Japan) - offers
BLASTP
search capability against individual Archaea, Bacteria, Eukaryota,
and viruses.
T4-like Phage
NCBI MegaBLAST (Tulane
Univ., New Orleans, U.S.A. & CNRS, Toulouse, France)
- includes a growing list of T4-like
completed phage sequences as well as those in the draft and contig stages
of completion.
WU-BLAST
(Washington University BLAST) -
The emphasis of this tool is to find regions
of sequence similarity quickly, with minimum loss of sensitivity. This
will yield functional and evolutionary clues about the structure and
function of your novel sequence.
Batch BLAST
(Greengene web server;
developed by Michael V. Graves for DNA or protein BLAST sequence analysis
against the NCBI databases. It allows one to submit a
file that contains multiple sequences and then will organize the results by
each individual sequence contained in the file. The results will be
available on the server as individual html files. An alternative site is
here
.
For more sophisticated studies you might want to employ:
PSI-BLAST or PHI-BLAST
search - (NCBI)
Position-Specific Iterative BLAST creates a profile after the initial
search. This is
used subsequent searches.
Tutorial.
BLAST 2
- (NCBI) BLAST two sequences against one another.
N.B. This utilizes BLASTN, P, X as well as TBLASTN and
TBLASTX.
Gene Context Tool - is an incredible tool for
visualizing the genome context of a gene or group of genes (synteny). In
the
following diagram an RpoN (Sigma54) protein was analyzed. (Reference:
R. Ciria et al. (2004)
Bioinformatics 20: 2307-2308).

Other search engines include:
Fasta33 - (EBI) I particularly like the Visual Fasta presentation of the data. I think that it is better than what one gets on BLAST searches.
TC-BLAST (Saier Laboratory Bioinformatics Grp, Univ. San Diego, U.S.A.) - Scans the transport protein database (TC-DB) producing alignments and phylogenetic trees. The TC-DB details a comprehensive classification system for membrane transport proteins known as the Transport Commission (TC) system.
MEROPS BLAST - permits one to screen protein sequences against an extensive database of characterized peptidases (Rawlings, N.D., O'Brien, E. A. & Barrett, A.J. (2002) MEROPS: the protease database. Nucleic Acids Res. 30, 343-346).
SEARCHGTr - is a web-based software for the analysis of glycosyltransferases involved in the biosynthesis of a variety of pharmaceutically important compounds like adriamycin, erythromycin, vancomycin etc. This software has been developed based on a comprehensive analysis of sequence/structural features of 102 GTrs of known specificity from 52 natural product biosynthetic gene clusters (Reference: Kamra, P. et al. 2005. Bioinformatics 33 (Web Server Issue): W220-W225).
PipeAlign (Laboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire, France ) offers an integrated approach to protein family analysis through a cascade of five different sequence analysis programs (BALLAST, DbClustal multiple alignment program, Rascal alignment analysis, removal of any sequences that do not belong to the protein family are performed by the NorMD, and clustered into potential functional subfamilies using Secator or DPC. Reference: F. Plewniak et al. 2003. Nucleic Acids Research, 31: 3829-3832.
MPsrch (EMBL-EBI) - this sequence sequence comparison tool implements the true Smith and Waterman algorithm identifying hits in cases where Blast and Fasta fail and also reports fewer false-positives. Provides information on: Match %; % Query Match (% of the query sequence matched); Conservative changes; Mismatches; Indels; and Gaps.
GOAnno (University of Strasbourg, France) - this web tool automatically annotates proteins according to the Gene Ontology using hierarchised multiple alignments. Positioning the query protein in its aligned functional subfamily represents a key step to obtain highly reliable predicted GO annotation based on the GOAnno algorithm.
COMPASS - is a profile-based method for the detection of remote sequence similarity and the prediction of protein structure. The server features three major developments: (i) improved statistical accuracy; (ii) increased speed from parallel implementation; and (iii) new functional features facilitating structure prediction. These features include visualization tools that allow the user to quickly and effectively analyze specific local structural region predictions suggested by COMPASS alignments.(Reference: R.I. Sadreyev et al. 2009. Nucl. Acids Res. 37(Web Server issue:W90-W94)
Unique search engine:
MineBlast - performs BLASTP searches in UniProt to identify names and synonyms based on homologous proteins and subsequently queries PubMed, using combined search terms in order to find and present relevant literature. This tool only allows max. 100 queries per user per day. (Reference: G. Dieterich et al. 2005. Bioinformatics 21: 3450-3451).
Comparison of homology between two small genomes:
SCAN2 (Softberry.com) provides one with a colour-coded graphical alignment of genome length DNAs in Java. In the top panel regions of high sequence identity are presented in red. By highlighting the gray, yellow, green, black boxes one can select specific regions for examination of the sequence alignment. For additional information on the output see here. This site appears to work best with Internet Explorer.
Advanced PipMaker (Schwartz et al. Genome Research Vol. 10, Issue 4, 577-586, April 2000) aligns two DNA sequences and returns a percent identity plot of that alignment, together with a traditional textual form of the alignment. You might want to download Laj (Penn State - Bioinformatics Group, U.S.A.) for viewing and manipulating the output from pairwise alignment programs such as PipMaker representations of the alignments.
JDotter: A Java Dot Plot Viewer ( Viral Bioinformatics Resource Center, University of Victoria, Canada) - a dot matrix plotter for Java. Produces similar diagrams to the above mentioned programs, but with better control on output.
multi-zPicture: multiple sequence alignment tool (Comparative Genomics Center, Lawrence Livermore National Laboratory, U.S.A.) - provides nice dotplot graphs and dynamic visualizations. If simple gene locations are provided in the form (e.g. > 2000 5000 RNA_polymerase; indicates the the RNA polymerase gene is found on the plus strand between bases 2000 and 5000) this data will be added to the dynamic visualization. zPicture alignments can be automatically submitted to rVista to identify conserved transcription factor binding sites.
GeneOrder 3.0 (D. Seto, Bioinformatics & Computational Biology, George Mason Univ., U.S.A.) is ideal for comparing small GenBank genomes (up to 2 Mb). Each gene from the Query sequence is compared to all of the genes from the Reference sequence using BLASTP. There are two display formats: graphical and tabular. Currently the graph is an applet and must be saved as a "SCREEN SHOT". If your data is not present in GenBank use this site.
CoreGenes (D. Seto, Bioinformatics & Computational Biology, George Mason Univ., U.S.A.) is designed to analyze two to five genomes simultaneously, generating a table of related genes - orthologs and putative orthologs. These entries are linked to their GenBank data. It has a limit of 0.35 Mb, while the newer version CoreGenes 2.0 extends the limit to approx. 2.0Mb. If your data is not present in GenBank use this site. The following diagram is from an analysis of coliphages T3, T7, Yersinia phage phi-YeO3-12 and Roseophage S10I.

CoreGenes 3.0 - is the latest member in the CoreGenes family of tools. It determines unique genes contained in a pair of proteomes. (Caveat: Currently only supports a
single pairs of genomes). This has proved exctremely useful in determining unique genes in comparisons between large Myoviridae.