PowerShell Remoting Project Home

Wednesday, December 28, 2005

NCBI Blastn under MSH command line

From NCBI web site:
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

Blastn is one of my most frequently used tools from NCBI. In stead of going to their web site, I can now run blastn under command line. It is convenient and even faster than web interfaces. Although I use "nr" database here, you can change the URL string to use other database or other blast program.

As for scripting, nothing magic here. Just rewrite original perl script using .NET class: System.Net.WebClient . For Regular Expression, we can use "-match" expression.
Thanks for Lee Holmes Blog

# Begin of script
# =============================================================
# This code is for test purposes only. Use it at your own risk.
# Please do not submit or retrieve more than one request every
# two seconds. Results will be kept at NCBI for 24 hours. For
# best batch performance,they recommend that you submit requests
# after 2000 EST (0100 GMT) and retrieve results before 0500 EST
# (1000 GMT).
# reference: http://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html
# reference: http://www.ncbi.nlm.nih.gov/blast/docs/web_blast.pl
# =============================================================
# return codes:
# 0 – success
# 1 - invalid arguments
# 2 - no hits found
# 3 - rid expired
# 4 - search failed
# 5 - unknown error
# =========================================================
param([string] $query)

if (-not $query)
"Please specify query seqence!"
return 1

#Submit query sequence
"Query sequence:"
"Submit query sequence..."
$uri="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&PROGRAM=blastn&DATABASE=nr&QUERY=" + $query
$BlastClient = new-object System.Net.WebClient
$pagecontent = $BlastClient.DownloadString($uri);
# Get RID
$pagecontent -match " RID = (.*)"
"RID=" + $RID
" "
$pagecontent -match " RTOE = (.*)"
$TimeToComplete= $Matches[1]
"Time to complete search: " + $TimeToComplete + " seconds or sooner. Waiting..."
" "
Start-sleep $TimeToComplete

#Waiting for results
While ($true)
Start-sleep 5
$uri= "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&FORMAT_OBJECT=SearchInfo&RID=" + $RID

$pagecontent = $BlastClient.DownloadString($uri);

if ($pagecontent -match "Status=WAITING")
"Results not ready, waiting..."

if ($pagecontent -match "Status=FAILED")
"Search failed, exiting..."
return 2

if ($pagecontent -match "Status=UNKNOWN")
"Search expired, exiting..."
return 3

if ($pagecontent -match "Status=READY")
if ($pagecontent -match "ThereAreHits=yes")
"Search complete, retrieving results...";
"No hits found.\n";
return 4
return 5

# Get results
$uri= "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&RID=" + $RID + "&ALIGNMENTS=500&ALIGNMENT_VIEW=QueryAnchored&FORMATOBJECT=Alignment&FORMAT_TYPE=TEXT"
$pagecontent = $BlastClient.DownloadString($uri);
$pagecontent >.\blast.results
Get-content .\blast.results | more
return 0
#End of script

So if we do this under MSH prompt:

the output would like:

Submit query sequence...
Time to complete search: 12 seconds or sooner. Waiting...
Results not ready, waiting...
Search complete, retrieving results...

BLASTN 2.2.12 [Aug-07-2005]Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schffer,Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman
(1997), "Gapped BLAST and PSI-BLAST: a new generation ofprotein database search
programs", Nucleic Acids Res. 25:3389-3402.

RID: 1135712055-18130-135523741431.BLASTQ1

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,GSS,environmental
samples or phase 0, 1 or 2 HTGS sequences) 3,659,359 sequences; 16,326,075,880
total letters

Query= (23 letters)

Score ESequences producing significant alignments: (Bits) Valued
AK202294.1 Mus musculus cDNA, clone:Y1G0140P22, strand:m... 40.1 0.067gb
AF467007.1AF467007S1 Homo sapiens histidine N-methyltrans... 38.2 0.27
dbjAK192209.1 Mus musculus cDNA, clone:Y1G0108D05, strand:m... 38.2 0.27
embAJ333598.1HSA333598 Homo sapiens genomic sequence surrou... 38.2 0.27
embAJ343149.1HSA343149 Homo sapiens genomic sequence surrou... 38.2 0.27
embAJ343845.1HSA343845 Homo sapiens genomic sequence surrou... 38.2 0.27
gbAY811874.1 Schistosoma japonicum SJCHGC01957 protein mRNA, p 36.2 1.0
dbjAB232923.1 Oryzias latipes hox gene cluster, complete cds, 36.2 1.0
dbjBS000120.2 Pan troglodytes chromosome 22 clone:RP43-007D... 36.2 1.0

>gbAY811874.1 Schistosoma japonicum SJCHGC01957 protein mRNA, partial cds
Score = 36.2 bits (18),
Expect = 1.0
Identities = 18/18 (100%),
Gaps = 0/18 (0%)

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmentalsamples or phase 0, 1 or 2 HTGS sequences)
Posted date: Dec 26, 2005 4:16 AM
Number of letters in database: -853,793,300
Number of sequences in database: 3,659,359
Lambda K H 1.37 0.711 1.31
GappedLambda K H 1.37 0.711 1.31
Matrix: blastn matrix:1 -3
Gap Penalties:
Existence: 5,
Extension: 2
Number of Sequences: 3659359
Number of Hits to DB: 2403072
Number of extensions: 36138
Number of successful extensions: 36138
Number of sequences better than 10: 21
Number of HSP's better than 10 without gapping: 21
Number of HSP's gapped: 36138
Number of HSP's successfully gapped: 21
Number of extra gapped extensions for HSPs above 10: 36070
Length of query: 23
Length of database: 16326075880
Length adjustment: 18
Effective length of query: 5
Effective length of database: 16260207418
Effective search space: 81301037090
Effective search space used: 81301037090
A: 0X1: 11 (21.8 bits)
X2: 15 (29.7 bits)
X3: 25 (49.6 bits)
S1: 11 (22.3 bits)
S2: 17 (34.2 bits)
Have fun!

[Edit: Monad has now been renamed to Windows PowerShell. This script or discussion may require slight adjustments before it applies directly to newer builds.]


In English class, one girl Cheap WoW Gold said never buy WoW Gold give up when discussing. It just reminded me wow goldsomething about myself.WoW Gold Maybe it is wow power leveling something about love.WoW Gold My love, which began on July 3rd of WoW Gold 2005 and finished on August wow leveling 23rd of 2008, taught me many things. wow leveling It is not a pleasant world of warcraft power leveling thing to look back on world of warcraft power leveling that. But I know I world of warcraft power leveling must learn something from it,world of warcraft leveling no matter what it is, world of warcraft leveling happiness or sorrow.Our love did not go on so smoothly and we went through many things.
your post is helpful and informative
Really very good blog for wow gold its really excellent. The content of your blog Is really superb.
Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.
Oes Tsetnoc | Semangat Mengembalikan Jati Diri Bangsa

Post a Comment

Links to this post:

Create a Link

<< Home