|
Tuesday, May 30, 2006
Download Gene Sequences Using NCBI eFetch Tools
I have a old post talking about NCBI Entrez eUtils tools. Today, I will use the eFetch tool (also included in eUtils). The script is simple and self-explaining.
# ===========================================================================You need a text file (genes.txt) to test this script:
#
# Author: Tony (http://MSHForFun.blogspot.com)
# File: Efetch.ps1
# Description: Download gene sequences using NCBI eUtils.eFetch tool
# Reference: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl
# Reference: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html
# Reference: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html
#
# ===========================================================================
param
(
[string] $Path=$(throw "Please Specify a file")
)
$BaseURL = "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id="
$Option= "&rettype=fasta&retmode=text"
$WebClient = new-object System.Net.WebClient
$SavePath = $Path + ".result"
if (test-path $savePath)
{
del $SavePath
}
foreach ( $id in (get-content $path))
{
# Construct eFetch URL
$URL=$BaseURL + $id + $Option
Write-Progress -Activity "Download Sequences" -Status "Submit gene $Id"
# Submit and download data
$Data = $WebClient.DownloadString($URL)
# Parse Data
if ($Data.Length -gt 1)
{
Write-Progress -Activity "Download Sequences" -Status "$id OK"
# Write to Console
$data
# Wrtie To file
$data >> $SavePath
}
else
{
Write-Progress -Activity "Download Sequences" -Status "$Id is not found!"
"$Id is not found!`n`r"
"$Id is not found!`n`r" >> $SavePath
}
# Try not to overload NCBI Server
start-sleep 1
}
# Clear Progress pane
Write-Progress -Activity "Download Sequences" -Status "Done" -completed
0If you are a biologist, you can see what kind of genes I am intersted in. The first "0" is just to cause an "Not Found" Error. You can run this script like following:
NM_008176
NM_009140
NM_009141
NM_011333
NM_013654
NM_016960
NM_009142
NM_008491
NM_031168
NM_009883
NM_007679
NM_010030
NM_009971
NM_010809
NM_008607
NM_030612
NM_011198
NM_007987
.\efetch.ps1 genes.txtYour results is printed to screen as well as "genes.txt.result" file.
Have Fun
Tags: msh monad PowerShell
Post a Comment