PowerShell Remoting Project Home

Tuesday, May 30, 2006

Download Gene Sequences Using NCBI eFetch Tools

Recently, I was working on a bioinformatics research project which needed to download hundreds of gene mRNA sequences. I have all the gene IDs in one text file. So a simple PowerShell Script could solve my problem.

I have a old post talking about NCBI Entrez eUtils tools. Today, I will use the eFetch tool (also included in eUtils). The script is simple and self-explaining.
# ===========================================================================
#
# Author:      Tony (http://MSHForFun.blogspot.com)
# File:        Efetch.ps1
# Description: Download gene sequences using NCBI eUtils.eFetch tool
# Reference: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_example.pl
# Reference: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetch_help.html
# Reference: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html

# ===========================================================================
param
(
  [string] $Path=$(throw "Please Specify a file")
)
$BaseURL = "http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id="
$Option= "&rettype=fasta&retmode=text"
$WebClient = new-object System.Net.WebClient
$SavePath = $Path + ".result"
if (test-path $savePath)
{
  del $SavePath
}
foreach ( $id in (get-content $path))
{
  # Construct eFetch URL
  $URL=$BaseURL + $id + $Option
  Write-Progress -Activity "Download Sequences" -Status "Submit gene $Id"
  # Submit and download data
  $Data = $WebClient.DownloadString($URL)
  # Parse Data
  if ($Data.Length -gt 1)
  {
    Write-Progress -Activity "Download Sequences" -Status "$id OK"
    # Write to Console
    $data
    # Wrtie To file
    $data >> $SavePath
  }
  else
  {
    Write-Progress -Activity "Download Sequences" -Status "$Id is not found!"
    "$Id is not found!`n`r"
    "$Id is not found!`n`r" >> $SavePath
  }
  # Try not to overload NCBI Server
  start-sleep 1
}
# Clear Progress pane
Write-Progress -Activity "Download Sequences" -Status "Done" -completed
You need a text file (genes.txt) to test this script:
0
NM_008176
NM_009140
NM_009141
NM_011333
NM_013654
NM_016960
NM_009142
NM_008491
NM_031168
NM_009883
NM_007679
NM_010030
NM_009971
NM_010809
NM_008607
NM_030612
NM_011198
NM_007987
If you are a biologist, you can see what kind of genes I am intersted in. The first "0" is just to cause an "Not Found" Error. You can run this script like following:
.\efetch.ps1 genes.txt
Your results is printed to screen as well as "genes.txt.result" file.

Have Fun

Tags:       


Comments:
Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.
Oes Tsetnoc | Semangat Mengembalikan Jati Diri Bangsa
 

Post a Comment



Links to this post:

Create a Link



<< Home