PowerShell Remoting Project Home

Friday, January 13, 2006

Use Search-Entrez Cmdlet and Format Results

In my previous blog entry Author a Monad Cmdlet as Web Service Client, I created a cmdlet Search-Entrez. Let's see what we can do with this cmdlet.
get-command search-entrez | format-list
Name : search-entrez
CommandType : Cmdlet
Definition : search-entrez [-Database] String [-Keywords] String [[-MaxRecor
d] String] [[-Email] String] [[-Field] String] [[-RelativeDate]
String] [[-MinimumDate] String] [[-MaximumDate] String] [[-Dat
eType] String] [-Verbose] [-Debug] [-ErrorAction ActionPreferen
ce] [-ErrorVariable String] [-OutVariable String] [-OutBuffer I
nt32] [-WhatIf] [-Confirm]

Path :
AssemblyInfo :
DLL : D:\msh\Entrez-mshsnapin.dll
HelpFile : Entrez-mshsnapin.dll-Help.xml
ParameterSets : {__AllParameterSets}
Type : Entrez.SearchEntrezCmd
Verb : search
Noun : entrez

Database: the Entrez database name (can be pubmed, protein, nucleotide, nuccore, nucgss, nucest, structure, genome, books, cancerchromosomes, cdd, domains, gene, genomeprj, gensat, geo, gds, homologene, journals, mesh, ncbisearch, nlmcatalog, omia, omim, pmc, popset, probe, pcassay, pccompound, pcsubstance, snp, taxonomy, unigene, unists)

Keywords: the search strategy. All words should be URL encoded (that is to say blank should be +). Can be keywords plus "AND/OR/NOT"

Other parameter is not mandatory.
Using MaximumRecord would limit numbers of results returned to you.
Using Email parameter would help NCBI server inform you if something went wrong.

$resultSummary=search-entrez pubmed "cancer+T+cell" -MaxRecord 10 -Email test@test.com -verbose
VERBOSE: Searching Entrez database...
Database: pubmed
Keywords: cancer+T+cell
VERBOSE: Creating Entrez WebService...
VERBOSE: Submit search...
VERBOSE: Number of results item found:51521
VERBOSE: Getting results summary...
VERBOSE: Number of item retrieved:10
VERBOSE: Write results to pipline...

Because I was too lazy to write code for results parsing, I just emitted an eUtils.eSummaryResultType object to monad pipline, which make search-Entrez cmdlet return a non-human-readable output.
$resultSummary | format-list
ERROR :
DocSum : {16408214, 16407851, 16406172, 16404742, 16404738, 16404427, 16403911, 16403282, 16401550, 16399573}

Where ERROR is a string object for error and Docsum is a collection of DocSumType objects.
public class DocSumType {
private string idField;
private ItemType[] itemField;
public string Id {
get {return this.idField;}
set {this.idField = value;}}
public ItemType[] Item {
get {return this.itemField;}
set {this.itemField = value;}}
}
To make things even more complicated, the DocSumType.ItemType is a collection of NESTED ItemType object. As you can see ItemType.Items is also a collection of ItemType objects.
public class ItemType {
private ItemType[] itemsField;
private string[] textField;
private string nameField;
private ItemTypeType typeField;
public ItemType[] Items {
get {return this.itemsField;}
set {this.itemsField = value;}}
public string[] Text {
get {return this.textField;}
set {this.textField = value;}}
public string Name {
get {return this.nameField;}
set {this.nameField = value;}}
public ItemTypeType Type {
get {return this.typeField;}
set {this.typeField = value;}}
}
The good news is that they seem to only nest TWO levels deep for pubmed results. So I write a script to format results.
# Format-eSummary.msh
# Script to format (pubmed) results from Search-Entrez cmdlet
#requires -MshSnapIn EntrezSnapin
param
(
$Summary =$(throw "Please specify an eUtils.eSummaryResultType object to parise.")
)
if ($Summary.DocSum.Length -eq 0)
{
"No hits found!"
return 1
}
foreach ($sum in $Summary.DocSum)
{
"Primary IDs:" + $sum.Id
foreach ($Item in $sum.Item)
{
$Item.Name + ": " + $Item.Text
if ($Item.Items)
{
foreach($ChildItem in $Item.Items)
{
"`t" + $ChildItem.Name + ": "+ $ChildItem.Text
}
}
} "====================================================="
}
return 0

.\Format-eSummary.msh $resultSummary

Primary IDs:16408214
PubDate: 2006 Jan 12
EPubDate: 2006 Jan 12
Source: Cancer Immunol Immunother
AuthorList:
Author: Prell RA
Author: Gearin L
Author: Simmons A
Author: Vanroey M
Author: Jooss K
Title: The anti-tumor efficacy of a GM-CSF-secreting tumor cell vaccine is not
inhibited by docetaxel administration.
Volume:
Issue:
Pages: 1-9
LangList:
Lang: English
NlmUniqueID: 8605732
ISSN: 0340-7004
ESSN: 1432-0851
PubTypeList:
PubType: Journal Article
RecordStatus: PubMed - as supplied by publisher
PubStatus: aheadofprint
ArticleIds:
doi: 10.1007/s00262-005-0116-4
pubmed: 16408214
DOI: 10.1007/s00262-005-0116-4
History:
received: 2005/11/08 00:00
accepted: 2005/12/15 00:00
aheadofprint: 2006/01/12 00:00
pubmed: 2006/01/13 09:00
medline: 2006/01/13 09:00
References:
HasAbstract: 1
PmcRefCount: 0
FullJournalName: Cancer immunology, immunotherapy : CII.
SO: 2006 Jan 12;:1-9
=====================================================
Primary IDs:16407851
PubDate: 2006 Jan 9
EPubDate: 2006 Jan 9
Source: Oncogene
......

Tags:    


Comments:

Post a Comment





<< Home