文章详情

  • 游戏榜单
  • 软件榜单
关闭导航
热搜榜
热门下载
热门标签
php爱好者> php文档>remote BLAST 的使用!

remote BLAST 的使用!

时间:2010-06-24  来源:aids260

Accesing remote BLAST
Written by Administrator   
Problem: given a set of sequnces, run remote BLAST at NCBI and get neighborhood near the hit. Here\'s implementation in Perl + BioPerl.
1. read sequences from text file (each sequnce on separate line). This can be easily adapted to reading in FASTA-format.

2. Run remote blastn from NCBI server. Specify the following parameters:
    - minimum  e-value
    -  nr database, search in only viruses (txid10239 [ORGN]). Again, this needs to be modified for your needs.

3. BLAST result is saved as  XML. For each hit also download the sequnce, containing the hit and cut neighborhood of given length ($DELTA).

4. Output to text file for each initial sequence. Each file can contain several hits.

Code was adapted from BioPerl demo and is not optimized. For example, sequence containing a hit (usually full genome) should be cached instead of downloading it separtely each time.

All parameters are specified in the code. Additional Perl module misc.pm is used only for trimming spaces from a string, trim(). This is for additional safety and can be removed.

Please, feel free to post comments/questions.

       
 
#
# BLAST nucleotide sequences against Genbank
#
use List::Util qw[min max];
use Bio::SearchIO;
use Bio::Tools::Run::RemoteBlast;
use Bio::DB::GenBank;
use misc;
 
  my $infile=\"seqs.txt\";   # file with sequences
  my $prog = \'blastn\';
  my $blastdb = \'nr\';
  my $e_val= \'1\';  
  $DELTA=100; #how many nucleotides before and after hit to preserve
 
  $db = new Bio::DB::GenBank; #connect to NCBI
  my @params = ( \'-prog\' => $prog,
                 \'-data\' => $blastdb,
                 \'-expect\' => $e_val,
                 \'-readmethod\' => \'xml\' );
  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
 
#SEE http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html for more params
  $factory->retrieve_parameter(\'FORMAT_TYPE\', \'XML\'); # tells NCBI to send XML back 
 
#change a query paramter
  $Bio::Tools::Run::RemoteBlast::HEADER{\'ENTREZ_QUERY\'} = \'txid10239 [ORGN]\';  # all viruses
  #$Bio::Tools::Run::RemoteBlast::HEADER{\'ENTREZ_QUERY\'} = \'txid629 [ORGN]\'; # Yersinia
  
  $Bio::Tools::Run::RemoteBlast::HEADER{\'FILTER\'} = \'L\';
  $Bio::Tools::Run::RemoteBlast::HEADER{\'WORD_SIZE\'} = \'7\';
  $Bio::Tools::Run::RemoteBlast::HEADER{\'FORMAT_TYPE\'} = \'XML\';
#see also OTHER_ADVANCED for specifying word size etc
 
  #howto: change a retrieval parameter
  #$Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{\'DESCRIPTIONS\'} = 1000;
  #remove a parameter
  #delete $Bio::Tools::Run::RemoteBlast::HEADER{\'FILTER\'};
 
  open(INFILE,$infile) or die \"error opening file\";
  while () {
    my $str=$_; chomp($str);
    my $pos=index $str, \"#\";
    if ($pos==0) { next; }  #skipping commentaries
    
    $seq=trim($str);
 
    $outfile=$seq.\".txt\";
    # to allow re-run
    if (-e $outfile) { print \"file $outfile already exists - skipping\\n\"; next; }
      
    open(OUTFILE,\">$outfile\"); 
    $old_fh=select(OUTFILE);$|=1;select($old_fh);  #disable buffering
 
    my $seqobj = Bio::Seq->new(-seq =>$seq);
    my $r = $factory->submit_blast($seqobj);
      
    while ( my @rids = $factory->each_rid ) {   #RID - request ID   
  #    my @rids = $factory->each_rid ;    #RID - request ID
      $rid= $rids[-1];  #taking last one
     
      my $rc = $factory->retrieve_blast($rid);
      if( !ref($rc) ) { if($rc<0) { $factory->remove_rid($rid); } print \".\"; sleep 1;}
      else {
        my $result = $rc->next_result();
        my $filename = $seq.\".blres.xml\";
        $factory->save_output($filename); # save balst result to xml  
       
        # parsing xml file
        my $searchio = new Bio::SearchIO( -format => \'blastxml\', -file   => $filename );
        while ( my $res = $searchio->next_result() ) {
          while( my $hit = $res->next_hit ) { # Bio::Search::Hit::HitI object
            my $query_id=$hit->accession;
            my $hitdescr=$hit->description;
            while( my $hsp = $hit->next_hsp ) { # Bio::Search::HSP::HSPI object
              $start=$hsp->{HIT_START};
              $end=$hsp->{HIT_END};
              $evalue=$hsp->{EVALUE};
 
#NB: do this before retrieving the sequence!              
              if ($start>$end) { print \"Error: start $start > end $end  Swapping them\\n\";
                my $tmp=$start;
                $start=$end; $end=$tmp;
              }
              
              $delta=$DELTA; #NB reassign it here each time
            
              print \"retrieving $query_id...\";
              $seq = $db->get_Seq_by_acc($query_id);
              print \"done\\n\";
              
              if ((($start-$delta)<=0)||(($end+$delta)>length($seq->seq)))
              { $delta = min($start-1, length($seq->seq)-$end, 100); }  #change delta
              
             # for debug: 
             # print \"debug start= $start end = $end delta=$delta seqlength=\".length($seq->seq).\" \";
             # printing output
              print OUTFILE \"$query_id\\t$hitdescr\\n\";
              print OUTFILE \"$start\\t$end\\t$hsp->{IDENTICAL}\\t$evalue\\n\";
       
              $subseq = $seq->primary_seq->subseq($start-$delta,$end+$delta); 
              print OUTFILE \"$subseq\\n\";
              
              for(my $i=0;$i<$delta;$i++) { print OUTFILE \" \";};
              print OUTFILE \"$hsp->{QUERY_SEQ}\\n\";
              for(my $i=0;$i<$delta;$i++) { print OUTFILE \" \";};
              print OUTFILE \"$hsp->{HOMOLOGY_SEQ}\\n\";
              for(my $i=0;$i<$delta;$i++) { print OUTFILE \" \";};
              print OUTFILE \"$hsp->{HIT_SEQ}\\n\";
            }
            print OUTFILE \"\\n\"; #separator between records
          } # for each hsp
        } #$result = $searchio->next_result()
        last; 
      } 
    }# while ( my @rids = $factory->each_rid ) 
    print \"\\nfinished for sequence $seq\\n\";
    close(OUTFILE);
  }#reading $infile
  close(INFILE);
  print \"#work is finished\";
相关阅读 更多 +
排行榜 更多 +
太空飞船终极攻击

太空飞船终极攻击

飞行射击 下载
化作星辰

化作星辰

飞行射击 下载
枪战火柴人中文版

枪战火柴人中文版

飞行射击 下载