文章详情

  • 游戏榜单
  • 软件榜单
关闭导航
热搜榜
热门下载
热门标签
php爱好者> php文档>如何提取介于某个区间的几行文字-CU帖子

如何提取介于某个区间的几行文字-CU帖子

时间:2009-03-19  来源:mouse.rice

如这样的文件:
CC   -!- FUNCTION: Rapidly .
CC   -!- CATALYTIC ACTIVITY: Acetylcholine.
CC   -!- SUBUNIT: Homotetramer; composed .
CC       Interacts with PRIMA1.
CC       anchor it to the basal
CC       (By similarity).
CC   -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC       similarity). Cell membrane; Peripheral membrane protein (By
CC       similarity).
CC   -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC       anchor; Extracellular side (By similarity).
CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing; Named isoforms=2;
我要提取其中以SUBCELLULAR LOCATION开头的那一小段文件,如下:
SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
  similarity). Cell membrane; Peripheral membrane protein (By
  similarity).
SUBCELLULAR LOCATION: Isoform 2: Cell membrane; Lipid-anchor, GPI-
   anchor; Extracellular side (By similarity).
  NO1. 下面给出这一类问题的通用解决办法。

这是面向行处理的一种轻量级解决方法。
比那些对整个文件进行模式匹配的方法不知优雅了要多少倍。

$start 表示开始标记的模式,$end 表示结束标记的模式,
if ( (/$start/ .. /$end/) and !/$end/ ){
表示需要开始和结束之间的,但不需要结束的那一行。

#! /usr/bin/env perl


my $start = qr/^CC\s+-!- SUBCELLULAR LOCATION/;
my $end = qr/^CC\s+-!- (?!SUBCELLULAR LOCATION)/;

while(<DATA>){
    if ( (/$start/ .. /$end/) and !/$end/ ){
        print "*** $_";
    }
    else{
        print "--- $_";
    }
}
__END__
CC -!- FUNCTION: Rapidly .
CC -!- CATALYTIC ACTIVITY: Acetylcholine.
CC -!- SUBUNIT: Homotetramer; composed .
CC Interacts with PRIMA1.
CC anchor it to the basal
CC (By similarity).
CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC similarity). Cell membrane; Peripheral membrane protein (By
CC similarity).
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC anchor; Extracellular side (By similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=2;

运行结果:

flw@debian:~$ ./ttt.pl
--- CC -!- FUNCTION: Rapidly .
--- CC -!- CATALYTIC ACTIVITY: Acetylcholine.
--- CC -!- SUBUNIT: Homotetramer; composed .
--- CC Interacts with PRIMA1.
--- CC anchor it to the basal
--- CC (By similarity).
*** CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
*** CC similarity). Cell membrane; Peripheral membrane protein (By
*** CC similarity).
*** CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
*** CC anchor; Extracellular side (By similarity).
--- CC -!- ALTERNATIVE PRODUCTS:
--- CC Event=Alternative splicing; Named isoforms=2;
flw@debian:~$

No2.

 

#!user/bin/perl


use strict;
use warnings;

my @data = <DATA>;
$_ = join '', @data;

my @t = /(SUBCELLULAR.*?)CC\s+-!-/msg;

print map {s/CC\s+//g; $_} @t;

__DATA__
CC -!- FUNCTION: Rapidly .
CC -!- CATALYTIC ACTIVITY: Acetylcholine.
CC -!- SUBUNIT: Homotetramer; composed .
CC Interacts with PRIMA1.
CC anchor it to the basal
CC (By similarity).
CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC similarity). Cell membrane; Peripheral membrane protein (By
CC similarity).
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC anchor; Extracellular side (By similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=2;

No3.

 

#! /bin/perl


use warnings;
use strict;

my $key;

while(<DATA>){
    if (/-!-/) {
        $key = 0;
    }
    if (/SUBCELLULAR LOCATION/) {
        print;
        $key = 1;
        next;
    }
    if ($key) {
        print;
    }
}

__END__
CC -!- FUNCTION: Rapidly .
CC -!- CATALYTIC ACTIVITY: Acetylcholine.
CC -!- SUBUNIT: Homotetramer; composed .
CC Interacts with PRIMA1.
CC anchor it to the basal
CC (By similarity).
CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC similarity). Cell membrane; Peripheral membrane protein (By
CC similarity).
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC anchor; Extracellular side (By similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=2;

 

相关阅读 更多 +
排行榜 更多 +
别惹神枪手安卓版

别惹神枪手安卓版

冒险解谜 下载
坦克战争世界

坦克战争世界

模拟经营 下载
丛林反击战

丛林反击战

飞行射击 下载