如何提取介于某个区间的几行文字-CU帖子
时间:2009-03-19 来源:mouse.rice
CC -!- FUNCTION: Rapidly .
CC -!- CATALYTIC ACTIVITY: Acetylcholine.
CC -!- SUBUNIT: Homotetramer; composed .
CC Interacts with PRIMA1.
CC anchor it to the basal
CC (By similarity).
CC -!- SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
CC similarity). Cell membrane; Peripheral membrane protein (By
CC similarity).
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cell membrane;
CC anchor; Extracellular side (By similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=2;
我要提取其中以SUBCELLULAR LOCATION开头的那一小段文件,如下:
SUBCELLULAR LOCATION: Cell junction, synapse. Secreted (By
similarity). Cell membrane; Peripheral membrane protein (By
similarity).
SUBCELLULAR LOCATION: Isoform 2: Cell membrane; Lipid-anchor, GPI-
anchor; Extracellular side (By similarity).
NO1. 下面给出这一类问题的通用解决办法。
这是面向行处理的一种轻量级解决方法。
比那些对整个文件进行模式匹配的方法不知优雅了要多少倍。
$start 表示开始标记的模式,$end 表示结束标记的模式,
if ( (/$start/ .. /$end/) and !/$end/ ){
表示需要开始和结束之间的,但不需要结束的那一行。
#! /usr/bin/env perl |
运行结果:
flw@debian:~$ ./ttt.pl |
No2.
#!user/bin/perl |
No3.
#! /bin/perl |