文章详情

  • 游戏榜单
  • 软件榜单
关闭导航
热搜榜
热门下载
热门标签
php爱好者> php文档>perl学习:获取sina 读书小说

perl学习:获取sina 读书小说

时间:2009-07-30  来源:n7611

[学习perl 练手用,请勿用于商业用途,尊重别人的劳动成果,访问一下提供资料的网站,最好看一下它的广告]
这个perl 脚本用于获取sina 读书的小说,合成一个大的文本文件,老婆喜欢在PDA 上看小说,正好拿来练手

#!/usr/bin/perl

use strict;
use warnings;
use LWP::Simple;
use HTML::Tree;
use URI;

#download html files

my $caturl = 'http://vip.book.sina.com.cn/book/index_100419.html';
my $remotefile;
my $localfile;
my $remotedir ='';
my $locdir = 'index_100419';

$caturl = shift@ARGV;

my $uri = URI->new($caturl);
my @path=$uri->path_segments;
# There will always be an empty first component.

shift(@path);

$locdir = pop(@path);
$locdir =~ s/\.html//g ;
foreach my $dir (@path)
{
  $remotedir .= $dir.'/' ;
}
mkdir($locdir);
my $html = get($caturl);

my $file ;
my $tree = HTML::TreeBuilder->new;
$tree->parse_content($html); # !

foreach my $paras ( $tree->look_down('_tag', 'a',sub{$_[0]->attr('href') =~ /^chapter_/}))
{
    if($paras)
    {
    $file = $paras->attr('href'); # !

    chomp $file;
    $localfile = $locdir.'/'.$file;
    $remotefile = $remotedir.$file;
    $uri->path($remotefile);
    getstore($uri->as_string, $localfile);
    }
}
$tree->delete; # clear memory!


# now merge files

my $filename;
open DIRFILE ,"-|", "ls $locdir |grep html\$|sort -t\. -n" ;
while(<DIRFILE>)
{
    chomp ;
    $filename = $_;
    print $filename,"...\r\n";
    &process_file($filename);
}
close DIRFILE ;

sub process_file
{
my $infile;
    $infile = shift;
    $infile = "$locdir\/$infile" ;
    my $tree = HTML::TreeBuilder->new;
    $tree->parse_file($infile); # !

    my $text;
    foreach my $divs ( $tree->look_down(_tag => 'div', 'id' => 'contTxt'))
    {
        if($divs)
        {
            foreach my $paras ( $divs->look_down('_tag', 'p'))
            {
                if($paras)
                {
                $text = $paras->as_text; # !

                chomp $text;
                print $text ,"\r\n" ;
                }
            }
        }
    }
    $tree->delete; # clear memory!

}

相关阅读 更多 +
排行榜 更多 +
mirrox模组(玩家自制)手机版下载

mirrox模组(玩家自制)手机版下载

休闲益智 下载
集装箱模拟器手机版下载安装

集装箱模拟器手机版下载安装

模拟经营 下载
哔咔漫画app下载免费2025

哔咔漫画app下载免费2025

浏览阅读 下载