文章详情

  • 游戏榜单
  • 软件榜单
关闭导航
热搜榜
热门下载
热门标签
php爱好者> php文档>第一个python程序

第一个python程序

时间:2010-09-20  来源:niuniu2006t

昨天朋友让我帮忙从搜房网的搜索结果里面,把所有的住房和价格提取出来,我说试一下。正好用python练练。逻辑很简单,但是由于不熟,还是敲了好久,还好,最后搞定了,不过有几个小bug~但是应该能用了。。。 BeautifulSoup用起来很方便  哈哈

import sys
from BeautifulSoup import BeautifulSoup
import re
import urllib2
import csv


def get_name_price(url):
    print(url)
    response = urllib2.urlopen(url)
    print("get response")
    html=response.read()
    soup = BeautifulSoup(html,fromEncoding="gbk")
    print("soup complete")
    name=soup.findAll("div",{"class":"name"})
    print("get name")
    price=soup.findAll("span",{"class":"price_type"})
    print("get price")
    next=soup.findAll("div",{"class":"searchListPage"})
    l=len(next[0].contents)
    
    b=u'\u5c3e\u9875'
    lasturl=""
    for i in range(1,l,2):
        if next[0].contents[i]['class']==u's4':
            if next[0].contents[i].contents[0].string==b:
                lasturl=next[0].contents[i].contents[0]["href"]
                break;
            
    if(lasturl==""):
        lasturl=url
    return (name,price,lasturl)

if len(sys.argv) < 2:
    url="http://newhouse.wf.soufun.com/house/%CE%AB%B7%BB______%D7%A1%D5%AC___________1.htm"
else:
    url=sys.argv[1]


l=len(url)
ptn=url[0:l-5]
(name,price,lasturl)=get_name_price(url)
if url!=lasturl:
        lasturl="http://newhouse.wf.soufun.com"+lasturl
allname=name
allprice=price
lptn=len(ptn)
cnt=int(lasturl[lptn:len(lasturl)-4])

f=open("house_price.xls","w")

alllen=len(name)
for i in range(0,alllen):
    print name[i].contents[1].string,price[i].string,name[i].contents[1]['href']
    f.write(name[i].contents[1].string.encode("gbk"))
    f.write(" ")
    f.write(price[i].string.encode("gbk"))
    f.write(" ")
    f.write(name[i].contents[1]['href']+"\r\n")
    f.flush()
    print '-------------------------------'

for i in range(2,cnt+1):
    url=ptn+str(i)+".htm"
    (name,price,tem)=get_name_price(url)
    allname+=name;
    allprice+=price
    alllen=len(name)
    for i in range(0,alllen):
        print name[i].contents[1].string,price[i].string,name[i].contents[1]['href']
        f.write(name[i].contents[1].string.encode("gbk"))
        f.write(" ")
        f.write(price[i].string.encode("gbk"))
        f.write(" ")
        f.write(name[i].contents[1]['href']+"\r\n")
        f.flush();
        print '-------------------------------'


相关阅读 更多 +
排行榜 更多 +
辰域智控app

辰域智控app

系统工具 下载
网医联盟app

网医联盟app

运动健身 下载
汇丰汇选App

汇丰汇选App

金融理财 下载