文章详情

  • 游戏榜单
  • 软件榜单
关闭导航
热搜榜
热门下载
热门标签
php爱好者> php文档>A Python script to check Google rankings for a spe

A Python script to check Google rankings for a spe

时间:2009-04-06  来源:cobrawgl

Using Python’s pycurl (cURL) and re (Regular Expression) libraries, it’s possible to write a script that will check the Google ranking of a specific domain for a specific search term.

To check for and install Python 2.4 and the py-curl library on Mac OS X:

Follow these instructions to install MacPorts if it hasn’t been installed yet, then open a new Terminal window and enter the following command to see a listing of all installed ports:

sudo port installed

If ‘python24‘ and ‘py-curl‘ are not listed amongst the installed ports, install them by entering:

sudo port install python24 sudo port install py-curl

To check for and install Python 2.4 and the pycurl library on Ubuntu Linux:

Open a new Terminal window and enter the following command to install Python and the pycurl library (you’ll be notified if they’ve already been installed):

sudo apt-get install python2.4 sudo apt-get install python-pycurl

To run the rankcheck.py script:

Download Geekology’s version of this script here, or copy the code below to create your own rankcheck.py script file:

#!/usr/bin/python   """   This script accepts Domain, Search String and Google Locale arguments, then returns which Search String results page for the Google Locale the Domain appears on.     Usage example:   rankcheck {domain} {searchstring} {locale}     Output example:   rankcheck geekology.co.za 'bash scripting' .co.za - The domain 'geekology.co.za' is listed in position 2 (page 1) for the search 'bash+scripting' on google.co.za   """   __author__ = "Willem van Zyl ([email protected])" __version__ = "$Revision: 1.5 $" __date__ = "$Date: 2009/02/10 12:10:24 $" __license__ = "GPLv3"   import sys, pycurl, re   # check if all arguments were specified and whether help was requested: if len(sys.argv) < 4: if len(sys.argv) == 1: print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE"; print "`rankcheck --help' for more information." sys.exit() elif sys.argv[1] == '--help': print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE" print "Check the Search String page ranking of a Domain on a specific Google Locale" print "\nExample: rankcheck geekology.co.za 'bash scripting' .co.za" print "\nReport bugs to <[email protected]>." sys.exit() else: print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE"; print "`rankcheck --help' for more information." sys.exit()     # some initial setup: USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0)' FIND_DOMAIN = sys.argv[1] SEARCH_STRING = sys.argv[2].replace(' ', '+') LOCALE = sys.argv[3]   # check if the locale is valid: if sys.argv[3] == '.co.za': SEARCH_COUNTRY = '&meta=cr%3DcountryZA' elif sys.argv[3] == '.co.uk': SEARCH_COUNTRY = '&meta=cr%3DcountryUK' elif sys.argv[3] == '.com': SEARCH_COUNTRY = '' else: print "Only the '.com', '.co.uk' and '.co.za' locales are allowed." sys.exit()   ENGINE_URL = 'http://www.google' + LOCALE + '/search?q=' + SEARCH_STRING + SEARCH_COUNTRY     # define class to store result: class RankCheck: def __init__(self): self.contents = ''   def body_callback(self, buf): self.contents = self.contents + buf     # instantiate curl and result objects: rankRequest = pycurl.Curl() rankCheck = RankCheck();     # set up curl: rankRequest.setopt(pycurl.USERAGENT, USER_AGENT) rankRequest.setopt(pycurl.FOLLOWLOCATION, 1) rankRequest.setopt(pycurl.AUTOREFERER, 1) rankRequest.setopt(pycurl.WRITEFUNCTION, rankCheck.body_callback) rankRequest.setopt(pycurl.COOKIEFILE, '') rankRequest.setopt(pycurl.HTTPGET, 1) rankRequest.setopt(pycurl.REFERER, '')   # run curl: for i in range(0, 10): rankRequest.setopt(pycurl.URL, ENGINE_URL + '&start=' + str(i * 10)) rankRequest.perform()   # close curl: rankRequest.close()     # collect the search results html = rankCheck.contents counter = 0 result = 0   url=unicode(r'(<h3 class=r><a href=")((https?):((//))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)')   for google_result in re.finditer(url, html): # print m.group() this_url = google_result.group() this_url = this_url[21:] counter += 1   google_url_regex = re.compile("((https?):((//))+([\w\d:#@%/;$()~_?\+-=\\\.&])*" + FIND_DOMAIN + "+([\w\d:#@%/;$()~_?\+-=\\\.&])*)") google_url_regex_result = google_url_regex.match(this_url) if google_url_regex_result: result = counter break     # show results if result == 0: print " - The domain '" + FIND_DOMAIN + "' wasn't listed in the first 10 pages for the search '" + SEARCH_STRING + "' on google" + LOCALE else: print " - The domain '" + FIND_DOMAIN + "' is listed in position " + str(result) + " (page " + str((result / 10) + 1) + ") for the search '" + SEARCH_STRING + "' on google" + LOCALE

Open a new Terminal window and navigate to the folder containing the script, then execute it by entering:

python ./rankcheck.py {domain} '{search string}' {locale}

… filling in the Domain, Search String and Locale that you want to check.

Because the Python script file starts with ‘#!/usr/bin/python‘, you’ll be able to execute it from the command line without invoking the python executeable if you set execute permissions on the file:

sudo chmod 744 rankcheck.py   ./rankcheck.py {domain} '{search string}' {locale}
相关阅读 更多 +
排行榜 更多 +
辰域智控app

辰域智控app

系统工具 下载
网医联盟app

网医联盟app

运动健身 下载
汇丰汇选App

汇丰汇选App

金融理财 下载