A Python script to check Google rankings for a spe
时间:2009-04-06 来源:cobrawgl
Using Python’s pycurl (cURL) and re (Regular Expression) libraries, it’s possible to write a script that will check the Google ranking of a specific domain for a specific search term.
To check for and install Python 2.4 and the py-curl library on Mac OS X:
Follow these instructions to install MacPorts if it hasn’t been installed yet, then open a new Terminal window and enter the following command to see a listing of all installed ports:
sudo port installed
If ‘python24‘ and ‘py-curl‘ are not listed amongst the installed ports, install them by entering:
sudo port install python24 sudo port install py-curl
To check for and install Python 2.4 and the pycurl library on Ubuntu Linux:
Open a new Terminal window and enter the following command to install Python and the pycurl library (you’ll be notified if they’ve already been installed):
sudo apt-get install python2.4 sudo apt-get install python-pycurl
To run the rankcheck.py script:
Download Geekology’s version of this script here, or copy the code below to create your own rankcheck.py script file:
#!/usr/bin/python """ This script accepts Domain, Search String and Google Locale arguments, then returns which Search String results page for the Google Locale the Domain appears on. Usage example: rankcheck {domain} {searchstring} {locale} Output example: rankcheck geekology.co.za 'bash scripting' .co.za - The domain 'geekology.co.za' is listed in position 2 (page 1) for the search 'bash+scripting' on google.co.za """ __author__ = "Willem van Zyl ([email protected])" __version__ = "$Revision: 1.5 $" __date__ = "$Date: 2009/02/10 12:10:24 $" __license__ = "GPLv3" import sys, pycurl, re # check if all arguments were specified and whether help was requested: if len(sys.argv) < 4: if len(sys.argv) == 1: print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE"; print "`rankcheck --help' for more information." sys.exit() elif sys.argv[1] == '--help': print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE" print "Check the Search String page ranking of a Domain on a specific Google Locale" print "\nExample: rankcheck geekology.co.za 'bash scripting' .co.za" print "\nReport bugs to <[email protected]>." sys.exit() else: print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE"; print "`rankcheck --help' for more information." sys.exit() # some initial setup: USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0)' FIND_DOMAIN = sys.argv[1] SEARCH_STRING = sys.argv[2].replace(' ', '+') LOCALE = sys.argv[3] # check if the locale is valid: if sys.argv[3] == '.co.za': SEARCH_COUNTRY = '&meta=cr%3DcountryZA' elif sys.argv[3] == '.co.uk': SEARCH_COUNTRY = '&meta=cr%3DcountryUK' elif sys.argv[3] == '.com': SEARCH_COUNTRY = '' else: print "Only the '.com', '.co.uk' and '.co.za' locales are allowed." sys.exit() ENGINE_URL = 'http://www.google' + LOCALE + '/search?q=' + SEARCH_STRING + SEARCH_COUNTRY # define class to store result: class RankCheck: def __init__(self): self.contents = '' def body_callback(self, buf): self.contents = self.contents + buf # instantiate curl and result objects: rankRequest = pycurl.Curl() rankCheck = RankCheck(); # set up curl: rankRequest.setopt(pycurl.USERAGENT, USER_AGENT) rankRequest.setopt(pycurl.FOLLOWLOCATION, 1) rankRequest.setopt(pycurl.AUTOREFERER, 1) rankRequest.setopt(pycurl.WRITEFUNCTION, rankCheck.body_callback) rankRequest.setopt(pycurl.COOKIEFILE, '') rankRequest.setopt(pycurl.HTTPGET, 1) rankRequest.setopt(pycurl.REFERER, '') # run curl: for i in range(0, 10): rankRequest.setopt(pycurl.URL, ENGINE_URL + '&start=' + str(i * 10)) rankRequest.perform() # close curl: rankRequest.close() # collect the search results html = rankCheck.contents counter = 0 result = 0 url=unicode(r'(<h3 class=r><a href=")((https?):((//))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)') for google_result in re.finditer(url, html): # print m.group() this_url = google_result.group() this_url = this_url[21:] counter += 1 google_url_regex = re.compile("((https?):((//))+([\w\d:#@%/;$()~_?\+-=\\\.&])*" + FIND_DOMAIN + "+([\w\d:#@%/;$()~_?\+-=\\\.&])*)") google_url_regex_result = google_url_regex.match(this_url) if google_url_regex_result: result = counter break # show results if result == 0: print " - The domain '" + FIND_DOMAIN + "' wasn't listed in the first 10 pages for the search '" + SEARCH_STRING + "' on google" + LOCALE else: print " - The domain '" + FIND_DOMAIN + "' is listed in position " + str(result) + " (page " + str((result / 10) + 1) + ") for the search '" + SEARCH_STRING + "' on google" + LOCALE
Open a new Terminal window and navigate to the folder containing the script, then execute it by entering:
python ./rankcheck.py {domain} '{search string}' {locale}
… filling in the Domain, Search String and Locale that you want to check.
Because the Python script file starts with ‘#!/usr/bin/python‘, you’ll be able to execute it from the command line without invoking the python executeable if you set execute permissions on the file:
sudo chmod 744 rankcheck.py ./rankcheck.py {domain} '{search string}' {locale}