urlgrabber
时间:2006-10-20 来源:linxh
urlgrabber
http://linux.duke.edu/projects/urlgrabber/
urlgrabber is a pure python package that drastically simplifies the fetching of files. It is designed to be used in programs that need common (but not necessarily simple) url-fetching features. It is extremely simple to drop into an existing program and provides a clean interface to protocol-independant file-access. Best of all, urlgrabber takes care of all those pesky file-fetching details, and lets you focus on whatever it is that your program is written to do!
urlgrabber came into existence as the part of yum that downloads rpms and header files, but it quickly became clear that this is a general problem that many applications must deal with.
Features
Using urlgrabber, data can be fetched in three basic ways:
urlgrab(url) | copy the file to the local filesystem | |
urlopen(url) | open the remote file and return a file object | |
urlread(url) | return the contents of the file as a string |
- identical behavior for http://, ftp://, and file:// urls
- http keepalive - faster downloads of many files by using only a single connection
- byte ranges - fetch only a portion of the file
- reget - for a urlgrab, resume a partial download
- progress meters - the ability to report download progress automatically, even when using urlopen!
- throttling - restrict bandwidth usage
- batched downloads using threads - download multiple files simultaneously (feature still in progress)
- retries - automatically retry a download if it fails. The number of retries and failure types are configurable
- authenticated server access for http and ftp
- proxy support - support for authenticated http and ftp proxies
- mirror groups - treat a list of mirrors as a single source, automatically switching mirrors if there is a failure
Not sure if urlgrabber is the tool for you? Check out our comparison of the major options.
Documentation, Examples, and Help
There are many sources of urlgrabber-related assistance and information
- The urlgrabber package documentation, built from the __doc__ strings using pydoc
- The examples page
- The urlgrabber package contents including the source code
- Browsable urlgrabber cvs (synced every few hours)
- The yum-devel mailing list. For now, urlgrabber is piggy-backing on this list. If it becomes necessary, we will get our own list. When posting to this list, please indicate that it is a urlgrabber-related post by beginning the subject with [UG].
Authors and Credits
urlgrabber is written and maintained by Michael Stenner and Ryan Tomayko. We would like to thank Seth Vidal for many valuable ideas and suggestions, and also for the very earliest version of the code that became urlgrabber. We would also like to thank Linux@DUKE and Duke University for the resources they have provided.
All urlgrabber-related mail (questions, comments, requests, bug reports, praise) should be directed to the yum-devel mailing list. Please indicate that it is a urlgrabber-related post by beginning the subject with [UG].
License and Copyright
urlgrabber is © 2002-2006 Michael D. Stenner and Ryan Tomayko.
This software is licensed under the GNU LGPL and comes without any warranty, written or implied. For more information about GNU LGPL please see http://www.gnu.org/licenses/lgpl.html.
Download
Release information:
urlgrabber follows kernel-style version numbering. As such, the 3.0.x series is the current "stable" branch, and 3.1.x is considered development.
We are no longer providing RPMs for urlgrabber for two reasons:
- It's too much pain to build them for all the relevant systems. We just don't have access to them.
- It's really easy to build an RPM from the tarball. Simply unpack the tarball, cd into it, and do
python setup.py bdist_rpm
Name : urlgrabber Relocations: (not relocatable) Version : 2.9.5 Vendor: Michael D. Stenner, Ryan Tomayko Release : 1 Build Date: Wed 02 Mar 2005 07:54:58 PM EST Install Date: (not installed) Build Host: bird.ece.arizona.edu Group : Development/Libraries Source RPM: (none) Size : 74288 License: LGPL Signature : (none) URL : http://linux.duke.edu/projects/urlgrabber/ Summary : A high-level cross-protocol url-grabber Description : A high-level cross-protocol url-grabber. Using urlgrabber, data can be fetched in three basic ways: urlgrab(url) copy the file to the local filesystem urlopen(url) open the remote file and return a file object (like urllib2.urlopen) urlread(url) return the contents of the file as a string When using these functions (or methods), urlgrabber supports the following features: * identical behavior for http://, ftp://, and file:// urls * http keepalive - faster downloads of many files by using only a single connection * byte ranges - fetch only a portion of the file * reget - for a urlgrab, resume a partial download * progress meters - the ability to report download progress automatically, even when using urlopen! * throttling - restrict bandwidth usage * retries - automatically retry a download if it fails. The number of retries and failure types are configurable. * authenticated server access for http and ftp * proxy support - support for authenticated http and ftp proxies * mirror groups - treat a list of mirrors as a single source, automatically switching mirrors if there is a failure.--> 附: 文档
|