Zero-copy user-space access
时间:2006-05-13 来源:rwen2012
Zero-copy user-space access
[Posted April 14, 2003 by corbet]This article is part of the LWN Porting Drivers to 2.6 series. |
This article looks at how to port drivers which used the kiobuf interface in 2.4. We'll proceed on the assumption that the real feature of interest was direct access to user space; there wasn't much motivation to use a kiobuf otherwise.
Zero-copy block I/O
The 2.6 kernel has a well-developed direct I/O capability for block devices. So, in general, it will not be necessary for block driver writers to do anything to implement direct I/O themselves. It all "just works."Should you have a need to perform zero-copy block operations, it's worth noting the presence of a useful helper function:
struct bio *bio_map_user(struct block_device *bdev,
unsigned long uaddr,
unsigned int len,
int write_to_vm);
This function will return a BIO describing a direct operation to the given block device bdev. The parameters uaddr and len describe the user-space buffer to be transferred; callers must check the returned BIO, however, since the area actually mapped might be smaller than what was requested. The write_to_vm flag is set if the operation will change memory - if it is a read-from-disk operation. The returned BIO (which can be NULL - check it) is ready for submission to the appropriate device driver.
When the operation is complete, undo the mapping with:
void bio_unmap_user(struct bio *bio, int write_to_vm);
Mapping user-space pages
If you have a char driver which needs direct user-space access (a high-performance streaming tape driver, say), then you'll want to map user-space pages yourself. The modern equivalent of map_user_kiobuf() is a function called get_user_pages():int get_user_pages(struct task_struct *task,
struct mm_struct *mm,
unsigned long start,
int len,
int write,
int force,
struct page **pages,
struct vm_area_struct **vmas);
task is the process performing the mapping; the primary purpose of this argument is to say who gets charged for page faults incurred while mapping the pages. This parameter is almost always passed as current. The memory management structure for the user's address space is passed in the mm parameter; it is usually current->mm. Note that get_user_pages() expects that the caller will have a read lock on mm->mmap_sem. The start and len parameters describe the user-buffer to be mapped; len is in pages. If the memory will be written to, write should be non-zero. The force flag forces read or write access, even if the current page protection would otherwise not allow that access. The pages array (which should be big enough to hold len entries) will be filled with pointers to the page structures for the user pages. If vmas is non-NULL, it will be filled with a pointer to the vm_area_struct structure containing each page.
The return value is the number of pages actually mapped, or a negative error code if something goes wrong. Assuming things worked, the user pages will be present (and locked) in memory, and can be accessed by way of the struct page pointers. Be aware, of course, that some or all of the pages could be in high memory.
There is no equivalent put_user_pages() function, so callers of get_user_pages() must perform the cleanup themselves. There are two things that need to be done: marking of modified pages, and releasing them from the page cache. If your device modified the user pages, the virtual memory subsystem may not know about it, and may fail to write the pages to permanent storage (or swap). That, of course, could lead to data corruption and grumpy users. The way to avoid this problem is to call:
SetPageDirty(struct page *page);
for each page in the mapping. Current (2.6.3) kernel code checks to ensure that pages are not reserved first with code like:
if (!PageReserved(page))
SetPageDirty(page);
But pages mapped from user space should not, normally, be marked reserved in the first place.
Finally, every mapped page must be released from the page cache, or it will stay there forever; simply pass each page structure to:
void page_cache_release(struct page *page);
After you have released the page, of course, you should not access it again.
For a good example of how to use get_user_pages() in a char driver, see the definition of sgl_map_user_pages() in drivers/scsi/st.c.
(Log in to post comments)
|
|
|
Driver porting: Zero-copy user-space access |
This seems to rule out performing DMA directly from user space but I would like to be told that I'm wrong.