文章详情

  • 游戏榜单
  • 软件榜单
关闭导航
热搜榜
热门下载
热门标签
php爱好者> php文档>linux那些事儿之我是block层(11)传说中的内存映射(上)...

linux那些事儿之我是block层(11)传说中的内存映射(上)...

时间:2010-08-09  来源:victorzhangl

“如果这次有机会与中央首长握了手,能不能不要洗掉,这样等回去之后与他们握手,就如同首长与他们握手了.” 2007年10月17日,参加十七大的福建三明市特殊教育学校校长黄金莲如此转述学生的嘱托. 网络暴民们对这一事件进行了强烈的讽刺和抨击,然而我觉得大可不必如此,事实上,学生们的想法看似纯朴,实则蕴含了一种深刻的思想,这就是Linux中的内存映射的思想.Linux中经常有这样的情况,一个是用户空间的buffer,一个是内核空间的buffer,一个是属于应用程序,一个属于设备驱动,它们原本没有联系,它们只是永远的相提并论,只是永恒的擦肩而过,就仿佛天上的小鸟和水里的鱼,也许可以相恋,但是它们在哪里筑巢呢? 解决这一问题的方法就是映射,看似并不相连的世界,通过映射,就使得它们有关系了.但是为什么要让前者和后者联系起来呢?如果我把user buffer比作上例中的学生,而把kernel buffer比作黄金莲校长,那么你很快就能知道,之所以学生要和黄校长握手,并不是因为黄校长多么有明星气质,而是因为她和中央首长握了手,那么这里谁可以被比作中央首长呢?仔细一想就知道,设备驱动干嘛用的?用来驱动设备,没错,真正的主角不是设备驱动,而是设备.所以,应用程序之所以愿意把它的user buffer和kernel buffer映射起来,恰恰是因为kernel buffer和设备本身有联系.所以,和kernel buffer握手,就如同和设备握手. 我们拿Block层的两个函数来举例.这两个函数就是blk_rq_map_user和blk_rq_map_kern.两者都来自block/ll_rw_block.c.在我们分析sd模块时,说到ioctl时,我们最后实际上调用的是sg_io(),而sg_io()中我们需要调用blk_rq_map_user函数,所以我们先来看这个函数. 2394 /** 2395 * blk_rq_map_user - map user data to a request, for REQ_BLOCK_PC usage 2396 * @q: request queue where request should be inserted 2397 * @rq: request structure to fill 2398 * @ubuf: the user buffer 2399 * @len: length of user data 2400 * 2401 * Description: 2402 * Data will be mapped directly for zero copy io, if possible. Otherwise 2403 * a kernel bounce buffer is used. 2404 * 2405 * A matching blk_rq_unmap_user() must be issued at the end of io, while 2406 * still in process context. 2407 * 2408 * Note: The mapped bio may need to be bounced through blk_queue_bounce() 2409 * before being submitted to the device, as pages mapped may be out of 2410 * reach. It's the callers responsibility to make sure this happens. The 2411 * original bio must be passed back in to blk_rq_unmap_user() for proper 2412 * unmapping. 2413 */ 2414 int blk_rq_map_user(request_queue_t *q, struct request *rq, void __user *ubuf, 2415 unsigned long len) 2416 { 2417 unsigned long bytes_read = 0; 2418 struct bio *bio = NULL; 2419 int ret; 2420 2421 if (len > (q->max_hw_sectors > PAGE_SHIFT; 2432 start = (unsigned long)ubuf >> PAGE_SHIFT; 2433 2434 /* 2435 * A bad offset could cause us to require BIO_MAX_PAGES + 1 2436 * pages. If this happens we just lower the requested 2437 * mapping len by a page so that we can fit 2438 */ 2439 if (end - start > BIO_MAX_PAGES) 2440 map_len -= PAGE_SIZE; 2441 2442 ret = __blk_rq_map_user(q, rq, ubuf, map_len); 2443 if (ret < 0) 2444 goto unmap_rq; 2445 if (!bio) 2446 bio = rq->bio; 2447 bytes_read += ret; 2448 ubuf += ret; 2449 } 2450 2451 rq->buffer = rq->data = NULL; 2452 return 0; 2453 unmap_rq: 2454 blk_rq_unmap_user(bio); 2455 return ret; 2456 } 这个函数的参数ubuf不是别人,正是从用户空间传下来的那个user buffer,或曰user-space buffer,而len则是该buffer的长度. 也许我们早就该讲struct bio了.毫无疑问这个结构体是Generic Block Layer中最基础最核心最拉风最潇洒最酷的结构体之一.它表征的是一次正在进行的块设备I/O操作.经典的Linux书籍中无一例外的都对这个结构体进行了详细的介绍,但作为80后我们并不需要跟风,并不需要随波逐流,我们要追求自己的个性,所以这里我们并不过多地讲这个结构体,只是告诉你,它来自include/linux/bio.h: 68 /* 69 * main unit of I/O for the block layer and lower layers (ie drivers and 70 * stacking drivers) 71 */ 72 struct bio { 73 sector_t bi_sector; /* device address in 512 byte 74 sectors */ 75 struct bio *bi_next; /* request queue link */ 76 struct block_device *bi_bdev; 77 unsigned long bi_flags; /* status, command, etc */ 78 unsigned long bi_rw; /* bottom bits READ/WRITE, 79 * top bits priority 80 */ 81 82 unsigned short bi_vcnt; /* how many bio_vec's */ 83 unsigned short bi_idx; /* current index into bvl_vec */ 84 85 /* Number of segments in this BIO after 86 * physical address coalescing is performed. 87 */ 88 unsigned short bi_phys_segments; 89 90 /* Number of segments after physical and DMA remapping 91 * hardware coalescing is performed. 92 */ 93 unsigned short bi_hw_segments; 94 95 unsigned int bi_size; /* residual I/O count */ 96 97 /* 98 * To keep track of the max hw size, we account for the 99 * sizes of the first and last virtually mergeable segments 100 * in this bio 101 */ 102 unsigned int bi_hw_front_size; 103 unsigned int bi_hw_back_size; 104 105 unsigned int bi_max_vecs; /* max bvl_vecs we can hold */ 106 107 struct bio_vec *bi_io_vec; /* the actual vec list */ 108 109 bio_end_io_t *bi_end_io; 110 atomic_t bi_cnt; /* pin count */ 111 112 void *bi_private; 113 114 bio_destructor_t *bi_destructor; /* destructor */ 115 }; 而它的存在并非是孤立的,它和request是有联系的.struct request中有一个成员struct bio *bio,表征的就是这个request的bio们,因为一个request包含多个I/O操作.而blk_rq_map_user的主要工作就是建立user buffer和bio之间的映射,具体工作是由__blk_rq_map_user来完成的. 2341 static int __blk_rq_map_user(request_queue_t *q, struct request *rq, 2342 void __user *ubuf, unsigned int len) 2343 { 2344 unsigned long uaddr; 2345 struct bio *bio, *orig_bio; 2346 int reading, ret; 2347 2348 reading = rq_data_dir(rq) == READ; 2349 2350 /* 2351 * if alignment requirement is satisfied, map in user pages for 2352 * direct dma. else, set up kernel bounce buffers 2353 */ 2354 uaddr = (unsigned long) ubuf; 2355 if (!(uaddr & queue_dma_alignment(q)) && !(len & queue_dma_alignment(q))) 2356 bio = bio_map_user(q, NULL, uaddr, len, reading); 2357 else 2358 bio = bio_copy_user(q, uaddr, len, reading); 2359 2360 if (IS_ERR(bio)) 2361 return PTR_ERR(bio); 2362 2363 orig_bio = bio; 2364 blk_queue_bounce(q, &bio); 2365 2366 /* 2367 * We link the bounce buffer in and could have to traverse it 2368 * later so we have to get a ref to prevent it from being freed 2369 */ 2370 bio_get(bio); 2371 2372 if (!rq->bio) 2373 blk_rq_bio_prep(q, rq, bio); 2374 else if (!ll_back_merge_fn(q, rq, bio)) { 2375 ret = -EINVAL; 2376 goto unmap_bio; 2377 } else { 2378 rq->biotail->bi_next = bio; 2379 rq->biotail = bio; 2380 2381 rq->data_len += bio->bi_size; 2382 } 2383 2384 return bio->bi_size; 2385 2386 unmap_bio: 2387 /* if it was boucned we must call the end io function */ 2388 bio_endio(bio, bio->bi_size, 0); 2389 __blk_rq_unmap_user(orig_bio); 2390 bio_put(bio); 2391 return ret; 2392 } 但至少目前为止,bio还只是一个虚无缥缈的指针,华而不实,谁为它申请了内存呢?让我们接着深入,进一步我们需要关注的是bio_map_user().uaddr是ubuf的虚拟地址,如果其满足所在队列的字节对齐要求,则bio_map_user()会被调用.(否则需要调用bio_copy_user()来建立所谓的bounce buffer,不表.)该函数来自fs/bio.c: 713 /** 714 * bio_map_user - map user address into bio 715 * @q: the request_queue_t for the bio 716 * @bdev: destination block device 717 * @uaddr: start of user address 718 * @len: length in bytes 719 * @write_to_vm: bool indicating writing to pages or not 720 * 721 * Map the user space address into a bio suitable for io to a block 722 * device. Returns an error pointer in case of error. 723 */ 724 struct bio *bio_map_user(request_queue_t *q, struct block_device *bdev, 725 unsigned long uaddr, unsigned int len, int write_to_vm) 726 { 727 struct sg_iovec iov; 728 729 iov.iov_base = (void __user *)uaddr; 730 iov.iov_len = len; 731 732 return bio_map_user_iov(q, bdev, &iov, 1, write_to_vm); 733 } 走到这里,struct sg_iovec似曾相识,仔细回忆一下,在sd中讲ioctl的时候曾经讲过这个结构体,描述的就是一个scatter-gather数组成员.iovec就是io vector的意思,即IO向量,或者说一个由基地址和长度组成的结构体. 关于函数的各个参数,注释里说得很清楚,而且注释也说了这个函数的目的,不难知道这个函数将返回一个描述了一次IO操作的bio指针.不过真正干活的是bio_map_user_iov().于是再转战至bio_map_user_iov().同样来自fs/bio.c: 735 /** 736 * bio_map_user_iov - map user sg_iovec table into bio 737 * @q: the request_queue_t for the bio 738 * @bdev: destination block device 739 * @iov: the iovec. 740 * @iov_count: number of elements in the iovec 741 * @write_to_vm: bool indicating writing to pages or not 742 * 743 * Map the user space address into a bio suitable for io to a block 744 * device. Returns an error pointer in case of error. 745 */ 746 struct bio *bio_map_user_iov(request_queue_t *q, struct block_device *bdev, 747 struct sg_iovec *iov, int iov_count, 748 int write_to_vm) 749 { 750 struct bio *bio; 751 752 bio = __bio_map_user_iov(q, bdev, iov, iov_count, write_to_vm); 753 754 if (IS_ERR(bio)) 755 return bio; 756 757 /* 758 * subtle -- if __bio_map_user() ended up bouncing a bio, 759 * it would normally disappear when its bi_end_io is run. 760 * however, we need it for the unmap, so grab an extra 761 * reference to it 762 */ 763 bio_get(bio); 764 765 return bio; 766 } 还不是终点,继续走入__bio_map_user_iov(). 603 static struct bio *__bio_map_user_iov(request_queue_t *q, 604 struct block_device *bdev, 605 struct sg_iovec *iov, int iov_count, 606 int write_to_vm) 607 { 608 int i, j; 609 int nr_pages = 0; 610 struct page **pages; 611 struct bio *bio; 612 int cur_page = 0; 613 int ret, offset; 614 615 for (i = 0; i < iov_count; i++) { 616 unsigned long uaddr = (unsigned long)iov[i].iov_base; 617 unsigned long len = iov[i].iov_len; 618 unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; 619 unsigned long start = uaddr >> PAGE_SHIFT; 620 621 nr_pages += end - start; 622 /* 623 * buffer must be aligned to at least hardsector size for now 624 */ 625 if (uaddr & queue_dma_alignment(q)) 626 return ERR_PTR(-EINVAL); 627 } 628 629 if (!nr_pages) 630 return ERR_PTR(-EINVAL); 631 632 bio = bio_alloc(GFP_KERNEL, nr_pages); 633 if (!bio) 634 return ERR_PTR(-ENOMEM); 635 636 ret = -ENOMEM; 637 pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); 638 if (!pages) 639 goto out; 640 641 for (i = 0; i < iov_count; i++) { 642 unsigned long uaddr = (unsigned long)iov[i].iov_base; 643 unsigned long len = iov[i].iov_len; 644 unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT; 645 unsigned long start = uaddr >> PAGE_SHIFT; 646 const int local_nr_pages = end - start; 647 const int page_limit = cur_page + local_nr_pages; 648 649 down_read(&current->mm->mmap_sem); 650 ret = get_user_pages(current, current->mm, uaddr, 651 local_nr_pages, 652 write_to_vm, 0, &pages[cur_page], NULL); 653 up_read(&current->mm->mmap_sem); 654 655 if (ret < local_nr_pages) { 656 ret = -EFAULT; 657 goto out_unmap; 658 } 659 660 offset = uaddr & ~PAGE_MASK; 661 for (j = cur_page; j < page_limit; j++) { 662 unsigned int bytes = PAGE_SIZE - offset; 663 664 if (len len) 668 bytes = len; 669 670 /* 671 * sorry... 672 */ 673 if (bio_add_pc_page(q, bio, pages[j], bytes, offset) < 674 bytes) 675 break; 676 677 len -= bytes; 678 offset = 0; 679 } 680 681 cur_page = j; 682 /* 683 * release the pages we didn't map into the bio, if any 684 */ 685 while (j < page_limit) 686 page_cache_release(pages[j++]); 687 } 688 689 kfree(pages); 690 691 /* 692 * set data direction, and check if mapped pages need bouncing 693 */ 694 if (!write_to_vm) 695 bio->bi_rw |= (1 bi_bdev = bdev; 698 bio->bi_flags |= (1 bi_destructor = bio_fs_destructor; 193 194 return bio; 195 } 其实就是调用bio_alloc_bioset(),来自同一个文件: 147 /** 148 * bio_alloc_bioset - allocate a bio for I/O 149 * @gfp_mask: the GFP_ mask given to the slab allocator 150 * @nr_iovecs: number of iovecs to pre-allocate 151 * @bs: the bio_set to allocate from 152 * 153 * Description: 154 * bio_alloc_bioset will first try it's on mempool to satisfy the allocation. 155 * If %__GFP_WAIT is set then we will block on the internal pool waiting 156 * for a &struct bio to become free. 157 * 158 * allocate bio and iovecs from the memory pools specified by the 159 * bio_set structure. 160 **/ 161 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) 162 { 163 struct bio *bio = mempool_alloc(bs->bio_pool, gfp_mask); 164 165 if (likely(bio)) { 166 struct bio_vec *bvl = NULL; 167 168 bio_init(bio); 169 if (likely(nr_iovecs)) { 170 unsigned long idx = 0; /* shut up gcc */ 171 172 bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx, bs); 173 if (unlikely(!bvl)) { 174 mempool_free(bio, bs->bio_pool); 175 bio = NULL; 176 goto out; 177 } 178 bio->bi_flags |= idx bi_max_vecs = bvec_slabs[idx].nr_vecs; 180 } 181 bio->bi_io_vec = bvl; 182 } 183 out: 184 return bio; 185 } 看到这儿基本上就明白怎么回事了.mempool_alloc很明确的告诉我们,为bio申请了内存,紧接着bio_init()为它做了初始化.更多细节不再说了,唯一需要关注的是,nr_iovecs,一路传过来的, __bio_map_user_iov()中把nr_pages传递了给了bio_alloc(),而615行到627行对nr_pages进行了计算,通过一个for循环累加,循环次数是iov_count,每次雷加的是end和start的差值.很显然,最终的nr_pages就是iov数组所对应的page的数量,而iov是__bio_map_user_iov的第三个参数,另一方面,很显然,iov_count表征的是iov数组的元素个数,而在bio_map_user中调用bio_map_user_iov时传递的第三个参数是1,所以iov_count就是1.不过这些都不重要,重要的是我们现在有bio了.我们结束bio_alloc,回到__bio_map_user_iov中继续往下走,637行,申请了另一个东西,pages,一个二级指针,冥冥中感觉到这将代表一个指针数组. 而紧接着,又是另一个for循环.而get_user_pages是获得page描述符.这一行代码应该是灵魂性质的代码.从这一刻起,用户空间的buffer和内核空间建立了姻缘.让我们从下面这幅图说起. Bio中最重要的成员就是bi_io_vec和bi_vcnt.bi_io_vec是一个struct bio_vec指针,后者的定义在include/linux/bio.h中: 54 /* 55 * was unsigned short, but we might as well be ready for > 64kB I/O pages 56 */ 57 struct bio_vec { 58 struct page *bv_page; 59 unsigned int bv_len; 60 unsigned int bv_offset; 61 }; 而bi_io_vec实际上则是代表了一个struct bio_vec的数组,bi_vcnt是这个数组的元素个数.如图中看到的那样,bio_vec中的成员bv_page指向的是一个个映射的page.而建立映射的恰恰就是刚才看到的这个伟大的get_user_pages()函数,是它让这些个page和用户空间的buffer联系了起来.而bio_add_pc_page()则是让bv_page指向相应的page.之所以要把page和用户空间的buffer映射起来,其原因在于block层只认bio不认用户空间的user buffer,block层的那些个函数都是针对bio来操作的,它们可不管你什么用户空间不用户空间,它们就管自己的bio,它们就知道每一个request对应一个bio. 关于get_user_pages函数,其原型在include/linux/mm.h中: 795 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, 796 int len, int write, int force, struct page **pages, struct vm_area_struct **vmas); 这其中,start和len这两个参数描述的是user-space buffer,(其中len的单位是page,即len如果为3就表示3个page.)本函数的目的就是把这个user-space buffer映射到内核空间,而pages和vmas是这个函数的输出.其中pages是一个二级指针,换言之它其实就是一个指针数组,包含的是一群page指针,这群page指针指向的正是这个user-space buffer.这个函数的返回值是实际映射了几个pages.(The return value is the number of pages actually mapped.)而vmas咱们不用管了,至少咱们这里传递进去的是NULL,所以它不会起什么作用. 继续对get_user_pages多八卦几句,正如每一个成功的男人背后都有一个(或多个)女人,比如张斌老师,比如赵忠祥老师,比如李金斗老师,每一个Linux进程背后都有一个页表.在进程创建的时候会在其地址空间中建立自己的页表,对于x86而言,页表中一共有1024项,每一项可以表征一个page,而该page是否存在于物理内存中呢?这就很难说了.我们不妨把page table中的1024项说成1024个指针,这1024个指针都是32个bits,这其中就有一位被叫做Present位,它为1则说明该page存在于物理内存中,它为0则说明不存在物理内存中. 那么这和我们这个get_user_pages有什么关系呢?get_user_pages的参数start和len表征的是线性地址,拿x86来说,线性地址一共32个bits,这三十二个bits分为三段,bit31-bit22称为Directory,或者说Page Directory中的索引,bit21-bit12称为Table,或者说Page Table中的索引,bit11-bit0则是Offset.给定了一个虚拟地址,或者说线性地址,就相当于给定了它在Page Directory中的位置,给定了它在Page Table中的位置,也就是说给定了一个Page.假如这个Page在物理内存中,那么好说,但是如果不在呢?如果不在,这时候get_user_pages()方显英雄本色,它会申请一个Page Frame,会相应的设置页表.这之后,这段虚拟地址就属于有后台的虚拟地址了,因为有物理地址给它撑腰,这样你应用程序就可以访问它了,而设备驱动也可以访问它了,只不过设备驱动并不是直接访问这些个地址,还是前面说的,Block层只认bio,不认page,不认虚拟地址,所以有下面这个函数bio_add_pc_page(),负责把page和bio联系起来. 我们来看bio_add_pc_page,它来自fs/bio.c: 414 /** 415 * bio_add_pc_page - attempt to add page to bio 416 * @q: the target queue 417 * @bio: destination bio 418 * @page: page to add 419 * @len: vec entry length 420 * @offset: vec entry offset 421 * 422 * Attempt to add a page to the bio_vec maplist. This can fail for a 423 * number of reasons, such as the bio being full or target block 424 * device limitations. The target block device must allow bio's 425 * smaller than PAGE_SIZE, so it is always possible to add a single 426 * page to an empty bio. This should only be used by REQ_PC bios. 427 */ 428 int bio_add_pc_page(request_queue_t *q, struct bio *bio, struct page *page, 429 unsigned int len, unsigned int offset) 430 { 431 return __bio_add_page(q, bio, page, len, offset, q->max_hw_sectors); 432 } 而__bio_add_pages来自同一个文件. 318 static int __bio_add_page(request_queue_t *q, struct bio *bio, struct page 319 *page, unsigned int len, unsigned int offset, 320 unsigned short max_sectors) 321 { 322 int retried_segments = 0; 323 struct bio_vec *bvec; 324 325 /* 326 * cloned bio must not modify vec list 327 */ 328 if (unlikely(bio_flagged(bio, BIO_CLONED))) 329 return 0; 330 331 if (((bio->bi_size + len) >> 9) > max_sectors) 332 return 0; 333 334 /* 335 * For filesystems with a blocksize smaller than the pagesize 336 * we will often be called with the same page as last time and 337 * a consecutive offset. Optimize this special case. 338 */ 339 if (bio->bi_vcnt > 0) { 340 struct bio_vec *prev = &bio->bi_io_vec[bio->bi_vcnt - 1]; 341 342 if (page == prev->bv_page && 343 offset == prev->bv_offset + prev->bv_len) { 344 prev->bv_len += len; 345 if (q->merge_bvec_fn && 346 q->merge_bvec_fn(q, bio, prev) < len) { 347 prev->bv_len -= len; 348 return 0; 349 } 350 351 goto done; 352 } 353 } 354 355 if (bio->bi_vcnt >= bio->bi_max_vecs) 356 return 0; 357 358 /* 359 * we might lose a segment or two here, but rather that than 360 * make this too complex. 361 */ 362 363 while (bio->bi_phys_segments >= q->max_phys_segments 364 || bio->bi_hw_segments >= q->max_hw_segments 365 || BIOVEC_VIRT_OVERSIZE(bio->bi_size)) { 366 367 if (retried_segments) 368 return 0; 369 370 retried_segments = 1; 371 blk_recount_segments(q, bio); 372 } 373 374 /* 375 * setup the new entry, we might clear it again later if we 376 * cannot add the page 377 */ 378 bvec = &bio->bi_io_vec[bio->bi_vcnt]; 379 bvec->bv_page = page; 380 bvec->bv_len = len; 381 bvec->bv_offset = offset; 382 383 /* 384 * if queue has other restrictions (eg varying max sector size 385 * depending on offset), it can specify a merge_bvec_fn in the 386 * queue to get further control 387 */ 388 if (q->merge_bvec_fn) { 389 /* 390 * merge_bvec_fn() returns number of bytes it can accept 391 * at this offset 392 */ 393 if (q->merge_bvec_fn(q, bio, bvec) < len) { 394 bvec->bv_page = NULL; 395 bvec->bv_len = 0; 396 bvec->bv_offset = 0; 397 return 0; 398 } 399 } 400 401 /* If we may be able to merge these biovecs, force a recount */ 402 if (bio->bi_vcnt && (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec) || 403 BIOVEC_VIRT_MERGEABLE(bvec-1, bvec))) 404 bio->bi_flags &= ~(1 bi_vcnt++; 407 bio->bi_phys_segments++; 408 bio->bi_hw_segments++; 409 done: 410 bio->bi_size += len; 411 return len; 412 } Block层很多东西都是为Raid服务的,比如这里的这个merge_bvec_fn函数指针,对于普通的硬盘驱动来说,是没有这么一个破指针的,或者说这个指针指向的是空气.不过有意思的是没有这个函数的话,__bio_add_pages这个函数就变得很简单了,所以我们很开心.这个函数最有意义的代码就是378行到381行对bvec的赋值,以及406行到410行对bio的赋值.友情提醒一下,注意410行这个赋值,bio->bi_size就是len的累加,如果你仔细追踪一下就会发现,其实兜来转去,这个bio->bi_size就是最初用户空间传下来那个len. 函数__bio_map_user_iov()中,661行到679行这个for循环,就是让这所有的那些pages一个个的全都加入到bio的那张bi_io_vec表里去,让每一个bv_page都有所指. 然后,在699行,__bio_map_user_iov()函数返回,返回的就是bio.紧接着,bio_map_user_iov()和bio_map_user()也先后返回,返回值也都是这个bio.我们于是回到了__blk_rq_map_user()中. 不过,我们刚才也看到了,bio是有了,bio和pages也有了暧昧关系,bio和user buffer也有了暧昧关系,可是这就够了吗?很显然bio还应该和request建立关系吧,没加入到request中的bio可不是有用的bio,request和bio之间的关系如下图所示: 完成这项工作的就是2373行调用的blk_rq_bio_prep()函数,来自block/ll_rw_blk.c: 3669 void blk_rq_bio_prep(request_queue_t *q, struct request *rq, struct bio *bio) 3670 { 3671 /* first two bits are identical in rq->cmd_flags and bio->bi_rw */ 3672 rq->cmd_flags |= (bio->bi_rw & 3); 3673 3674 rq->nr_phys_segments = bio_phys_segments(q, bio); 3675 rq->nr_hw_segments = bio_hw_segments(q, bio); 3676 rq->current_nr_sectors = bio_cur_sectors(bio); 3677 rq->hard_cur_sectors = rq->current_nr_sectors; 3678 rq->hard_nr_sectors = rq->nr_sectors = bio_sectors(bio); 3679 rq->buffer = bio_data(bio); 3680 rq->data_len = bio->bi_size; 3681 3682 rq->bio = rq->biotail = bio; 3683 } 到这里bio正式嫁入rq. 回到__blk_rq_map_user(),也该返回了,2384行,返回的是bio->bi_size.刚才说过了,这个就是用户空间传过来那个user buffer的长度. 而回到blk_rq_map_user()中,发现这个函数也该结束了,正常的话这个函数返回0.于是这个浩大的映射工程就算是结束了.然而网友”贱男村村长”提出质疑,这些个bio什么时候被用到的?当时在讲scsi命令的时候好像没怎么说起?其实当时在讲scsi命令的时候,有这么一个函数,scsi_setup_blk_pc_cmnd,这个函数1104行就是判断req->bio是否为NULL,如果不为NULL,则会对它进行相应的处理,一个叫做scsi_init_io()的函数会被调用,会建立一个scatter-gather数组来和这个bio中的向量bi_io_vec相对应.
相关阅读 更多 +
排行榜 更多 +
幸运硬币官方正版下载

幸运硬币官方正版下载

休闲益智 下载
宝宝来找茬手机版 v9.86.00.00 安卓版

宝宝来找茬手机版 v9.86.00.00 安卓版

休闲益智 下载
翻滚飞机大战最新版 v1.0.4 安卓版

翻滚飞机大战最新版 v1.0.4 安卓版

飞行射击 下载