InnoDB uses a doublewrite buffer to avoid data corruption in case of partial page writes. A partial page write occurs when a disk write doesn’t complete fully, and only a portion of a 16 KB page is written to disk. There are a variety of reasons (crashes,bugs, and so on) that a page might be partially written to disk. The doublewrite buffer guards against data corruption if this happens.---doublewrite buffer可以防止页面只有部分写到磁盘.通过这个buffer可以恢复数据
The doublewrite buffer is a special reserved area of the tablespace, large enough to hold 100 pages in a contiguous block.---虽然是buffer,但是并不是在内存中.因为是buffer,所以总有被清除的时候.总体大小为100pages的连续空间. It is essentially a backup copy of recently written pages. --它只是最近被写的页面的一个备份 When InnoDB flushes pages from the buffer pool to the disk, it writes (and flushes) them first to the doublewrite buffer, then to the main data area where they really belong. This ensures that every page write is atomic and durable. Doesn’t this mean that every page is written twice? Yes, it does, but because InnoDB writes several pages to the doublewrite buffer sequentially and only then calls fsync( ) to sync them to disk the performance impact is relatively small—generally a few percentage points. More importantly, this strategy allows the log files to be much more efficient. ---这种策略使得innodb的事务日志非常小,非常高效
Because the doublewrite buffer gives InnoDB a very strong guarantee that the data pages are not corrupt, InnoDB’s log records don’t have to contain full pages;they are more like binary deltas to pages.---innodb的事务日志只需要记录页面的二进制变化量,以防万一需要通过这些日志来恢复数据,应对系统突然crash的情况.innodb的日志实际上是物理日志,而非逻辑上的. If there’s a partial page write to the doublewrite buffer itself, the original page will still be on disk in its real location.---如果是写doublewrite buffer本身失败,那么这些数据不会被写到磁盘,innodb此时会从磁盘载入原始的数据,然后通过innodb的事务日志来计算出正确的数据,重新写入到doublewrite buffer.When InnoDB recovers, it will use the original page instead of the corrupted copy in the doublewrite buffer. However, if the doublewrite buffer succeeds and the write to the page’s real location fails, InnoDB will use the copy in the doublewrite buffer during recovery. ---如果doublewrite buffer写成功的话,但是写磁盘失败,innodb就不用通过事务日志来计算了,而是直接用buffer的数据再写一遍. InnoDB knows when a page is corrupt because each page has a checksum at the end; the checksum is the last thing to be written, so if the page’s contents don’t match the checksum, the page is corrupt. Upon recovery, therefore, InnoDB just reads each page in the doublewrite buffer and verifies the checksums. If a page’s checksum is incorrect, it reads the page from its original location.---在恢复的时候,innodb直接比较页面的checksum,如果不对的话,就从硬盘载入原始数据,再由事务日志开始推演出正确的数据.所以innodb的恢复通常需要较长的时间.
In some cases, the doublewrite buffer really isn’t necessary—for example, you might want to disable it on slaves. Also, some filesystems (such as ZFS) do the same thing themselves, so it is redundant for InnoDB to do it. You can disable the doublewrite buffer by setting innodb_doublewrite to 0.
总体来说,doublewrite buffer的作用有两个: 提高innodb把缓存的数据写到硬盘这个过程的安全性;间接的好处就是,innodb的事务日志不需要包含所有数据的前后映像,而是二进制变化量,这可以节省大量的IO.