文章详情

  • 游戏榜单
  • 软件榜单
关闭导航
热搜榜
热门下载
热门标签
php爱好者> php文档>Implementing efficient counters with MySQL

Implementing efficient counters with MySQL

时间:2007-07-05  来源:wangchunning

 

许多站点都使用一个计数器,这个特性很容易实现,但却很耗性能。 计数器的简单实现通常是将读负载转换为写负载。当取出对象的同时要更新计数器。 对于小的站点这种操作带来的负载并不明显-当取出数据后更新不会造成任何同步IO和后台IO。 当系统增长到一定程度,你就会实现一些类似于缓存的东西,如 memcache,来减少数据库的读,但不会对写有优化。如果采用复制 结构(而不是分区)则更糟-很容易造成复制的过载,特别是当从库串行化的执行事务时,而这些事务在主库上是可以并行执行的。 这非常糟糕,如何处理这个问题呢? 假设系统确实需要计数器功能,同时可以接受半实时更新这些计数器。 首先,使用单独的计数器表,可以有id和counter列。使用单独的表对更新来说已经可以解决大部分的问题,因为,假设你有一个 blogpost表,平均每行2K,1千万行,共20GB,不可能放入缓存。现在这个新表每行20字节,共200M可以放入内存。将记录集放入缓 存是非常重要的,每秒可以执行几千个更新,而不会影响性能。当然这和你每次只更新几行有关。 The next problem to deal with is log flushes. If you have innodb_flush_log_at_trx_commit=0 or 2 or if you use MyISAM tables you would not have the problem, if you have innodb_flush_log_at_trx_commit=1 log flushes can well be the problem - even with BBU thousands of log flushes per second may cause significant overhead or may simply be bottleneck. If you can reduce your durability requirements this is best thing to do if not you would either use MyISAM table as counter table (can well work if you do not run into table locks bottleneck) or implement delayed table update as explained below.
下面的问题是处理日志刷新的问题。如果设置innodb_flush_log_at_trx_commit=0 或 2 或使用myisam表,则不会有这个问题。如果
innodb_flush_log_at_trx_commit=1 就会出现问题。每秒将会有很多次日志刷新,这将导致明显的性能问题。 这里,我们已经优化了更新的复杂性,但更新量仍很大,需要优化。对于这个问题,memcache 不好的地方就是它是一个被动的缓存 ,这就意味着,它只缓存,当对象从缓存中移除,或周期性的执行移除缓存的操作,你不能通知memcache执行特定的操作(如,写回 到数据库)。如果你是系统级的编程,或许可以实现这个功能,它将会处理实时的计数功能,并将更新写回到数据库。 可以使用 memcache + 另一个 mysql 实例(或非复制的数据库)将更新记录到heap/myisam/archive表中,并每小时更新一次数据库 。如果 memcache 足够大,那么上一小时更新的对象仍在缓存中,这将非常快。
使用这种延迟更新不仅可以减少更新次数,也可以将所有的更新放到一个事务中,可以减少事务提交的次数。
前面提到过可以使用myisam或heap表,那么如何处理表锁的问题呢?你可以使用“影子表”-使用两个表,插入一张表、处理、删除 另一个表,mysql提供了rename table 命令,可以交换两个表。 Posted by peter @ 8:33 am :: production, tips 
del.icio.us :: digg 
Comment RSS :: Trackback URI 
Related posts: :INSERT ON DUPLICATE KEY UPDATE and REPLACE INTO::INSERT ON DUPLICATE KEY UPDATE and summary
counters.::MySQL: Followup on UNION for query optimization, Query profiling:
 
12 Comments »
1. Sam’s random musings » Implementing efficient counters with MySQL
[…] Implementing efficient counters with MySQL: […]
Pingback :: July 1, 2007 @ 9:07 am 2. Haiken
Alternative to memcache : MCache (http://www.mohawksoft.org/?q=node/8) which offers access to memory data objects
accross multiple servers, and can commit objects to filesystem or a mysql database. Comment :: July 1, 2007 @ 4:07 pm 3. Brian Aker
Hi Peter!
Here are my thoughts:
http://krow.livejournal.com/532321.html
I think you should consider whether or not an exact count is required at all times. Cheers,
-Brian
Comment :: July 1, 2007 @ 10:07 pm 4. Robin
Hi Peter,
For my employer we have thousands of updates every N minutes (N=~5). We’ve encountered the problems you mention and came up with similar solutions (having a two-column table with only the foreign key and the count, grouping the hits from the last five minutes by key to reduce number of updates)
Another optimization is to group the update queries by their counts. For example, if you have 100 rows that need to
be increased by 3 you could group these into one query like this: UPDATE table SET count = count + 3 WHERE id IN (…);
This greatly reduces the number of update queries.
During rush hours you may want to introduce a threshold: don’t update the ‘+1′ records but keep them (reinsert
them) into your log table so you don’t need to bother the counts table with these minor updates.
Also table partitioning based on the foreign key works well for better performance.
Best, Robin Comment :: July 2, 2007 @ 12:07 am 5. peter
Robin,
Right. Optimizing updates by their count is one solution. Another Approach - if you have some form of cache on top and know original data values to use multiple value replace instead of update. Regarding handling +1 values it is very application dependent question - some objects may have their count increased say only once per day so reinserting them in the log table waiting until it growths more would be rather inefficient. Comment :: July 2, 2007 @ 1:07 am 6. peter
Brian,
First In my post I specially mentioned we’re looking at the case when we want to have counters and want to keep them semi-realtime. I’m not discussing if you need it at all it is whole other story  Now it looks like you read my post only to the middle because using access log type table is exactly solution I mention in the last part. True I mention it as special table only for this particular task but if you can merge it with tables you use for other needs it is also fine. However as we’re speaking about semi-realtime counters just access logs are not enough this is why you can find realtime updates to memcache in my solution. Obviously it can be dropped if you do not care or you can use data specific optimization, for example if there are 0 views in total you may want to do an update as skipping it would be too visible, while if the view count is already 123400 skipping few updates will be unnoticed by most of the visitors. Now about replication. I’m not sure why you’re speaking about contention which is fixed by MySQL 5.1 row level replication. If you do single row updates by primary key this already gets pretty close to row based replication and you would not save much, at least not on the storage engine level. I would be surprised if we see any dramatic improvement with MySQL 5.1 here. Plus for table which fits in memory with transaction commit you already can do 5000+ updates second from replication thread. If you need more than that you should have moved to scale-out type of architecture long ago. Comment :: July 2, 2007 @ 1:07 am 7. Robin
Not really contributing to the discussion, but maybe someone gets inspired to write a new storage engine  : When
implementing a ‘log’ table and a ‘counts’ table which is regenerated every so often it would be nice to have a ‘Constant DB’ (CDB, see: http://cr.yp.to/cdb.html) storage engine, wouldn’t it? It has limited features (write once, read many - no updates) but it provides fantastic read/write performance.
Shouldn’t be too difficult to implement something like that as cdb’s API is similar to bdb. Ofcourse update,
replace & delete wouldn’t be supported in such an engine, as it is ‘constant’. Comment :: July 2, 2007 @ 2:07 am 8. guillaume
The other idea is to use a dedicated database instance for view summaries. It works quite well. And after all you
don’t re-read the value every time. Just use ON DUPLICATE KEY UPDATE instead! Comment :: July 2, 2007 @ 7:07 am 9. Jay Pipes
Hi Peter!
You wrote: “When implementing a ‘log’ table and a ‘counts’ table which is regenerated every so often it would be nice to have a ‘Constant DB’ (CDB, see: http://cr.yp.to/cdb.html) storage engine, wouldn’t it? It has limited features (write once, read many - no updates) but it provides fantastic read/write performance.” What about ARCHIVE? No updates, good read and write performance… Cheers, and BTW, I 100% agree with everything you say in this article; I’ve been including this information in my slides for a while now. -jay Comment :: July 2, 2007 @ 8:07 am 10. peter
Jay, Robin wrote that not me.
I mentioned Archive as possible candidate for temporary log table. Comment :: July 2, 2007 @ 8:07 am 11. peter
guillaume,
Sure. I kind of more viewed it for the single instance case more. But using another instance for this seems like a good idea especially to offload replication. As you do not show to many counters per page fetching them from another instance does not add a lot of overhead. And while counter is zero you do not need to store it so INSERT ON DUPLICATE KEY UPDATE can indeed be a lot of help. Comment :: July 2, 2007 @ 8:07 am 12. sebcio
if your application is PHP you can use XCache and its API to provide nice counter.
A small pseudo-code example could be found in its wiki:
if (!xcache_isset(”count”)) {
xcache_set(”count”, load_count_from_mysql());
}
?>
This guest book has been visited times.
相关阅读 更多 +
排行榜 更多 +
找茬脑洞的世界安卓版

找茬脑洞的世界安卓版

休闲益智 下载
滑板英雄跑酷2手游

滑板英雄跑酷2手游

休闲益智 下载
披萨对对看下载

披萨对对看下载

休闲益智 下载