【原创】MySQL的CSV引擎应用实例解析

时间：2008-08-11 来源：yueliangdao0608

如果您想把EXCEL的数据或者CSV格式的数据导入到MySQL中，MySQL的CSV引擎再适合不过了。
MySQL的CSV引擎在5.0后开始提供，不过不支持WINDOWS，到了5.1才支持。
今天我测试的版本号是5.0.45

一、注意几点：
1、没有索引，跟MySQL5的数据字典库一样。
2、可以直接用任何文本编辑器来编辑数据文件。
3、非英文编码问题。
我的字符终端和表都是UTF-8的，所以要把上传的CSV文件保存为UTF-8的编码。
4、编码转化工具，我这边在WINDOWS下用EDITPLUS来转化，在LINUX下可以用ICONV命令行工具来转化编码。
二、示例数据文件。
"1","Designed for 99.999% Availability","MySQL Cluster provides a fault tolerant architecture that ensures your organization's mission critical applications achieve 99.999% availability. This means less than 5 minutes downtime per year, including scheduled maintenance time. MySQL Cluster implements automatic node recoverability to ensure an application automatically fails over to another database node that contains a consistent data set, if one or more database nodes fail. Should all nodes fail due to hardware faults for example, MySQL Cluster ensures an entire system can be safely recovered in a consistent state by using a combination of checkpoints and log execution. Furthermore, MySQL Cluster ensures systems are available and consistent across geographies by enabling entire clusters to be replicated across regions."
"2","High Performance Only a Main Memory Database Can Deliver","MySQL Cluster provides the response time and throughput to meet the most demanding high volume enterprise applications. MySQL Cluster achieves its performance advantage by being a main memory clustered database solution, which keeps all data in memory and limits IO bottlenecks by asynchronously writing transaction logs to disk. MySQL Cluster also enables servers to share processing within a cluster, taking full advantage of all hardware. Typical response times for MySQL Cluster are in the range of a few milliseconds and MySQL Cluster has been proven to handle tens of thousands of distributed transactions per second that are also replicated across database nodes."
"3","Extremely Fast Automatic Failover","MySQL delivers extremely fast failover time with sub-second response so your applications can recover quickly in the event of application, network or hardware failure. MySQL Cluster uses synchronous replication to propagate transaction information to all the appropriate database nodes so applications can automatically fail over to another node extremely quickly. This eliminates the time consuming operation of recreating and replaying log files required by 'Shared-Disk' architectures to fail over successfully. Plus, MySQL Cluster database nodes are able to automatically restart, recover, and dynamically reconfigure themselves in case of failures without having to program advanced features into the application."
"4","Flexible Distributed Architecture with No Single Point of Failure","The parallel server architecture combines database nodes, management server nodes, and application nodes that can be distributed across computers and geographies to ensure there is no single point of failure. Any node can be stopped or started without stopping the applications that use the database. And MySQL Cluster is highly configurable so you can implement the appropriate level of performance, scalability and fault tolerance to match your application requirements."
"5","Significantly Reduce Costly Downtime","MySQL Cluster not only lowers up-front license costs with affordable commercial licensing under a dual licensing mechanism, but it also significantly reduces system downtime - the number one contributor to the Total Cost of Ownership (TCO) of database software. Furthermore, a highly portable standards-based environment allows you to cost-effectively distribute your applications using commodity hardware and open source software infrastructure."
"6","Lower Maintenance Costs","MySQL Cluster is designed to be largely self-governing so very few system parameters actually need fine-tuning, further reducing the risk of costly errors. As a result, there are typically fewer conflicts with other software and hardware, and less need for manual intervention. This also means that MySQL Cluster will have a much lower maintenance costs, with less fine tuning required by Database Administrators."
"7","Easy-to-use Administration","MySQL Cluster includes easy to use and powerful tools for administering your clustered environment. Command line tools enable you to monitor database nodes, control access to applications, and create and restore backups."
"8","Services and Support","MySQL provides extensive consulting, training and technical support services to ensure the success of your next mission-critical database application project. MySQL has a proven track record gained through millions of successful customer deployments that can lower your risk and maximize return on investment."
"9","MySQL CLUSTER","以上是集群的特性概括"
"10","月亮他爸","我博客地址：http://yueliangdao0608.cublog.cn"
"11","最要注意的一点","最后的一行必须有个空格！如果是中文，好奇怪！"

三、直接COPY数据文件到MySQL数据库目录下。
1、用EDITPLUS转码，然后进入MySQL命令行。
mysql> \C gbk
Charset changed
mysql> flush tables;
Query OK, 0 rows affected (0.00 sec)
mysql> select sql_calc_found_rows * from ytt order by id desc limit 2\G
*************************** 1. row ***************************
     id: 11
title: 最要注意的一点
summary: 最后的一行必须有个空格！如果是中文，好奇怪！"
*************************** 2. row ***************************
     id: 10
title: 月亮他爸
summary: 我博客地址：http://yueliangdao0608.cublog.cn"
2 rows in set (0.00 sec)

mysql> select found_rows();
+--------------+
| found_rows() |
+--------------+
|           11 |
+--------------+
1 row in set (0.00 sec)
2、用ICONV转码
现在可以直接用EXCEL来编辑数据文件，不过可惜的是EXCEL的编码是CP936的，所以传上去后要用ICONV来转码。

[root@localhost t_girl]# iconv -f CP936 -t UTF-8 ytt.CSV -o ytt.new
[root@localhost t_girl]# chown mysql:mysql ytt.new
[root@localhost t_girl]# cp -uf ytt.new ytt.CSV
[root@localhost t_girl]# ls -sihl
total 56K
13797305 8.0K -rw-rw---- 1 mysql mysql   61 Apr 30 15:19 db.opt
30539777 8.0K -rw-rw---- 1 mysql mysql 383 Jun 19 09:54 show_user.frm
13797308 12K -rw-r--r-- 1 mysql mysql 4.5K Aug 9 14:59 ytt.CSV
13797306 16K -rw-rw---- 1 mysql mysql 8.5K Aug 8 17:39 ytt.frm
13797307 12K -rw-r--r-- 1 mysql mysql 4.5K Aug 9 14:57 ytt.new
[root@localhost t_girl]# sed -i 's/^/"/g' ytt.CSV
[root@localhost t_girl]# sed -i 's/,/","/g' ytt.CSV
[root@localhost t_girl]# sed -i 's/$/"/g' ytt.CSV

用SED替换是为了用双引号来分割各个字段。