ITPUB SQL大赛之BUG(七)
时间:2011-04-10 来源:yangtingkun
由于SQL大赛题目相对比较困难,不但需要使用大量的特性,且SQL实现十分复杂,一般运行时间也会比较长,这些因素导致碰到bug的几率直线上升。这里介绍SQL大赛期间碰到的几个bug。
这篇给出ORA-1489错误的原因和解决方法。
ITPUB SQL大赛之BUG(一):http://yangtingkun.itpub.net/post/468/515815
ITPUB SQL大赛之BUG(二):http://yangtingkun.itpub.net/post/468/515815
ITPUB SQL大赛之BUG(三):http://yangtingkun.itpub.net/post/468/515982
ITPUB SQL大赛之BUG(四):http://yangtingkun.itpub.net/post/468/516023
ITPUB SQL大赛之BUG(五):http://yangtingkun.itpub.net/post/468/516219
ITPUB SQL大赛之BUG(六):http://yangtingkun.itpub.net/post/468/516307
由于篇幅所限,前两篇只是描述了现象,并没有解释问题产生的真正原因。
其实导致问题的关键是递归子查询中的UNION ALL语句,当递归WITH第一次运行时,LINES列的数据类型已经确定下来,而随后的运行过程中,如果列返回的长度超过了列的定义,则会导致ORA-1489错误。
SQL> CREATE TABLE T_UNION_ALL AS
2 SELECT '1' C FROM DUAL
3 UNION ALL
4 SELECT '1234' FROM DUAL;
表已创建。
SQL> DESC T_UNION_ALL
名称 是否为空? 类型
----------------------------------- -------- --------------------------------------------
C VARCHAR2(4)
可以看得,对于UNION ALL语句,数据类型长度的确定由各个子句共同决定。
不过递归WITH子句则更复杂一些:
SQL> CREATE TABLE T_RES_WITH AS
2 WITH A (N, C) AS
3 (
4 SELECT 1, '1'
5 FROM DUAL
6 UNION ALL
7 SELECT N + 1, C || '1'
8 FROM A R, DUAL A
9 WHERE N <= 2
10 )
11 SELECT * FROM A;
表已创建。
SQL> DESC T_RES_WITH
名称 是否为空? 类型
----------------------------------- -------- --------------------------------------------
N NUMBER
C VARCHAR2(3)
对于递归WITH而言,情况就要复杂一些,首先UNION ALL的第二个子句中的C,其实就是UNION ALL第一个子句中的’1’,这个长度是1,而当前表达式的长度则是2。也就是说新的C列的长度是2,而递归WITH语句中出现的最大长度是C || ‘1’,因此,递归WITH语句的最终长度是3。
为了更好的说明这个问题,看一个和大赛SQL有关的例子:
SQL> CREATE TABLE T_RES_WITH2 AS
2 WITH ROUTE_ALL_S (C1, C2, LINES, DIS) AS
3 (
4 SELECT CITY1, CITY2, CITY1 || CITY2, DISTANCE
5 FROM ROUTES
6 UNION ALL
7 SELECT SUBSTR(LINES, 1, 1), CITY2, LINES || '-' || CITY1 || CITY2, DISTANCE + DIS
8 FROM ROUTES R, ROUTE_ALL_S A
9 WHERE A.C2 = R.CITY1
10 )
11 SELECT LINES
12 FROM ROUTE_ALL_S
13 WHERE ROWNUM < 10;
表已创建。
SQL> DESC T_RES_WITH2
名称 是否为空? 类型
----------------------------------- -------- --------------------------------------------
LINES VARCHAR2(62)
由于CITY1和CITY2的长度都是10,因此LINES || ‘-’ || CITY1 || CITY2的总长度就是20 + 1 + 10 + 10 = 41,而递归WITH最终字符串长度就是41 + 1 + 10 + 10 = 62。
下面看看CONCAT和||有什么区别:
SQL> CREATE TABLE T_RES_WITH3 AS
2 WITH ROUTE_ALL_S (C1, C2, LINES, DIS) AS
3 (
4 SELECT CITY1, CITY2, CITY1 || CITY2, DISTANCE
5 FROM ROUTES
6 UNION ALL
7 SELECT SUBSTR(LINES, 1, 1), CITY2, CONCAT(LINES, '-' || CITY1 || CITY2), DISTANCE + DIS
8 FROM ROUTES R, ROUTE_ALL_S A
9 WHERE A.C2 = R.CITY1
10 )
11 SELECT LINES
12 FROM ROUTE_ALL_S
13 WHERE ROWNUM < 10;
表已创建。
SQL> DESC T_RES_WITH3
名称 是否为空? 类型
----------------------------------- -------- --------------------------------------------
LINES VARCHAR2(62)
可以看得,无论是||还是CONCAT,在生成字符串长度的时候并没有区别,不过执行下面的SQL就会看到区别:
SQL> WITH ROUTE_D AS
2 (
3 SELECT CITY1 R, CITY2 T, DISTANCE DIS
4 FROM ROUTES
5 UNION ALL
6 SELECT CITY2, CITY1, DISTANCE
7 FROM ROUTES
8 ),
9 ROUTE_ALL_D (C1, C2, LINES) AS
10 (
11 SELECT R, T, R || T
12 FROM ROUTE_D
13 WHERE R = 'E'
14 UNION ALL
15 SELECT A.C1, T, LINES || RPAD('-', 20, '-') || R || T || RPAD('-', 20, '-')
16 FROM ROUTE_D R, ROUTE_ALL_D A
17 WHERE A.C2 = R.R
18 )
19 CYCLE LINES SET DUP_LINES TO 'Y' DEFAULT 'N'
20 SELECT LENGTH(LINES)
21 FROM ROUTE_ALL_D;
LENGTH(LINES)
-------------
2
2
44
44
44
44
44
44
44
44
44
44
86
86
86
.
.
.
86
86
ERROR:
ORA-01489: 字符串连接的结果过长
已选择51行。
这个SQL从SQL大赛结果中抽取了部分内容并进行了简单的变形后得到的,为了是尽快出现ORA-1489错误。
根据前面计算递归WITH的字符串长度,这里允许的最大长度应为:20 + 60 + 60 = 140,而递归到下一层的实际长度只有86 + 20 + 1 + 1 + 20 = 128,小于列的最大长度。
看一下使用CONCAT函数的情况:
SQL> WITH ROUTE_D AS
2 (
3 SELECT CITY1 R, CITY2 T, DISTANCE DIS
4 FROM ROUTES
5 UNION ALL
6 SELECT CITY2, CITY1, DISTANCE
7 FROM ROUTES
8 ),
9 ROUTE_ALL_D (C1, C2, LINES) AS
10 (
11 SELECT R, T, R || T
12 FROM ROUTE_D
13 WHERE R = 'E'
14 UNION ALL
15 SELECT A.C1, T, CONCAT(LINES, RPAD('-', 20, '-') || R || T || RPAD('-', 20, '-'))
16 FROM ROUTE_D R, ROUTE_ALL_D A
17 WHERE A.C2 = R.R
18 )
19 CYCLE LINES SET DUP_LINES TO 'Y' DEFAULT 'N'
20 SELECT LENGTH(LINES)
21 FROM ROUTE_ALL_D;
LENGTH(LINES)
-------------
2
2
44
44
44
44
44
44
44
44
44
44
86
86
86
.
.
.
86
86
128
128
.
.
.
128
128
128
ERROR:
ORA-01489: 字符串连接的结果过长
已选择211行。
可以看到,使用CONCAT函数,则递归WITH的长度可以达到刚才计算的值128,这比使用||递归深了一层,可以处理更多的数据,这也是为什么上一篇文章中使用||会报错,而使用CONCAT可以顺利执行的原因。使用||时,Oracle允许的最大的长度也是140,为什么在递归WITH执行的时候,字符串长度到了86后,再次迭加就会报错,怀疑||操作这里存在bug,Oracle计算长度的时候使用了列的定义,而不是实际的长度。Oracle可能认为86的下一层长度会达到86 + 20 + 10 + 10 + 20 = 146,超过了最大的长度,因此报错。
下面再看第五篇文章的问题:当使用C1、C2作为ROUTE_S的列,则会出现ORA-1489的错误,而如果使用SUBSTR的方式,则不会导致错误。
其实这个道理很简单,由于使用了SUBSTR,使得列的长度发生了变化:
SQL> CREATE TABLE T_WITH_SUBSTR AS
2 WITH ROUTE_ALL_S (C1, C2, LINES, DIS) AS
3 (
4 SELECT CITY1, CITY2, CITY1 || CITY2, DISTANCE
5 FROM ROUTES
6 UNION ALL
7 SELECT SUBSTR(LINES, 1, 1), CITY2, LINES || '-' || CITY1 || CITY2, DISTANCE + DIS
8 FROM ROUTES R, ROUTE_ALL_S A
9 WHERE A.C2 = R.CITY1
10 ),
11 ROUTE_S AS
12 (
13 SELECT SUBSTR(LINES, 1, 1) R, SUBSTR(LINES, LENGTH(LINES)) T, MIN(DIS) DIS
14 FROM ROUTE_ALL_S
15 GROUP BY SUBSTR(LINES, 1, 1), SUBSTR(LINES, LENGTH(LINES))
16 ),
17 ROUTE_D AS
18 (
19 SELECT R, T, DIS
20 FROM ROUTE_S
21 UNION ALL
22 SELECT T, R, DIS
23 FROM ROUTE_S
24 ),
25 ROUTE_ALL_D (C1, C2, LINES, DISTANCE) AS
26 (
27 SELECT R, T, R || T, DIS
28 FROM ROUTE_D
29 UNION ALL
30 SELECT SUBSTR(LINES, 1, 1), T, LINES || '-' || R || T, DIS + DISTANCE
31 FROM ROUTE_D R, ROUTE_ALL_D A
32 WHERE A.C2 = R.R
33 AND INSTR(LINES, R || T) = 0
34 AND INSTR(LINES, T || R) = 0
35 AND C1 != T
36 AND INSTR(LINES, R, 1, 2) = 0
37 AND INSTR(LINES, T, 1, 2) = 0
38 AND DISTANCE + DIS <= NVL((SELECT DISTANCE FROM ROUTE_D RS WHERE C1 = RS.R AND R.T = RS.T), 9.9E38)
39 )
40 SELECT LINES
41 FROM ROUTE_ALL_D
42 WHERE ROWNUM < 1;
表已创建。
SQL> DESC T_WITH_SUBSTR
名称 是否为空? 类型
----------------------------------- -------- --------------------------------------------
LINES VARCHAR2(497)
SQL> CREATE TABLE T_WITH_SUBSTR2 AS
2 WITH ROUTE_ALL_S (C1, C2, LINES, DIS) AS
3 (
4 SELECT CITY1, CITY2, CITY1 || CITY2, DISTANCE
5 FROM ROUTES
6 UNION ALL
7 SELECT SUBSTR(LINES, 1, 1), CITY2, LINES || '-' || CITY1 || CITY2, DISTANCE + DIS
8 FROM ROUTES R, ROUTE_ALL_S A
9 WHERE A.C2 = R.CITY1
10 ),
11 ROUTE_S AS
12 (
13 SELECT SUBSTR(LINES, 1, 1) R, SUBSTR(LINES, LENGTH(LINES)) T, MIN(DIS) DIS
14 FROM ROUTE_ALL_S
15 GROUP BY SUBSTR(LINES, 1, 1), SUBSTR(LINES, LENGTH(LINES))
16 )
17 SELECT * FROM ROUTE_S;
表已创建。
SQL> DESC T_WITH_SUBSTR2
名称 是否为空? 类型
----------------------------------- -------- --------------------------------------------
R VARCHAR2(2)
T VARCHAR2(124)
DIS NUMBER
SQL> CREATE TABLE T_WITH_SUBSTR3 AS
2 WITH ROUTE_ALL_S (C1, C2, LINES, DIS) AS
3 (
4 SELECT CITY1, CITY2, CITY1 || CITY2, DISTANCE
5 FROM ROUTES
6 UNION ALL
7 SELECT SUBSTR(LINES, 1, 1), CITY2, LINES || '-' || CITY1 || CITY2, DISTANCE + DIS
8 FROM ROUTES R, ROUTE_ALL_S A
9 WHERE A.C2 = R.CITY1
10 ),
11 ROUTE_S AS
12 (
13 SELECT SUBSTR(LINES, 1, 1) R, SUBSTR(LINES, LENGTH(LINES)) T, MIN(DIS) DIS
14 FROM ROUTE_ALL_S
15 GROUP BY SUBSTR(LINES, 1, 1), SUBSTR(LINES, LENGTH(LINES))
16 ),
17 ROUTE_D AS
18 (
19 SELECT R, T, DIS
20 FROM ROUTE_S
21 UNION ALL
22 SELECT T, R, DIS
23 FROM ROUTE_S
24 )
25 SELECT * FROM ROUTE_D;
表已创建。
SQL> DESC T_WITH_SUBSTR3
名称 是否为空? 类型
----------------------------------- -------- --------------------------------------------
R VARCHAR2(124)
T VARCHAR2(124)
DIS NUMBER
对于SUBSTR(LINES, 1, 1)而言,考虑到双字节字符集的因素,因此最大长度是2,而对于SUBSTR(LINES, LENGTH(LINES))而言,这个最大长度虽然也应该是2,但是对于Oracle而言是未知的,因为不是标量,无法在计算之前获取,因此Oracle认为最大长度是LINES长度的2倍(考虑双字节字符集的原因),而LINES的长度可以通过上面的方法来进行计算,是62,所以R和T的长度分别变成了2和124,而再经过UNION ALL的操作,这两个值的长度都变成了124。
这时Oracle计算的LINES值达到了124 + 124 + 1 + 124 + 124 = 497。注意,可能是由于字符串已经达到了一定的长度,这里Oracle没有在这个长度的基础上增加一次递归的长度。不过即使是497,对于当前的问题也是足够,这就是第五篇文章问题中使用SUBSTR可以得到结果,而直接使用C1、C2列会报错的原因。
说了这么多似乎还没有给出解决方法,其实最稳妥的方法是利用CAST来指定列的长度,从而避免错误的产生:
SQL> WITH ROUTE_D AS
2 (
3 SELECT CITY1 R, CITY2 T, DISTANCE DIS
4 FROM ROUTES
5 UNION ALL
6 SELECT CITY2, CITY1, DISTANCE
7 FROM ROUTES
8 ),
9 ROUTE_ALL_D (C1, C2, LINES, DISTANCE) AS
10 (
11 SELECT R, T, CAST('"' || R || '"' || T || '"' AS VARCHAR2(4000)), DIS
12 FROM ROUTE_D
13 UNION ALL
14 SELECT A.C1, T, LINES || T || '"', DIS + DISTANCE
15 FROM ROUTE_D R, ROUTE_ALL_D A
16 WHERE A.C2 = R.R
17 AND INSTR(LINES, '"' || T || '"', 1, 1) = 0
18 AND DISTANCE + DIS <= NVL((SELECT DISTANCE FROM ROUTES RS WHERE (A.C1 = RS.CITY1 AND R.T = RS.CITY2) OR (A.C1 = RS.CITY2 AND R.T = RS.CITY1)), 9.9E38)
19 ),
20 RESULT AS
21 (
22 SELECT C1 R, C2 T,
23 SUM(MIN(DISTANCE) * 2 * MAX(C.MEMBERS)) OVER(PARTITION BY C1, C2) COST,
24 SUM(MIN(DISTANCE) * 2 * MAX(C.MEMBERS)) OVER(PARTITION BY C1) COST_CITY
25 FROM ROUTE_ALL_D R, CITIES C
26 WHERE R.C2 = C.CITY_NAME(+)
27 GROUP BY R.C1, R.C2
28 )
29 SELECT R, NVL(T, 'TOTAL') T, NVL(SUM(COST), 0) COST
30 FROM RESULT
31 WHERE COST_CITY = (SELECT MIN(COST_CITY) FROM RESULT)
32 GROUP BY GROUPING SETS ((R, T), R)
33 ORDER BY R, DECODE(T, 'TOTAL', CHR(0), T);
R T COST
---------- ---------- ----------
D TOTAL 68356
D A 3200
D B 3224
D C 3634
D E 7300
D F 7598
D G 3840
D H 14580
D I 4400
D J 12352
D K 8228
D L 0
已选择12行。