Powerful awk "array"
时间:2009-07-08 来源:benben3956
In awk, array can be very dynamic. It looks more like a hash table. The index of array can be a number of a string. We call such an array "Associative array". In fact, the number is converted to a string when it is handled.
Let's look at an example. It counts the lines' number which contains "widget"
/widget/ {count[widget]++}
END {print count["widget"]}
We can use a special loop to read all members of the array
for(item in array)
process array[item]
Or we can test whether an item exists in an array
if(item in array)
Look at another example where the array shows its powerful functions.
The parsing file reads like:
09:55:54: ERROR1 /tmp/error/log.3 50 times
09:56:09: ERROR1 /tmp/error/log.14 50 times
10:56:12: ERROR1 /tmp/error/log.14 100 times
10:56:23: ERROR2 /tmp/error/log.5 50 times
11:56:26: ERROR2 /tmp/error/log.1 50 times
11:56:27: ERROR2 /tmp/error/log.5 100 times
15:56:29: ERROR3 /tmp/error/log.1 100 times
15:56:32: ERROR3 /tmp/error/log.1 150 times
16:56:33: ERROR4 /tmp/error/log.6 50 times
16:56:36: ERROR4 /tmp/error/log.6 100 times
16:56:40: ERROR4 /tmp/error/log.12 50 times
And we want to collect how many errors take place each hour. If we don't use array, the code will read like the following.
awk -F'[: ]+' 'BEGIN {timeframe="";count=0}
{
if($1 != timeframe) {
if(timeframe != "") {
print count " errors take place at " timeframe "
}
timeframe = $1
count = 1
}
else
count++
}
END {print count " errors take place at " timeframe}
While it can be more simpler, if array is used instead
awk -F'[: ]+' '{count[$1]++}
END {for(i in count) print count[i] " errors take place at " i}'
But from the code performance, the first one should be more quick and consumes less memory.
Let's look at an example. It counts the lines' number which contains "widget"
/widget/ {count[widget]++}
END {print count["widget"]}
We can use a special loop to read all members of the array
for(item in array)
process array[item]
Or we can test whether an item exists in an array
if(item in array)
Look at another example where the array shows its powerful functions.
The parsing file reads like:
09:55:54: ERROR1 /tmp/error/log.3 50 times
09:56:09: ERROR1 /tmp/error/log.14 50 times
10:56:12: ERROR1 /tmp/error/log.14 100 times
10:56:23: ERROR2 /tmp/error/log.5 50 times
11:56:26: ERROR2 /tmp/error/log.1 50 times
11:56:27: ERROR2 /tmp/error/log.5 100 times
15:56:29: ERROR3 /tmp/error/log.1 100 times
15:56:32: ERROR3 /tmp/error/log.1 150 times
16:56:33: ERROR4 /tmp/error/log.6 50 times
16:56:36: ERROR4 /tmp/error/log.6 100 times
16:56:40: ERROR4 /tmp/error/log.12 50 times
And we want to collect how many errors take place each hour. If we don't use array, the code will read like the following.
awk -F'[: ]+' 'BEGIN {timeframe="";count=0}
{
if($1 != timeframe) {
if(timeframe != "") {
print count " errors take place at " timeframe "
}
timeframe = $1
count = 1
}
else
count++
}
END {print count " errors take place at " timeframe}
While it can be more simpler, if array is used instead
awk -F'[: ]+' '{count[$1]++}
END {for(i in count) print count[i] " errors take place at " i}'
But from the code performance, the first one should be more quick and consumes less memory.
相关阅读 更多 +