Powerful awk "array"

时间：2009-07-08 来源：benben3956

In awk, array can be very dynamic. It looks more like a hash table. The index of array can be a number of a string. We call such an array "Associative array". In fact, the number is converted to a string when it is handled.

Let's look at an example. It counts the lines' number which contains "widget"

/widget/ {count[widget]++}
END {print count["widget"]}

We can use a special loop to read all members of the array

for(item in array)
    process array[item]

Or we can test whether an item exists in an array

if(item in array)

Look at another example where the array shows its powerful functions.

The parsing file reads like:
09:55:54: ERROR1 /tmp/error/log.3 50 times
09:56:09: ERROR1 /tmp/error/log.14 50 times
10:56:12: ERROR1 /tmp/error/log.14 100 times
10:56:23: ERROR2 /tmp/error/log.5 50 times
11:56:26: ERROR2 /tmp/error/log.1 50 times
11:56:27: ERROR2 /tmp/error/log.5 100 times
15:56:29: ERROR3 /tmp/error/log.1 100 times
15:56:32: ERROR3 /tmp/error/log.1 150 times
16:56:33: ERROR4 /tmp/error/log.6 50 times
16:56:36: ERROR4 /tmp/error/log.6 100 times
16:56:40: ERROR4 /tmp/error/log.12 50 times

And we want to collect how many errors take place each hour. If we don't use array, the code will read like the following.

awk -F'[: ]+' 'BEGIN {timeframe="";count=0}
{
   if($1 != timeframe) {
      if(timeframe != "") {
          print count " errors take place at " timeframe "
      }
      timeframe = $1
      count = 1
   }
   else
      count++
}
END {print count " errors take place at " timeframe}

While it can be more simpler, if array is used instead
awk -F'[: ]+' '{count[$1]++}
END {for(i in count) print count[i] " errors take place at " i}'

But from the code performance, the first one should be more quick and consumes less memory.

相关阅读更多 +