Monsieur Boulet nails it in a panel from this comic
— David Lee Roth
Color study. Will do a few more. Then order some temp tats.
— Peter Drucker
RT @strcpy (via @sdague): haha, SOPA - What would Bender do? http://t.co/dCjYoYJj
So I was thinking about the last post, and got on the machine that had the original (albeit broken) solution. Which was so much more elegant than the working solution, even though it missed some cases that the working version caught. Here it is, bask in the simplicity:
#!/bin/awk
function isnum(x){return(x==x+0)}
BEGIN { OFS = ";"; ORS = "" }
{
if (NR==1) {
fields=NF
for (i=1; i <= NF; i++) {
headers[i] = $i
}
}
else {
print "{"
for (i=1; i < fields; i++) {
print "\""headers[i]":\" "
if (isnum($i)) {
print $i
}
else{
print "\"" $i "\""
}
print ","
}
print "\""headers[fields]":\" \""
for (i=fields; i <=NF; i++) {
print $i
}
print "\"}\n"
}
}
I was messing around with getting data into BUGswarm from some standard Linux/UNIX commands like ps, iostat, vmstat. I started with a sed statement that made a JSON string out of every output line:
ps -ef |sed 's/\(.*\)/[\"\1\"]/' | produce.py
It was a decent start, it counted as valid JSON and I was able to post to my swarm. But the data wasn’t organized, just messy strings:
["UID PID PPID C STIME TTY TIME CMD"] ["root 1 0 0 Oct25 ? 00:00:02 /sbin/init"] ["root 2 0 0 Oct25 ? 00:00:00 [kthreadd]"] ["root 3 2 0 Oct25 ? 00:00:00 [migration/0]"]
Next I took a stab at creating a JSON object from vmstat’s output. I made an ugly ugly single awk print statement that hard coded every parameter from vmstat.
{ print "{\"r\": "$1", \"b\": "$2", \"swpd\": "$3", \"free\": "$4", \"buff\": "$5", \"cache\": "$6","\
"\"si\": "$7", \"so\": "$8", \"bi\": "$9", \"bo\": "$10", \"in\": "$11", \"cs\": "$12", \"us\": \
"$13", \"sy\": "$14", \"id\": "$15", \"wa\": "$16"}"}
Run vmstat, pipe into that monstrosity and get something kinda useful:
{"r": 3, "b": 0, "swpd": 162668, "free": 461012, "buff": 3036440, "cache": 634388,"si": 0, "so": 0, "bi": 9, "bo": 6, "in": 1, "cs": 1, "us": 4, "sy": 1, "id": 94, "wa": 0}
Cool. So I went home. All the way it stewed in my brain. I wanted to read in the column headers and use that to tag the data. As long as there was a line that had the data description, I could make the object.
stew stew stew….
Then I sat down and wrote this messy little piece of awkward goodness. The default is to look for header descriptions on the first line. Some commands like vmstat and io stat give other bits before the data columns. I put in the header var to account for that. On the command line specify header= and the line number with the column headers.
A few hints as you read through the code. NR is the number of records read so far. Think of it as line number. NF is the number of fields in a record. Let’s take a walk through the code
First up is a little function to tell us if something is a numeric value
function isnum(x){return(x==x+0)}
In the BEGIN clause I set OFS (output record separator) to nothing, this way I can piece out the record in multiple print statements. I also check if header was passed in from the command line. If not, set the default of 1:
BEGIN {
ORS = ""
if (!header) {
header=1
}
}
If the current record is less than our specified start line, throw the whole line away
NR < header {next}
This if statement could have been a separate clause for NR==header. I’m primarily a C coder (I didn’t figure out the clause check until this morning) so I went with an if to check if the line was the header line.
The assumption is that the number of headers is the number of fields we should expect. But the output of ps will come in with a higher field count on lines that have complete command lines for running processes (ps I’m looking at you!). Later we use this to concatenate remaining records in the last entry
{
if (NR==header) {
fields=NF
for (i=1; i <= NF; i++) {
headers[i] = $i
}
}
Now to the guts. Read in a line, print the header as the object field description, print the value (as a string if necessary).
else {
print "{"
for (i=1; i < fields; i++) {
print "\""headers[i]"\": "
if (isnum($i)) {
print $i
}
else{
print "\"" $i "\""
}
print ", "
}
print "\""headers[fields]"\": "
The first draft of code is always so pretty. Then you find corner cases that the short elegant solution doesn’t cover. Feh. The logic goes like this. If the number of fields in this record matches what was expected, print it out and be done. Printing of course checks for numeric or string and accomodates
if (NF==fields) {
if (isnum($i)) {
print $i
}
else {
print "\"" $i "\""
}
}
But if we have more fields than expected, concatenate them into the last field (ps, this is all about you). Since a number wouldn’t be in multiple parts, assume stringage. Finish up the record and have a nice day
else {
print "\""
for (i=fields; i <=NF; i++) {
print $i " "
}
print "\""
}
print "}\n"
}
Oh right. Since this is piping into another program that will send it into the cloud, flush out this line and let the magic happen
fflush() }
Well, that’s it. Next up I’ll be specifying another variable on the command line to name the object. This way the application that is digesting the data that is passed through the swarm will know what it’s getting.
Oh heck, you want to see how it works? I run it like this:
vmstat 2 |awk -f ../awkward header=2
and get output like this (actually I don’t get output, I pipe it into produce.py and it goes into the specified swarm)
{"r": 1, "b": 0, "swpd": 162736, "free": 533052, "buff": 2989260, "cache": 641272, "si": 0, "so": 0, "bi": 0, "bo": 10, "in": 966, "cs": 3122, "us": 10, "sy": 3, "id": 87, "wa": 0}
{"r": 1, "b": 0, "swpd": 162736, "free": 533044, "buff": 2989260, "cache": 641280, "si": 0, "so": 0, "bi": 0, "bo": 0, "in": 1014, "cs": 3205, "us": 9, "sy": 4, "id": 87, "wa": 0}
{"r": 1, "b": 0, "swpd": 162736, "free": 533060, "buff": 2989260, "cache": 641276, "si": 0, "so": 0, "bi": 0, "bo": 0, "in": 1002, "cs": 3033, "us": 9, "sy": 3, "id": 88, "wa": 0}
{"r": 1, "b": 0, "swpd": 162736, "free": 533044, "buff": 2989264, "cache": 641280, "si": 0, "so": 0, "bi": 0, "bo": 10, "in": 909, "cs": 2924, "us": 10, "sy": 3, "id": 86, "wa": 0}
Q.E.D
RT @JillEBond: #GOP has introduced these bills: 44 on abortion, 99 on religion, 71 family relationships, 36 on marriage, 522 on taxation …
Grumpy pancake says “Do not eat!” http://t.co/RsI7wJut
RT @doctorow: Free Bieber: campaign to kill proposed law that would send you to prison for 5 years for singing copyrighted music http:// …
RT @BUGswarm: Introducing BUGswarm: A new way to acquire data from and control embedded devices using JavaScript or plain old HTTP. htt …
Elton John: There’s a Global War Against the Right of Gay People to Live and Love. We Need to Fight Back http://t.co/qnfQASaq