Text Reformatting
I have a text file that looks like this:
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,other , 0.68,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,other , 0.69
I need it to look like this (first line is a header
line):
Date,oracle,nis,other,network,memory_management,other_user_root,
03/04/04,0.55,0.43,0.61,0.11,0.10,3.76,
03/04/04,0.00,0.68,0.11,0.07,3.14,0.69,
In English... basically every application polled needs
to be in columnar format. Applications can be added or removed during the
month so it should place a zero value if there is no value for that pariticular
sample period.
I can use shell or perl, but shell is preferred. I
wrote a script to do it, but it is verrrry slooow, and I didn't know how
to handle if an application wasn't consistant throughout the sample period.
Do you guys know of an easy way to do this?
---------->
Your problem would be more simple to resolve if there
was a ligne delimiting periods in your file.
For example :
03/04/04,NEW_PERIOD
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,other , 0.68,
03/04/04,NEW_PERIOD
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,other , 0.69
---------->
Well I could probably accomplish that by extracting an
additional time field. For example. The date probably has a sample time
that would be uniq to each sample period. I would just omit that field
on the output. I'm checking to see if that is an option.
---------->
#kludge, but fun
Sfile=/tmp/$$
sort -o $Sfile $1
for d in $(awk -F, '{ print $1 }' $Sfile|sort -u)
do
f2=$(echo $(grep $d $Sfile|awk -F, '{ print $2 }' |sort
-u))
echo "Date $(echo $f2|sed 's/ /,/g'),"
v1=""
v2=""
for f in $f2
do
vg1=$(grep -c "$d,.*$f ," $Sfile)
if [ $vg1 = 1 ]
then
v1="$v1, 0"
else
v1="$v1, $(echo $(grep "$d,.*$f ," $Sfile|head -1|awk
-F, '{ print $3 }'))"
fi
v2="$v2, $(echo $(grep "$d,.*$f ," $Sfile|tail -1|awk
-F, '{ print $3 }'))"
done
echo "$d $v1,"
echo "$d $v2,"
done
rm $Sfile
---------->
The following awk script assume that a delimiting line
exists for each
period (this line start a new period and is identified
by the NEW_PERIOD application)
The input datas :
03/04/04,NEW_PERIOD
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,NEW_PERIOD
03/04/04,other , 0.68,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,NEW_PERIOD
03/04/04,other , 0.69
The awk script :
#!/usr/bin/awk -f
#
# Initialize
#
BEGIN {
???FS = ",";
???header = "Date";
}
#
# New period,
# Memorize in periods[]
#
$2=="NEW_PERIOD" {
???periods[++period] = $1;
???next;
}
#
# Application,
# Memorize appli in applis[] and value in values[]
#
{
???# Get appli (without leading spaces) and value
???app = $2;
???val = $3;
???sub(/[[:space:]]*$/,"",app);
???# Get appli id, if first time affect id and memorize
???if (app in applis)
??????app_id = applis[app];
???else {
??????app_id = ++appli;
??????applis[app] = app_id;
??????header = header "," app;
???}
???# Memorize value for application in period
???values[period, app_id ] += val;
}
#
# End of datas, print result
#
END {
???print header;
???# For each period, display the value of each application?
???for (periode_id=1; periode_id<=period; periode_id++)
{
??????line = periods[periode_id];
??????for (app_id=1; app_id<=appli; app_id++) {
??????????id = periode_id SUBSEP app_id;
??????????if (id in values)
?????????????line = line "," values[id];
??????????else
?????????????line = line ",0.00";
??????}
??????print line;
???}
}
The result :
Date,oracle,nis,other,network,memory_management,other_user_root
03/04/04,0.55,0.43,0.61,0.11,0.1,3.76
03/04/04,0.00,0.00,0.68,0.11,0.07,3.14
03/04/04,0.00,0.00,0.69,0.00,0.00,0.00
---------->
I thought G-M wanted only the 1st and last.
For file:
03/04/04,oracle , 0.55,
03/04/04,nis , 0.43
03/04/04,other , 0.61,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.10,
03/04/04,other_user_root , 3.76,
03/04/04,other , 0.68,
03/04/04,network , 0.11,
03/04/04,memory_management , 0.07,
03/04/04,other_user_root , 3.14,
03/04/04,other , 0.69
03/05/04,oracle , 0.55,
03/05/04,nis , 0.43
03/05/04,other , 0.81,
03/05/04,ZOrt , 0.81,
03/05/04,network , 0.11,
03/05/04,other_user_root , 3.76,
03/05/04,other , 0.88,
03/05/04,network , 0.11,
03/05/04,other_user_root , 3.14,
03/05/04,other , 0.89
my script gives:
Date memory_management,network,nis,oracle,other,other_user_root,
03/04/04 , 0.07, 0.11, 0, 0, 0.61, 3.14,
03/04/04 , 0.10, 0.11, 0.43, 0.55, 0.69, 3.76,
Date network,nis,oracle,other,other_user_root,ZOrt,
03/05/04 , 0.11, 0, 0, 0.81, 3.14, 0,
03/05/04 , 0.11, 0.43, 0.55, 0.89, 3.76, 0.81,
---------->
First of all... thanks everyone for your kind suggestions.
To help determine the iteration "NEW_PERIOD". I was able
to change the data format so that it could more easily be ascertained.
Here's a sample:
03/15/04,1079308800,oracle , 0.21,
03/15/04,1079308800,other , 0.64,
03/15/04,1079308800,network , 0.10,
03/15/04,1079308800,memory_management , 0.05,
03/15/04,1079308800,other_user_root , 1.51,
03/15/04,1079312400,other , 0.63,
03/15/04,1079312400,network , 0.11,
03/15/04,1079312400,memory_management , 0.05,
03/15/04,1079312400,other_user_root , 1.51,
The second field is seconds since 1970 (longtime).
I appreciate the sample scripts. I will work through them
to learn a little more about sorting using these methods.
See Also
Unix
Administrator Career Path
Have a Unix Problem
Do
you have a UNIX Question?
Unix Books :-
UNIX Programming,
Certification, System Administration, Performance Tuning Reference Books
Return to : - Unix
System Administration Hints and Tips
(c) www.gotothings.com All material on this site is Copyright.
Every effort is made to ensure the content integrity.
Information used on this site is at your own risk.
All product names are trademarks of their respective
companies.
The site www.gotothings.com is in no way affiliated with
or endorsed by any company listed at this site.
Any unauthorised copying or mirroring is prohibited.
|