Compare Lines Within File
ex data
aaa20040110
aaa20040110
aaa20040110
aaa20040111
aaa20040111
File may contain data for 2 dates or single day. I
need to find if it has 2 day data or one day data
---------------------------------------
#!/bin/ksh
exec 3< ./junk.data
while read -u3 line
do
if [[ $(print $line | wc -w) -eq 2 ]]
then
print $line
fi
---------------------------------------
Solution prints each line containing 2 words.
To determine how many unique dates your file contains
(keying on word1 in each line), you can isolate that word (with awk or
cut, for example), and then do a unique sort which will eliminate duplicates.
And this is korn shell syntax, but can be converted for other shells.
k=$(awk '{print $1}' myfile.txt | sort -u | wc -l)
echo "number of unique dates:" $k
The following script will summarize your file on word1:
awk '\
????{date[$1]++}
END {for (i in date)
?????????print i, date[i]}
' myfile.txt
---------------------------------------
Thanks a lot. Next hurdle is I want to know if there
are more than 2 dates then how many records for each date.
in above example.
20040110 - 3
20040111 - 2
Once again thanks a lot.
---------------------------------------
The first script gives you a count ($k) of how many unique
dates you have.
The second script provides the summary you are asking
for.
---------------------------------------
It's working great. it's dumb q. but how can I get summary
data into unix shell variable ( or array) and manipulate it.
---------------------------------------
and also skip first and last line of file.
---------------------------------------
It is very easy to waste the first line. ?awk could do
an initial getline to waste it, or tail +1 would do it. ?But the last line
is much more of a pain. ?awk could do that too, but only awkwardly (sorry
about that) by holding each line and processing it on a delayed basis.
?I took the easy way below and just let sed strip those.
There are several ways to get output from awk (or from
any command) back to the shell. ?But when the output is any number of lines,
you need a construct that can process multiple lines. ?Below, the awk output
is piped into a while-loop, where each iteration will process one line.
By the way, on some linux, any environment variables established
within that while-loop will go away after the while-loop.
sed '1d;$d' myfile.txt |
awk '\
{date[$1]++}
END {for (i in date)
????????print i, date[i]}' |
while read date k
do
echo "date=$date k=$k"
done
---------------------------------------
I used
set -A record_count `awk 'BEGIN{FS = "|"}{ if(NF!=1)
date[$1]++}END {for (i in date) print date[i]}' $fposted`
and then array record_count will have data as i require.
also if(NF!=1) got rid of head and tail as they will not have dilimiter(which
i did not mention in my post, sorry)
Thanks for all your help
Quick Links:
Have a Unix Problem?
Do
you have a UNIX Question?
Unix Home: Unix System
Administration Hints and Tips
(c) www.gotothings.com All material on this site is Copyright.
Every effort is made to ensure the content integrity.
Information used on this site is at your own risk.
All product names are trademarks of their respective
companies. The site www.gotothings.com is in no way affiliated with
SAP AG.
Any unauthorised copying or mirroring is prohibited.
|