Update Column from a Reference file

Update Column from a Reference file

I have one reference file: f1
Staff# Staff name
------------------
34567 John smith
12344 Mark Wood
45432 Sara James

and a raw data file: f2
Staff# Tel# Job
----------------------
45432 9665875 Engineer
34567 6587542 Manager
12345 5878952 CEO
58785 8547585 trainee

I would like to produce a file f3:

Staff name Staff# Tel# Job
----------
Sara James 45432 9665875 Engineer
Mark Wood 34567 6587542 Manager
John smith 12345 5878952 CEO
58785 8547585 train

What is the best way to do that in unix?

----------------------------
awk can store the f1 file into an array at startup, then pull from the array as it processes f2.
I suspect that you want a consistent number of words (columns) in your output file. ?In support of this, f1 entries with missing names will default to NOFIRST and
NOLAST, and additional words in an f1 line are ignored. f2 entries that are not in f1 will default to NO F1.

awk 'BEGIN \
{while ((getline <?"f1") > 0)
???if (NF>0)
??????{if (NF==1)
??????????fullname="NOFIRST NOLAST"
???????else
???????if (NF==2)
??????????fullname=$2 " NOLAST"
???????else
???????????fullname=$2 " " $3
???????namearray[$1] = fullname}
}
{if ($1 in namearray)
????fullname=namearray[$1]
?else
????fullname="NO F1"
?print fullname, $0
}' f2 > f3

----------------------------
You could also look at the join command.
join -1 1 -2 1 f1 f2
Staff# Staff name Tel# Job
45432 Sara James 9665875 Engineer

join -a -1 1 -2 1 f1 f2
Staff# Staff name Tel# Job
----------------------
34567 John smith
12344 Mark Wood
45432 Sara James 9665875 Engineer
34567 6587542 Manager
12345 5878952 CEO
58785 8547585 trainee

----------------------------
Good tip.
The join command will require f1 and f2 to be in collating sequence on their respective fields to be joined.

----------------------------
Thank you. It worked under bash (cygwin). But I tried to apply it on another example and it didn't seem working:
f1 file:

cell_name cell_id
-------------
ALW102B 426-01-101-1022 BSC4 4
ALW102A 426-01-101-1021 BSC4 4
BAN107A 426-01-101-1071 BSC7 4
BAN107B 426-01-101-8923 BSC7 4
BHS104A 426-01-101-0272 BSC7 4
BHS104B 426-01-101-8272 BSC7 4
ALB109A 426-01-101-1091 BSC7 4
SRF180A 426-01-101-0173 BSC4 4

f2 file:

a b loc cell_id
============================================
0440553621| 0733655565| GMS1
0440553621| 0736815743| GMS1
0440553621| 0736847914| GMS2
0440655665| 0733553621| MSC1| 426-01-101-0401
0440553621| 0734553810| GMS1|
0440655665| 0736553621| MSC1| 426-01-101-8401
0440655665| 0739553621| MSC1| 426-01-101-8923
0440553621| 008888| MSC3| 426-01-103-8272
0440553621| 008888| MSC3| 426-01-103-0272

I need to update cell_id with cell_name from f1 file, something like:

f3 file:
a b loc cell_id cell_name
============================================
0440553621| 0733655565| GMS1|
0440553621| 0736815743| GMS1|
0440553621| 0736847914| GMS2|
0440655665| 0733553621| MSC1| 426-01-101-0401| *No_f1_record*
0440553621| 008888| MSC3| 426-01-103-8272| BHS104B

-----------------------
awk 'BEGIN \
{FS=" "}
{while ((getline < "f1") > 0)
if ($1>0) # skip header rows
fullname=$1 " " $2
else
fullname=""
namearray[$1] = fullname
}
{FS="|"}
{
if (NF==4) # check cell_id rows
if ($4 in namearray)
fullname=namearray[$1]
else
fullname="*No_f1_record*"

print $0,fullname
}' f2 > f3
-----------------

but i got result:

a b loc cell_id *No_f1_record*
============================================ *No_f1_record*
0440553621| 0733655565| GMS1 *No_f1_record*
0440553621| 0736815743| GMS1 *No_f1_record*
0440553621| 0736847914| GMS2 *No_f1_record*
0440655665| 0733553621| MSC1| 426-01-101-0401 *No_f1_record*
0440553621| 0734553810| GMS1| *No_f1_record*
0440655665| 0736553621| MSC1| 426-01-101-8401 *No_f1_record*
0440655665| 0739553621| MSC1| 426-01-101-8923 *No_f1_record*
0440553621| 008888| MSC3| 426-01-103-8272 *No_f1_record*
0440553621| 008888| MSC3| 426-01-103-0272 *No_f1_record*
-----------------

Your valuable assistance is appreciated.

----------------------------
A BEGIN statement is a single (compound) statement. As coded, your BEGIN statement was just an FS assignment and you need it to include the while statement also.
I do not know what your actual f1 header statements look like, but I coded for the ones posted. ?We could also bypass the first n lines of f1, but if you sometimes do not have f1 hdrs, then you are bypassing data lines. If bypassing f1 hdrs based on content (like I coded), hopefully those strings will never appear in data lines.

When storing f1, note that we are storing field1 into an array slot indexed by field2. ?And when processing an f2 line, we are checking if field4 exists as an index in the array. If it does, we pull the contents of the array slot as indexed by that same field4.

You show f2 field separation as bar-and-space. ?If we process f2 using bar as the FS, those spaces become part of the fields. To keep this from messing up our match-ups etc, I use a gsub to blow all spaces away.

I did not reinsert the spaces because field separation is cleaner with bar character only. ?But if you need bar-and-space, changing OFS (Output Field Separator) to "| " will not work unless you print individual field lists (print $1,$2,$3,$4,loc). ?You could use:
???gsub("|","| ")
to put a space after each bar character.

On processing f2, I opted to bypass f2 hdr records by assuming that they will always be there as records 1 and 2. ?This is a good approach only if you can be sure of the number of header records. Might be better to bypass or process by matching on line content like I did for f1.

The 5th line of f2 terminates with a field separator, which means this line has 4 fields, the fourth field being null.

awk 'BEGIN \
{while ((getline <?"f1") > 0)
????if (!match($1,"cell|----")) # skip f1 hdrs
???????locarray[$2]=$1
# ??else
# ?????print "Bypassing f1 header: " $0
?FS="|"
?OFS="|"
}

{if (NR?)
???{print # output f2 hdrs
????next}
?gsub(" ","")
?if (NF!=4)
????print
?else
???{if ($4 in locarray)
???????loc=locarray[$4]
????else
???????loc="*No_f1_record*"
????print $0,loc}
}' f2 > f3

----------------------------
One of my lines above did not post properly. In the line below:
{if (NR?)

within the parentheses should be 4 characters, namely:

N R leftanglebracket 3

I will try to post it properly:

{if (NR?)

----------------------------
Nope, could not get it to post.

Have a Unix Problem
Do you have a UNIX Question?

Unix Books :-
UNIX Programming, Certification, System Administration, Performance Tuning Reference Books

Return to : - Unix System Administration Hints and Tips

(c) www.gotothings.com All material on this site is Copyright.
Every effort is made to ensure the content integrity. Information used on this site is at your own risk.
All product names are trademarks of their respective companies.
The site www.gotothings.com is in no way affiliated with or endorsed by any company listed at this site.
Any unauthorised copying or mirroring is prohibited.