Format a unix file on Teradata
I have a requirement to format a file and am noticing a bug and inefficency. Can somebody give some suggestions? Input Data ( i mean columns header and rows)
current_licl_nbr| policy_id | plociyhold_id|mail_allowed_id|email_address_txt
output data should look like 701000002990|200000000175|200000000175|2|XYZ@ABCD.COM The Current commands we are using has a bug and is
generating the output like the below and is very ineffiecent.
Current output generated
The code we have in the script is csplit -ks -f ${DWH_OUT}/other/a1prefix ${DWH_OUT}/other/a1_xxxx.tmp
3
The amount of data that is being formatted would be
around 6,000,000.
Have you thought about using Perl?
Here's 2 variations of 1 Perl solution (with Perl "there is always more than 1 way to to anything"). These will not make the changes I noted above, but could easily be added. from command line: perl -pe "s/\.\|\s*/|/g" input.txt > output.txt or perl -pi -e "s/\.\|\s*/|/g" input.txt The second one does an inline edit of the original file. I ran a benchmark test on a 6,000,000 line file and it took between 120 to 130 seconds to complete on a slow Windows PII 550 machine. ---------------------------------- For your reference, here are the complete scripts that
I used to test/benchmark.script to create the source file:
open OUT, ">braveking.txt" or die $!; for (1..6000000) {
benchmark script:
use Time::HiRes 'time'; for $i (1..50) {
Have a Unix Problem
Unix Books :-
Return to : - Unix System Administration Hints and Tips (c) www.gotothings.com All material on this site is Copyright.
|