Trim last 3 characters of a line WITHOUT using sed, or perl, etc

I've got a shell script outputting data like this:

1234567890  *
1234567891  *

I need to remove JUST the last three characters " *". I know I can do it via

(whatever) | sed 's/\(.*\).../\1/'

But I DON'T want to use sed for speed purposes. It will always be the same last 3 characters.

Any quick way of cleaning up the output?

Solution 1:

Here's an old-fashioned unix trick for removing the last 3 characters from a line that makes no use of sed OR awk...

> echo 987654321 | rev | cut -c 4- | rev

987654

Unlike the earlier example using 'cut', this does not require knowledge of the line length.

Solution 2:

I can guarantee you that bash alone won't be any faster than sed for this task. Starting up external processes in bash is a generally bad idea but only if you do it a lot.

So, if you're starting a sed process for each line of your input, I'd be concerned. But you're not. You only need to start one sed which will do all the work for you.

You may however find that the following sed will be a bit faster than your version:

(whatever) | sed 's/...$//'

All this does is remove the last three characters on each line, rather than substituting the whole line with a shorter version of itself. Now maybe more modern RE engines can optimise your command but why take the risk.

To be honest, about the only way I can think of that would be faster would be to hand-craft your own C-based filter program. And the only reason that may be faster than sed is because you can take advantage of the extra knowledge you have on your processing needs (sed has to allow for generalised procession so may be slower because of that).

Don't forget the optimisation mantra: "Measure, don't guess!"

If you really want to do this one line at a time in bash (and I still maintain that it's a bad idea), you can use:

pax> line=123456789abc
pax> line2=${line%%???}
pax> echo ${line2}
123456789
pax> _

You may also want to investigate whether you actually need a speed improvement. If you process the lines as one big chunk, you'll see that sed is plenty fast. Type in the following:

#!/usr/bin/bash

echo This is a pretty chunky line with three bad characters at the end.XXX >qq1
for i in 4 16 64 256 1024 4096 16384 65536 ; do
    cat qq1 qq1 >qq2
    cat qq2 qq2 >qq1
done

head -20000l qq1 >qq2
wc -l qq2

date
time sed 's/...$//' qq2 >qq1
date
head -3l qq1

and run it. Here's the output on my (not very fast at all) R40 laptop:

pax> ./chk.sh
20000 qq2
Sat Jul 24 13:09:15 WAST 2010

real    0m0.851s
user    0m0.781s
sys     0m0.050s
Sat Jul 24 13:09:16 WAST 2010
This is a pretty chunky line with three bad characters at the end.
This is a pretty chunky line with three bad characters at the end.
This is a pretty chunky line with three bad characters at the end.

That's 20,000 lines in under a second, pretty good for something that's only done every hour.

Trim last 3 characters of a line WITHOUT using sed, or perl, etc

Solution 1:

Solution 2:

Related

Recent Posts