Print word containing string and first word
Solution 1:
Bash/grep version:
#!/bin/bash
# string-and-first-word.sh
# Finds a string and the first word of the line that contains that string.
text_file="$1"
shift
for string; do
# Find string in file. Process output one line at a time.
grep "$string" "$text_file" |
while read -r line
do
# Get the first word of the line.
first_word="${line%% *}"
# Remove special characters from the first word.
first_word="${first_word//[^[:alnum:]]/}"
# If the first word is the same as the string, don't print it twice.
if [[ "$string" != "$first_word" ]]; then
echo -ne "$first_word\t"
fi
echo "$string"
done
done
Call it like so:
./string-and-first-word.sh /path/to/file text thing try Better
Output:
This text
Another thing
It try
Better
Solution 2:
Perl to the rescue!
#!/usr/bin/perl
use warnings;
use strict;
my $file = shift;
my $regex = join '|', map quotemeta, @ARGV;
$regex = qr/\b($regex)\b/;
open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
if (my ($match) = /$regex/) {
print my ($first) = /^\S+/g;
if ($match ne $first) {
print "\t$match";
}
print "\n";
}
}
Save as first-plus-word
, run as
perl first-plus-word file.txt text thing try Better
It creates a regex from the input words. Each line is then matched against the regex, and if there's a match, the first word is printed, and if it's different to the word, the word is printed, too.
Solution 3:
Here's an awk version:
awk '
NR==FNR {a[$0]++; next;}
{
gsub(/"/,"",$0);
for (i=1; i<=NF; i++)
if ($i in a) printf "%s\n", i==1? $i : $1"\t"$i;
}
' file2 file1
where file2
is the word list and file1
contains the phrases.