grep expression

Jan 18, 2015 at 12:57am
I am having trouble trying to find an expression that will match a word that starts with "b" ends with "t" and contains "o". The following expression grabs something that begins with "b" and ends with "t" but doesn't account for "o" ^[Bb]..t$ I understand that having brackets next to each other[][] will match the next letter in a word. I just can't figure out how I would look for say a letter five characters down. Any help would nice.
Last edited on Jan 18, 2015 at 12:59am
Jan 18, 2015 at 2:31am
I figured it out. I piped it.

grep "^b" file.txt | grep "t$" | grep "o"
Jan 18, 2015 at 2:47am
A regex will work too:
grep "^[Bb].*o.*t$" file.txt

Read the first few paragraphs of the man page: http://man7.org/linux/man-pages/man7/regex.7.html
Jan 19, 2015 at 4:28pm
Why is it when I use this regex with grep in a file,
grep  ^$1.*$2.*$3$ $4

it will only work if there is one word per line.
How it entered into comand line - Unix> ./find_words.sh M a y userD.txt

Also when using a variable how do I do ^[Bb]. Would this work ^[*$1]
Last edited on Jan 19, 2015 at 4:31pm
Jan 19, 2015 at 7:38pm
^ - beginning of line
$ - end of line

So your regex will only find lines that begin with (using your original example) B or b, end with t, contain an o and contain no spaces (.*).

Try using this Perl regex (-P):
grep -P \\b\(?i\)$1.*$2.*$3\\b $4

(?i) - turn on case insentive
\b - word boundary

The backslashes and parentheses have special meaning in bash so they need to be escaped for literal interpretation
Jan 19, 2015 at 8:01pm
I tried word boundary earlier and can not get it to work. I am on vm using lubuntu. Would that matter? Here is what happens -

Enter this in command line - ./find_words.sh F e d test.txt

Script -
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  //find_words.sh
  1 #!/bin/bash
  2 
  3 #test for 2 command line arguments
  4 if [ $# -ne 4 ]; then
  5 
  6 #print out error message and exit
  7         echo usage: Need 4 arguments
  8         exit 1
  9 fi
 10 #       grep -P \\b\(?i\)$1.*$2.*$3\\b $4  test.txt | while read -r line ; do
 11 #       echo $line
 12 #done
 13 #       grep  ^$1.*$2.*$3$ $4
 14 
 15         grep -P \\b\(?i\)"$1".*"$2".*"$3"\\b "$4" 

test.txt -
1 Fred that Foed
2 Fred this
3 Mary

output -
Fred that Foed
Fred this

I just don't understand why word boundary isn't working.

Jan 19, 2015 at 8:22pm
retroCheck wrote:
I am on vm using lubuntu. Would that matter?
Don't think so. I'm running Debian Sid.

Are you expecting only the words that match to print out? If so, grep won't work.
From the grep man page:
grep, egrep, fgrep, rgrep - print lines matching a pattern


Jan 19, 2015 at 8:36pm
The man page says that -o should allow me to but still doesnt work.

-o
--only-matching
Show only the part of a matching line that matches PATTERN.
Jan 19, 2015 at 9:43pm
I couldn't get -o to work either!

Long way from C++, but how about a simple Perl script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/usr/bin/perl

if ($#ARGV != 3 ) {
    die "Requires 4 arguments";
}

my $begin = $ARGV[0];
my $letter = $ARGV[1];
my $end = $ARGV[2];

open (INFILE,  $ARGV[3])
    or die "Couldn't read from $ARGV[3]";

while (my $line = <INFILE>) {
    my @matches=($line=~ /(\b(?i)$begin[a-z]*$letter[a-z]*$end\b)/gim); 
    for my $word (@matches) {
        print  $word, "\n";
    }
}

close INFILE;
Jan 20, 2015 at 12:37am
> The backslashes and parentheses have special meaning in bash
> so they need to be escaped for literal interpretation
you may also quote them


> I couldn't get -o to work either!
The period . matches any single character. it matches spaces too.

in your perl script you change it for [a-z]


> Long way from C++
oh yeah, this is a C++ forum
Jan 20, 2015 at 12:39pm
# echo "foo bar gaz bur bum bear" | grep -o "\bb[^ ]*a[^ ]*r\b"
bar
bear
# rpm -q grep
grep-2.6.3-6.el6.x86_64

(The newline between matched words is semi-surprising.)

The "\bb[^ ]*a[^ ]*r\b" contains:
\b word boundary
b literal 'b'
[^ ]* 0 or more non-space
a literal 'a'
[^ ]* 0 or more non-space
r literal 'r'
\b word boundary

Apparently the el6 grep takes the \b without the perl-flag.
Positional parameter substitution and escape characters with BASH script ... you already had the hang of them.
Jan 20, 2015 at 1:25pm
Got it to work, seems much of the issue was with quotes.

grep -o '\b'$1'\w*'$2'\w*'$3'\b' $4

Of course keskiverto example works too -
grep -o "\b"$1"[^ ]*"$2"[^ ]*"$3"\b" $4

Thanks!!! I was struggling with for awhile.
Last edited on Jan 20, 2015 at 1:38pm
Jan 21, 2015 at 12:14am
¿why do you leave your variables outside the quotes?
grep -o "\b$1[^ ]*$2[^ ]*$3\b" "$4"
Jan 21, 2015 at 9:43am
He was using both single and double quotes. One prevents variable substitution, the other does not. It can be tricky to notice all the possibilities.
Topic archived. No new replies allowed.