[CS-FSLUG] regexp help

Tim Young Tim.Young at LightSys.org
Thu Nov 11 16:10:05 CST 2004


Ahh...  I think Frank is a Perl or PHP guy... :)

Sed is a little "old-school" here and does not use the nifty regex nicities
that Perl has built in.  The reason I added the ([a-Z0-9][a-Z0-9]*) with
[a-Z0-9] twice is because the * after the second could match zero of them.  If
there were just one of them ([a-Z0-9]*) it could match the following patterns:
abcdef-
-abcdef
-
and change them to
abcdef--
--abcdef
--
If he was looking to match ONLY:
wordA-wordB

You need at least one character before the - and one after.  So thus the
([a-Z0-9][a-Z0-9]*), with the first [a-Z0-9] matching at least one character,
and the [a-Z0-9]* matching zero or an infinite number of ones following.  It is
a peculiarity of sed.

Yes, Perl is much simpler.  I am not a perl guru and would have probably used
the +, *, 0, (and there are one or two others) wrong.  I chose sed, even though
odd, because I could get it to work.  Perl would have been much more elegant
(and probably faster if you needed to do a lot of them).

    - Tim

Frank Bax wrote:

> At 03:55 PM 11/11/04, Tim Young wrote:
> >cat [myfile] | sed 's/([a-Z0-9][a-Z0-9]*)-([a-Z0-9][a-Z0-9]*)/\1--\2/g'
>
> Isn't [a-Z0-9][a-Z0-9]* equivalent to [a-Z0-9]+
>
> In Ed's original email you mentioned "or practically any ASCII char".  The
> above will work as long as the single character immediately preceeding
> *and* the single character immediately following the hypen are
> alphanumeric.  The regexp doesn't look at the rest of the word.
>
> As soon as I typed that explanation, I realised Tim's regexp could use
> simply [a-Z0-9] instead of [a-Z0-9][a-Z0-9]*  (since the * can match zero
> characters, it isn't really doing anything in this case).
>
> Frank
>
> _______________________________________________
> ChristianSource FSLUG mailing list
> Christiansource at ofb.biz
> http://cs.uninetsolutions.com





More information about the Christiansource mailing list