[CS-FSLUG] regexp help
Tim Young
Tim.Young at LightSys.org
Thu Nov 11 16:10:05 CST 2004
Ahh... I think Frank is a Perl or PHP guy... :)
Sed is a little "old-school" here and does not use the nifty regex nicities
that Perl has built in. The reason I added the ([a-Z0-9][a-Z0-9]*) with
[a-Z0-9] twice is because the * after the second could match zero of them. If
there were just one of them ([a-Z0-9]*) it could match the following patterns:
abcdef-
-abcdef
-
and change them to
abcdef--
--abcdef
--
If he was looking to match ONLY:
wordA-wordB
You need at least one character before the - and one after. So thus the
([a-Z0-9][a-Z0-9]*), with the first [a-Z0-9] matching at least one character,
and the [a-Z0-9]* matching zero or an infinite number of ones following. It is
a peculiarity of sed.
Yes, Perl is much simpler. I am not a perl guru and would have probably used
the +, *, 0, (and there are one or two others) wrong. I chose sed, even though
odd, because I could get it to work. Perl would have been much more elegant
(and probably faster if you needed to do a lot of them).
- Tim
Frank Bax wrote:
> At 03:55 PM 11/11/04, Tim Young wrote:
> >cat [myfile] | sed 's/([a-Z0-9][a-Z0-9]*)-([a-Z0-9][a-Z0-9]*)/\1--\2/g'
>
> Isn't [a-Z0-9][a-Z0-9]* equivalent to [a-Z0-9]+
>
> In Ed's original email you mentioned "or practically any ASCII char". The
> above will work as long as the single character immediately preceeding
> *and* the single character immediately following the hypen are
> alphanumeric. The regexp doesn't look at the rest of the word.
>
> As soon as I typed that explanation, I realised Tim's regexp could use
> simply [a-Z0-9] instead of [a-Z0-9][a-Z0-9]* (since the * can match zero
> characters, it isn't really doing anything in this case).
>
> Frank
>
> _______________________________________________
> ChristianSource FSLUG mailing list
> Christiansource at ofb.biz
> http://cs.uninetsolutions.com
More information about the Christiansource
mailing list