Saturday, May 28, 2011

PHP regex lookaround assertion: lookbehind and lookahead

Regular Expression is a mighty beast, if you can tame it Regex will be a great leverage to solve your Problems.

A Real World Example:

Regex assertion is to check if any string matches following or the preceding characters. Suppose we want to check if there is any '.mp3'or '.mp4' string in our text but we only want the value 'mp3'or 'mp4'

One solution is:

$text = 'demo.mp3, shakira.mp4,lorum ipsum';

if(preg_match_all('/\.mp3|\.mp4/', $text,$matches)):

foreach($matches as $key => $match){
$matches[$key] =  str_replace('.', '' ,$matches[$key]);
}

endif;
var_dump($matches);

But this is ugly, we can use the lookbehind assertion to make the code simple:

preg_match_all('/(?<=.)mp3|mp4/', $text,$matches);
var_dump($matches); 
 
Let's Go Back: Check The Basic: 
 What are the lookahead and lookbehind assertions? and what are the syntax?

Assertions Only check that from the current position if a string matches the following or preceding characters, It does not consume any character. 'lorum(?=ipsum)' matches 'lorum' from the text 'lorumipsum'. Note that it's not matching 'lorumipsum', it's just checking if 'ipsum' are the next characters after 'lorum'.

LookAhead Syntax:
Positive lookahead: (?=)
Negative lookahead: (?!)

LookAhead Assertion Checks if following characters matches(positive lookahead) or does not(negative lookahead) from the current position.

positive Lookahead:
The following example matches 'lorum' first then checks that if 'ipsum' is next, If the checking is successful then function returns a successful match.

preg_match('/lorum(?=ipsum)/','lorumipsum', $match);
var_dump($match); 
 
Negative Lookahead: The following example matches 'lorum' first then checks that if 'test' is not next, If the checking is successful then function returns a successful match.

preg_match('/lorum(?!test)/','lorumipsum', $match);
var_dump($match); 
 
Look Behind Syntax:
Positive lookbehind:(?<=)
Negative lookbehind:(?<!)

positive Lookbehind:
This example matches 'ipsum' first then checks that if 'lorum' is previous, If the checking is successful then function returns a successful match.


preg_match('/(?<=lorum)ipsum/','lorumipsum', $match);
var_dump($match); 
 
 ******
One important thing to remember: Regex engine searches from left to right but in this case as there is nothing too match before our assertion it matces ipsum first to determine the current position. Suppose we would build our regex expression by  '^(?<=lorum)ipsum' then the regex engine would match the start of the line first. Then it would determine whether 'lorum' matches previous which would obviously fail as there is no character preceding 'lorum'.



Negative LookBehind: 
The following example matches 'ipsum' first then checks that if 'hello' is not previous, If the checking is successful then function returns a successful match.

preg_match('/(?<!hello)ipsum/','lorumipsum', $match);
var_dump($match); 

1 comment:

  1. Hello, I'm trying to find IMG tags which don't contain ALT attribute and I'm using:

    preg_match_all('@<img((?!alt).)*?>@', $text, $matches);

    But this negative look-ahead expression doesn't return anything, while there are such IMG tags.
    Any ideas why is it not working?

    ReplyDelete

About Me

Web Developer From Dhaka, Bangladesh.