PHP  
downloads | documentation | faq | getting help | mailing lists | | php.net sites | links 
search for in the  
previousqdom_treeereg_replacenext
Last updated: Tue, 09 Jul 2002
view the printer friendly version or the printer friendly version with notes or change language to English | Brazilian Portuguese | Chinese | Czech | Dutch | Finnish | German | Hungarian | Italian | Japanese | Korean | Polish | Romanian | Russian | Spanish | Swedish | Turkish

LXXXIX. Expressions r�guli�res

Les expressions r�guli�res sont utilis�es pour effectuer des manipulations complexes de cha�nes de caract�res. Les fonctions sont :

Ces fonctions requi�rent toutes une expression r�guli�re comme premier argument. PHP utilise les expressions r�guli�res avanc�es de POSIX (POSIX 1003.2). Pour avoir tous les d�tails sur ces expressions, reportez vous aux pages de manuel inclues dans le r�pertoire de la distribution PHP.

Exemple 1. Expressions r�guli�res

<?php
ereg("abc",$string);
/* Retourne TRUE si "abc"
   est trouv� quelque part dans la cha�ne $string. */
ereg("^abc",$string);
/* Retourne TRUE si  "abc"
   est trouv� au d�but de la cha�ne $string. */
ereg("abc$",$string);
/* Retourne TRUE si  "abc"
   est trouv� � la fin de la cha�ne  $string. */
eregi("(ozilla.[23]|MSIE.3)",$HTTP_USER_AGENT);
/* Retourne TRUE si  le client
   est Netscape 2, 3 ou MSIE 3. */
ereg("([[:alnum:]]+) ([[:alnum:]]+) ([[:alnum:]]+)",
     $string,$regs);
/* Introduit trois mots s�par�s par des espaces
   dans les cha�nes $regs[1], $regs[2] et $regs[3]. */
$string = ereg_replace("^","<BR>",$string);
/* Ins�re une balise <BR> au d�but de la cha�ne $string. */
$string = ereg_replace("$","<BR>",$string);
/* Ins�re une balise <BR> � la fin de la cha�ne $string. */
$string = ereg_replace("\n","",$string);
/* Supprime toutes les nouvelles lignes de $string. */
?>

Table des mati�res
ereg_replace -- Remplacement par expression r�guli�re.
ereg -- Expression r�guli�re standard.
eregi_replace --  Remplacement par expression r�guli�re insensible � la casse.
eregi --  Recherche par expression r�guli�re insensible � la casse.
split --  Scinde une cha�ne en un tableau, gr�ce � une expression r�guli�re.
spliti --  Scinde une cha�ne en un tableau, gr�ce � une expression r�guli�re.
sql_regcase --  Pr�pare une expression r�guli�re pour effectuer une recherche insensible � la casse.
User Contributed Notes
Expressions r�guli�res
add a note about notes
07-Mar-2001 05:38
If you don't have commandline access to the manpage cited above, note that the "POSIX 1003.2 Regular Expressions" manpage is also widely re-published on the web. See, for instance:



The "POSIX 1003.2 Regular Expressions" manpage provides a good basic reference for the syntax used by ereg_* functions. Most tutorials on "extended regular expressions" are also applicable.

[email protected]
07-Mar-2001 12:53

Dario seems to have made a nice tutorial about regular expressions:



Thanks Dario! ...

[email protected]
18-Dec-2001 11:39

I noticed Cyro's link had gone old. So I made copy of the regex manpage and placed it on my site. You can get it from the following address:



This is primarily for Windows users, who have no access to the man pages in Linux distributions.

03-Feb-2002 01:02
if you are looking for the abbreviations like tab, carriage return, regex-class definitions

you should look here:


some excerpts:

\a control characters bell
\b backspace
\f form feed
\n line feed
\r carriage return
\t horizontal tab
\v vertical tab

class example
\cLu all uppercase letters

[email protected]
21-Feb-2002 03:12

It's easy to exclude characters but excluding words with a regular expression is a bit more tricky. For parentheses there is no equivalent to the ^ for brackets. The only way I've found to exclude a string is to proceed by inverse logic: accept all the words that do NOT correspond to the string. So if you want to accept all strings except those _begining_ with "abc", you'd have to accept any string that matches one of the following:
^(ab[^c])
^(a[^b]c)
^(a[^b][^c])
^([^a]bc)
^([^a]b[^c])
^([^a][^b]c)
^([^a][^b][^c])

which, put together, gives the regex
^(ab[^c]|a[^b]c|a[^b][^c]|[^a]bc|[^a]b[^c]|[^a][^b]c|[^a][^b][^c])

Note that this won't work to detect the word "abc" anywhere in a string. You need to have some way of anchoring the inverse word match
like: ^(a[^b]|[^a]b|[^a][^b]) ;"ab" not at begining of line
or: (a[^b]|[^a]b|[^a][^b])& ;"ab" not at end of line
or: 123(a[^b]|[^a]b|[^a][^b]) ;"ab" not after "123"

I don't know why "(abc){0,0}" is an invalid synthax. It would've made all this much simpler.


Slightly off-topic, here's a regex date validator (format yyyy-mm-dd, remove all spaces and linefeeds):
^(19|20)([0-9]{2}-((0[13-9]|1[0-2])-(0[1-9]|[12][0-9]|30)|
(0[13578]|1[02])-31|02-(0[1-9]|1[0-9]|2[0-8]))|([2468]0|
[02468][48]|[13579][26])-02-29)$

luciano_at_braziliantranslation.net
03-Mar-2002 06:15

mholdgate wrote a very nice quick reference guide in the next page (), but I felt it could be improved a little:
________________

^ Start of line
$ End of line
n? Zero or only one single occurrence of character 'n'
n* Zero or more occurrences of character 'n'
n+ At least one or more occurrences of character 'n'
n{2} Exactly two occurrences of 'n'
n{2,} At least 2 or more occurrences of 'n'
n{2,4} From 2 to 4 occurrences of 'n'
. Any single character
() Parenthesis to group expressions
(.*) Zero or more occurrences of any single character, ie, anything!
(n|a) Either 'n' or 'a'
[1-6] Any single digit in the range between 1 and 6
[c-h] Any single lower case letter in the range between c and h
[D-M] Any single upper case letter in the range between D and M
[^a-z] Any single character EXCEPT any lower case letter between a and z.

Pitfall: the ^ symbol only acts as an EXCEPT rule if it is the
very first character inside a range, and it denies the
entire range including the ^ symbol itself if it appears again
later in the range. Also remember that if it is the first
character in the entire expression, it means "start of line".
In any other place, it is always treated as a regular ^ symbol.
In other words, you cannot deny a word with ^undesired_word
or a group with ^(undesired_phrase).
Read more detailed regex documentation to find out what is
necessary to achieve this.

[_4^a-zA-Z] Any single character which can be the underscore or the
number 4 or the ^ symbol or any letter, lower or upper case

?, +, * and the {} count parameters can be appended not only to a single character, but also to a group() or a range[].

therefore,
^.{2}[a-z]{1,2}_?[0-9]*([1-6]|[a-f])[^1-9]{2}a+$
would mean:

^.{2} = A line beginning with any two characters,
[a-z]{1,2} = followed by either 1 or 2 lower case letters,
_? = followed by an optional underscore,
[0-9]* = followed by zero or more digits,
([1-6]|[a-f]) = followed by either a digit between 1 and 6 OR a
lower case letter between a and f,
[^1-9]{2} = followed by any two characters except digits
between 1 and 9 (0 is possible),
a+$ = followed by at least one or more
occurrences of 'a' at the end of a line.

[email protected]
07-Mar-2002 04:26

sorry to be picky here but saying ^ is beginning of a line or $ is end of line is rather misleading, if you're working on a daily basis with regexes.

it might be that it is most of the time correct BUT in some occasions you'd be better off to think of ^ as "start of string" and $ as "end of string".

there are ways to make your regex engine forget about your system's notion of a newline, it's what is commonly refered to as multiline regexes...

[email protected]
08-Mar-2002 04:33

Follow-up to my previous post:
Some simple optimization allowed me to realize that excluding a word at the beginning of a string has a degree of complexity O(n) rather than O(n^2). I only had to follow the logic:

if str[0] != badword[0] then OK
else
if str[1] != badword[1] then OK
else
if str[2] != badword[2] then OK
else ...

So excluding the word 'abc' at the beginning of a string is much more simple than I had made it out to be:
^([^a]|a[^b]|ab[^c])

[email protected]
09-Mar-2002 04:40

Sadly, the Posix regexp evaluator (PHP 4.1.2) does not seem to support multi-character coallating sequences, even though such sequences are included in the man-page documentation.

Specifically, the man-page discusses the expression "[[.ch.]]*c" which matches the first five characters of "chchcc". Running this expression in ereg_replace generates the error "Warning: REG_ECOLLATE". (Running an equivalent expression with only one character between the periods does work, however.)

Multi-character coallating sequences are not supported!

This is really, really too bad, because it would have provided a simple way to exlude words from the target.

I'm going to go learn PCRE, now. :-(

[email protected]
22-Aug-2002 01:40

Something that really got me: I'm used to using Perl's regexps, and so I used \s to check for a whitespace character in a password on a website. My PHP book (Wrox Press, Professional PHP Programming) agreed with me that this is exactly the same as [ \r\n\t\f\v], but it's NOT. In fact, what it did was keep anyone from joining the site if they put an 's' in their password! So beware, check for subtle differences between what you're used to and PHP.

[[:space:]] works fine, by the way.

I'm going to use the pcre functions from now on... I like Perl :o)

add a note about notes
previousqdom_treeereg_replacenext
Last updated: Tue, 09 Jul 2002
show source | credits | stats | mirror sites
Copyright © 2001, 2002 The PHP Group
All rights reserved.
This mirror generously provided by:
Last updated: Sat Aug 31 06:19:44 2002 CEST