|
|
XC. Regul�re Ausdr�cke Funktionen (POSIX erweitert)Einf�hrungAnmerkung:
Bei Verwendung der PCRE
Funktionen unterst�tzt PHP auch Regul�re Ausdr�cke mit
einer zu Perl kompatiblen Syntax. Diese Funktionen unterst�tzen
non-greedy Matching, Assertions, bedingte Subpatterns und viele
andere Merkmale, die von der POSIX-erweiterten Syntax regul�rer
Ausdr�cke nicht unterst�tzt werden.
Warnung |
Diese Funktionen regul�rer Ausdr�cke sind im Gegensatz zu den
PCRE Funktionen nicht
binary-safe.
|
Regul�re Ausdr�cke werden f�r komplexe Manipulationen an
Zeichenketten mit PHP verwendet. Folgende Funktionen unterst�tzen
regul�re Ausdr�cke:
Alle diese Funktionen nehmen als erstes Argument einen regul�ren
Ausdruck an. PHP verwendet durch POSIX 1003.2 definierte POSIX
erweiterte regul�re Ausdr�cke. Eine vollst�ndige Beschreibung der
POSIX regul�ren Ausdr�cke finden Sie im Regex-Verzeichnis der
PHP-Distribution in den Regex man pages. Da sie im manpage-Fomat
vorliegt, sollten Sie einen Befehl der Art man
/usr/local/src/regex/regex.7 verwenden, um sie zu
lesen.
AnforderungenDiese Erweiterung ben�tigt zur Erstellung keine externen Bibliotheken. Installation
Um die regexp-Unterst�tzung zu aktivieren, m�ssen Sie PHP mit der
Option --with-regex[=TYPE]
�bersetzen. TYPE kann entweder system, apache oder php sein.
Standardm��ig wird php verwendet.
Anmerkung:
Sie sollten TYPE nur �ndern, wenn Sie wissen, was Sie tun.
The windows version of PHP
has built in support for this extension. You do not need to load any additional
extension in order to use these functions. Laufzeit KonfigurationDiese Erweiterung definiert keine Konfigurationseinstellungen in der php.ini. Resource TypenDiese Erweiterung definiert keine Resource-Typen. Vordefinierte KonstantenDiese Erweiterung definiert keine Konstanten. Beispiele
Beispiel 1. Beispiele regul�rer Ausdr�cke ereg ("abc", $string);
/* Gibt true zur�ck, falls "abc"
irgendwo in $string gefunden wird. */
ereg ("^abc", $string);
/* Gibt true zur�ck, falls "abc"
am Anfang von $string gefunden wird. */
ereg ("abc$", $string);
/* Gibt true zur�ck, falls "abc"
am Ende von $string gefunden wird. */
eregi ("(ozilla.[23]|MSIE.3)", $HTTP_USER_AGENT);
/* Gibt true zur�ck, falls es sich beim Client Browser
um Netscape 2, 3 oder MSIE 3 handelt. */
ereg ("([[:alnum:]]+) ([[:alnum:]]+) ([[:alnum:]]+)", $string,$regs);
/* Setzt drei W�rter, die durch Leerzeichen getrennt
sind, in $regs[1], $regs[2] und $regs[3] ein. */
$string = ereg_replace ("^", "<br />", $string);
/* Setzt ein <br /> Tag vor $string. */
$string = ereg_replace ("$", "<br />", $string);
/* Setzt ein <br /> Tag hinter $string. */
$string = ereg_replace ("\n", "", $string);
/* Entfernt alle Zeilenumbr�che aus $string. */ |
|
Siehe auch
Schauen Sie sich bez�glich regul�rer Ausdr�cke mit einer zu Perl
kompatiblen Syntax die PCRE Funktionen an.
fnmatch() bietet die M�glichkeit der Suche
nach �bereinstimmungen mit Wildcard-Suchmustern im einfacheren
Shell-Stil.
- Inhaltsverzeichnis
- ereg_replace -- Ersetzt einen regul�ren Ausdruck
- ereg --
Sucht �bereinstimmungen mit einem regul�ren Ausdruck
- eregi_replace --
Ersetzt einen regul�ren Ausdr�ck ohne Ber�cksichtigung von
Gro�-/Kleinschreibung
- eregi --
Sucht �bereinstimmung mit regul�rem Ausdruck ohne
Ber�cksichtigung von Gro�-/Kleinschreibung
- split --
Zerlegt eine Zeichenkette anhand eines regul�ren Ausdrucks in ein
Array
- spliti --
Zerlegt eine Zeichenkette anhand eines regul�ren Ausdrucks ohne
Ber�cksichtigung von Gro�-/Kleinschreibung in ein Array
- sql_regcase --
Erstellt einen regul�ren Ausdruck f�r eine Suche nach
�bereinstimmungen ohne Ber�cksichtigung von Gro�-/Kleinschreibung
User Contributed Notes Regul�re Ausdr�cke Funktionen (POSIX erweitert) |
|
07-Mar-2001 06:38 |
|
If you don't have commandline access to the manpage cited above, note that
the "POSIX 1003.2 Regular Expressions" manpage is also widely
re-published on the web. See, for instance:
The
"POSIX 1003.2 Regular Expressions" manpage provides a good basic
reference for the syntax used by ereg_* functions. Most tutorials on
"extended regular expressions" are also applicable.
|
|
bart at framers dot nl
07-Mar-2001 01:53 |
|
Dario seems to have made a nice tutorial about regular
expressions:
Thanks
Dario! ...
|
|
webmaster at datamike dot org
18-Dec-2001 12:39 |
|
I noticed Cyro's link had gone old. So I made copy of the regex manpage and
placed it on my site. You can get it from the following
address:
This
is primarily for Windows users, who have no access to the man pages in
Linux distributions.
|
|
03-Feb-2002 02:02 |
|
if you are looking for the abbreviations like tab, carriage return,
regex-class definitions
you should look here:
some
excerpts:
\a control characters bell \b backspace \f form
feed \n line feed \r carriage return \t horizontal
tab \v vertical tab
class example \cLu all uppercase
letters
|
|
regex at dan42 dot cjb dot net
21-Feb-2002 04:12 |
|
It's easy to exclude characters but excluding words with a regular
expression is a bit more tricky. For parentheses there is no equivalent to
the ^ for brackets. The only way I've found to exclude a string is to
proceed by inverse logic: accept all the words that do NOT correspond to
the string. So if you want to accept all strings except those _begining_
with "abc", you'd have to accept any string that matches one of
the following: ^(ab[^c]) ^(a[^b]c) ^(a[^b][^c])
^([^a]bc) ^([^a]b[^c]) ^([^a][^b]c)
^([^a][^b][^c])
which, put together, gives the regex
^(ab[^c]|a[^b]c|a[^b][^c]|[^a]bc|[^a]b[^c]|[^a][^b]c|[^a][^b][^c])
Note
that this won't work to detect the word "abc" anywhere in a
string. You need to have some way of anchoring the inverse word
match like: ^(a[^b]|[^a]b|[^a][^b]) ;"ab" not at begining
of line or: (a[^b]|[^a]b|[^a][^b])& ;"ab" not at end
of line or: 123(a[^b]|[^a]b|[^a][^b]) ;"ab" not after
"123"
I don't know why "(abc){0,0}" is an
invalid synthax. It would've made all this much simpler.
Slightly off-topic, here's a regex date validator (format yyyy-mm-dd,
remove all spaces and linefeeds):
^(19|20)([0-9]{2}-((0[13-9]|1[0-2])-(0[1-9]|[12][0-9]|30)|
(0[13578]|1[02])-31|02-(0[1-9]|1[0-9]|2[0-8]))|([2468]0|
[02468][48]|[13579][26])-02-29)$
|
|
luciano_at_braziliantranslation.net
03-Mar-2002 07:15 |
|
mholdgate wrote a very nice quick reference guide in the next page (),
but I felt it could be improved a
little: ________________
^ Start of line $ End of
line n? Zero or only one single occurrence of character
'n' n* Zero or more occurrences of character 'n' n+ At least one
or more occurrences of character 'n' n{2} Exactly two occurrences of
'n' n{2,} At least 2 or more occurrences of 'n' n{2,4} From 2 to 4
occurrences of 'n' . Any single character () Parenthesis to group
expressions (.*) Zero or more occurrences of any single character, ie,
anything! (n|a) Either 'n' or 'a' [1-6] Any single digit in the
range between 1 and 6 [c-h] Any single lower case letter in the range
between c and h [D-M] Any single upper case letter in the range
between D and M [^a-z] Any single character EXCEPT any lower case
letter between a and z.
Pitfall: the ^ symbol only acts as an
EXCEPT rule if it is the very first character inside a range, and it
denies the entire range including the ^ symbol itself if it appears
again later in the range. Also remember that if it is the first
character in the entire expression, it means "start of
line". In any other place, it is always treated as a regular ^
symbol. In other words, you cannot deny a word with ^undesired_word
or a group with ^(undesired_phrase). Read more detailed regex
documentation to find out what is necessary to achieve
this.
[_4^a-zA-Z] Any single character which can be the underscore
or the number 4 or the ^ symbol or any letter, lower or upper
case
?, +, * and the {} count parameters can be appended not only
to a single character, but also to a group() or a
range[].
therefore, ^.{2}[a-z]{1,2}_?[0-9]*([1-6]|[a-f])[^1-9]{2}a+$ would
mean:
^.{2} = A line beginning with any two characters,
[a-z]{1,2} = followed by either 1 or 2 lower case letters, _? =
followed by an optional underscore, [0-9]* = followed by zero or
more digits, ([1-6]|[a-f]) = followed by either a digit between 1 and
6 OR a lower case letter between a and f, [^1-9]{2} = followed
by any two characters except digits between 1 and 9 (0 is possible),
a+$ = followed by at least one or more occurrences of 'a' at
the end of a line.
|
|
spiceee at potentialvalleys dot com
07-Mar-2002 05:26 |
|
sorry to be picky here but saying ^ is beginning of a line or $ is end of
line is rather misleading, if you're working on a daily basis with
regexes.
it might be that it is most of the time correct BUT in
some occasions you'd be better off to think of ^ as "start of
string" and $ as "end of string".
there are ways to
make your regex engine forget about your system's notion of a newline,
it's what is commonly refered to as multiline regexes...
|
|
regex at dan42 dot cjb dot net
08-Mar-2002 05:33 |
|
Follow-up to my previous post: Some simple optimization allowed me to
realize that excluding a word at the beginning of a string has a degree of
complexity O(n) rather than O(n^2). I only had to follow the
logic:
if str[0] != badword[0] then OK else if str[1] !=
badword[1] then OK else if str[2] != badword[2] then OK
else ...
So excluding the word 'abc' at the beginning of a string
is much more simple than I had made it out to be:
^([^a]|a[^b]|ab[^c])
|
|
david at NOgreenhammerSPAM dot com
09-Mar-2002 05:40 |
|
Sadly, the Posix regexp evaluator (PHP 4.1.2) does not seem to support
multi-character coallating sequences, even though such sequences are
included in the man-page documentation.
Specifically, the man-page
discusses the expression "[[.ch.]]*c" which matches the first
five characters of "chchcc". Running this expression in
ereg_replace generates the error "Warning: REG_ECOLLATE".
(Running an equivalent expression with only one character between the
periods does work, however.)
Multi-character coallating sequences
are not supported!
This is really, really too bad, because it would
have provided a simple way to exlude words from the target.
I'm
going to go learn PCRE, now. :-(
|
|
bps7j at yahoo dot com
22-Aug-2002 02:40 |
|
Something that really got me: I'm used to using Perl's regexps, and so I
used \s to check for a whitespace character in a password on a website. My
PHP book (Wrox Press, Professional PHP Programming) agreed with me that
this is exactly the same as [ \r\n\t\f\v], but it's NOT. In fact, what it
did was keep anyone from joining the site if they put an 's' in their
password! So beware, check for subtle differences between what you're used
to and PHP.
[[:space:]] works fine, by the way.
I'm going to
use the pcre functions from now on... I like Perl :o)
|
|
paper
09-Sep-2002 06:57 |
|
I have also experienced the same problem as [email protected] had been
experiencing, except I did not recognize the problem until after many
hours of debugging.
"\s" does not seem to represent
spaces, however "[[:space:]]" does.
Another problem I was
having was matching dashes/hyphens '-'. You must escape them
"\-" and place them at the end of a bracket
expression.
Example: To match a blank string or a string containing
only uppercase letters, underscores, spaces, and
hyphens:
^([A-Z_\-]|[[:space:]])*$
Hope this saves someone
some time from debugging like I was. :)
|
|
moc DOT liamtoh AT ssengnorw
18-Oct-2002 04:28 |
|
In a PCRE \s matches whitespace, but not inside a character
class:
preg_match ('/\s/', ' ') // match preg_match ('/[\s]/', '
') // no match
Within a character class [:space:] is treated as a
single character that matches any single whitespace
character:
$pattern = '/[[:space:]]/'; $subject = "space
tab\tnewline\n"; preg_match_all($pattern, $subject, $out) // ==
3
To match a hyphen from within a character class, it must either
be first or last; otherwise, it will act as a range
operator.
Example: To match a blank string or a string containing
only uppercase letters, underscores, spaces, and
hyphens:
preg_match('/^[A-Z_ -]*$/', $subject)
To match any
whitespace, not just spaces:
preg_match('/^[A-Z_[:space:]-]*$/',
$subject)
|
|
Robin
15-Jan-2003 05:53 |
|
Ever wondered how to exclude "[" and "]"? Here it
goes: "[^][]". Extra characters to exclude can beadded right in
the middle like this: "[^]fobar[]".
|
|
Anand Thakur
25-Mar-2003 06:43 |
|
I saw a link to this page somewhere. It is a library of user-submitted
regular expressions for various things. Some good stuff there.
|
|
|
| |