Gossamer Forum
Home : General : Perl Programming :

The equivalent regex

(Page 1 of 2)
> >
Quote Reply
The equivalent regex
This isn't a question, but more of a challenge (very easy though).

I was playing with some code and trying to do what I wanted without loading the regex engine and came up with this:

Code:
substr($url, rindex($url, '/'), (length($url) - rindex($url, '/'))) = undef;

Can anyone tell me what the corresponding regex is?

I wonder whether it is any faster than a regex.

Last edited by:

Paul: Jul 31, 2002, 6:45 AM
Quote Reply
Re: [Paul] The equivalent regex In reply to
Code:
my $url =~ s!/[^/]*$!!;

- wil
Quote Reply
Re: [Wil] The equivalent regex In reply to
Hmm close enough but you wouldn't want "my" there :)
Quote Reply
Re: [Paul] The equivalent regex In reply to
Other than that, the regex does as you requested, though?

Have you benchmarked them?

- wil
Quote Reply
Re: [Paul] The equivalent regex In reply to
why wouldn't you use

$url = substr ($url, 0, rindex ($url, '/'));

--
jsu
Quote Reply
Re: [Wil] The equivalent regex In reply to
Yeah, the regex seems to kick ass....10 million iterations....

Code:
Benchmark: timing 10000000 iterations of regex, substr...

regex: 3 wallclock secs ( 2.84 usr + -0.00 sys = 2.84 CPU) @ 3517411.19/s (n=10000000)

substr: 19 wallclock secs (17.65 usr + 0.04 sys = 17.69 CPU) @ 565131.39/s (n=10000000)
Quote Reply
Re: [Seto Kaiba] The equivalent regex In reply to
Indeed I could, but like I said, I was playing about.
Quote Reply
Re: [Paul] The equivalent regex In reply to
Wow. That's quite a bit faster. I guess your first snippet was calling four different functions.

What would be neat would be downloading Parrot and bechmarking the same regex on the Perl 5 v. the Perl 6 engine.

- wil
Quote Reply
Re: [Paul] The equivalent regex In reply to
well.. mine is still faster than the regex..
regex = wil
substr1 = paul
substr2 = me

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 15 wallclock secs (13.23 usr + 0.01 sys = 13.24 CPU) @ 755287.01/s (n=10000000)
substr1: 20 wallclock secs (18.77 usr + 0.00 sys = 18.77 CPU) @ 532765.05/s (n=10000000)
substr2: 10 wallclock secs (10.39 usr + 0.00 sys = 10.39 CPU) @ 962463.91/s (n=10000000)


and to double check..

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 12 wallclock secs (12.87 usr + -0.01 sys = 12.86 CPU) @ 777604.98/s (n=10000000)
substr1: 19 wallclock secs (18.43 usr + 0.01 sys = 18.44 CPU) @ 542299.35/s (n=10000000)
substr2: 10 wallclock secs (10.71 usr + -0.02 sys = 10.69 CPU) @ 935453.70/s (n=10000000)


and on m$ windoze.

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 20 wallclock secs (16.43 usr + 0.01 sys = 16.44 CPU) @ 608124.54/s
(n=10000000)
substr1: 29 wallclock secs (25.52 usr + 0.05 sys = 25.57 CPU) @ 391129.19/s
(n=10000000)
substr2: 11 wallclock secs (10.84 usr + 0.00 sys = 10.84 CPU) @ 922083.91/s
(n=10000000)


--
jsu

Last edited by:

Seto Kaiba: Jul 31, 2002, 9:00 AM
Quote Reply
Re: [Seto Kaiba] The equivalent regex In reply to
This is what I get with the same banchmark...

Code:
Benchmark: timing 10000000 iterations of regex, substr, substr2...

regex: 3 wallclock secs ( 3.18 usr + 0.02 sys = 3.20 CPU) @ 3128911.14/s (n=10000000)

substr: 19 wallclock secs (17.96 usr + 0.03 sys = 17.99 CPU) @ 555957.08/s (n=10000000)

substr2: 13 wallclock secs ( 9.88 usr + 0.01 sys = 9.89 CPU) @ 1010611.42/s (n=10000000)
Quote Reply
Re: [Paul] The equivalent regex In reply to
your computer is a regex supermachine.

i forget which version of perl i use.

5.6.1 on linux

and same for windows.

--
jsu
Quote Reply
Re: [Seto Kaiba] The equivalent regex In reply to
Im using 5.6.1 on WinXP Pro
Quote Reply
Re: [Paul] The equivalent regex In reply to
i think i figured out what you did..

i got these results in windoze.

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 3 wallclock secs ( 2.77 usr + 0.01 sys = 2.78 CPU) @ 3590664.27/s (n=10000000)
substr1: 18 wallclock secs (15.11 usr + 0.02 sys = 15.13 CPU) @ 660894.85/s (n=10000000)substr2: 8 wallclock secs ( 7.37 usr + -0.01 sys = 7.36 CPU) @ 1358511.07/s (n=10000000)


but.. that's testing on empty values for $url. what's the point of doing that? of course regex won't do anything.

--
jsu

Last edited by:

Seto Kaiba: Jul 31, 2002, 9:10 AM
Quote Reply
Re: [Seto Kaiba] The equivalent regex In reply to
>>
but.. that's testing on empty values for $url. what's the point of doing that? of course regex won't do anything.
<<


my $url = 'http://www.wiredon.net/foo/bar';
Quote Reply
Re: [Paul] The equivalent regex In reply to
be sure your regex is actually using that value cause my perl on both windows 2000 and redhat linux 7.2 aren't timing as well (still higher than the substr2).

--
jsu
Quote Reply
Re: [Seto Kaiba] The equivalent regex In reply to
I think I know what it is. I'm using the anonymous code ref method of benchmarking but the $url is declared outside the block, I need to move it inside or make it global.

Last edited by:

Paul: Jul 31, 2002, 9:15 AM
Quote Reply
Re: [Paul] The equivalent regex In reply to
What's your bechmark code so I can try it out on a few machines?

- wil
Quote Reply
Re: [Paul] The equivalent regex In reply to
exactly what i meant by the $url isn't being used by the regex. my code is..

Code:
#!perl

use Benchmark;

timethese (10000000, {
'regex' => \&wil,
'substr1' => \&pau,
'substr2' => \&set

});

sub wil {
my $url = "http://www.gossamer-threads.com/lala/lala";
$url =~ s!/[^/]*!!;
}

sub pau {
my $url = "http://www.gossamer-threads.com/lala/lala";
substr($url, rindex($url, '/'), (length($url) - rindex($url, '/'))) = undef;
}

sub set {
my $url = "http://www.gossamer-threads.com/lala/lala";
$url = substr ($url, 0, rindex ($url, '/'));
}

[edit] i hate the advanced editor.

--
jsu

Last edited by:

Seto Kaiba: Jul 31, 2002, 9:34 AM
Quote Reply
Re: [Seto Kaiba] The equivalent regex In reply to
Code:
This is perl, version 5.005_03 built for i386-linux

Benchmark: timing 1000000 iterations of regex, substr1, substr2...
regex: 6 wallclock secs ( 4.54 usr + 0.01 sys = 4.55 CPU)
substr1: 7 wallclock secs ( 6.08 usr + 0.03 sys = 6.11 CPU)
substr2: 3 wallclock secs ( 3.15 usr + 0.04 sys = 3.19 CPU)

- wil
Quote Reply
Re: [Wil] The equivalent regex In reply to
Yep. A simple substr done right beats regex anyday.

--
jsu
Quote Reply
Re: [Wil] The equivalent regex In reply to
Code:
This is perl, v5.6.0 built for i386-linux

Benchmark: timing 1000000 iterations of regex, substr1, substr2...
regex: 16 wallclock secs (14.91 usr + 0.02 sys = 14.93 CPU) @ 66979.24/s (n=1000000)
substr1: 24 wallclock secs (23.85 usr + 0.02 sys = 23.87 CPU) @ 41893.59/s (n=1000000)
substr2: 11 wallclock secs (10.12 usr + -0.03 sys = 10.09 CPU) @ 99108.03/s (n=1000000)

- wil
Quote Reply
Re: [Wil] The equivalent regex In reply to
Hey!!

Hang on a minute, Jerry.

You modified my regex to drop off the $ anchor!

- wil
Quote Reply
Re: [Wil] The equivalent regex In reply to
oops. didn't notice that.. but it's still the same thing.. if you think about it.. the [^/]* pretty much means ... get the last /..

but with the $ anchor.. i get even slower times for the regex.. here are my results.

Code:
Windows 2000. Perl 5.6.1.

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 35 wallclock secs (31.03 usr + 0.04 sys = 31.07 CPU) @ 321895.32/s (n=10000000)
substr1: 29 wallclock secs (26.40 usr + 0.02 sys = 26.42 CPU) @ 378515.46/s (n=10000000)
substr2: 14 wallclock secs (12.54 usr + 0.04 sys = 12.58 CPU) @ 794975.75/s (n=10000000)

RH Linux 7.2. Perl 5.6.1.

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 22 wallclock secs (21.23 usr + 0.01 sys = 21.24 CPU) @ 470809.79/s (n=10000000)
substr1: 15 wallclock secs (17.00 usr + 0.01 sys = 17.01 CPU) @ 587889.48/s (n=10000000)
substr2: 11 wallclock secs (10.71 usr + 0.02 sys = 10.73 CPU) @ 931966.45/s (n=10000000)

so.. still horribly slow.

i think it's obvious why substr1 is slower than substr2. it calls substr once, rindex twice and length once. while substr2 calls substr and rindex once. regex.. i wouldn't know how to explain.. but it's not faster.. so i won't explain it.

--
jsu

Last edited by:

Seto Kaiba: Jul 31, 2002, 1:57 PM
Quote Reply
Re: [Seto Kaiba] The equivalent regex In reply to
>>
i wouldn't know how to explain.. but it's not faster.. so i won't explain it.
<<

Im guessing part of the reason is that perl has to init the regex engine
Quote Reply
Re: [Paul] The equivalent regex In reply to
why would your substr inflate the time on windows then?

is "length ($url)" too much for windows? i hope not. maybe it's the subtraction. stupid windows.

--
jsu
> >