Gossamer Forum: General: Perl Programming: The equivalent regex

Veteran (19537 posts)

Jul 31, 2002, 6:44 AM

Post #1 of 32

Shortcut

The equivalent regex

This isn't a question, but more of a challenge (very easy though).

I was playing with some code and trying to do what I wanted without loading the regex engine and came up with this:

Code:
substr($url, rindex($url, '/'), (length($url) - rindex($url, '/'))) = undef;

Can anyone tell me what the corresponding regex is?

I wonder whether it is any faster than a regex.

Last edited by:

Paul: Jul 31, 2002, 6:45 AM

Jul 31, 2002, 7:41 AM

Veteran / Moderator (4108 posts)

Jul 31, 2002, 7:41 AM

Post #2 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

Code:
my $url =~ s!/[^/]*$!!;

- wil

Jul 31, 2002, 8:21 AM

Veteran (19537 posts)

Jul 31, 2002, 8:21 AM

Post #3 of 32

Shortcut

Re: [Wil] The equivalent regex In reply to

Hmm close enough but you wouldn't want "my" there :)

Jul 31, 2002, 8:23 AM

Veteran / Moderator (4108 posts)

Jul 31, 2002, 8:23 AM

Post #4 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

Other than that, the regex does as you requested, though?

Have you benchmarked them?

- wil

Jul 31, 2002, 8:25 AM

User (105 posts)

Jul 31, 2002, 8:25 AM

Post #5 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

why wouldn't you use

$url = substr ($url, 0, rindex ($url, '/'));

--
jsu

Jul 31, 2002, 8:31 AM

Veteran (19537 posts)

Jul 31, 2002, 8:31 AM

Post #6 of 32

Shortcut

Re: [Wil] The equivalent regex In reply to

Yeah, the regex seems to kick ass....10 million iterations....

Code:
Benchmark: timing 10000000 iterations of regex, substr...  

regex: 3 wallclock secs ( 2.84 usr + -0.00 sys = 2.84 CPU) @ 3517411.19/s (n=10000000)  

substr: 19 wallclock secs (17.65 usr + 0.04 sys = 17.69 CPU) @ 565131.39/s (n=10000000)

Jul 31, 2002, 8:32 AM

Veteran (19537 posts)

Jul 31, 2002, 8:32 AM

Post #7 of 32

Shortcut

Re: [Seto Kaiba] The equivalent regex In reply to

Indeed I could, but like I said, I was playing about.

Jul 31, 2002, 8:37 AM

Veteran / Moderator (4108 posts)

Jul 31, 2002, 8:37 AM

Post #8 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

Wow. That's quite a bit faster. I guess your first snippet was calling four different functions.

What would be neat would be downloading Parrot and bechmarking the same regex on the Perl 5 v. the Perl 6 engine.

- wil

Jul 31, 2002, 8:50 AM

User (105 posts)

Jul 31, 2002, 8:50 AM

Post #9 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

well.. mine is still faster than the regex..
regex = wil
substr1 = paul
substr2 = me

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 15 wallclock secs (13.23 usr + 0.01 sys = 13.24 CPU) @ 755287.01/s (n=10000000)
substr1: 20 wallclock secs (18.77 usr + 0.00 sys = 18.77 CPU) @ 532765.05/s (n=10000000)
substr2: 10 wallclock secs (10.39 usr + 0.00 sys = 10.39 CPU) @ 962463.91/s (n=10000000)

and to double check..

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 12 wallclock secs (12.87 usr + -0.01 sys = 12.86 CPU) @ 777604.98/s (n=10000000)
substr1: 19 wallclock secs (18.43 usr + 0.01 sys = 18.44 CPU) @ 542299.35/s (n=10000000)
substr2: 10 wallclock secs (10.71 usr + -0.02 sys = 10.69 CPU) @ 935453.70/s (n=10000000)

and on m$ windoze.

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 20 wallclock secs (16.43 usr + 0.01 sys = 16.44 CPU) @ 608124.54/s
(n=10000000)
substr1: 29 wallclock secs (25.52 usr + 0.05 sys = 25.57 CPU) @ 391129.19/s
(n=10000000)
substr2: 11 wallclock secs (10.84 usr + 0.00 sys = 10.84 CPU) @ 922083.91/s
(n=10000000)

--
jsu

Last edited by:

Seto Kaiba: Jul 31, 2002, 9:00 AM

Jul 31, 2002, 9:01 AM

Veteran (19537 posts)

Jul 31, 2002, 9:01 AM

Post #10 of 32

Shortcut

Re: [Seto Kaiba] The equivalent regex In reply to

This is what I get with the same banchmark...

Code:
Benchmark: timing 10000000 iterations of regex, substr, substr2...  

regex: 3 wallclock secs ( 3.18 usr + 0.02 sys = 3.20 CPU) @ 3128911.14/s (n=10000000)  

substr: 19 wallclock secs (17.96 usr + 0.03 sys = 17.99 CPU) @ 555957.08/s (n=10000000)  

substr2: 13 wallclock secs ( 9.88 usr + 0.01 sys = 9.89 CPU) @ 1010611.42/s (n=10000000)

Jul 31, 2002, 9:05 AM

User (105 posts)

Jul 31, 2002, 9:05 AM

Post #11 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

your computer is a regex supermachine.

i forget which version of perl i use.

5.6.1 on linux

and same for windows.

--
jsu

Jul 31, 2002, 9:08 AM

Veteran (19537 posts)

Jul 31, 2002, 9:08 AM

Post #12 of 32

Shortcut

Re: [Seto Kaiba] The equivalent regex In reply to

Im using 5.6.1 on WinXP Pro

Jul 31, 2002, 9:08 AM

User (105 posts)

Jul 31, 2002, 9:08 AM

Post #13 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

i think i figured out what you did..

i got these results in windoze.

Benchmark: timing 10000000 iterations of regex, substr1, substr2...
regex: 3 wallclock secs ( 2.77 usr + 0.01 sys = 2.78 CPU) @ 3590664.27/s (n=10000000)
substr1: 18 wallclock secs (15.11 usr + 0.02 sys = 15.13 CPU) @ 660894.85/s (n=10000000)substr2: 8 wallclock secs ( 7.37 usr + -0.01 sys = 7.36 CPU) @ 1358511.07/s (n=10000000)

but.. that's testing on empty values for $url. what's the point of doing that? of course regex won't do anything.

--
jsu

Last edited by:

Seto Kaiba: Jul 31, 2002, 9:10 AM

Jul 31, 2002, 9:10 AM

Veteran (19537 posts)

Jul 31, 2002, 9:10 AM

Post #14 of 32

Shortcut

Re: [Seto Kaiba] The equivalent regex In reply to

>>
but.. that's testing on empty values for $url. what's the point of doing that? of course regex won't do anything.
<<

my $url = 'http://www.wiredon.net/foo/bar';

Jul 31, 2002, 9:12 AM

User (105 posts)

Jul 31, 2002, 9:12 AM

Post #15 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

be sure your regex is actually using that value cause my perl on both windows 2000 and redhat linux 7.2 aren't timing as well (still higher than the substr2).

--
jsu

Jul 31, 2002, 9:15 AM

Veteran (19537 posts)

Jul 31, 2002, 9:15 AM

Post #16 of 32

Shortcut

Re: [Seto Kaiba] The equivalent regex In reply to

I think I know what it is. I'm using the anonymous code ref method of benchmarking but the $url is declared outside the block, I need to move it inside or make it global.

Last edited by:

Paul: Jul 31, 2002, 9:15 AM

Jul 31, 2002, 9:22 AM

Veteran / Moderator (4108 posts)

Jul 31, 2002, 9:22 AM

Post #17 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

What's your bechmark code so I can try it out on a few machines?

- wil

Jul 31, 2002, 9:30 AM

User (105 posts)

Jul 31, 2002, 9:30 AM

Post #18 of 32

Shortcut

Re: [Paul] The equivalent regex In reply to

exactly what i meant by the $url isn't being used by the regex. my code is..

Code:
 #!perl 

use Benchmark; 

timethese (10000000, {   
 'regex'   => \&wil,   
 'substr1' => \&pau,   
 'substr2' => \&set   

}); 

sub wil {   
 my $url = "http://www.gossamer-threads.com/lala/lala";   
 $url =~ s!/[^/]*!!;   
} 

sub pau {   
 my $url = "http://www.gossamer-threads.com/lala/lala";   
 substr($url, rindex($url, '/'), (length($url) - rindex($url, '/'))) = undef;   
} 

sub set {   
 my $url = "http://www.gossamer-threads.com/lala/lala";   
 $url = substr ($url, 0, rindex ($url, '/'));   
}

[edit] i hate the advanced editor.

--
jsu

Last edited by:

Seto Kaiba: Jul 31, 2002, 9:34 AM

Jul 31, 2002, 9:41 AM

Veteran / Moderator (4108 posts)

Jul 31, 2002, 9:41 AM

Post #19 of 32

Shortcut

Re: [Seto Kaiba] The equivalent regex In reply to

Code:
This is perl, version 5.005_03 built for i386-linux 

Benchmark: timing 1000000 iterations of regex, substr1, substr2... 
     regex:  6 wallclock secs ( 4.54 usr +  0.01 sys =  4.55 CPU) 
   substr1:  7 wallclock secs ( 6.08 usr +  0.03 sys =  6.11 CPU) 
   substr2:  3 wallclock secs ( 3.15 usr +  0.04 sys =  3.19 CPU)

- wil

Jul 31, 2002, 9:43 AM

User (105 posts)

Jul 31, 2002, 9:43 AM

Post #20 of 32

Shortcut

Re: [Wil] The equivalent regex In reply to

Yep. A simple substr done right beats regex anyday.

--
jsu

Jul 31, 2002, 9:45 AM

Veteran / Moderator (4108 posts)

Jul 31, 2002, 9:45 AM

Post #21 of 32

Shortcut

Re: [Wil] The equivalent regex In reply to

Code:
This is perl, v5.6.0 built for i386-linux 

Benchmark: timing 1000000 iterations of regex, substr1, substr2... 
     regex: 16 wallclock secs (14.91 usr +  0.02 sys = 14.93 CPU) @ 66979.24/s (n=1000000) 
   substr1: 24 wallclock secs (23.85 usr +  0.02 sys = 23.87 CPU) @ 41893.59/s (n=1000000) 
   substr2: 11 wallclock secs (10.12 usr + -0.03 sys = 10.09 CPU) @ 99108.03/s (n=1000000)

- wil

Jul 31, 2002, 9:50 AM

Veteran / Moderator (4108 posts)

Jul 31, 2002, 9:50 AM

Post #22 of 32

Shortcut

Re: [Wil] The equivalent regex In reply to

Hey!!

Hang on a minute, Jerry.

You modified my regex to drop off the $ anchor!

- wil

Jul 31, 2002, 1:41 PM

User (105 posts)

Jul 31, 2002, 1:41 PM

Post #23 of 32

Shortcut

Re: [Wil] The equivalent regex In reply to

oops. didn't notice that.. but it's still the same thing.. if you think about it.. the [^/]* pretty much means ... get the last /..

but with the $ anchor.. i get even slower times for the regex.. here are my results.

Code:
  Windows 2000. Perl 5.6.1.   

Benchmark: timing 10000000 iterations of regex, substr1, substr2...    
     regex: 35 wallclock secs (31.03 usr + 0.04 sys = 31.07 CPU) @ 321895.32/s (n=10000000)    
   substr1: 29 wallclock secs (26.40 usr + 0.02 sys = 26.42 CPU) @ 378515.46/s (n=10000000)   
   substr2: 14 wallclock secs (12.54 usr + 0.04 sys = 12.58 CPU) @ 794975.75/s (n=10000000)   

RH Linux 7.2. Perl 5.6.1.   

Benchmark: timing 10000000 iterations of regex, substr1, substr2...   
     regex: 22 wallclock secs (21.23 usr +  0.01 sys = 21.24 CPU) @ 470809.79/s (n=10000000)   
   substr1: 15 wallclock secs (17.00 usr +  0.01 sys = 17.01 CPU) @ 587889.48/s (n=10000000)   
   substr2: 11 wallclock secs (10.71 usr +  0.02 sys = 10.73 CPU) @ 931966.45/s (n=10000000)

so.. still horribly slow.

i think it's obvious why substr1 is slower than substr2. it calls substr once, rindex twice and length once. while substr2 calls substr and rindex once. regex.. i wouldn't know how to explain.. but it's not faster.. so i won't explain it.

--
jsu

Last edited by:

Seto Kaiba: Jul 31, 2002, 1:57 PM

Jul 31, 2002, 1:56 PM

Veteran (19537 posts)

Jul 31, 2002, 1:56 PM

Post #24 of 32

Shortcut

Re: [Seto Kaiba] The equivalent regex In reply to

>>
i wouldn't know how to explain.. but it's not faster.. so i won't explain it.
<<

Im guessing part of the reason is that perl has to init the regex engine

Jul 31, 2002, 1:59 PM