Gossamer Forum
Home : General : Perl Programming :

Better way to do this?

Quote Reply
Better way to do this?
  $Description =~ tr/-/ /;
$Description =~ tr/,/ /;
$Description =~ tr/./ /;
$Description =~ tr/;/ /;
$Description =~ tr/"/ /;
$Description =~ tr/'/ /;
$Description =~ tr/:/ /;
$Description =~ tr/)/ /;
$Description =~ tr/(/ /;
$Description =~ tr/*/ /;
$Description =~ tr/@/ /;
$Description =~ tr/%/ /;
$Description =~ tr/$/ /;
$Description =~ tr/ /+/;


Is there a better way to do this? Basically I want to remove all the characters and then replace any spaces with a + sign.

ex: I--hope% Frownto get a reply)--"for my question"

TO

I+hope+to+get+a+reply+to+my+question

Quote Reply
Re: Better way to do this? In reply to
One way:

#!/usr/bin/perl

$description = qq~I--hope%& to get a reply)--"to my question"~;

$description =~ s/\W/ /g;
$description =~ s/( +)\b/+/g;

print "$description\n";

--Mark
Quote Reply
Re: Better way to do this? In reply to
Why are we putting one or more spaces in pattern memory "( +)" (the parenthesis)?

/ +\b/ means we are matching one or more spaces at the end of a word (\b). This means we won't get a "+" at the beginning of the string but what about the end of the string?
and we still could have a space at the beginning of the string (the example is not a problem).

What about tabs (these weren't in the example either)?

s/\W/ /g; #match "anything not a word character"
s/[\t ]+/+/g; # one or more tabs or spaces
# could include new lines by using \s instead of [\t ]
s/^+//g; # remove + at beginning of line
s/+$//g; # remove + at end of line
Quote Reply
Re: Better way to do this? In reply to
I don't know why I threw the parens on there, I put the whole thing together in like 30 seconds. I was probably seperating my portions while working on it. It will work fine without.

As for the other things, I didn't say this was a perfect way, just one. Put together quickly, and works with the example listed.

Smile

--mark
Quote Reply
Re: Better way to do this? In reply to
Mark,

Didn't mean to criticize. I've been learning/working with patterns and it was an attempt to understand what you were doing. The fine details of \W, \b and () still escape me (e.g. parenthesis are also used for precedence, pattern memory can be avoided with (?:foo)but how much does this matter?). It seems like your use of \b would leave a + at the end of the string.
Quote Reply
Re: Better way to do this? In reply to
Thanks guys, I will try that.
Quote Reply
Re: Better way to do this? In reply to
OK, that does not seem to work.

Here is the parse sub I have:

# Parse query.

sub ParseQuery {
@pairs = split(/&/, $ENV{'QUERY_STRING'});
foreach $pair (@pairs) {
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$name =~ tr/+/ /;
$name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$value =~ s/<!--(.|\n)*-->//g;
$value =~ s/<([^>]|\n)*>//g;
$value =~ s/\|//g;
$QUERY{$name} = $value;
&AssignVariables2;
&LogSearch2;

}
}


and here is the assign variables:

sub AssignVariables2 {
$query =~ s/\W/ /g;
$query =~ s/( +)\b/+/g;
$affiliate = $QUERY{'affiliate'};
$affiliate =~ tr/ /+/;
}

If for example the user input was /cgi-bin/search.cgi?query=information_management_systems

then the query passed on should be:
information+managemenet+systems

Anyone? Thanks
Quote Reply
Re: Better way to do this? In reply to
hi socrates..

i would do this for URLS..

Code:
$query = s/([^a-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/eg;

it parses everything including % $ * & ) ( [ @ EVERYTHING.. to what urls read it as..

for spaces.. it puts a plus sign..

jerry
Quote Reply
Re: Better way to do this? In reply to
if you want to go with marks thing.. i suggest to do this:

Code:
$query =~ s/\W/\s/g;
$query =~ s/(\s+)/+/g;
Quote Reply
Re: Better way to do this? In reply to
widgetz,

Thanks for your reply. But it does not seem to work. As a matter of fact, the above parse query as I posted earlier works except that when you have more than one word input in the command line query, it only sends the first work and chops the rest

In other words, if there is ?query=computer management then it will pass only computer

I want all words sent in with all the characters removed and a (+) sign in between.

What do you say?

Thanks