Gossamer Forum
Home : General : Perl Programming :

find duplicate entries

Quote Reply
find duplicate entries
i want to search for duplicate content in multiple fields. for example, if a spammer enters the same text in firstname, lastname, address, etc., i want to display an invalid entry error or just delete it. i've been using the following code:
Code:
my ($i, $j, $howmany);

$i = 0;
$howmany = @db_compare_fields; # number of elements
while ($howmany > $i) {
$j = $i + 1;
while ($howmany > $j) {
if ($in{$db_compare_fields[$i]} && $in{$db_compare_fields[$j]}) { # don't compare blank fields
if ($in{$db_compare_fields[$i]} eq $in{$db_compare_fields[$j]}) {
push (@input_err, "Invalid $db_compare_fields[$j]"); # show second field in error
} #if match
} # if fields not blank
$j++;
}
$i++;
}

@db_compare_fields is a list of the fields i want to compare
there must be a simpler way to do this. in addition, the error message contains duplicates. that is, if the person enters the same text in firstname, lastname, and phone, the error says

Code:
Invalid lastname
invalid phone
invalid phone
i only want phone listed once. please help.

Quote Reply
Re: [delicia] find duplicate entries In reply to
I would do something more like:

Code:
sub do_spam_checks {

my $FORM = $_[0];
my $counters = {};

map {
# only run on fields we care about
if ($_ =~ /^(col1|col2|col3)$/) {
$counters->{$vals->{$_}}++;
}
} keys %$vals;

my @bad;
map {
if ($counters->{$_} > 2) {
push @bad, $_;
}
} keys %$counters;

return @bad;
}

What that is doing, is only checking fields you care about - and then putting the values of those fields intro $counters->{xxx} . Then the final bit will loop through the counter, and return the bad VALUES (not field)

Personally though - I would be more inclined to set up something like Google reCAPTCHA if you are worried about spammers. The above kind of system is tricky. For example, this is a port of something I did for someone else a while back on their contact form:

Code:
sub do_url_checks {
my $FORM = $_[0];
my $counters = {};

#print "Content-type: text/html \n\n";

map {
#print qq|Doing: "$_"\n|;
while ($FORM->{$_} =~ m/"?(https?:\/\/.+?)([\s"]+|$)/ig) {
my $url = $1;
#print "Found: '$1'\n";
if ($counters->{$url}) {
$counters->{$url}++
} else {
$counters->{$url} = 1;
}
}

while ($FORM->{$_} =~ m/(.+?\.(online|news|porn|work|date|wang|men|click|loan|top|site|fit|live|life|rest|club|country|stream|download|xin|gdn|racing|jetzt|win|bid|vip|ninja|ren|rocks|faith|kim|loan|ly|mom|party|review|science|space|trade|accountants|xyz|webcam|win))/ig) {
if ($counters->{$url}) {
$counters->{$url}++
} else {
$counters->{$url} = 1;
}
}

} keys %$FORM;

# use Data::Dumper;
# print Dumper($counters,$FORM);

map {
if ($counters->{$_} > 3) {
return 0;
}
} keys %$counters;

return 1; # all good

}

It's checking for URL's spamming their contact form, and rejecting it if it finds lots of repeats, as well as bad words. Spammers are very ingenious though, and will use stuff like entering HTTP://www.foo.com https://foo.com etc (i.e variants of the same thing - so if it were a phone number - they might enter +44(0) 1234 1234567, and then in another input +44 0234 1234567

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] find duplicate entries In reply to
i already have a good "dirty word" checker so i'm more interested in the first one. what do these two lines do?

Code:
my $FORM = $_[0];

...


if ($_ =~ /^(col1|col2|col3)$/) {
Quote Reply
Re: [delicia] find duplicate entries In reply to
Hi,

Code:
my $FORM = $_[0];

This is taking the values passed in for the form. So in your case, it would probably be something like;

Code:
my @test = do_spam_checks (%$in)


These are the fields you want to check for duplication in:

Code:
if ($_ =~ /^(col1|col2|col3)$/) {

So if you have fields "phone", "address1" and "address2", you could put:

Code:
if ($_ =~ /^(address1|address2|phone)$/) {

All it is doing it working out which fields to work on (and skipping the rest)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] find duplicate entries In reply to
so if i have a variable such as:
Code:
@db_compare_fields = ( Firstname,Lastname,Phone,Email);
would i change
Code:
if ($_ =~ /^(col1|col2|col3)$/) {
to
Code:
if ($_ =~ @db_compare_fields) {
i may have different field names for different tables or installs
Quote Reply
Re: [delicia] find duplicate entries In reply to
 

Code:
if ($_ =~ /^(Firstname|Lastname|Phone|Email)$/) {

or better if you change that array often:

Code:
my $r = join ("|", @db_compare_fields);
if ($_ =~ /^($r)$/) {

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] find duplicate entries In reply to
thanks, i will work on this later!