Gossamer Forum
Home : General : Perl Programming :

Darn regex :(

Quote Reply
Darn regex :(
Hi,

I don't suppose anyone has any ideas why this regex isn't working?

Sample HTML:

<span
lang=EN-GB style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>raYIbZlPS
Visit coursework ge in ge fo ge for ge more paper ge Do ge not ge redistribute
raYIbZlPS</span>

Code:

Code:
$desc =~ s|\Q<span\E[\n\s+]\Qlang=EN-GB\E[\n\s+]\Qstyle='font-size:\E[\n\s+]\Q7.5pt;font-family:Verdana;\E[\n\s+]\Qcolor:#CCFFFF'>\E([\W\w\d\s\n]+?)\Q</span>\E||ig;

Its really annoying me. Basically, the format can look anything like;

Quote:
<span
lang=EN-GB style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>cocf cfr
secfcfw orcf cfk incf focf cf!</span>


<span
lang=EN-GB style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>cocd cdr
secdcdw </span>

<span lang=EN-GB style='font-size:7.5pt;font-family:
Verdana;color:#FFCCFF'>VgODf6j4 from VgODf6j4 coursewrok VgODf6j4 work VgODf6j4
info VgODf6j4 </span>


<span lang=EN-GB
style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>11u2737Yu Visit
coursework gd in gd fo gd for gd more paper gd Do gd not gd redistribute
11u2737Yu</span>

<span
lang=EN-GB style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>cogb gbr
segbgbw orgb gbk ingb fogb gb;</span>

Note how the newlines are different on each of them. I have a feeling this is whats causing the problem :(

The idea behind this code, is to remove all tags that look like;

Code:
<span
lang=EN-GB style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>CONTENT</span>

..and;

Code:
<span
lang=EN-GB style='font-size:7.5pt;font-family:Verdana;color:#FFCCFF'>CONTENT</span>


Anyone got any suggestions? Smile

TIA

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Darn regex :( In reply to
What do you want the end result to look like?

- wil
Quote Reply
Re: [Wil] Darn regex :( In reply to
Hi,

Thanks for the reply :)

This content is just junk, and it needs removing totally. I tried it with;

Code:
$desc =~ s|<span([\W\w\d\s\n]+?)>(.*?)</span>||sig;

.. but that also filters out the valid ones :/

The main problems seems to be the newlines. The regex could easily be;

Code:
\Q<span lang=EN-GB style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>\E(.*?)\Q</span>\E

.. but considering some of the lines apprear as (for example);

Code:
<span
lang=EN-GB style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>

...or;

Code:
<span lang=
EN-GB style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>

.. and loads of other weird combinations :(

TIA for any sugestions :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Darn regex :( In reply to
Yes, but what do you want the end result to look like after its been through your regex? What do you want to get rid of, and what do you want to keep?

- wil
Quote Reply
Re: [Wil] Darn regex :( In reply to
Hi,

I literally just want to get rid of these span tags (<span lang=EN-GB;style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>...</span>), as they just hold junk text.. i.e;

<span lang=EN-GB;style='font-size:7.5pt;font-family:Verdana;color:#CCFFFF'>sdf sfsdfs djos dgjopsd gopsjdg sg</span>

Hope that explains what I'm trying to do better Smile

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Darn regex :( In reply to
Maybe using a HTML parser module, would be a better idea, if you want a safe job done...
Just my 2 cents.

Best regards,
Webmaster33


Paid Support
from Webmaster33. Expert in Perl programming & Gossamer Threads applications. (click here for prices)
Webmaster33's products (upd.2004.09.26) | Private message | Contact me | Was my post helpful? Donate my help...
Quote Reply
Re: [Andy] Darn regex :( In reply to
Code:
$html =~ s|</?span[^>]*>||sig;

Will that do what you want?
Quote Reply
Re: [Andy] Darn regex :( In reply to
im not pro like this guys but here a simple way:
($content) = ( $lines =~ />(.*?)</ );
or
($content) = ( $lines =~ />(.*?)<\/td>/ );

also u can do something like
($content) = ( $lines =~ />(.*?)<\/span>/ );