Hi,
I've parsed my first batch of DMOZ links with Parse_RDF.pl, and it worked great. I'd like to add other DMOZ links/categories to my existing links/categories.. however, for some reason the parser completely skips over the entries I've specified and adds nothing to the database.
Any idea what the problem is? I've posted the config portion of the script below..
Thanks! :)
Katina
--
# 1. Set this to the location of the content.rdf file (note you can leave the file gzipped).
my $CONTENT = './content.rdf.gz';
# 2. You can leave the file gzipped if you are short on disk space, just tell
# the program where your gzip program is. The -c decompresses to stdout and is
# required. The -d says to decompress (you can use gunzip as well).
my $GZIP = '/usr/bin/gzip -cd';
# 3. Set what subset of the Open Directory you want to parse.
my $SUBSET = 'Top/Computers/Internet/Commercial Services/Access Providers/By Region/North America/United States/Christian';
# 4. You can insert the categories into an existing subcategory, or if you leave this
# blank, links will be added to the existing category.
my $PREFIX = 'Computers & Internet/Internet Service Providers/';
# 5. Append? If set to 1 the script will add all links and categories, if set to 0, the
# script will only add links/categories that don't already exist (slows down the parsing).
my $APPEND = 1;
# 6. Defaults to use for Add_Date and Contact Name/Contact Email. Note: Don't set add
# date to today, otherwise you will end up with WAY TO MANY new links.
my $ADD_DATE = '1999-12-01';
my $CONTACT_N = 'DMOZ';
my $CONTACT_E = '';
# 7. Max lines per category. Shouldn't need to touch this.
my $max_limit = 5000;
I've parsed my first batch of DMOZ links with Parse_RDF.pl, and it worked great. I'd like to add other DMOZ links/categories to my existing links/categories.. however, for some reason the parser completely skips over the entries I've specified and adds nothing to the database.
Any idea what the problem is? I've posted the config portion of the script below..
Thanks! :)
Katina
--
# 1. Set this to the location of the content.rdf file (note you can leave the file gzipped).
my $CONTENT = './content.rdf.gz';
# 2. You can leave the file gzipped if you are short on disk space, just tell
# the program where your gzip program is. The -c decompresses to stdout and is
# required. The -d says to decompress (you can use gunzip as well).
my $GZIP = '/usr/bin/gzip -cd';
# 3. Set what subset of the Open Directory you want to parse.
my $SUBSET = 'Top/Computers/Internet/Commercial Services/Access Providers/By Region/North America/United States/Christian';
# 4. You can insert the categories into an existing subcategory, or if you leave this
# blank, links will be added to the existing category.
my $PREFIX = 'Computers & Internet/Internet Service Providers/';
# 5. Append? If set to 1 the script will add all links and categories, if set to 0, the
# script will only add links/categories that don't already exist (slows down the parsing).
my $APPEND = 1;
# 6. Defaults to use for Add_Date and Contact Name/Contact Email. Note: Don't set add
# date to today, otherwise you will end up with WAY TO MANY new links.
my $ADD_DATE = '1999-12-01';
my $CONTACT_N = 'DMOZ';
my $CONTACT_E = '';
# 7. Max lines per category. Shouldn't need to touch this.
my $max_limit = 5000;