Gossamer Forum
Home : Products : DBMan : Installation :

Convert existing database to DBman format

Quote Reply
Convert existing database to DBman format
I have a database of about 1000 files with MD5 filenames -- ie. 98fj91348np97h97nd2983.sgm which is viewed by a script attaching a page top/bottom to it. There are generic headings in each file such as <category></category> with the data contained in those fields. A sample file would look something like this:

<model>97 Judy Sl</model>
<name>Adam Borowy</name>
<price>$180</price>
<from>Minnetonka Mn</from>
<comments></comments>
<brand>rockshox</brand>
<email></email>
<date>3/7/98</date>
<category>forx</category>

I am starting to install the DBman script and instead of manually entering like 1000 entries - is there some sort of script that can convert my existing files into the DBman format?

Any help would be extremely appreciated!! Any questions about what I've asked, just let me know and I'll clarify.

Thanks.

------------------
Jason
Extreme mtb
http://extreme.nas.net
Quote Reply
Re: Convert existing database to DBman format In reply to
Is it one file per record? You could create a perl script to do the job. Something like:

#!/usr/local/bin/perl

# Get the list of files to work on.
open (DIR, "/path/to/sgm/files") or die $!;
@files = grep $_ !~ /^\.\.?$/, readdir (DIR);
close DIR;

# Open the output file.
open (DB, ">output.db") or die $!;

# Go through each file.
foreach (@files) {
open (FILE, "/path/to/sgm/files/$_") or die $!;
# Store in @rec one record.
while (<FILE> ) {
chomp;
s/<.+?>//g; # Remove <> tags.
push (@rec, $_);
}
close FILE;

# Print the record out to the database. We add a id counter as the first field.
print OUT join ("|", $id++, @rec);
print OUT "\n";
@rec = ();
}
close OUT;

Hope that gives you a start (it is untested, but does compile. =)

Cheers,

Alex
Quote Reply
Re: Convert existing database to DBman format In reply to
Yes, its one file per record --- I have a database full of files like 98jfp0q9384mf0q3848j0q93f4.sgm = all of which have different filenames.

I changed the path to point to a test directory with a couple .sgm files in it and it creates the output.db but it has no contents.

Any ideas?

Thanks a million for your help!!


------------------
Jason
Extreme mtb
http://extreme.nas.net
Quote Reply
Re: Convert existing database to DBman format In reply to
Try a little debugging to see what's going on. Replace the foreach loop with:

# Go through each file.
foreach (@files) {
print "Working on file: $file ... ";
open (FILE, "/path/to/sgm/files/$_") or die $!;
# Store in @rec one record.
while (<FILE> ) {
chomp;
s/<.+?>//g; # Remove <> tags.
push (@rec, $_);
}
close FILE;

# Print the record out to the database. We add a id counter as the first field.
print OUT join ("|", $id++, @rec);
print OUT "\n";
print "Added record size: $#rec\n\t@rec\n";
@rec = ();
print "\n";
}

Hopefully you should be able to see which files it is working on, and then make sure each file produces a record of appropriate size.

Hope that helps,

Alex
Quote Reply
Re: Convert existing database to DBman format In reply to
I added the bit of code that you gave and ran it from the UNIX command line (./convert.cgi) -- (that's just the name that I gave the file) and it just paused for a second and went back to the prompt and didn't say "working on file" and "added record size", it just made the output.db file containing no data.

Again, thanks for your help and an amazing script!


------------------
Jason
Extreme mtb
http://extreme.nas.net
Quote Reply
Re: Convert existing database to DBman format In reply to
Did you catch my mistake?

Quote:
open (DIR, "/path/to/sgm/files") or die $!;
@files = grep $_ !~ /^\.\.?$/, readdir (DIR);
close DIR;

That should be opendir not open.

Cheers,

Alex
Quote Reply
Re: Convert existing database to DBman format In reply to
So the top part now reads:

opendir (DIR, "/srvs/vcgi/CG0019/db/auth") or die $!;
@files = grep $_ !~ /^\.\.?$/, readdir (DIR);
close DIR;

I ran the script and it appears to be working but the output.db file's size is 0 and some of the words are cut off. The following was displayed when I ran it:

"Working on file: ... Added record size: 9
money for the weight it saved and the ride. Columbia, Sc Kick Arse!! Answer mcs"

There should be about 30 more words before the word "money" and it about 30 more are cut off after "mcs". In the script, it says "workin on file: $file ..." It looks like that isn't displaying either. One more tidbit, how would the "|" be inserted between each field?

Thanks again for all your help!!

------------------
Jason
Extreme mtb
http://extreme.nas.net
Quote Reply
Re: Convert existing database to DBman format In reply to
Man, I should have tested it (but I didn't have the input files, that's my excuse). =)

Notice:

open (DB, ">output.db") or die $!;
...
print OUT join ("|", $id++, @rec);
print OUT "\n";

That should be print DB, not print OUT.

Give that a whirl..

Alex
Quote Reply
Re: Convert existing database to DBman format In reply to
I am still getting a filesize of 0 on the output.db

I put a few .sgm files into a public directory so you can have a look at them for testing. You can see them at extreme.nas.net/alex I added .txt to the extension so your browser can read 'em.

------------------
Jason
Extreme mtb
http://extreme.nas.net
Quote Reply
Re: Convert existing database to DBman format In reply to
All right, here's a tested working version 3.0. =)

#!/usr/local/bin/perl -w

use strict;
my $dir = 'test';
my $out = 'output.db';

opendir (DIR, "test") or die $!;
my @files = grep $_ !~ /^\.\.?$/, readdir(DIR);
closedir (DIR);

open (DB, ">$out") or die $!;
my $id = 0;
foreach (@files) {
my @rec = ();
open (FILE, "$dir/$_") or die $!;
while (<FILE> ) {
chomp;
s/<[^>]+>//g;
push (@rec, $_);
}
close FILE;
print DB join ("|", $id++, @rec), "\n";
}
close DB;
# ----------------------------------------

Let me know if you have any other problems.

Alex
Quote Reply
Re: Convert existing database to DBman format In reply to
Alex, you are THE MAN!!! It works like a charm. I thought that I was going to have to do all of the data input by hand which would take forever for 1000 records. You saved me sooooooooooooooooo much time. All that I have to do is go into the db and plug in the extra fields that are missing. I can't thank you enough for all of your help and for making this amazing program as well as Links which I am also using.

Thanks again!!


------------------
Jason
Extreme mtb
http://extreme.nas.net