As for the program I use, it's a hack (and a bad one!). It's pretty stupid and really just brute force opens the file, looks for certain strings, and writes out the file. I have to run it multiple times, when really I should be able to run it once, and generate all the cuts, but never had the time to do it.
I do have cuts available, where I've taken the top sections and put them into separate files, with the proper headers. Don't know how important that is, but it works. I try to generate them once a month, so my new sites have a fresher copy of the DMOZ.
Part of the problem for most people is simply the size of the files. They don't have enough disk space, or RAM to deal with them. The smaller "cuts" work better. They work better for general use as well.
As for vi, or any of the other editors, if you can open the file in a "read only" mode, so the program generates lower overhead, you might be able to mark the part you want and save to another file (I know joe and EMACS can do that, I've never gotten the hang of vi).
It would really be a good job for a summer intern (or fall intern) to take the import program, and mix with the parse routines, and be able to pre-parse, take out a cut of DMOZ then import that cut. It would be nice to do it all at once, but doing in several passes (like old style compilers) uses less resources, and allows setting up categories, import locations, and such that would require loads of RAM and linked lists to do in a single pass.
Anyway....
PUGDOGŪ Enterprises, Inc.
FAQ:
http://LinkSQL.com/FAQ Plugins:
http://LinkSQL.com/plugin