Gossamer Forum
Home : General : Perl Programming :

Seach engine similar to XAV Indexed Search?

(Page 1 of 2)
> >
Quote Reply
Seach engine similar to XAV Indexed Search?
Hi!

Does anyone know of a hacked version of XAV Indexed Search's search.pl? One that will work with larger databases faster and possibly better navigation?

I've used XAV's build.pl and search.pl for 2 months and very happy with the results. Its features are solid... it has a nice weighing feature (by title, date, meta, etc.) But there are a few things on why I need to look for a new one.

Quote:
Depending on the power of your server, Xavatoria will begin to slow down excessively when scanning more than five or ten megabytes of text. If you need to scan a larger set, you may wish to try some of the more powerful engines listed at www.cgi-resources.com.

-from XAV website

But over at CGI-Resources, I couldn't find more powerful search engines than XAV. Or at least, they weren't obvious to me.

And then, the script uses a method of click here for next 20 type of navigation. I wonder if it's technically possible to do:
1-10 | 11-20 | 21-30 hits instead of that.

I now have a 6.1MB file of text, from 1,500 pages. The search is still pretty fast, mind you, and this on a virtual account. File grows by about 150k a day, I reckon. Just want to be ahead of the textfile :-)

Thanks in advance.

Regards,
Khayu
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
khayu,

Hi there. I was wondering how you have build.pl automatically indexing your site. Are you using a crone process or something to index the site on a daily basis?

My understanding is that you have to manually re-index the site.

If you know of a way to automate the build.pl file, I would be extremely greatful.

Thanks.

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
No, Eliott. I don't have a cron on it. It's part of my daily NEWS update routine. It's just a link that I click (pointing to build.pl) on after I do my news update.

Regards,
Khayu
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Ah...That sucks. I am really looking for a way to automate this process...If only I had $4,000 to spend on UltraSeek - software.infoseek.com . It is the best product on the market.

Oh well....

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Someone should pick up where XAV left off, because those are great scripts. They should be updated, and hacked further. It's a shame if they are left unsupported, and nothing comes up that can replace it.

Whatever happened to the author anyway?

*sigh*
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
I am currently working on customizing the heck out of the Xav Indexed Search script.
Yet the major obstacle I have come accross is automating the build.pl file so that it does not negatively impact the server. I could set-up a counter program that would cycle until the date designated to run the program, but this eats up virtual memory.

If anyone knows a way to automate the build.pl on NT, let me know...Thanks.

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Hi,

i've add some features on Xav Search...

1º bolding search terms;
2º results bar (like altavista);
3º more search options;

try this: www.thiago.he.com.br

search for: universo online


------------------
[]'s

Lucas Saud - #19815087









Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Hi Lucas (and Elliott),

>1º bolding search terms;
>2º results bar (like altavista);
>3º more search options;


How do you do that? Care to post the mods? If it's called for, I'd donate some webspace, a cgi-bin, and a URL database to test all posted mods on.

Regards,
Khayu
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Hi guys,

i will post all mods in this thread....

i've add all mods in the xavatoria indexed search (http://www.xav.com/search.txt). in my computer i tested it with a large database (30MB) and the scripts works fast... (30seconds..)

in four days i post the download url...


tanks!

------------------
[]'s

Lucas Saud - #19815087









Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Eliot,

Code:
# Rebuild Search Engine Index file - Daily at 3 a.m.
0 3 * * * /path/to/xav/build.pl

I hope this helps.
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Re: UltraSeek and $4,000 (currently listed as $4,995 for up to 10,000 documents)

Quote:
You should also be aware that the general licensing terms of Ultraseek Server do not allow you to use the software to create a public search service for content you do not create or maintain. If you would like to use this software to provide such a service, please contact us to discuss special terms and conditions.

Wonder how much more they will ding you for that capability.

Ditto on the wish to see further development of the XAV script.


[This message has been edited by Dave (edited June 25, 1999).]
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Thunderstone? Their products have a pretty good reputation. The Webinator is free for indexes of up to 10,000 documents, although your results page will carry the Thurnderstone logo. But they have commercial versions too...

www.thunderstone.com

adam

[This message has been edited by dahamsta (edited June 25, 1999).]
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Bobsie,

No problem. I am very glad that I found a solution.

Lucas,

I will be posting my version in this thread in the next week or so. I'd like to add in your mods and then post it. You can "pick it up" here.

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Hi,

I've made all modifcastions to ftp.xav.com/search.txt, because i have plans to create one search engine using this script.

if you have problems to add my mods in your script, let me know..i help you!

Questions:

1 - Sort by Date: i'm work on this;
2 - Fuzzy Search Option (app = apple)(in future, maybe...)
3 -Building multiple index files for different web sites and having the ability to search the multiple database files via ONE search form. (sorry, i dont have plans to write this, because i use it to external search, no internal search)

4 - Sort by score of results (type of Excite) maybe in nextweek i've done this.

More mods will be soon..

Download URL:

http://http://www.weblinker.he.com.br/xav.txt

ps: Eliot, can you send me your script?

------------------
[]'s

Lucas Saud - #19815087









Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Eliot,

Oops, I didn't see your message that mentioned Windows NT. I was responding to an earlier one that said:

Quote:
Ah...That sucks. I am really looking for a way to automate this process...If only I had $4,000 to spend
on UltraSeek - software.infoseek.com . It is the best product on the market.

Oh well....

Anyway, glad you found a solution.
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Bobsie,

Thanks for the suggestion, however, cron DOES NOT run on NT. I did find a solution to this problem with NT via WinAT, which does allow cron like processes to occur.
By typing in the perl command in AT, it
now indexes the site nightly.

Webinator is a decent product, however, it is disk space and virtual memory intensive. Webinator takes up about 30% of your total disk space, with very little room for filters. Xav Indexed Search Engine is a better script (yet it is limited in that it can only operate up to 6 MGS in the data file).

I, too, have added a bunch of customizations to the Xav Indexed Search, including:

1) Match Terms Options:

(All Terms - And)
(Any Terms - Or)
(As Phrase)

2) Show Results:

(Compact Form)
(Titles and Summaries)
(Titles Only)

3) Matches Per Page:

(10 hits)
(25 hits)
(50 hits)
(100 hits)

4) Case Settings:

(Case Insensitive)
(Case Sensitive)

5) Rotating Search Tips (refreshes
utomatically when pages are reloaded or when people visit the NEXT MATCHES). It is fully customizable in that you can add your own search tips in a .txt database file (which allows HTML codes for formatting text).

6) Pop-Up Window Option Link for links

To see this in action, go to

www.coco.cc.az.us/search/

I also have made the script very user friendly by implementing user options, including:

1) Font Styles
2) Table Cell Colors
3) Links (Search Page, Title, etc.)
4) Ability to Insert Footer and Header Files.

I also added more variables and cleaned up the Summary File. (I am in process of using an add-on logging program that will provide meaningful statistics of keywords searched for and results. I tinkered with mirroring the structure of the IIS log files, but not very successful.)

I also have added some codes in the build.pl file that disallows batch directories and files from being indexed. (Very useful for those annoying Front Page hidden directories.)

Lucas,

Great job with adding those cool features and offer the mod publicly! I will try those modifications next week at the office. (Really excited to have the spanning results available.) I will let you know how it works.

Question: Will it be easy to add in parts of the new script into the older version of Xav Indexed Script? I briefly looked at the script and its structure seems to have dramatically changed (and also now allows people to submit URLS to the database?). I looked at the section that shows spanning results and it seems VERY similar to the older version. Just wanted your input on the ease of adding in the modifications you've written. Let me know.

Question: Have you thought about adding in the following features?:

1) Sort by Date (days - most recent docs)
2) Fuzzy Search Option (app = apple)
3) Building multiple index files for different web sites and having the ability to search the multiple database files via ONE search form.

(Glad to see that the Xavatoria Indexed Search script will be upgraded.)

Dave,

Sorry I misquoted the price. Actually there is an educational discount, which is 70% off the license price (not technical support). I spoke with an InfoSeek Customer Service rep for about an hour.

khayu,

I will be posting my mods here in the next few weeks after I have cleaned up the codes and also added in the features that Lucas has so generously worked on.

See ya.

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us

[This message has been edited by Eliot (edited June 26, 1999).]

[This message has been edited by Eliot (edited June 26, 1999).]

[This message has been edited by Eliot (edited June 26, 1999).]

[This message has been edited by Eliot (edited June 26, 1999).]

[This message has been edited by Eliot (edited June 26, 1999).]
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Lucas et al,

Nice scripting. However, not very portable to older versions of Xav Indexed Search or modified versions of Xav Indexed Search. I spent over 10 hours trying to get the spanning search results to work with limited success. The main problem I have is that since I allow the option for end users to choose hits per page, it screws up the calculations for the spanning results.

If you have any suggestions for how to correct this error, please send me an email message or post your response in this Item.

(Sorry, I realized that with the Bad Referer sub-routine I have placed in the script, the search.cgi cannot be accessed unless you use our search form. So, I cannot post an example.)

The other problem is that our modifications are quite different, which makes it complicated to implement each other's mods.

But great job with your script! Keep it up!

Hasta la vista,

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us

[This message has been edited by Eliot (edited June 30, 1999).]

[This message has been edited by Eliot (edited July 03, 1999).]

[This message has been edited by Eliot (edited July 09, 1999).]
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Eliot, have you tried Wincron to automate build.pl? Wincron is available at http://www.erols.com/graysteel/wincron.html and works on Win95/98/NT and is similar to cron. If it works, I'll let you know where to send the $4,000! ;-)

[This message has been edited by RJ Ackerman (edited June 30, 1999).]
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Already figured it out...Actually it is the WinAt program through the NT Resource Kit that allows cron to run.

*laughs*

Now, I got a good one for you. The XAV Search Engine only searches one data file. How can we make it search multiple data files (i.e., multiple web sites on servers)? This is the next thing I am about to tackle.

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
  
Quote:
How can we make it search multiple data files (i.e., multiple web sites on servers)? This is the next thing I am about to tackle.

But it does that already.

http://woodworkingsearch.hypermart.net/

Script comes with this:
$IndexFile{'Remote Web Pages'} = 'remotes.txt';

I currently have this:
$IndexFile{'CWB/WWP mags'} = 'cwbmag.txt';
$IndexFile{'Cabinetmaker/FDM mags'} = 'fdmmag.txt';
$IndexFile{'Woodworkers Central'} = 'wood_org.txt';
$IndexFile{'Badger Pond'} = 'badger.txt';
$IndexFile{'Woodweb'} = 'woodweb.txt';

That's it, nothing to it.

[This message has been edited by Dave (edited July 05, 1999).]
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Great, I will try that in the next few days.

Thanks.

Now, how about the following options:

1) Fuzzy searches (ant -> anthropology)
2) Sorted by Date

Do you know how to implement these options??
(I know Lucas has been working on these options, but I have not seen him on this Forum and he has not provided his email address.)

Smile

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Dave,

Code:
That's it, nothing to it.

Sorry, but I disagree. If people are using the new Xav search script (which is way less user friendly than the earlier version, I might add), then it works. But I am using a hacked version, because I do not need a search engine that adds sites dynamically into the database file (I use LINKS for that). My intention for using Xav is to build an index file of web sites within our web server and then have that file be searchable.

I looked at the script and it is very confusing how Realm is defined and used in the script. But I will attempt to take your suggestion as a starting point and see if I can't get it to work with the hacked version of Xav I am using.

Regards,

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Hello,

Can this be used to index external sites and then create a text file?

Dave,

I tried the search on your site. How do you actually index all those sites?

Feedback appreciated.

Thanks
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Eliot,

I knew that line was asking trouble.
Quote:
I do not need a search engine that adds sites dynamically into the database file (I use LINKS for that). My intention for using Xav is to build an index file of web sites within our web server and then have that file be searchable.
I'm not sure what you mean, I think (being cautious here) it does what you are describing. If you want full text indexed search capability of those sites (each with a separate index file) XAV can do it, as remote's, local realms or runtimes. From what I can tell the end result is the same with either version (I use both and have local and remote indexes).

Seems your main problem is incorporating your mods into the new script, or realms into the old. By user friendly you must mean, to hack, as the 'new' version has an excellent admin function.

socrates,

It indexes internal and/or external sites creating a full text searchable .txt file (gets big fast). I need to compare size as I have been playing with a rather large group of exclude words (problem, case sensitive).

Its actually pretty easy to get sites/pages indexed, its a spider that goes out, parses and saves in a format the search function understands. It can also return a list of links found which can, in turn, be added to the index (individual, selected or all). The admin section has editing and updating functions. Descriptions can be lousy depending on meta description, if no tag the first so many characters of the document.

...Looks like we may have an update www.gossamer-threads.com/scripts/forum/resources/Forum8/HTML/000536.html

...Yup, and it has a few improvements/changes. All I had to do is type in existing realm names/files, which is now handled through admin, and I was up with my existing indexes. He fixed the header which is now used on all script pages (should do the same with footers) and the search form html stuff is up near the top (easier to find/mod). The new search tips page is worth a read, looks like a word is anything without a space, * now supports full words and some special searches like link:URLtext are possible.

[This message has been edited by Dave (edited July 07, 1999).]

[This message has been edited by Dave (edited July 07, 1999).]
Quote Reply
Re: Seach engine similar to XAV Indexed Search? In reply to
Dave,

Yes, I mean that the newer version is leaps and bounds away from the older version. They are two totally different scripts. Yes, I am having problems fitting the new version into my version. I started to configure the new version manually and it was a nightmare. While it may be "user-friendly" for novice programmers, it is a nightmare for advanced programmers who like "clean" scripts with descriptions/comments for sections of the script (and I might add like to customize the script manually).

I do not need to add sites or pages from other servers. I just need to build database files for our various web sites on our web server, and be able to have one search script that will query data from all the database files.

(Also, there are many things missing in the newer version, which would be nice to all programmers, including calling external header and footer files. I am a strong proponent of calling external footer and header files to maintain consistency of web sites without having to go into a PERL script and pasting codes whenever a header or footer changes.)

Welp, we shall see what happens. "Don't fix it if it ain't broken" definitely applies. I will play around with the scripts in baby steps. I may install the new script and then try to add the mods I have in my script.

Thanks for the suggestions though...I really appreciate it.

Cheers (as Alex says),

------------------
Eliot Lee
Founder and Editor
Anthro TECH, L.L.C
http://www.anthrotech.com/
info@anthrotech.com
==========================
Coconino Community College
http://www.coco.cc.az.us/
Web Technology
Coordinator
elee@coco.cc.az.us

[This message has been edited by Eliot (edited July 07, 1999).]
> >