DotNetFirebird.org DotNetFirebird
Using Firebird SQL in .NET.
Monday, January 10, 2005

DotLuceneFbDirectory: DotLucene Index Storage for Firebird

I have created an add-on for DotLucene search engine that allows storing the Lucene index in a Firebird database. You can download the source code here. It is licensed under Apache Software License 2.0 so you can use it freely in your applications. You are only required to put an acknowledgement there (see NOTICE file). It should work with DotLucene 1.3 (tested) and 1.4 (it uses the same interfaces). But remember that the index file format has changed in 1.4.

Performance tips:

  1. Use FSDirectory instead.... No, really, the performance goes down if you store the index in the database (my results are that database is twice slower that filesystem). Use the database storage only when you have no other choice.
  2. Use compound index format (writer.SetUseCompoundFile(true);). This is default in DotLucene 1.4 but in 1.3 you have to do it manually.
  3. Create the index in memory, optimize, then save it on disk using FbDirectory.Copy(); This will only help you if you are rebuilding the whole index from scratch.
  4. If you are adding a document to the index from a desktop application, do it in background (in a separate thread). You are still able to search while you are adding a new document.

Comments:

What I don't understand is if this way is faster than index using the standard Lucene index. Have you done some test or benchmark?
As I wrote in performance tips, the response times (indexing, searching) are about twice longer when using FbDirectory. Don't store the index in Firebird when performance is your top priority. It makes sense to use the database when you can't store the index in the filesystem. The advantages of using Firebird are:
1) you don't need any special filesystem permissions (this may be useful when using a limited webhosting)
2) you can have all data and indexes in one (DB) file (useful for desktop applications)
This is cool. I have a desktop application where the database size is over 1GB size. Will the firebird be able to handle large size indexes ? or what is the maximum permissible size of the database with which firebird will work without losing on speed on an average deskop.

Just hoping if i was able to build the index and save it to the CD then the user will be able to search CD without actually copying the contents to hard disk.

Thanks
Robinson
another big advantage is in clustered environments. in ejb clusters e.g. you are not allowed to write to the file system (usually you just have cheap nodes without cluster filesystem).
Robinson: For this purpose you should rather store the DotLucene index directly on the CD, not in the database. As I said using the database for the index means performance loss.

As to Firebird database size: The bigger the database the slower it will work (that's how every database work). According to my experience Firebird scales well (i.e. the speed decreases proportionally to the database size). The only solution for you case would be to do a test. The database speed depends on the DB design more on than anything else.

If you have a 1GB size DB for a desktop application you are probably storing BLOB data there. To make it quicker store it directly in the filesystem and store only the metadata in DotLucene index and/or database.
Hello Dan

I am being asked to build a google-esque search facility within an ASP.NET application. the database to be searched is a normalised Firebird 1.5 database.

i am considering evaluating dotlucene for this purpose. however i have a few concerns and wondered if you could shed some light on them for me.

the example you give searches a fairly static documentation datasource. however my datasource consists of records which will be deleted/inserted/updated on a frequent basis. my main question is this. how would i be able to maintain concurrency between the external index and the database data? also, does the fact that i have to republish the external search index every time a user changes some data mean this will impair the search performance? will lock outs occur?

many thanks for any assistance with this.
Does this work with Lucene.NET 1.9.1? I made some small syntax changes and got it installed with this version, and things mostly work. However, 2 things look suspect to me:

1. When deleting all files from the index, the 2 tables still have records left in them.

2. After a delete operation (single or multiple, it doesn't matter), if I then do another create index call to add the document back into the index, it's not there. However, if I immediately perform the call again, the document is now available in the index.

Thanks for any info here.
It's been written for DotLucene 1.3. AFAIK, not much has changed in Directory class.

Are you closing the IndexReader properly after the deletes?
Thanks for that info. I'll do some more testing on other versions to see what happens. Are the rows in the tables supposed to get deleted when the index entries get deleted?

Here's the delete code. The Search code is fairly similar. Hope it wraps well.

I'll post back with more results this weekend, but if anything stands out, let me know.

Thanks much!

Store.Directory directory = Store.FbDirectory.GetDirectory(CONN_STR);
IndexReader reader = IndexReader.Open(directory);
Term term = new Term("path", fileName);
reader.Delete(term);

reader.Close();
directory.Close();
The index will always contain some files (i.e. the tables will always contain some rows) even if you delete all the Documents (you can try it out with FSDirectory).
I tested both Lucene.Net 1.4.3 and 1.9.1 with the same results. In both versions, FSDirectory would behave just fine when doing an index/delete/index cycle, and FbDirectory would require index/delete/index/index in order to get search results.

This definitely seems to be a problem with FbDirectory. I'll continue digging into that area and report back, but if you are at least able to reproduce this behavior, I would greatly appreciate hearing that.

Thanks again.
Oh, FWIW, I'm using very simple mods to the Demo projects in the Lucene.Net, too. The mods pertain to obtaining an FbDirectory instead of the FSDirectory and that's it.
I m using dotlucene 1.4, and using filesystem to store the index file. Now what I want to do is whenever I add a new file to my data storage (which is of xml & HTML files) and it is not been indexed earlier then it should be indexed to the existing index file.
Can you please help me out to get the information of a particular file existance (on the basis of name) in the index file?
Blog comments are closed.



Previous

Archives