Php文档 Php问答行业资讯 Php论坛 Php手册 Php博客

游戏榜单

软件榜单

关闭导航

热搜榜

热门下载

热门标签

关闭搜索

php爱好者> php文档>lucene2.2.0学习实例

lucene2.2.0学习实例

时间：2007-10-07 来源：linxh

本文转自： http://blog.csdn.net/hongjue/archive/2007/09/22/1795729.aspx 原文如下： org.apache.lucene包是纯java语言的全文索引检索工具包。
Lucene的作者是资深的全文索引/检索专家，最开始发布在他本人的主页上，2001年10月贡献给APACHE，成为APACHE基金jakarta的一个子项目。lucene广泛用于全文索引/检索的项目中，目前已经有很多应用程序的搜索功能是基于 Lucene 的，比如 Eclipse 的帮助系统的搜索功能。Lucene 能够为文本类型的数据建立索引，所以你只要能把你要索引的数据格式转化的文本的，Lucene 就能对你的文档进行索引和搜索。比如你要对一些 HTML 文档，PDF 文档进行索引的话你就首先需要把 HTML 文档和 PDF 文档转化成文本格式的，然后将转化后的内容交给 Lucene 进行索引，然后把创建好的索引文件保存到磁盘或者内存中，最后根据用户输入的查询条件在索引文件上进行查询。不指定要索引的文档的格式也使 Lucene 能够几乎适用于所有的搜索应用程序。

我在参考前人代码的时候，发现lucene2.2.0已经有了一些变化（很小），所以把我的代码共享出来。以下是我用lucene2.2.0包对文本文件进行索引和关键字查询的实例代码：

1.txtFileIndex.java
主程序

/** *//**

* 使用lucene2.2.0对txt文件建立索引

package luceneTest;

import java.io.*;

import java.util.Date;

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.index.IndexWriter;

/** *//**

* @author 红角（somesongs） 2007-9-20

* mail:[email protected]

* msn:[email protected]

public class txtFileIndex ...{

/** *//**

* @param args

public static void main(String[] args) ...{

//索引存放目录

File indexDir = new File("E:\JavaGo\luceneTestData\index");

try...{

if(args.length>0)...{

//需索引的数据存放目录

File dataDir = new File(args[0]);

Analyzer luceneAnalyzer = new StandardAnalyzer();

IndexWriter writer = new IndexWriter(indexDir,luceneAnalyzer,true);

long startTime = new Date().getTime();

IndexDir theIndexDir = new IndexDir();

theIndexDir.indexDocs(writer,dataDir);

writer.optimize();

writer.close();

long endTime = new Date().getTime();

System.out.println("共花费"+(endTime-startTime)+"秒,索引文件存放在:"+indexDir.getCanonicalPath());

}

else...{

System.out.println("请输入要查找的关键字：");

String keyWords = new String();

BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in));

keyWords = stdin.readLine();

SearchIndex searchIndex = new SearchIndex();

searchIndex.search(keyWords,indexDir);

}

catch(Exception e)...{

System.out.println(e);

}

2.IndexDir.java
对指定目录下的文本文件进行索引类

/** *//**

* 对目录中的文件建立索引

package luceneTest;

import java.io.*;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

/** *//**

* @author 红角（somesongs） 2007-9-20

* mail:[email protected]

* msn:[email protected]

public class IndexDir ...{

IndexDir()...{}

public void indexDocs(IndexWriter writer,File dataDir)

...{

if(dataDir.isDirectory())...{

File[] dataFiles = dataDir.listFiles();

if(dataFiles!=null)...{

for(int i=0;i<dataFiles.length;i++)...{

indexDocs(writer,dataFiles[i]); //递归

}

else if(dataDir.getName().endsWith(".txt"))

...{

indexFile(dataDir,writer);

}

private void indexFile(File dataFile,IndexWriter writer)...{

try...{

System.out.println("正索引文件："+dataFile.getCanonicalPath());

Document doc = new Document();

Reader txtReader = new FileReader(dataFile);

doc.add(new Field("path",dataFile.getCanonicalPath(),Field.Store.YES,Field.Index.UN_TOKENIZED));

doc.add(new Field("contents",txtReader));

writer.addDocument(doc);

}

catch(IOException e)...{

System.out.println(e);

}

3.SearchIndex.java
对关键字进行搜索类

/** *//**

* 对关键字进行搜索

package luceneTest;

import java.io.File;

import org.apache.lucene.index.Term;

import org.apache.lucene.search.*;

import org.apache.lucene.store.FSDirectory;

/** *//**

* @author 红角（somesongs） 2007-9-20

* mail:[email protected]

* msn:[email protected]

public class SearchIndex ...{

SearchIndex()...{}

public void search(String keyWords,File indexDir)...{

if(!indexDir.exists())...{

System.out.println("索引目录不存在！");

return;

}

try...{

FSDirectory indexDirectory = FSDirectory.getDirectory(indexDir);

IndexSearcher searcher = new IndexSearcher(indexDirectory);

Term term = new Term("contents",keyWords.toLowerCase());

TermQuery termQuery = new TermQuery(term);

Hits hits = searcher.search(termQuery);

System.out.println("共有"+searcher.maxDoc()+"条索引，命中"+hits.length()+"条");

for(int i=0;i<hits.length();i++)

...{

int DocId = hits.id(i);

String DocPath = hits.doc(i).get("path");

System.out.println(DocId+":"+DocPath);

}

catch(Exception e)...{

System.out.println(e);

}

以上代码与以前版本的最大区别在于：

A.在《实战 Lucene，第 1 部分: 初识 Lucene》和 idior 大哥的是理论代码中：

document.add(Field.Text("path",dataFiles[i].getCanonicalPath()));
document.add(Field.Text("contents",txtReader));

而在lucene2.2.0中已经改变了，如下：
doc.add(new Field("path",dataFile.getCanonicalPath(),Field.Store.YES,Field.Index.UN_TOKENIZED));
doc.add(new Field("contents",txtReader));

Field需要新建一个实例，而不是静态调用了。