Architecture

例:lucene 的同义词分析器

package player.kent.chen.temp.lucene.synonymon; import java.io.IOException; import java.util.LinkedList; import java.util.List; import java.util.Queue; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.TermAttribute; import org.apache.lucene.util.AttributeSource; public class MySynonymFilter extends TokenFilter { private final TermAttribute termAttr; private final PositionIncrementAttribute piAttr; private final Queue<String> synonyms = new LinkedList<String>(); private AttributeSource.State attrsState; protected MySynonymFilter(TokenStream input) { super(input); this.piAttr = addAttribute(PositionIncrementAttribute.class); this.termAttr = addAttribute(TermAttribute.class); …

例:lucene 的同义词分析器 Read More »

Indexing和Searching时都需要Analysis

Indexing和Searching时都需要Analysis 一个是把document拆成token, 索引起来。比如在lucene里, IndexWriter writer = new IndexWriter(dir, analyzer, …); 另一个是把用户的输入拆成token,再跟索引匹配 QueryParser parser = new QueryParser(Version.xxx, "contents", analyzer");

lucene near-real-time search代码示例

package player.kent.chen.temp.lucene.nrts; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.Version; public class MyNearRealTimeSearch { public static void main(String[] args) throws Exception { //创建index writer对象 Directory indexDir = new RAMDirectory(); IndexWriter indexWriter = new IndexWriter(indexDir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED); //为第一个文档建索引,但不commit() String …

lucene near-real-time search代码示例 Read More »

lucene几种Query对象示例

package player.kent.chen.temp.lucene.miscquery; import java.io.IOException; public class MyLuceneMiscQueryDemo { public static void main(String[] args) throws Exception { //创建index writer对象 Directory indexDir = new RAMDirectory(); IndexWriter indexWriter = new IndexWriter(indexDir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED); String text1 = "adam"; Document doc1 = new Document(); doc1.add(new Field("content", text1, Field.Store.YES, Field.Index.ANALYZED)); indexWriter.addDocument(doc1); String text2 = "brings"; Document doc2 = new Document(); …

lucene几种Query对象示例 Read More »

[lucene] QueryParser中的default field是什么意思?

直接上例子 假设已有Index: 对文本文件进行索引,有两个Field, 分别是 文件名(fileName)和文件内容(content) 使用content作为default field: QueryParser qp = new QueryParser(Version.LUCENE_30, "content", new StandardAnalyzer( Version.LUCENE_30)); Query query = qp.parse("人"); //会搜出内容中含有“人”字样的文档 Query query = qp.parse("fileName:人"); //会搜出标题中含有“人”字样的文档 可以看出: 1. 使用content作为default field构建的Parser,仍然可以对其他Field进行搜索 2. 如果在搜索的term里不指定field, 则parser会默认使用content作为目标Field

想象一下Lunece索引的逻辑结构

  想象:   假设一个文本有以下几部分组成:                   title:   "Hadoop: The Definitive Guide"         content:   "Hadoop got its start in Nutch" unbreakable:    "united kingdom"   (先不要理会unbreakable的意义)         ignored:    "Hadoop Nonsense" (注释同上)       如果按下列语句来建索引,索引大概会是什么样?     Document doc = …

想象一下Lunece索引的逻辑结构 Read More »

lucene indexer/searcher简单代码示例

仅供拷贝 <!–pom.xml–> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>3.0.0</version> </dependency> package player.kent.chen.temp.lucene; import java.io.File; import java.io.FileReader; import java.io.IOException; import org.apache.commons.io.FileUtils; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class MyLuceneIndexer { public static void main(String[] args) throws Exception { String rootDir = "/home/kent/diskD/home-kent-dev/workspace/kent-temp/data/lucene"; File contentDir = new File(rootDir, "content"); File indexDir …

lucene indexer/searcher简单代码示例 Read More »