Architecture – Page 6 – CHEN Jian's Java Blog

例：lucene 的同义词分析器

Leave a Comment / Architecture / January 14, 2013 January 14, 2013

package player.kent.chen.temp.lucene.synonymon; import java.io.IOException; import java.util.LinkedList; import java.util.List; import java.util.Queue; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.TermAttribute; import org.apache.lucene.util.AttributeSource; public class MySynonymFilter extends TokenFilter { private final TermAttribute termAttr; private final PositionIncrementAttribute piAttr; private final Queue<String> synonyms = new LinkedList<String>(); private AttributeSource.State attrsState; protected MySynonymFilter(TokenStream input) { super(input); this.piAttr = addAttribute(PositionIncrementAttribute.class); this.termAttr = addAttribute(TermAttribute.class); …

例：lucene 的同义词分析器 Read More »

Indexing和Searching时都需要Analysis

Leave a Comment / Architecture / January 5, 2013 January 5, 2013

Indexing和Searching时都需要Analysis 一个是把document拆成token, 索引起来。比如在lucene里， IndexWriter writer = new IndexWriter(dir, analyzer, …); 另一个是把用户的输入拆成token,再跟索引匹配 QueryParser parser = new QueryParser(Version.xxx, "contents", analyzer");

lucene near-real-time search代码示例

Leave a Comment / Architecture / December 29, 2012 December 29, 2012

package player.kent.chen.temp.lucene.nrts; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.Version; public class MyNearRealTimeSearch { public static void main(String[] args) throws Exception { //创建index writer对象 Directory indexDir = new RAMDirectory(); IndexWriter indexWriter = new IndexWriter(indexDir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED); //为第一个文档建索引，但不commit() String …

lucene near-real-time search代码示例 Read More »

lucene几种Query对象示例

Leave a Comment / Architecture / December 29, 2012 December 29, 2012

package player.kent.chen.temp.lucene.miscquery; import java.io.IOException; public class MyLuceneMiscQueryDemo { public static void main(String[] args) throws Exception { //创建index writer对象 Directory indexDir = new RAMDirectory(); IndexWriter indexWriter = new IndexWriter(indexDir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED); String text1 = "adam"; Document doc1 = new Document(); doc1.add(new Field("content", text1, Field.Store.YES, Field.Index.ANALYZED)); indexWriter.addDocument(doc1); String text2 = "brings"; Document doc2 = new Document(); …

lucene几种Query对象示例 Read More »

[lucene] 查询对象

Leave a Comment / Architecture / December 29, 2012 December 29, 2012

最直接的作法：手工建立具体的Query对象，传给“搜索者” 不过一般会使用解析器从字符串生成Query对象：

[lucene] QueryParser中的default field是什么意思？

Leave a Comment / Architecture / December 29, 2012 December 29, 2012

直接上例子假设已有Index: 对文本文件进行索引，有两个Field, 分别是文件名(fileName)和文件内容(content) 使用content作为default field: QueryParser qp = new QueryParser(Version.LUCENE_30, "content", new StandardAnalyzer( Version.LUCENE_30)); Query query = qp.parse("人"); //会搜出内容中含有“人”字样的文档 Query query = qp.parse("fileName:人"); //会搜出标题中含有“人”字样的文档可以看出： 1. 使用content作为default field构建的Parser，仍然可以对其他Field进行搜索 2. 如果在搜索的term里不指定field, 则parser会默认使用content作为目标Field

收藏一个lucence 索引查看工具：luke

Leave a Comment / Architecture / December 22, 2012 December 22, 2012

http://code.google.com/p/luke/

想象一下Lunece索引的逻辑结构

Leave a Comment / Architecture / December 22, 2012 December 22, 2012

想象：假设一个文本有以下几部分组成： title: "Hadoop: The Definitive Guide" content: "Hadoop got its start in Nutch" unbreakable: "united kingdom" (先不要理会unbreakable的意义) ignored: "Hadoop Nonsense" (注释同上) 如果按下列语句来建索引，索引大概会是什么样？ Document doc = …

想象一下Lunece索引的逻辑结构 Read More »

lucene indexer/searcher简单代码示例

Leave a Comment / Architecture / December 21, 2012 December 21, 2012

仅供拷贝 <!–pom.xml–> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>3.0.0</version> </dependency> package player.kent.chen.temp.lucene; import java.io.File; import java.io.FileReader; import java.io.IOException; import org.apache.commons.io.FileUtils; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class MyLuceneIndexer { public static void main(String[] args) throws Exception { String rootDir = "/home/kent/diskD/home-kent-dev/workspace/kent-temp/data/lucene"; File contentDir = new File(rootDir, "content"); File indexDir …

lucene indexer/searcher简单代码示例 Read More »

度量搜索程序质量的两个指标

Leave a Comment / Architecture / December 20, 2012 December 20, 2012

摘自 "Lucene In Action" 度量搜索程序质量的两个指标： 1. Recall: How well it finds out relevant documents 2. Precision: How well it filters out irrevelant documents