Lucene6-indexing-flow
2016-09-02 19:31:05 0 举报
Lucene 6是一个开源的全文检索引擎库,提供了创建和查询索引的功能。其索引流程包括以下步骤:首先,将文档内容分词(Tokenization),然后对每个词项进行标准化处理(Normalization)。接着,为每个词项生成一个倒排索引,记录包含该词项的所有文档。最后,将倒排索引写入磁盘上的索引文件。在这个过程中,还可以通过设置过滤器(Filter)来优化索引效果,例如去除停用词、同义词替换等。Lucene 6支持多种数据源,如文本文件、数据库表等,可以方便地集成到各种应用中。
作者其他创作
大纲/内容
return seq
docWriter.ensureInitialized(ThreadState)
DocumentWriterFlushControl
perThreadPool.release(perThread)
threadStates: List
Then
consumer.processDocument()
termsHash.startDocument()
docWriter: DocumentsWriterPerThreadnextTermsHash: TermVectorsConsumer
call
storedFieldsWriter.finishDocument()
do
threadState.lock()
directory:MMapDirectiryconfig:IndexWriterConfigwriter: IndexWriterperThreadPool:DocumentWriterPerThreadPool {threadStates: List}flushControl:DocumentWriterFlushControl
DefaultIndexingChain#PerField
termsHashPerField: FreqFoxTermsWriterPerFieldsimilarity: BM25Similaritynorms: NormValuesWriterinvertSate: FieldInvertStatedocValuesWriterpointValuesWritertokenStreamnext: PerField
finishStoredFields()
return
storedFiledsWriter.startDocument()
DefaultIndexingChain
IndexWriter
perField-fieldsHash[field]
nextTermsHash.finishDocument()
new TermVectorsConsumerPerField()
hasEvents=preUpdate()
threadState.dwpt = new
FreqProxTermsWriter extends TermsHash
config:IndexWriterConfigdocumentsWriter: DocumentsWriterperThreadPool:DocumentWriterPerThreadPool {threadStates: List}
flushControl.obtainAndLock()
foreach perField in fields:
startStoredFields()
perFied.setInvertState()|- new FieldInvertState() |-new NormValuesWriter()
nextTermsHash.startDocument()
new FreqProxTermsWriterPerField()
DocumentWriterPerThreadPool
directorydirectoryOrgconsumer: DefaultIndexChainindexWriter: IndexWriterindexWriterConfig: IndexWriterConfigintBlockAllocator: DocumentWriterPerThread$IntBlockAllocatorbyteBlockAllocator: ByteBlockPool$DirectoryTrackingAllocator
Do
finishDocument(delterm)
DocumentsWriterPerThread
perField.finish():norms.addValue()termsHashPerField.finish()
ThreadState: perThread=
then
seq=
directory:MMapDirectiryconfig:IndexWriterConfigayalyzer:StandardAnalyzerdocWriter:DocumentWritermergeScheduler:ConcurrentMergeScheduler
DocumentsWriter
Start
CompressingStoredFiledsWriter
termsHash.finishDocument()
docWriter: DocumentsWriterPerThreadperFileds: TermVectorsConsumerPerFiledwriter: TermVectorsWriter
TermVectorsConsumer extends TermsHash
ThreadState extends ReentrantLock
0 条评论
回复 删除
下一页