To ensure that the search returns good results in response to queries, the fields must be extracted appropriately for the target data type.įor some documents, in particular structured documents, field and value pairs will be fairly trivial to associate.
APACHE LUCENE TUTORIAL JAVA HOW TO
Deciding how to correctly structure the way in which fields are extracted from documents is an important stage in setting up the search. When indexes are queried, Lucene looks for matches between the indexed values and the query terms. This advantage is indispensible, giving Lucene the ability to index structured database objects and unstructured or semi-structured documents, such as Word documents or PDF formats. As the Lucene document, stored in the index, is independent of file type, any type of document that can be parsed into fields and values can be made searchable. To create an index, whatever document is being index must be parsed, and the fields extracted. These Lucene documents contain fields with associated values, which are essentially key and value pairs. Within a Lucene index, Lucene documents are constructed. Secondly, Lucene queries the created indexes to search for content.
Firstly, it creates indexes of the content to be made searchable. The operations performed by Lucene can essentially be simplified to two key steps. These would become the popular search engine library, Lucene, named after his wife’s middle name. Anticipating the burst of the internet bubble, Cutting reduced his working hours to teach himself Java and begin working on a set of search tools.
Lucene began development in 1997 as Doug Cutting’s side project during his time at the web search engine Excite.