Monday, January 19, 2015

Elastic Search - Keywords


  • term: A term is used to search for exact values. It matches exact value indexed in elasticsearch.
    • Example: States is not treated as states, StaTes, STATES.
  • text: Text is an unstructured text which are analyzed and resulting terms are indexed in elasticsearch.
  • analysis: It is the process of converting text into terms and indexing terms.
    • Example: Text is 'united states', 'United States' which will be indexed as 'united','states'.
  • cluster: Cluster consists of one or more nodes which has the same cluster name. It automatically chooses a master node and if the master node fails it randomly chooses another master node.
  • node: Node is a running instance of elasticsearch and belongs to a cluster. Any number of nodes can be started on a sever but usually one node per server is recommended. As soon as a node is started it searches for its cluster based on name and joins. Uses multicast or unicast for searching.
  • document: It is stored in elasticsearch index and is similar to a row in relational databases. It consists of id, type and document. It is a JSON object. Original document we indexed will be stored in the "_source" field.
  • index: It is like a database in the relational databases. It has a mapping which defines multiple types (table in relational database).
  • mapping: is like a schema definition in relational databases. It can use default settings or explicitly defined. It contains information of how each type in a document can be analyzed.
  • type: It is like a table in relational databases. It has list of fields for documents. 
  • id(index/type/id): Each document has an unique id and is auto generated if not supplied. 
  • field: It is like a column in a table. Document contains list of fields or key-value pairs. This can be scalar data or nested data. 
  • LucenceApache Lucene is a free open source information retrieval software library, originally written in Java. 
  • shard: It's an instance of Apache Lucene. It is automatically managed by elasticsearch and not managed by the user. An index is a logical namespace pointing to primary and replica shard. We can specify number of primary and replica shards for an index. 
  • primary shard: Each document is stored in a single primary shard. When we index it is indexed on primary shard first and then on replica shard. There are 5 default primary shards which can be increased or decreased before creating an index.
  • replica shard: each primary shard has zero or more replica shards. Whenever a primary shard fails replica is promoted to be a primary shard and reduces fail over. Replica shard increases performance by handling get and search requests. By default each primary shard has one replica shard and it can be scaled dynamically. Replica will not be started in the same node as primary replica.
  • routing: when indexing is done a document is stored on a single primary shard. This shard is chosen by hashing the routing value. Routing value is based on document id and if a document has parent it will be the parent document id. This ensures both the parent and child document are in the same shard. The routing value can be overridden by specifying at indexing time or in mapping.
  • source field: it is the field in document which holds the original JSON document we index.  

Reference: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html