Monday, January 19, 2015

Getting Started with Elastic Search


  • Download link
  • Run
    • Run bin/elasticsearch on Unix
    • Run bin/elasticsearch.bat on Windows 
  •  Run 'curl -X GET http://localhost:9200/' in command prompt. If you have not setup cURL setup.
  • This is the default cluster with name 'elasticsearch'. 
  • We should do following to improve performance of elasticsearch
    • Increase JVM memory.
    • Increase number of open file descriptors.
    • Increase virtual memory.
    • Disable swapping.
Configurations:
  • By default the process will be in foreground which can be toggled to background using '-d' and toggled back to foreground using '-f'.
  • We can configure using -X and -D parameters while starting the cluster which will override default JAVA_OPTS or ES_JAVA_OPTS configuration.
    • Example: 'bin/elasticsearch -Xmx2g -Xms2g -Des.index.store.type=memory --node.name=my-node'
    • Xmx stands for maximum memory allocation pool for a Java Virtual Machine (JVM).
    • Xms stands for initial memory allocation pool for a Java Virtual Machine (JVM).
    • -Xmx1024k - 1024 kilobytes
    • -Xmx512m - 512 MB
    • -Xmx8g - 8 GB
  • ES_HEAP_SIZE helps in setting heap memory that is allocated to elasticsearch java process. It can be set using ES_MIN_MEM and ES_MAX_MEM parameters.

System Configurations:
  • file descriptors
    • Set maximum file descriptors.
      • '_setmaxstdio' for windows by default is 512 and maximum is 2048.
      • Recommended is 32k, 64k.
    • To view the number of file descriptors for the process use parameter '-Des.max-open-files=true' which will print the number of file descriptors for the process.
    • Alternatively user 'curl localhost:9200/_nodes/process?pretty'
  • virtual memory
    • By default mmap count is low for the operating system it can be improved.
      • 'sysctl -w vm.max_map_count=262144' in linux,
        • This can be set permanently using '/etc/sysctl.conf' file and updating 'vm.max_map_count=262144'.
  • memory settings
    • swap
      • By default linux swaps out processes which are not used which will result in poor node stability so swapping should be disabled.
      • Three options for swap
        • Disable swap completely.
          • sudo swapoff -a
          • Permanent Setting: comment out lines for 'swap' in '/etc/fstab'
        • Set vm.swappiness = 0, but still swap under emergency conditions.
        • mlockall, this locks address space into RAM which prevents swapping out.
          • set 'bootstrap.mlockall : true' in 'config/elasticsearch.yml'.
elasticsearch Settings:
  • Configuration files: are found under 'ES_HOME/config'.
    • 'elasticsearch.yml' for configuring elasticsearch different modules.
    • 'logging.yml' for configuring elasticsearch logging.
  • Paths for logs and data (path)
    • Usage: path.logs = 'path for logs', path.data = 'path for data'
    • Usage in commands: "-Des.path.logs = '/var/log/elasticsearch'"
    • path:  logs: /var/log/elasticsearch
        data: /var/data/elasticsearch
  • Cluster Name (cluster)
    • Usage: cluster.name = 'name of your cluster'
    • Usage in commands: "-Des.cluster.name = 'name of your cluster'"
    • cluster:  name: <NAME OF YOUR CLUSTER>
  • Node Name (node), this is the default node name. By default it will randomly assign a Marvel character name.
    • Usage: node.name = 'name of your node'
    • Usage in commands: "-Des.node.name = 'name of your node'"
    • node:  name: <NAME OF YOUR NODE>
  • By default uses YAML format, can be converted to JSON if necessary where Node Name will be:
    • {
          "node" : {
              "name" : "NAME OF YOUR NODE"
          }
      }
  • If an external file is used it can be configured using '-Des.config = /path/to/config/file'.
index settings
  • indices created can be memory based or file based. By default it is file based and can be memory based by passing YAML or JSON paramter.
    • Usage in commands: "-Des.index.store.type = memory"
logging
  • uses log4j and supports yaml/json/properties formats. If multiple files are present it merges all the files.
  • Prefix: logging.
  • Suffix: .yml, .yaml, .json, .properties
  • Folder contains required java packages. 
multiple data
  • path.data: /mnt/first,/mnt/second
  • path.data: ["/mnt/first", "/mnt/second"]