SlideShare a Scribd company logo
1 of 53
Download to read offline
.
.
.
elasto mania
@about_andrefs
2014
.
.
.
what is it?
...
.
Elasticsearch is a flexible and powerful
open source, distributed, real-time search
and analytics engine.
elasticsearch.org/overview/
.
.
.
talk disclaimers
• introduction to ES (sorry, no heavy stuff)
• focused on Elasticsearch itself (not so much
on integration with Kibana, Logstash, etc)
• heavily based on Andrew Cholakian’s book
Exploring Elasticsearch
• Tiririca method
• not all disclaimers have necessarily been
disclaimed
.
.
.
getting
started
.
.
.
buzzword driven slide
• real time analytics
• conflict management
• per-operation
persistence
• document oriented
• build on top of
Apache Lucene™
• Apache 2 Open
Source License
• real time data
• distributed
• multi-tenancy
• RESTful API
• schema free
• full text search
• high availability
.
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
.
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
...
.search for words that sound like a given word
.
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
...
.search for words that sound like a given word
...
.
auto-complete a search box with previously search
issues and allowing misspellings
.
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
...
.search for words that sound like a given word
...
.
auto-complete a search box with previously search
issues and allowing misspellings
...
.
storing large quantities of semi-structured (JSON)
data in a distributed fashion, with redundancy
.
.
.
don’t use cases
...
.calculate how many items are le in an inventory
.
.
.
don’t use cases
...
.calculate how many items are le in an inventory
...
.
figure out the sum of all items in a given month’s
invoices
.
.
.
don’t use cases
...
.calculate how many items are le in an inventory
...
.
figure out the sum of all items in a given month’s
invoices
...
.
execute operations transactionally with rollback
support
.
.
.
don’t use cases
...
.calculate how many items are le in an inventory
...
.
figure out the sum of all items in a given month’s
invoices
...
.
execute operations transactionally with rollback
support
...
.guarantee item uniqueness across multiple fields
.
.
.
history
2004: Shay Bannon creates Compass (Java
search engine framework)
2009: big parts of Compass would need to
be rewritten to release a third version
focused on scalability
Feb 2010: Elasticsearch 0.4.0
Mar 2012: Elasticsearch 0.19.0
Apr 2013: Elasticsearch 0.90.0
Feb 2014: Elasticsearch 1.0.0
Mar 2014: Elasticsearch 1.1.0
.
.
.
the basics
.
.
.
JSON over HTTP
• primary data format for ES is JSON
• main protocol consists of HTTP requests with
JSON payload
• _id is unique, and generated automatically if
unassigned
• internally, JSON is converted flat fields for
Lucene’s key/value API
.
.
.
mnemonic
relational DB Elasticsearch
database index
table type
schema definition mapping
column field
row document
elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html
.
.
.
documents
• like a row in a table in an RDB
• JSON objects
• each is stored in an index, has a type and an
id
• each contains zero or more fields
.
.
.
sample document
.
PUT /music/songs/1
..
.
{
”_id” : 1,
”title” : ”The Vampyre of Time and Memory”,
”author” : ”Queens of the Stone Age”,
”album” : {
”title” : ”...Like Clockwork”,
”year” : 2013,
”track” : 3,
},
”genres” : [”alternative rock”,”piano rock”]
}
.
.
.
fields
• key-value pairs
• value can be a scalar or a nested structure
• each field has a type, defined in a mapping
.
.
.
types
type definition
string text
integer 32-bit integers
long 64-bit integers
float IEEE floats
double double precision floats
boolean true or false
date UTC Date/Time
geo_point latitude/longitude
null the value null
array any field
object type ommited, properties field
nested separate document
.
.
.
mapping
• defines the types of a document’s fields
• and the way they are indexed
• scopes _ids (documents with different types
may have identical _ids)
• defines a bunch of index-wide settings
• can be defined explicitly or automatically
when a document is indexed
.
.
.
sample mapping
.
PUT /music/songs/_mapping
..
.
{
”song” : {
”properties” : {
”title” : { ”type” : ”string” },
”author” : { ”type” : ”string” },
”album” : {
”properties” : {
”title” : { ”type” : ”string” },
”year” : { ”type” : ”integer” },
”number” : { ”type” : ”integer” }
}
},
”genres” : { ”type” : ”string” }
}
}
}
.
.
.
indexes
• like a database in an RDB
• has a mapping which defines types
• logical namespace
• maps to one or more primary shards
• can have zero or more replica shards
.
.
.
CRUD I
.
PUT /music...
.
PUT /music/songs/_mapping
..
.
{
”song” : {
”properties” : {
...
}
}
}
.
.
.
CRUD II
.
PUT /music/songs/1
..
.
{
”title” : ”The Vampyre of Time and Memory”,
...
}
.
GET /music/songs/1
...
.
POST /music/songs/1/_update
..
.{ ”doc” : { ”year” : 2014 }}
.
DELETE /music/songs/1
...
.
.
.
search
.
.
.
search fundamentals
1. boolean search
2. scoring
.
.
.
ES Search API
Includes:
• Query DSL
• Filter API
• Facet API
• Sort API
• …
...
.
• /index/_search
• /index/type/_search
.
.
.
filters
filtered queries: nested in the query field; affect
both query results and facet counts
top-level filters: specified at the root of search,
will only affect queries
facet level filters: pre-filters data before being
aggregated, only affects one specific
facet
.
.
.
search sample I
.
POST /music/_search
..
.
{ ”query” : {
”fuzzy” : { ”title” : ”vampires” }
}}
.
.
.
search sample II
.
POST /planet/_search
..
.
{
”from” : 0,
”size” : 15,
”query” : { ”match_all” : {} },
”sort” : { ”handle” : ”desc” },
”filter” : { ”term” : { ”_all” : ”coding” }},
”facets” : {
”hobbies” : {
”terms” : { ”field” : ”hobbies” }
}
}
}
.
.
.
analysis
• performed when documents are added
• manipulates data to ensure better indexing
• 3 steps:
1. character filtering
2. tokenization
3. token filtering
• distinct analyzers for each field
• multiple analyzers for each field
• custom analyzers
.
.
.
analyzers
.
PUT /music/songs/_mapping
..
.
{ ”song” : { ”properties” : {
”title” : {
”type” : ”string”,
”fields” : {
”title_exact” : { ”type” : ”string”,
”index” : ”not_analyzed” },
”title_simple”: { ”type” : ”string”,
”analyzer”: ”simple” },
”title_snow” : { ”type” : ”string”,
”analyzer”: ”snowball” }
}
},
...
}}}
.
.
.
highlighting
.
POST /publications/books/_search
..
.
{
”query” : {
”match” : { ”text” : ”spaceship” }
},
”fields” : [”title”, ”isbn”],
”highlight” : {
”fields” : {
”text” : { ”number_of_fragments” : 3 }
}
}
}
.
.
.
search phrases
.
POST /publications/books/_search
..
.
{
”query” : {
”match_phrase” : { ”text” : ”laser beam” }
},
”fields” : [”title”, ”isbn”],
”highlight” : {
”fields” : {
”text” : { ”number_of_fragments” : 3 }
}
}
}
.
.
.
going wild
.
.
.
aggregations
Unit of work that builds analytic information over a
set of documents
.
bucketing..
.
Documents are evaluated and placed into buckets
according to previously defined criteria
.
metric..
.
Keep track of metrics which are computed over a
set of documents
.
.
.
percolations
.
.
.
more stuff
• routing
• uri search
• suggesters
• count API
• validate API
• explain API
• more like this API
• …
.
.
.
scalability
.
.
.
tools
.
.
.
Logstash
.
.
.
Kibana
.
.
.
Marvel
.
.
.
what about
now
.
.
.
new features..
.
2014..
.
Apr 3rd
: count
Mar 6th
: Tribe nodes
Jan 17th
: the cat API
Jan 29th
: Marvel
Jan 21th
: snapshot & restore
.
2013..
.
Sep 24th
: official Elasticsearch clients for Ruby,
Python, PHP and Perl
Nov 28th
: Lucene 4.x doc values
…:
.
.
.
go read a book
• Exploring Elasticsearch, Andrew Cholakian
• Elasticsearch – The Definitive Guide,
Clinton Gormley, Zachary Tong
.
.
.
getting in touch
• https://github.com/elasticsearch
• @elasticsearch
• irc.freenode.org #elasticsearch
• irc.perl.org #elasticsearch
• http://www.elasticsearch.org/blog/
• Elasticsearch User mailing list
.
.
.
references
• Elastic Search Mega Manual
• http://solr-vs-elasticsearch.com/
• Elastic Search in Production
• Exploring Elasticsearch, Andrew Cholakian
• Elasticsearch – The Definitive Guide,
Clinton Gormley, Zachary Tong
.
.
.
job’s done
questions?

More Related Content

What's hot

20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 

What's hot (15)

Json - ideal for data interchange
Json - ideal for data interchangeJson - ideal for data interchange
Json - ideal for data interchange
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015Apache Lucene intro - Breizhcamp 2015
Apache Lucene intro - Breizhcamp 2015
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 

Viewers also liked

El pasado en_blanco_y_negro
El pasado en_blanco_y_negroEl pasado en_blanco_y_negro
El pasado en_blanco_y_negro
filipj2000
 
Kms 6 7 Newfeatures En
Kms 6 7 Newfeatures EnKms 6 7 Newfeatures En
Kms 6 7 Newfeatures En
srrm7
 
Windows Mobile65 Ve Mobil Gelecek Yg
Windows Mobile65 Ve Mobil Gelecek YgWindows Mobile65 Ve Mobil Gelecek Yg
Windows Mobile65 Ve Mobil Gelecek Yg
ekinozcicekciler
 
Tianmen mountains
Tianmen mountainsTianmen mountains
Tianmen mountains
filipj2000
 
Pps delz@-budapest - i - left bank-the historic part and more
Pps delz@-budapest - i - left bank-the historic part and morePps delz@-budapest - i - left bank-the historic part and more
Pps delz@-budapest - i - left bank-the historic part and more
filipj2000
 

Viewers also liked (17)

Manic depression psych pwrpt
Manic depression psych pwrptManic depression psych pwrpt
Manic depression psych pwrpt
 
El pasado en_blanco_y_negro
El pasado en_blanco_y_negroEl pasado en_blanco_y_negro
El pasado en_blanco_y_negro
 
Non genuine savings policy - fact sheet
Non genuine savings policy - fact sheetNon genuine savings policy - fact sheet
Non genuine savings policy - fact sheet
 
Editing images in the WordPress media manager
Editing images in the WordPress media managerEditing images in the WordPress media manager
Editing images in the WordPress media manager
 
Kms 6 7 Newfeatures En
Kms 6 7 Newfeatures EnKms 6 7 Newfeatures En
Kms 6 7 Newfeatures En
 
Experiential Education: Learning Through Co-curricular Leadership Experiences...
Experiential Education: Learning Through Co-curricular Leadership Experiences...Experiential Education: Learning Through Co-curricular Leadership Experiences...
Experiential Education: Learning Through Co-curricular Leadership Experiences...
 
La Excepción
La ExcepciónLa Excepción
La Excepción
 
Windows Mobile65 Ve Mobil Gelecek Yg
Windows Mobile65 Ve Mobil Gelecek YgWindows Mobile65 Ve Mobil Gelecek Yg
Windows Mobile65 Ve Mobil Gelecek Yg
 
Tianmen mountains
Tianmen mountainsTianmen mountains
Tianmen mountains
 
Problema
ProblemaProblema
Problema
 
Pps delz@-budapest - i - left bank-the historic part and more
Pps delz@-budapest - i - left bank-the historic part and morePps delz@-budapest - i - left bank-the historic part and more
Pps delz@-budapest - i - left bank-the historic part and more
 
乘科技風潮 學術生涯規劃
乘科技風潮 學術生涯規劃乘科技風潮 學術生涯規劃
乘科技風潮 學術生涯規劃
 
Asset finance fact sheet email
Asset finance   fact sheet emailAsset finance   fact sheet email
Asset finance fact sheet email
 
YU Connect student affairs symposium july 11
YU Connect student affairs symposium july 11YU Connect student affairs symposium july 11
YU Connect student affairs symposium july 11
 
Home Staging A Service That Really Works Case Study May 2011
Home Staging A Service That Really Works   Case Study May 2011Home Staging A Service That Really Works   Case Study May 2011
Home Staging A Service That Really Works Case Study May 2011
 
Chine
ChineChine
Chine
 
Building your own CPAN with Pinto
Building your own CPAN with PintoBuilding your own CPAN with Pinto
Building your own CPAN with Pinto
 

Similar to Elasto Mania

Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 

Similar to Elasto Mania (20)

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Использование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайтуИспользование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайту
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Language Search
Language SearchLanguage Search
Language Search
 
Infinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGMInfinispan,Lucene,Hibername OGM
Infinispan,Lucene,Hibername OGM
 
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIMEElasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
ElasticSearch Basics
ElasticSearch Basics ElasticSearch Basics
ElasticSearch Basics
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Amazon Elasticsearch and Databases
Amazon Elasticsearch and DatabasesAmazon Elasticsearch and Databases
Amazon Elasticsearch and Databases
 
Delhi elasticsearch meetup
Delhi elasticsearch meetupDelhi elasticsearch meetup
Delhi elasticsearch meetup
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHP
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 

More from andrefsantos

Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
andrefsantos
 

More from andrefsantos (10)

Slides
SlidesSlides
Slides
 
Identifying similar text documents
Identifying similar text documentsIdentifying similar text documents
Identifying similar text documents
 
Cleaning plain text books with Text::Perfide::BookCleaner
Cleaning plain text books with Text::Perfide::BookCleanerCleaning plain text books with Text::Perfide::BookCleaner
Cleaning plain text books with Text::Perfide::BookCleaner
 
Poster - Bigorna, a toolkit for orthography migration challenges
Poster - Bigorna, a toolkit for orthography migration challengesPoster - Bigorna, a toolkit for orthography migration challenges
Poster - Bigorna, a toolkit for orthography migration challenges
 
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
Text::Perfide::BookCleaner, a Perl module to clean and normalize plain text b...
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
 
Detecção e Correcção Parcial de Problemas na Conversão de Formatos
Detecção e Correcção Parcial de Problemas na Conversão de FormatosDetecção e Correcção Parcial de Problemas na Conversão de Formatos
Detecção e Correcção Parcial de Problemas na Conversão de Formatos
 
Bigorna - a toolkit for orthography migration challenges
Bigorna - a toolkit for orthography migration challengesBigorna - a toolkit for orthography migration challenges
Bigorna - a toolkit for orthography migration challenges
 
Bigorna
BigornaBigorna
Bigorna
 
Mojolicious lite
Mojolicious liteMojolicious lite
Mojolicious lite
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 

Elasto Mania

  • 2.
  • 3. . . . what is it? ... . Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. elasticsearch.org/overview/
  • 4. . . . talk disclaimers • introduction to ES (sorry, no heavy stuff) • focused on Elasticsearch itself (not so much on integration with Kibana, Logstash, etc) • heavily based on Andrew Cholakian’s book Exploring Elasticsearch • Tiririca method • not all disclaimers have necessarily been disclaimed
  • 6. . . . buzzword driven slide • real time analytics • conflict management • per-operation persistence • document oriented • build on top of Apache Lucene™ • Apache 2 Open Source License • real time data • distributed • multi-tenancy • RESTful API • schema free • full text search • high availability
  • 7. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results
  • 8. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word
  • 9. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word ... . auto-complete a search box with previously search issues and allowing misspellings
  • 10. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word ... . auto-complete a search box with previously search issues and allowing misspellings ... . storing large quantities of semi-structured (JSON) data in a distributed fashion, with redundancy
  • 11. . . . don’t use cases ... .calculate how many items are le in an inventory
  • 12. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices
  • 13. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices ... . execute operations transactionally with rollback support
  • 14. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices ... . execute operations transactionally with rollback support ... .guarantee item uniqueness across multiple fields
  • 15. . . . history 2004: Shay Bannon creates Compass (Java search engine framework) 2009: big parts of Compass would need to be rewritten to release a third version focused on scalability Feb 2010: Elasticsearch 0.4.0 Mar 2012: Elasticsearch 0.19.0 Apr 2013: Elasticsearch 0.90.0 Feb 2014: Elasticsearch 1.0.0 Mar 2014: Elasticsearch 1.1.0
  • 17. . . . JSON over HTTP • primary data format for ES is JSON • main protocol consists of HTTP requests with JSON payload • _id is unique, and generated automatically if unassigned • internally, JSON is converted flat fields for Lucene’s key/value API
  • 18. . . . mnemonic relational DB Elasticsearch database index table type schema definition mapping column field row document elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html
  • 19. . . . documents • like a row in a table in an RDB • JSON objects • each is stored in an index, has a type and an id • each contains zero or more fields
  • 20. . . . sample document . PUT /music/songs/1 .. . { ”_id” : 1, ”title” : ”The Vampyre of Time and Memory”, ”author” : ”Queens of the Stone Age”, ”album” : { ”title” : ”...Like Clockwork”, ”year” : 2013, ”track” : 3, }, ”genres” : [”alternative rock”,”piano rock”] }
  • 21. . . . fields • key-value pairs • value can be a scalar or a nested structure • each field has a type, defined in a mapping
  • 22. . . . types type definition string text integer 32-bit integers long 64-bit integers float IEEE floats double double precision floats boolean true or false date UTC Date/Time geo_point latitude/longitude null the value null array any field object type ommited, properties field nested separate document
  • 23. . . . mapping • defines the types of a document’s fields • and the way they are indexed • scopes _ids (documents with different types may have identical _ids) • defines a bunch of index-wide settings • can be defined explicitly or automatically when a document is indexed
  • 24. . . . sample mapping . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ”title” : { ”type” : ”string” }, ”author” : { ”type” : ”string” }, ”album” : { ”properties” : { ”title” : { ”type” : ”string” }, ”year” : { ”type” : ”integer” }, ”number” : { ”type” : ”integer” } } }, ”genres” : { ”type” : ”string” } } } }
  • 25. . . . indexes • like a database in an RDB • has a mapping which defines types • logical namespace • maps to one or more primary shards • can have zero or more replica shards
  • 26. . . . CRUD I . PUT /music... . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ... } } }
  • 27. . . . CRUD II . PUT /music/songs/1 .. . { ”title” : ”The Vampyre of Time and Memory”, ... } . GET /music/songs/1 ... . POST /music/songs/1/_update .. .{ ”doc” : { ”year” : 2014 }} . DELETE /music/songs/1 ...
  • 30. . . . ES Search API Includes: • Query DSL • Filter API • Facet API • Sort API • … ... . • /index/_search • /index/type/_search
  • 31. . . . filters filtered queries: nested in the query field; affect both query results and facet counts top-level filters: specified at the root of search, will only affect queries facet level filters: pre-filters data before being aggregated, only affects one specific facet
  • 32. . . . search sample I . POST /music/_search .. . { ”query” : { ”fuzzy” : { ”title” : ”vampires” } }}
  • 33. . . . search sample II . POST /planet/_search .. . { ”from” : 0, ”size” : 15, ”query” : { ”match_all” : {} }, ”sort” : { ”handle” : ”desc” }, ”filter” : { ”term” : { ”_all” : ”coding” }}, ”facets” : { ”hobbies” : { ”terms” : { ”field” : ”hobbies” } } } }
  • 34. . . . analysis • performed when documents are added • manipulates data to ensure better indexing • 3 steps: 1. character filtering 2. tokenization 3. token filtering • distinct analyzers for each field • multiple analyzers for each field • custom analyzers
  • 35. . . . analyzers . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ”title” : { ”type” : ”string”, ”fields” : { ”title_exact” : { ”type” : ”string”, ”index” : ”not_analyzed” }, ”title_simple”: { ”type” : ”string”, ”analyzer”: ”simple” }, ”title_snow” : { ”type” : ”string”, ”analyzer”: ”snowball” } } }, ... }}}
  • 36. . . . highlighting . POST /publications/books/_search .. . { ”query” : { ”match” : { ”text” : ”spaceship” } }, ”fields” : [”title”, ”isbn”], ”highlight” : { ”fields” : { ”text” : { ”number_of_fragments” : 3 } } } }
  • 37. . . . search phrases . POST /publications/books/_search .. . { ”query” : { ”match_phrase” : { ”text” : ”laser beam” } }, ”fields” : [”title”, ”isbn”], ”highlight” : { ”fields” : { ”text” : { ”number_of_fragments” : 3 } } } }
  • 39. . . . aggregations Unit of work that builds analytic information over a set of documents . bucketing.. . Documents are evaluated and placed into buckets according to previously defined criteria . metric.. . Keep track of metrics which are computed over a set of documents
  • 41. . . . more stuff • routing • uri search • suggesters • count API • validate API • explain API • more like this API • …
  • 43.
  • 49. . . . new features.. . 2014.. . Apr 3rd : count Mar 6th : Tribe nodes Jan 17th : the cat API Jan 29th : Marvel Jan 21th : snapshot & restore . 2013.. . Sep 24th : official Elasticsearch clients for Ruby, Python, PHP and Perl Nov 28th : Lucene 4.x doc values …:
  • 50. . . . go read a book • Exploring Elasticsearch, Andrew Cholakian • Elasticsearch – The Definitive Guide, Clinton Gormley, Zachary Tong
  • 51. . . . getting in touch • https://github.com/elasticsearch • @elasticsearch • irc.freenode.org #elasticsearch • irc.perl.org #elasticsearch • http://www.elasticsearch.org/blog/ • Elasticsearch User mailing list
  • 52. . . . references • Elastic Search Mega Manual • http://solr-vs-elasticsearch.com/ • Elastic Search in Production • Exploring Elasticsearch, Andrew Cholakian • Elasticsearch – The Definitive Guide, Clinton Gormley, Zachary Tong