Realtime Search: Solr vs Elasticsearch

Realtime Search: Solr vs Elasticsearch | Socialcast Engineering
Realtime Search: Solr vs Elasticsearch
Ryan SonnekRyan Sonnek
Tuesday May 31st, 2011 by Ryan Sonnek
19 comments
Tweet
What is Elasticsearch?

Elasticsearch is REST based, distributed search engine powered by the excellent Lucene library. The built in JSON + HTTP API provides an elegant platform perfect for integrating with (ex: the elastic_searchable ruby gem). It’s simple, scalable and “cool, bonsai cool“.
Why is it better than Solr?

First of all, let’s set the record straight: Solr is fast. I’m serious…it’s really fast! Solr is the defacto search engine for a reason. It’s stable, reliable and out of the box, it outperforms nearly every search solution for basic vanilla searches (including Elasticsearch).

Unfortunately, it is really easy to break Solr as well. All it takes is to performing searches while concurrently updating the index with new content. This is a pretty serious problem if you need to update your search index regularly.

Now throw a few million documents into the index and Solr will be buckling at the knees while Elasticsearch doesn’t break a sweat!

It is painfully apparent that Solr’s architecture was not built for realtime search applications. The demands of realtime web applications require delivery of updates in near realtime as new content is generated by users. The distributed nature of Elasticsearch allows it to keep up with concurrent search + index requests without skipping a beat.
Realworld Results…

After transitioning our search infrastructure from Solr to Elasticsearch, we saw an instant ~50x improvement in search performance!
And now for something a bit more interesting…

The typical realtime search architecture goes something like this:

index user content into the search engine
perform set of queries against search engine to determine if content matches particular criteria
perform specific logic notifying registered channels that new content is available

Elasticsearch can support this model quite well, but it also offers a feature that turns this entire workflow on it’s head.
Introducing: Percolation!

Elasticsearch percolation is similar to webhooks. The idea is to have Elasticsearch notify your application when new content matches your filters instead of having to constantly poll the search engine to check for new updates.

The new workflow looks like this:

register specific query (percolation) in Elasticsearch
index new content (passing a flag to trigger percolation)
the response to the indexing operation will contain the matched percolations

This is the perfect architecture for realtime search and a true gamechanger.
The Bottom Line

Solr may be the weapon of choice when building standard search applications, but Elasticsearch takes it to the next level with an architecture for creating modern realtime search applications. Percolation is an exciting and innovative feature that singlehandedly blows Solr right out of the water. Elasticsearch is scalable, speedy and a dream to integrate with. Adios Solr, it was nice knowing you.
Tagged: search
Comments

David says:

Cool article. Now, i know why I love ES ! ;-)
Commented on May 31, 2011
jrawlings says:

Was the ‘Search Fresh Index while Idle’ performed against an elasticsearch 5 shard index (the default setup for a newly created index) or a single shard index?
Commented on May 31, 2011
Ryan Sonnek
Ryan Sonnek says:

@jrawlings these benchmarks are for the “out of the box” vanilla install of Elasticsearch and Solr so yes, this is using the 5 shard index setting.
Commented on May 31, 2011
umad says:

Elasticsearch is a peach, when it doesn’t break. I’ve had so many nightmares trying to recover from a broken elasticsearch cluster that I wouldn’t recommend it to anyone.

I guess for small sites it’s ok. For serious business, I’ll stick with solr.

It would be nice to see a comparison with riaksearch as well.
Commented on May 31, 2011
Ryan Sonnek
Ryan Sonnek says:

@umad in our experience, the exact opposite is true. We pushed Solr so hard to try and support realtime search that we constantly had to deal with Java out of memory issues. Elasticsearch is much more stable (even for a beta application) and runs *so* much smoother.

I’m not sure what you classify a “small” site. Our search index contains millions of documents and we’re performing hundreds of requests per minute and Elasticsearch has not had a single hiccup yet.
Commented on June 1, 2011
Philip Ingram says:

That percolation business is awesome. Webhooks make updating realtime data sources easy, and it’s brilliant that Elasticsearch takes that approach. Thanks for sharing.
Commented on May 31, 2011
Ben says:

Good blog post. What were some of the parameters around index sizes (per shard) and commit rates? We have some massive warming times on our solr indexes that requires us to batch our adds before a commit, certainly not a position to be in with real time search though. I can see how without tuning and default cache warming you might run into bunches of overlapping warming searchers.
Commented on May 31, 2011
MarcMarc says:

And why not using master-slave configuration in SOLR? Isn`t that perfect solution for sepearating add doc/query operations?
Commented on June 1, 2011
Ryan Sonnek
Ryan Sonnek says:

@MarcMarc master-slave really isn’t an option for realtime search applications. The current Solr replication solution is not synchronous so once your update operation is complete on the master, the data is not yet available on all slaves for subsequent searches.

Introducing master-slave for the search index also introduces a lot of operational complexity that if you can avoid, you really should. :)
Commented on June 1, 2011
Vlad Zloteanu says:

Ryan, what was the commit strategy you used with Solr? Commit after each request, autocommit after X secs, autocommit after X docs? This can greatly impact update performance. See http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs, http://blog.raspberry.nl/2011/04/08/solr-update-performance/ and http://www.elevatedcode.com/articles/2009/01/14/speeding-up-solr-indexing/
Commented on June 1, 2011
Ryan Sonnek
Ryan Sonnek says:

@vlad we require all content to be immediately available for searches after indexing, so we commit after each update operation. this the nature of the beast when building a true realtime search application and as you point out is not the “preferred” way to integrate with Solr.
Commented on June 1, 2011
Otis Gospodnetic says:

Nice post. You’ll need to compare ES and Solr once Solr starts making use of the underlying Lucene NRT mechanism.

Just to make it clear to readers not familiar with the underlying details:
It is Lucene that adds the NRT support. ES uses it, while Solr does not use it yet, which is different from Solr using the same Lucene API as ES and doing it/still performing poorly.
Commented on June 1, 2011
Peter Bengtsson says:

Being a Xapian fan as of many years I’d love to see Xapian benchmarked against ES.
Commented on June 1, 2011
Andy says:

What’s the difference between “search fresh index” and “search full index”?

Were you running Solr and ElasticSearch on the same hardware?
Commented on June 1, 2011
Ryan Sonnek
Ryan Sonnek says:

@andy the fresh index benchmarks are done against an empty/clean index. the “full index” benchmarks were done after populating the index with a few million documents. The index is never technically “full”, but it was just a quick way of getting more realistic and real world benchmarks.
Commented on June 1, 2011
db says:

Interesting that umad says he had so many issues with broken clusters, that he stopped recommending ES for production usage. We’ve been running in production for 6 months with significant traffic volume on behalf of demanding clients.

There have been some nice robustness improvements in ES 0.16

We evaluated Solr vs ES and for our data with a wide range of queries, ES was significantly faster than Solr. Tuning Solr is challenging.

David
Commented on June 7, 2011
Steven Hildreth says:

Solr doesn’t support GeoPolygons either, so if you need spatial searches look to ElasticSearch.
Commented on August 24, 2011
David says:

Field collapsing (grouping, or whatever you call it) is still awaited in ES, but exists in Solr.

This is in some particular use cases a must have feature (think about SKUs in an index and search results must be products (and not SKU)
Commented on September 16, 2011

转自:http://www.cnblogs.com/lexus/archive/2011/10/11/2207984
2019-03-27 01:08

知识点

相关教程

更多

ElasticSearch

http://rockelixir.iteye.com/blog/1888717 http://www.cnblogs.com/bigfanofcpp/archive/2013/01/22/2871852.html http://donlianli.iteye.com/blog/1921149 http://thinkbiganalytics.com/solr-vs-elastic-search/

elasticsearch vs solr

http://solr-vs-elasticsearch.com/ http://www.searchtechnologies.com/elasticsearch-solr-lucene.html http://quintagroup.com/services/enterprise-search/elasticsearch-vs-solr http://blog.sematext.com/2012

elasticsearch

http://www.elasticsearch.org/tutorials/  http://labs.linkfluence.net/jobs.html  We are always looking for new talent to join our engineering team. Our team works using technologies like:    Linux Perl

Haystack - Search for Django

Haystack - Search for Django          Search doesn't have to be hard. Haystack lets you write your search     code once and choose the search engine you want it to run on. With a     familiar API that

Riak Search

Basho: Riak Search    Riak Search   Introduction   Operations   Indexing   Querying   Persistence   Major Components   Replication   Further Reading   Introduction   Riak Search is a distributed, easi

ElasticSearch入门-Bulk,Search操作

其实在上一篇博客中,只要大家能看懂,就应该能够根据其代码做到举一反三了,依次类推ES的批量操作Bulk,搜索功能Search等,但在这里还是简单讲一下。 批量索引和删除     BulkRequestBuilder bulkRequest = client.prepareBulk();  for(int i=500;i<1000;i++){   //业务对象   String json =

分布式检索系统 ElasticSearch 和 SenseiDB 比较

从网上找了一些关于这两个系统的介绍和比较的文章  1)senseidb VS. Solr VS. elasticsearch (***Incomplete***)  现阿里巴巴平台技术部高级技术专家王福强写的SenseiDB,Solr和ElasticSearch三者的比较  SenseiDB    特点  * 主要解决高速索引更新的问题; 底层是zoie的“2-swapping-in-memory

Faceted search

http://en.wikipedia.org/wiki/Faceted_search http://wiki.apache.org/solr/SolrFacetingOverview http://idratherbewriting.com/2010/05/20/faceted-classification-faceted-search-organizing-content-6/ http://

solr faceted search

Faceted Search with Solr    Posted byyonik    Faceted search has become a critical feature for enhancing findability and the user search experience for all types of search applications. In this articl

安装elasticsearch

来自:http://www.cnblogs.com/huangfox/p/3541300.html 一)安装elasticsearch 1)下载elasticsearch-0.90.10,解压,运行\bin\elasticsearch.bat (windwos) 2)进入http://localhost:9200/如下图 安装成功! 二)插件——head elasticsearch-head是一个

实时分布式搜索引擎比较(senseidb、Solr、elasticsearch)

11.Solr         1.1Features 1.2Pros & Cons 1.3References   22.Senseidb         2.1Features 2.2Pros & Cons 2.3为何没有直接用Solr? 2.4References   33.elasticsearch         3.1Features 3.2Pros & Con

Elasticsearch介绍

ElasticSearch 是一个基于Lucene构建的开源、分布式、RESTful的全文搜索引擎。设计用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。支持通过HTTP使用JSON进行数据索引。 Elasticsearch可以解决零配置和一个完全免费的搜索模式,可以简单地使用JSON通过HTTP的索引数据,提供高可用和平行扩展,实时搜索等。    2012年11月,获得1000万美

elasticsearch 口水篇(6) Mapping 定义索引

前面我们感觉ES就想是一个nosql数据库,支持Free Schema。 接触过Lucene、solr的同学这时可能会思考一个问题——怎么定义document中的field?store、index、analyzer等属性如何配置? 这时可以了解下ES中的Mapping。 [reference] http://www.elasticsearch.org/guide/en/elasticsearch/

solr和elasticsearch小结

前段时间在系统的学习solr和elasticsearch,前者是之前就接触过的,后者是其他部门过来一个大神带过来的技术。之前我们用solr搭建的是单核(core)全文检索,一个网站,一个solr,一个索引库,一个表,很有局限性,最近搭起了一个多核(cores)的solr,同时通过配置Scripts文件(solr_home/conf/scripts.conf)里内容搭建一个分布式搜索引擎。但是,so

(转)一些国外优秀的elasticsearch使用案例

Github “Github使用Elasticsearch搜索20TB的数据,包括13亿的文件和1300亿行的代码” 这个不用介绍了吧,码农们都懂的,Github在2013年1月升级了他们的代码搜索,由solr转为elasticsearch,目前集群规模为26个索引存储节点和8个客户端节点(负责处理搜索请求),详情请看官方博客 https://github.com/blog/1381-a-whol

最新教程

更多

java线程状态详解(6种)

java线程类为:java.lang.Thread,其实现java.lang.Runnable接口。 线程在运行过程中有6种状态,分别如下: NEW:初始状态,线程被构建,但是还没有调用start()方法 RUNNABLE:运行状态,Java线程将操作系统中的就绪和运行两种状态统称为“运行状态” BLOCK:阻塞状态,表示线程阻塞

redis从库只读设置-redis集群管理

默认情况下redis数据库充当slave角色时是只读的不能进行写操作,如果写入,会提示以下错误:READONLY You can't write against a read only slave.  127.0.0.1:6382> set k3 111  (error) READONLY You can't write against a read only slave. 如果你要开启从库

Netty环境配置

netty是一个java事件驱动的网络通信框架,也就是一个jar包,只要在项目里引用即可。

Netty基于流的传输处理

​在TCP/IP的基于流的传输中,接收的数据被存储到套接字接收缓冲器中。不幸的是,基于流的传输的缓冲器不是分组的队列,而是字节的队列。 这意味着,即使将两个消息作为两个独立的数据包发送,操作系统也不会将它们视为两个消息,而只是一组字节(有点悲剧)。 因此,不能保证读的是您在远程定入的行数据

Netty入门实例-使用POJO代替ByteBuf

使用TIME协议的客户端和服务器示例,让它们使用POJO来代替原来的ByteBuf。

Netty入门实例-时间服务器

Netty中服务器和客户端之间最大的和唯一的区别是使用了不同的Bootstrap和Channel实现

Netty入门实例-编写服务器端程序

channelRead()处理程序方法实现如下

Netty开发环境配置

最新版本的Netty 4.x和JDK 1.6及更高版本

电商平台数据库设计

电商平台数据库表设计:商品分类表、商品信息表、品牌表、商品属性表、商品属性扩展表、规格表、规格扩展表

HttpClient 上传文件

我们使用MultipartEntityBuilder创建一个HttpEntity。 当创建构建器时,添加一个二进制体 - 包含将要上传的文件以及一个文本正文。 接下来,使用RequestBuilder创建一个HTTP请求,并分配先前创建的HttpEntity。

MongoDB常用命令

查看当前使用的数据库    > db    test  切换数据库   > use foobar    switched to db foobar  插入文档    > post={"title":"领悟书生","content":"这是一个分享教程的网站","date":new

快速了解MongoDB【基本概念与体系结构】

什么是MongoDB MongoDB is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. MongoDB是一个基于分布式文件存储的数据库。由C++语言编写。旨在为WEB应用提供可扩展的高性能数据存储解决方案。

windows系统安装MongoDB

安装 下载MongoDB的安装包:mongodb-win32-x86_64-2008plus-ssl-3.2.10-signed.msi,按照提示步骤安装即可。 安装完成后,软件会安装在C:\Program Files\MongoDB 目录中 我们要启动的服务程序就是C:\Program Files\MongoDB\Server\3.2\bin目录下的mongod.exe,为了方便我们每次启动,我

Spring boot整合MyBatis-Plus 之二:增删改查

基于上一篇springboot整合MyBatis-Plus之后,实现简单的增删改查 创建实体类 添加表注解TableName和主键注解TableId import com.baomidou.mybatisplus.annotations.TableId;
import com.baomidou.mybatisplus.annotations.TableName;
import com.baom

分布式ID生成器【snowflake雪花算法】

基于snowflake雪花算法分布式ID生成器 snowflake雪花算法分布式ID生成器几大特点: 41bit的时间戳可以支持该算法使用到2082年 10bit的工作机器id可以支持1024台机器 序列号支持1毫秒产生4096个自增序列id 整体上按照时间自增排序 整个分布式系统内不会产生ID碰撞 每秒能够产生26万ID左右 Twitter的 Snowflake分布式ID生成器的JAVA实现方案