Open Source Search Engines in Java

Open Source Search Engines in Java

    Open Source Search Engines in Java
    Compass

    The Compass Framework is a first class open source Java framework, enabling the power of Search Engine semantics to your application stack declaratively. Built on top of the amazing Lucene Search Engine, Compass integrates seamlessly to popular development frameworks like Hibernate and Spring. It provides search capability to your application data model and synchronises changes with the datasource. With Compass: write less code, find data quicker.

    Go To Compass
    Oxyus

    Oxyus Search Engine is a Java based Application for indexing web documents for searching from an intranet or the Internet similar to other propietary search engines of the industry. Oxyus has a web module to present search results to the clients throught web browsers using Java Server that access a JDBC repository through Java Beans.

    Go To Oxyus
    BDDBot



    DDBot is a web robot, search engine, and web server written entirely in Java. It was written as an example for a chapter on how to write your search engines, and as such it is very simplistic.

    Go To BDDBot
    Egothor

    Egothor is an Open Source, high-performance, full-featured text search engine written entirely in Java. It is technology suitable for nearly any application that requires full-text search, especially cross-platform. It can be configured as a standalone engine, metasearcher, peer-to-peer HUB, and, moreover, it can be used as a library for an application that needs full-text search.

    Go To Egothor
    Nutch

    Nutch is a nascent effort to implement an open-source web search engine. Nutch provides a transparent alternative to commercial web search engines.

    Go To Nutch
    Lucene

    Jakarta Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

    Go To Lucene
    Zilverline

    Zilverline is what you could call a 'Reverse Search Engine'.

    It indexes documents from your local disks (and UNC path style network disks), and allows you to search through them locally or if you're away from your machine, through a webserver on your machine.

    Zilverline supports collections. A collection is a set of files and directories in a directory. PDF, Word, txt, java, CHM and HTML is supported, as well as zip and rar files. A collection can be indexed, and searched. The results of the search can be retrieved from local disk or remotely, if you run a webserver on your machine. Files inside zip, rar and chm files are extracted, indexed and can be cached. The cache can be mapped to sit behind your webserver as well.

    Go To Zilverline
    YaCy

    This is a distributed web crawler and also a caching HTTP proxy. You are using the online-interface of the application. You can use this interface to configure your personal settings, proxy settings, access control and crawling properties. You can also use this interface to start crawls, send messages to other peers and monitor your index, cache status and crawling processes. Most important, you can use the search page to search either your own or the global index.

    Go To YaCy
    Lius

    LIUS - Lucene Index Update and Search
    LIUS is an indexing Java framework based on the Jakarta Lucene project. The LIUS framework adds to Lucene many files format indexing fonctionalities as: Ms Word, Ms Excel, Ms PowerPoint, RTF, PDF, XML, HTML, TXT, Open Office suite and JavaBeans.
    LIUS is very easy to use; all the configuration of the indexing (types of files to be indexed, fields, etc...) as well as research is defined in a XML file, so the user only have to write few lines of code to carry out the indexing or research.

    LIUS has been developed from a range of Java technologies and full open source applications.

    Go To Lius
    Solr

    Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP APIs, caching, replication, and a web administration interface.

    Go To Solr
    regain

    ´regain´ is a fast search engine on top of Jakarta-Lucene. It crawles through files or webpages using a plugin architecture of preparators for several file formats and data sources. Search requests are handled via browser based user interface using Java server pages. ´regain´ is released under LGPL and comes in two versions:

    1. standalone desktop search program including crawler and http-server
    2. server based installation providing full text searching functionality for a website or intranet fileserver using XML configuration files.

    Go To regain
    MG4J

    MG4J (Managing Gigabytes for Java) is a collaborative effort aimed at providing a free Java implementation of inverted-index compression techniques; as a by-product, it offers several general-purpose optimised classes, including fast and compact mutable strings, bit-level I/O, fast unsynchronised buffered streams, (possibly signed) minimal perfect hashing, etc. MG4J functions as a full-fledged text-indexing system. It can analyze, index, and query consistently large document collections.

    Go To MG4J
    Piscator

    Piscator is a small SQL/XML search engine. Once an XML feed is loaded, it can be queried using plain SQL. The setup is almost identical to the DB2 side tables approach.

    Go To Piscator
    Hounder

    Hounder is a simple and complete search system. Out of the box, Hounder crawls the web targeting only those documents of interest, and presents them through a simple search web page and through an API, ideal for integrating into other projects. It is designed to scale on all fronts: the number of the indexed pages, the crawling speed and the number of simultaneous search queries. It is in use in many large scale search systems.

    Go To Hounder
    HSearch
    HSearch is an open source, NoSQL Search Engine built on Hadoop and HBase. HSearch features include:



     * Multiple document formats

     * Record and document level search access control

     * Continuous index updating

     * Parallel indexing using multiple machines

     * Embeddable application

     * A REST-ful Web service gateway that supports XML

     * Auto sharding

     * Auto replication


转自:http://www.cnblogs.com/lexus/archive/2012/03/07/2383599
2019-03-27 01:07

知识点

相关教程

更多

Web scraper open source

ByadminOnSeptember 10, 2012·Add Comment       What we know about open soucre web scrapping software? There are many open source scrapers out there. They’re free, but they do require a good deal

Open source, Open mind

从csdn迁移博客到cnblog。为啥?因为这里的blog可以选择不发布...你懂的。 好吧,算是在这里的正式开博。欢迎大家来评。本博主要基调(不是基情)是技术。如果想看生活的,请移步qq空间或微信朋友圈 :) 写博就是抛砖引玉 写博就是养成总结的习惯 写博就是思想的开源。闭关锁国是没有出路的,open source才是主流(起码是今天公认的主流) 希望能认识到高手 转自:http://www.c

使用Hibernate+solr取代hibernate search

尝试使用solr取代hibernate search的方法,因本人对二者没有全面的了解,对二者都只是使用API级别,本文仅供参考。 hibernate 4.1,solr3.6.0 本人已实现, 1.开箱即用的solr,就象hibernate search,只要引入JAR包,会自己注入事件,当sessionFactory初始化结束后,即更新schema.xml. 2.hibernate进行inser

[How to] Make custom search with Nutch(v 1.0)?(转)

http://puretech.paawak.com/2009/04/29/how-to-make-custom-search-with-nutchv-10/  What is Nutch?  Nutch is an open source web crawler + search engine based on Lucene. These are a few things that make i

关于如何参与到开源项目中《How To Succeed In Open Source ( In Ways You Haven't Considered Yet )》

转自:http://gaslight.co/blog/how-to-succeed-in-open-source-in-ways-you-havent-considered-yet It’s Easy to Feel Entitled in the Open Source World  A while back, it was easy to think of open source projec

Riak Search

Basho: Riak Search    Riak Search   Introduction   Operations   Indexing   Querying   Persistence   Major Components   Replication   Further Reading   Introduction   Riak Search is a distributed, easi

Faceted search

http://en.wikipedia.org/wiki/Faceted_search http://wiki.apache.org/solr/SolrFacetingOverview http://idratherbewriting.com/2010/05/20/faceted-classification-faceted-search-organizing-content-6/ http://

Django and full-text search

Structure in the flow » Blog Archive » Django and full-text search   Django and full-text search     13th February 2009, 11:18 am          Lately I’ve been searching for a simple solution

Full-Text Search in ASP.NET using Lucene.NET

This post is about the full-text search engine      Lucene.NET   and how I integrated it into      BugTracker.NET   . If you are thinking of adding full-text search to your application, you might find

Solr: a custom Search RequestHandler

As you know, I've been playing with Solr lately, trying to see how feasible it would be to customize it for our needs. We have been a Lucene shop for a while, and we've built our own search framework

Haystack - Search for Django

Haystack - Search for Django          Search doesn't have to be hard. Haystack lets you write your search     code once and choose the search engine you want it to run on. With a     familiar API that

《DotNetNuke 4 高级编程》(Professional DotNetNuke 4: Open Source Web Application Framework for ASP.NET 2.0)扫描版[PDF]

中文名: DotNetNuke 4 高级编程    原名: Professional DotNetNuke 4: Open Source Web Application Framework for ASP.NET 2.0    作者: Shaun Walker    Joe Brinkman    Bruce Hopkins    译者: 肖国尊    图书分类: 网络    资源格式: PDF

Running Solr with Maven

Solris an open source search server which is built by using the indexing and search capabilities ofLucene Core, and it can be used for implementing scalable search engines with almost any programming

Realtime Search: Solr vs Elasticsearch

Realtime Search: Solr vs Elasticsearch | Socialcast Engineering    Realtime Search: Solr vs Elasticsearch   Ryan SonnekRyan Sonnek   Tuesday May 31st, 2011 by Ryan Sonnek   19 comments   Tweet   What

flume写入Hadoop hdfs报错 Too many open files

故障现象: [Hadoop@dtydb6 logs]$ vi hadoop-hadoop-datanode-dtydb6.log at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockInputSt

最新教程

更多

java线程状态详解(6种)

java线程类为:java.lang.Thread,其实现java.lang.Runnable接口。 线程在运行过程中有6种状态,分别如下: NEW:初始状态,线程被构建,但是还没有调用start()方法 RUNNABLE:运行状态,Java线程将操作系统中的就绪和运行两种状态统称为“运行状态” BLOCK:阻塞状态,表示线程阻塞

redis从库只读设置-redis集群管理

默认情况下redis数据库充当slave角色时是只读的不能进行写操作,如果写入,会提示以下错误:READONLY You can't write against a read only slave.  127.0.0.1:6382> set k3 111  (error) READONLY You can't write against a read only slave. 如果你要开启从库

Netty环境配置

netty是一个java事件驱动的网络通信框架,也就是一个jar包,只要在项目里引用即可。

Netty基于流的传输处理

​在TCP/IP的基于流的传输中,接收的数据被存储到套接字接收缓冲器中。不幸的是,基于流的传输的缓冲器不是分组的队列,而是字节的队列。 这意味着,即使将两个消息作为两个独立的数据包发送,操作系统也不会将它们视为两个消息,而只是一组字节(有点悲剧)。 因此,不能保证读的是您在远程定入的行数据

Netty入门实例-使用POJO代替ByteBuf

使用TIME协议的客户端和服务器示例,让它们使用POJO来代替原来的ByteBuf。

Netty入门实例-时间服务器

Netty中服务器和客户端之间最大的和唯一的区别是使用了不同的Bootstrap和Channel实现

Netty入门实例-编写服务器端程序

channelRead()处理程序方法实现如下

Netty开发环境配置

最新版本的Netty 4.x和JDK 1.6及更高版本

电商平台数据库设计

电商平台数据库表设计:商品分类表、商品信息表、品牌表、商品属性表、商品属性扩展表、规格表、规格扩展表

HttpClient 上传文件

我们使用MultipartEntityBuilder创建一个HttpEntity。 当创建构建器时,添加一个二进制体 - 包含将要上传的文件以及一个文本正文。 接下来,使用RequestBuilder创建一个HTTP请求,并分配先前创建的HttpEntity。

MongoDB常用命令

查看当前使用的数据库    > db    test  切换数据库   > use foobar    switched to db foobar  插入文档    > post={"title":"领悟书生","content":"这是一个分享教程的网站","date":new

快速了解MongoDB【基本概念与体系结构】

什么是MongoDB MongoDB is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. MongoDB是一个基于分布式文件存储的数据库。由C++语言编写。旨在为WEB应用提供可扩展的高性能数据存储解决方案。

windows系统安装MongoDB

安装 下载MongoDB的安装包:mongodb-win32-x86_64-2008plus-ssl-3.2.10-signed.msi,按照提示步骤安装即可。 安装完成后,软件会安装在C:\Program Files\MongoDB 目录中 我们要启动的服务程序就是C:\Program Files\MongoDB\Server\3.2\bin目录下的mongod.exe,为了方便我们每次启动,我

Spring boot整合MyBatis-Plus 之二:增删改查

基于上一篇springboot整合MyBatis-Plus之后,实现简单的增删改查 创建实体类 添加表注解TableName和主键注解TableId import com.baomidou.mybatisplus.annotations.TableId;
import com.baomidou.mybatisplus.annotations.TableName;
import com.baom

分布式ID生成器【snowflake雪花算法】

基于snowflake雪花算法分布式ID生成器 snowflake雪花算法分布式ID生成器几大特点: 41bit的时间戳可以支持该算法使用到2082年 10bit的工作机器id可以支持1024台机器 序列号支持1毫秒产生4096个自增序列id 整体上按照时间自增排序 整个分布式系统内不会产生ID碰撞 每秒能够产生26万ID左右 Twitter的 Snowflake分布式ID生成器的JAVA实现方案