網(wǎng)上配置文檔眾多,但是對(duì)著他們的文檔來(lái)做老是出問(wèn)題,于是花了點(diǎn)時(shí)間研究了一下,寫成總結(jié),方便以后查閱。也希望學(xué)習(xí)sphinx的朋友能少走彎路。Coreseek的安裝請(qǐng)參考: http://blog.chinaunix.net/uid-20639775-id-3261834.html 。
一、sphinx的配置
- sphinx配置文件結(jié)構(gòu)介紹
Sphinx的配置文件結(jié)構(gòu)如下:
Source?源名稱1{?????
#添加數(shù)據(jù)源,這里會(huì)設(shè)置一些連接數(shù)據(jù)庫(kù)的參數(shù)比如數(shù)據(jù)庫(kù)的IP、用戶名、密碼等
#設(shè)置sql_query、設(shè)置sql_query_pre、設(shè)置sql_query_range等后面會(huì)結(jié)合例子做詳細(xì)介紹
?……
}
Index?索引名稱1{
???? Source=源名稱1
#設(shè)置全文索引
?????……
}
Indexer{
#設(shè)置Indexer程序配置選項(xiàng),如內(nèi)存限制等
……
}
Searchd{??
#設(shè)置Searchd守護(hù)進(jìn)程本身的一些參數(shù)
……
}
Source和Index都可以配置多個(gè)。
?
- spinx配置案例詳細(xì)解釋
接下來(lái)就來(lái)針對(duì)一個(gè)配置案例來(lái)做詳細(xì)的配置介紹:
#定義一個(gè)數(shù)據(jù)源
source search_main
{
???????????#定義數(shù)據(jù)庫(kù)類型
????type?????????????????= mysql
???????????#定義數(shù)據(jù)庫(kù)的IP或者計(jì)算機(jī)名
????sql_host?????????????= localhost
???????????#定義連接數(shù)據(jù)庫(kù)的帳號(hào)
????sql_user?????????????= root
???????????#定義鏈接數(shù)據(jù)庫(kù)的密碼
????sql_pass?????????????= test123
???????????#定義數(shù)據(jù)庫(kù)名稱
????sql_db???????????????= test
???????????#定義連接數(shù)據(jù)庫(kù)后取數(shù)據(jù)之前執(zhí)行的SQL語(yǔ)句
????sql_query_pre????????= SET NAMES utf8
????sql_query_pre????????= SET SESSION query_cache_type=OFF
???????????#創(chuàng)建一個(gè)sph_counter用于增量索引
????sql_query_pre????????= CREATE TABLE IF NOT EXISTS sph_counter \
??????????????????????????????????????( counter_id INTEGER PRIMARY KEY NOT NULL,max_doc_id INTEGER NOT NULL)
???????????#取數(shù)據(jù)之前將表的最大id記錄到sph_counter表中
????sql_query_pre????????= REPLACE INTO sph_counter SELECT 1, MAX(searchid) FROM v9_search
???????????#定義取數(shù)據(jù)的SQL,第一列ID列必須為唯一的正整數(shù)值
????sql_query????????????= SELECT searchid,typeid,id,adddate,data FROM v9_search where \
??????????????????????????????????????searchid<( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) \
????????????????????????????????????????and searchid>=$start AND searchid<=$end
???????????# sql_attr_uint和sql_attr_timestamp用于定義用于api過(guò)濾或者排序,寫多行制定多列
????sql_attr_uint????????= typeid
????sql_attr_uint????????= id
????sql_attr_timestamp???= adddate
???????????#分區(qū)查詢?cè)O(shè)置
????sql_query_range??????= SELECT MIN(searchid),MAX(searchid) FROM v9_search
???????????#分區(qū)查詢的步長(zhǎng)
????sql_range_step???????= 1000
???????????#設(shè)置分區(qū)查詢的時(shí)間間隔
????sql_ranged_throttle??= 0
???????????#用于CLI的調(diào)試
????sql_query_info???????= SELECT * FROM v9_search WHERE searchid=$id
}
#定義一個(gè)增量的源
source search_main_delta : search_main
{
????sql_query_pre???????= set names utf8
???????????#增量源只查詢上次主索引生成后新增加的數(shù)據(jù)
#如果新增加的searchid比主索引建立時(shí)的searchid還小那么會(huì)漏掉
????sql_query???????????= SELECT searchid,typeid,id,adddate,data FROM v9_search where??\
??????????????????????????????????searchid>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) \
???????????????????????????????????and searchid>=$start AND searchid<=$end
????sql_query_range?????= SELECT MIN(searchid),MAX(searchid) FROM v9_search where \
???????????????????????????????????????searchid>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}
?
#定義一個(gè)index_search_main索引
index index_search_main
{
???????????#設(shè)置索引的源
????source????????????= search_main
???????????#設(shè)置生成的索引存放路徑
????path?????????= /usr/local/coreseek/var/data/index_search_main
???????????#定義文檔信息的存儲(chǔ)模式,extern表示文檔信息和文檔id分開存儲(chǔ)
????docinfo???????????= extern
???????????#設(shè)置已緩存數(shù)據(jù)的內(nèi)存鎖定,為0表示不鎖定
????mlock?????????????= 0
???????????#設(shè)置詞形處理器列表,設(shè)置為none表示不使用任何詞形處理器
????morphology????????= none
???????????#定義最小索引詞的長(zhǎng)度
????min_word_len??????= 1
???????????#設(shè)置字符集編碼類型,我這里采用的utf8編碼和數(shù)據(jù)庫(kù)的一致
????charset_type??????= zh_cn.utf-8
???????????#指定分詞讀取詞典文件的位置
????charset_dictpath??= /usr/local/mmseg3/etc
???????????#不被搜索的詞文件里表。
????stopwords???????= /usr/local/coreseek/var/data/stopwords.txt
???????????#定義是否從輸入全文數(shù)據(jù)中取出HTML標(biāo)記
????html_strip???????= 0
}
#定義增量索引
index index_search_main_delta : index_search_main
{
????source???= search_main_delta
????path????= /usr/local/coreseek/var/data/index_search_main_delta
}
?
#定義indexer配置選項(xiàng)
indexer
{
???????????#定義生成索引過(guò)程使用索引的限制
????mem_limit????????= 512M
}
?
#定義searchd守護(hù)進(jìn)程的相關(guān)選項(xiàng)
searchd
{
???????????#定義監(jiān)聽的IP和端口
????#listen????????????= 127.0.0.1
????#listen????????????= 172.16.88.100:3312
????listen????????????= 3312
????listen????????????= /var/run/searchd.sock
???????????#定義log的位置
????log????????????????= /usr/local/coreseek/var/log/searchd.log
???????????#定義查詢log的位置
????query_log??????????= /usr/local/coreseek/var/log/query.log
???????????#定義網(wǎng)絡(luò)客戶端請(qǐng)求的讀超時(shí)時(shí)間
????read_timeout???????= 5
???????????#定義子進(jìn)程的最大數(shù)量
????max_children???????= 300
???????????#設(shè)置searchd進(jìn)程pid文件名
????pid_file???????????= /usr/local/coreseek/var/log/searchd.pid
???????????#定義守護(hù)進(jìn)程在內(nèi)存中為每個(gè)索引所保持并返回給客戶端的匹配數(shù)目的最大值
????max_matches????????= 100000
???????????#啟用無(wú)縫seamless輪轉(zhuǎn),防止searchd輪轉(zhuǎn)在需要預(yù)取大量數(shù)據(jù)的索引時(shí)停止響應(yīng)
????#也就是說(shuō)在任何時(shí)刻查詢都可用,或者使用舊索引,或者使用新索引
????seamless_rotate????= 1
???????????#配置在啟動(dòng)時(shí)強(qiáng)制重新打開所有索引文件
????preopen_indexes????= 1
???????????#設(shè)置索引輪轉(zhuǎn)成功以后刪除以.old為擴(kuò)展名的索引拷貝
????unlink_old?????????= 1
???????????#?MVA更新池大小,這個(gè)參數(shù)不太明白
????mva_updates_pool???= 1M
???????????#最大允許的包大小
????max_packet_size????= 32M
???????????#最大允許的過(guò)濾器數(shù)
????max_filters????????= 256
???????????#每個(gè)過(guò)濾器最大允許的值的個(gè)數(shù)
????max_filter_values??= 4096
}
?
二、sphinx的管理
- 生成Sphinx中文分詞詞庫(kù)(新版本的中文分詞庫(kù)已經(jīng)生成在了/usr/local/mmseg3/etc目錄下)
cd /usr/local/mmseg3/etc
/usr/local/mmseg3/bin/mmseg -u thesaurus.txt
mv thesaurus.txt.uni uni.lib
- 生成Sphinx中文同義詞庫(kù)
#同義詞庫(kù)是說(shuō)比如你搜索深圳的時(shí)候,含有深圳灣等字的也會(huì)被搜索出來(lái)
/data/software/sphinx/coreseek-3.2.14/mmseg-3.2.14/script/build_thesaurus.py unigram.txt > thesaurus.txt
/usr/local/mmseg3/bin/mmseg -t thesaurus.txt
將thesaurus.lib放到uni.lib同一目錄
- 生成全部索引
/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf –all
若此時(shí)searchd守護(hù)進(jìn)程已經(jīng)啟動(dòng),那么需要加上—rotate參數(shù):
/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf --all --rotate
- 啟動(dòng)searchd守護(hù)進(jìn)程
/usr/local/coreseek/bin/searchd --config /usr/local/coreseek/etc/sphinx.conf
- 生成主索引
寫成shell腳本,添加到crontab任務(wù),設(shè)置成每天凌晨1點(diǎn)的時(shí)候重建主索引
/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf --rotate index_search_main
- 生成增量索引
寫成shell腳本,添加到crontab任務(wù),設(shè)置成每10分鐘運(yùn)行一次
/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf --rotate index_search_main_delta
- 增量索引和主索引的合并
寫成shell腳本,添加到計(jì)劃任務(wù),每15分鐘跑一次
/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf --merge index_search_main index_search_main_delta --rotate
- 使用search命令在命令行對(duì)索引進(jìn)行檢索
/usr/local/coreseek/bin/search --config /usr/local/coreseek/etc/sphinx.conf??游戲
?
?
三、參考文章鏈接:
http://baobeituping.iteye.com/blog/870354
http://www.sphinxsearch.org/sphinx-tutorial
http://blog.s135.com/post/360/
http://youngerblue.iteye.com/blog/1513140
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號(hào)聯(lián)系: 360901061
您的支持是博主寫作最大的動(dòng)力,如果您喜歡我的文章,感覺(jué)我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對(duì)您有幫助就好】元
