使用Elasticsearch组件实现对word、pdf、excel、ppt等的文本内容进行全文检索。
部署完成DataEngine大数据平台,然后在2个节点上添加Elasticsearch服务组件形成Elasticsearch集群。
1、 插件:
elasticsearch-mapper-attachments-3.1.0 全文检索插件word、pdf、excel、ppt(兼容es2.10)
2、 在每个ES节点上安装插件:上传插件到该目录,然后在目录下解压插件,最后重启ES服务
[root@node4]cd /usr/share/elasticsearch/plugins
[root@node4 plugins]# ls
elasticsearch-analysis-ik-1.6.1 head elasticsearch-mapper-attachments-3.1.0.zip
[root@node4 plugins]#unzip elasticsearch-mapper-attachments-3.1.0.zip -d ./elasticsearch-mapper
3、 打开ES界面,在复合查询中输入如下配置:(建立索引)
http://172.168.1.179:9200/smartsegmentation /
点击提交请求后返还为true即创建成功。
内容输入如下:
{
"mappings": {
"parseword": {
"properties": {
"parse_file": {
"type": "attachment",
"fields": {
"content": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "ik_smart",
"store": true
},
"content_type": {
"type": "string",
"store": true
},
"name": {
"type": "string",
"store": true
}
}
}
}
},
"parsepdf": {
"properties": {
"parse_file": {
"type": "attachment",
"fields": {
"content": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "ik_smart",
"store": true
},
"content_type": {
"type": "string",
"store": true
},
"name": {
"type": "string",
"store": true
}
}
}
}
},
"parseexcel": {
"properties": {
"parse_file": {
"type": "attachment",
"fields": {
"content": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "ik_smart",
"store": true
},
"content_type": {
"type": "string",
"store": true
},
"name": {
"type": "string",
"store": true
}
}
}
}
},
"parsepowerpoint": {
"properties": {
"parse_file": {
"type": "attachment",
"fields": {
"content": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "ik_smart",
"store": true
},
"content_type": {
"type": "string",
"store": true
},
"name": {
"type": "string",
"store": true
}
}
}
}
}
}
}
4、 指定文件上传目录:
[root@node4 files]# ll
total 100
drwxr-xr-x 2 root root 4096 Jun 2 11:37 filetoup
-rw-r--r-- 1 root root 89557 Jun 2 11:37 json.file
-rwxr-xr-x 1 root root 374 Jun 2 10:42 upfile.sh
-rwxr-xr-x 1 root root 374 Jun 2 11:37 upfile-test.sh
[root@node4 files]# pwd
/opt/files
5、 编写上传脚本,并给与脚本可执行权限chmod 777 upfile.sh
vi upfile.sh
#!/bin/sh
file_path='/opt/files/filetoup/云计算重大工程项目奖励评选.docx'
file=$(base64 $file_path)
json="{\"parse_file\": {\"_content_type\" : \"application/docx\",\"_name\" : \"云计算重大工程项目奖励评选.docx\",\"_content\" : \"$file\"}}"
echo "$json"> json.file
curl -X POST "http://172.168.1.179:9200/smartsegmentation/parseword" -d @json.file
6、 上传文件
Xshell上传文件即可,注意下面的配置:
使用utf-8编码,中文才能正常显示
7、 Es上查询,获取结果
http://172.168.1.179:9200/smartsegmentation/
{
"fields": [
"parse_file.content_type",
"parse_file.name",
"parse_file.content"
],
"query": {
"match": {
"parse_file.content": "云计算"
}
},
"highlight": {
"fields": {
"parse_file.content": {}
}
}
}
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作