ElasticSearch创建索引 AnyLine

ElasticSearch创建索引

最后更新:2025-07-24 11:08:52 | 状态:未完成 | 相关数据库: ElasticSearch-Elasticsearch

如果没有什么特别要求可以通用Table创建

Table table = ServiceProxy.metadata().table(table_name);
if(null != table){
	ServiceProxy.ddl().drop(table);
}
//简单的索引，只有列的话 可以通过普通的Table创建
table = new Table(table_name);
table.addColumn("id"            	, "integer"     ).setStore(true); // 业务主键
table.addColumn("effect"        	, "integer"     ).setStore(true); // 效力
table.addColumn("pub_ymd"       	, "date"        ).setStore(true); // 发布日期
table.addColumn("update_ymd"    	, "keyword"     ).setStore(true); // 更新日期
table.addColumn("law_qty"       	, "integer"     ).setStore(true); // 相关法规
table.addColumn("qa_qty"        	, "integer"     ).setStore(true); // 相关问答
table.addColumn("file_qty"      	, "integer"     ).setStore(true); // 相关文件
table.addColumn("read_qty"      	, "integer"     ).setStore(true); // 阅读量
table.addColumn("title"      	, "text"        ).setStore(true); // 标题
table.addColumn("context"      	, "text"        ).setStore(true); // 正文
ServiceProxy.ddl().create(table);
//插入一部分测试数据
String txt = "Elasticsearch 是一个分布式、高扩展、高实时的搜索与数据分析引擎。它能很方便的使大量数据具有搜索、分析和探索的能力。充分利用Elasticsearch的水平伸缩性，能使数据在生产环境变得更有价值。Elasticsearch 的实现原理主要分为以下几个步骤，首先用户将数据提交到Elasticsearch 数据库中，再通过分词控制器去将对应的语句分词，将其权重和分词结果一并存入数据，当用户搜索数据时候，再根据权重将结果排名，打分，再将返回结果呈现给用户。\n" +
	"Elasticsearch是与名为Logstash的数据收集和日志解析引擎以及名为Kibana的分析和可视化平台一起开发的。这三个产品被设计成一个集成解决方案，称为“Elastic Stack”（以前称为“ELK stack”）。\n" +
	"Elasticsearch可以用于搜索各种文档。它提供可扩展的搜索，具有接近实时的搜索，并支持多租户。Elasticsearch是分布式的，这意味着索引可以被分成分片，每个分片可以有0个或多个副本。每个节点托管一个或多个分片，并充当协调器将操作委托给正确的分片。再平衡和路由是自动完成的。相关数据通常存储在同一个索引中，该索引由一个或多个主分片和零个或多个复制分片组成。一旦创建了索引，就不能更改主分片的数量。\n" +
	"Elasticsearch使用Lucene，并试图通过JSON和Java API提供其所有特性。它支持facetting和percolating，如果新文档与注册查询匹配，这对于通知非常有用。另一个特性称为“网关”，处理索引的长期持久性；例如，在服务器崩溃的情况下，可以从网关恢复索引。Elasticsearch支持实时GET请求，适合作为NoSQL数据存储，但缺少分布式事务。\n在2018年6月，Elastic提交了首次公开募股申请，估值在15亿到30亿美元之间。公司于2018年10月5日在纽约证券交易所挂牌上市。一些组织将Elasticsearch作为托管服务提供。这些托管服务提供托管、部署、备份和其他支持。大多数托管服务还包括对Kibana的支持。\n" +
	"Elasticsearch 自从诞生以来，其应用越来越广泛，特别是大数据领域，功能也越来越强大，但是如何有效的监控管理 Elasticsearch 一直是公司所面对的难题，由于 Elasticsearch 集群的稳定性，决定了其业务发展的高度，对于一个应用来说其稳定是第一目标，所以完善的监控体系是必不可少的。此外，Elasticsearch 写入和查询对资源的消耗都很大，如何合理有效地控制资源，既能满足写入和查询的需求，又能满足资源充分利用，这是公司必须面对的问题。\n" +
	"在国内，还没较为完善的面向 Elasticsearch 的监控管理平台，很多企业往往只关注搭建一套简单分布式的集群环境，而对这个集群的缺乏监控和管理，元数据混乱，写入和查询耦合，缺乏监控一旦集群出现问题，就会导致数据丢失，甚至很容易导致线上应用故障。\n插入一段关于anyline的广告\n" +
	"相比于小公司，中大型公司的资金较为充足，所以中大型公司，会选择为每个应用去维护一套集群，但是这每当资源不够需要扩容或者缩容时，极其不方便，需要增加删除节点，其运维成本过高。而且对每个应用来说，可能不能够充分利用资源，但是如果和其他应用混合部署，但是又涉及到复杂的资源分配问题，而且随着应用的发展，资源经常需要变动。\n" +
	"在国外，ELasticsearch 的应用也很广泛，也有对 Elasticsearch 进行很好的监控和管理，Amazon AWS中也有基于 Elasticsearch 构建的平台服务，帮助电商应用程序，网站等提供安全、高可靠、低成本、低延时、高吞吐量的个性化搜索。\n" +
	"虽然，对集群进行了监控和管理，但是管理的维度还是集群级别的，而对于应用往往是模板级别的，如果应用无法与集群一一对应，那就无法进行更高效的管理。这无法满足公司级别想要高效利用资源，集群内部能支持多个应用的场景";
String[] lines = txt.split("\n");
DataSet set = new DataSet();
for(String line:lines){
	DataRow row = new ElasticSearchRow();
	row.put("title", BasicUtil.cut(line, 0, 20));
	row.put("yyyy", BasicUtil.getRandomNumber(2000, 2020));
	row.put("content", line);
	set.add(row);
}
ServiceProxy.insert(table_name, set);

如果有更复杂的属性，通过ElasticSearchIndex创建

ElasticSearchIndex table = new ElasticSearchIndex("test_index_us");
table.addColumn("id"            	, "integer"     ).setStore(true); // 业务主键
table.addColumn("type"          	, "integer"     ).setStore(true); // 0:法规 1:问答
// 设置存储分词器 和查询分词器 与setting.analysis.analyzer[key]对应
// 插入索引时 按最大分词器分词
// 查询时 按默认分词器分词 并 识别同义词
table.addColumn("title"         	, "text"        ).setStore(true).setAnalyzer("us_max_word").setSearchAnalyzer("us_smart"); // 标题
table.addColumn("content"       	, "text"        ).setStore(false).setAnalyzer("us_max_word").setSearchAnalyzer("us_smart");// 内容
table.addColumn("summary"       	, "text"        ).setStore(true).setAnalyzer("us_max_word").setSearchAnalyzer("us_smart"); // 概要
table.addColumn("file_code"     	, "text"        ).setStore(true).setAnalyzer("us_max_word").setSearchAnalyzer("us_smart"); // 文号
table.addColumn("query_file_code" , "keyword"     ).setStore(true).setIgnoreAbove(200); // 模糊搜索文号
table.addColumn("sector"        	, "keyword"     ).setStore(true); // 一级部门
table.addColumn("sub_sector"    	, "keyword"     ).setStore(true); // 二级部门
table.addColumn("yyyy"          	, "integer"     ).setStore(true); // 年份
table.addColumn("tax_ids"       	, "integer"     ).setStore(true); // 税种
table.addColumn("tax_nms"       	, "keyword"     ).setStore(true); // 税种
table.addColumn("industry_ids"  	, "integer"     ).setStore(true); // 行业
table.addColumn("industry_nms"  	, "keyword"     ).setStore(true); // 行业
table.addColumn("subject_ids"   	, "integer"     ).setStore(true); // 专题
table.addColumn("subject_nms"   	, "keyword"     ).setStore(true); // 专题
table.addColumn("vip_lvl"       	, "integer"     ).setStore(true); // 要求VIP级别
table.addColumn("keywords"      	, "keyword"     ).setStore(true).setIgnoreAbove(20); // 关键字
table.addColumn("effect"        	, "integer"     ).setStore(true); // 效力
table.addColumn("pub_ymd"       	, "date"        ).setStore(true); // 发布日期
table.addColumn("update_ymd"    	, "keyword"     ).setStore(true); // 更新日期
table.addColumn("law_qty"       	, "integer"     ).setStore(true); // 相关法规
table.addColumn("qa_qty"        	, "integer"     ).setStore(true); // 相关问答
table.addColumn("file_qty"      	, "integer"     ).setStore(true); // 相关文件
table.addColumn("read_qty"      	, "integer"     ).setStore(true); // 阅读量

ElasticSearchAnalysis analysis = new ElasticSearchAnalysis();
table.setAnalysis(analysis);

//配置同义词  参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html
ElasticSearchFilter filter = new ElasticSearchFilter();
analysis.addFilter("us_synonym_filter", filter);
filter.setType("synonym_graph");
// 通过文件指定分词器 参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-with-synonyms.html#synonyms-store-synonyms-file
// 这个文件要真的存在 默认在 /usr/share/elasticsearch/config/analysis/us_synonym.txt
// 文件内容每行一组同义词 如:
// 个税,个人所得税,个得税
// i-pod, i pod => ipod
// foo => foo bar, baz

filter.setSynonymsPath("analysis/us_synonym.txt");

//量少的话也可以直接添加同义词 参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-with-synonyms.html#synonyms-store-synonyms-inline
//["pc => personal computer", "computer, pc, laptop"]
//filter.addSynonym("pc => personal computer");
//filter.addSynonym("computer, pc, laptop");

//也可以通过api 参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/synonyms-apis.html
//filter.setSynonymsSet("us-synonym-set");

ElasticSearchAnalyzer us_max_word = new ElasticSearchAnalyzer();
analysis.addAnalyzer("us_max_word", us_max_word);

us_max_word.setTokenizer("ik_max_word");
us_max_word.addFilter("us_synonym_filter");

ElasticSearchAnalyzer us_smart = new ElasticSearchAnalyzer();
analysis.addAnalyzer("us_smart", us_smart);

us_smart.setTokenizer("ik_smart");
us_smart.addFilter("us_synonym_filter");

ServiceProxy.ddl().create(table);