1 - HugeGraph 配置

1 概述

配置文件的目录为 hugegraph-release/conf,所有关于服务和图本身的配置都在此目录下。

主要的配置文件包括:gremlin-server.yaml、rest-server.properties 和 hugegraph.properties

HugeGraphServer 内部集成了 GremlinServer 和 RestServer,而 gremlin-server.yaml 和 rest-server.properties 就是用来配置这两个 Server 的。

  • GremlinServer:GremlinServer 接受用户的 gremlin 语句,解析后转而调用 Core 的代码。
  • RestServer:提供 RESTful API,根据不同的 HTTP 请求,调用对应的 Core API,如果用户请求体是 gremlin 语句,则会转发给 GremlinServer,实现对图数据的操作。

下面对这三个配置文件逐一介绍。

2 gremlin-server.yaml

gremlin-server.yaml 文件默认的内容如下:

# host and port of gremlin server, need to be consistent with host and port in rest-server.properties
#host: 127.0.0.1
#port: 8182

# Gremlin 查询中的超时时间(以毫秒为单位)
evaluationTimeout: 30000

channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
# 不要在此处设置图形,此功能将在支持动态添加图形后再进行处理
graphs: {
}
scriptEngines: {
  gremlin-groovy: {
    staticImports: [
      org.opencypher.gremlin.process.traversal.CustomPredicates.*',
      org.opencypher.gremlin.traversal.CustomFunctions.*
    ],
    plugins: {
      org.apache.hugegraph.plugin.HugeGraphGremlinPlugin: {},
      org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
      org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {
        classImports: [
          java.lang.Math,
          org.apache.hugegraph.backend.id.IdGenerator,
          org.apache.hugegraph.type.define.Directions,
          org.apache.hugegraph.type.define.NodeRole,
          org.apache.hugegraph.traversal.algorithm.CollectionPathsTraverser,
          org.apache.hugegraph.traversal.algorithm.CountTraverser,
          org.apache.hugegraph.traversal.algorithm.CustomizedCrosspointsTraverser,
          org.apache.hugegraph.traversal.algorithm.CustomizePathsTraverser,
          org.apache.hugegraph.traversal.algorithm.FusiformSimilarityTraverser,
          org.apache.hugegraph.traversal.algorithm.HugeTraverser,
          org.apache.hugegraph.traversal.algorithm.JaccardSimilarTraverser,
          org.apache.hugegraph.traversal.algorithm.KneighborTraverser,
          org.apache.hugegraph.traversal.algorithm.KoutTraverser,
          org.apache.hugegraph.traversal.algorithm.MultiNodeShortestPathTraverser,
          org.apache.hugegraph.traversal.algorithm.NeighborRankTraverser,
          org.apache.hugegraph.traversal.algorithm.PathsTraverser,
          org.apache.hugegraph.traversal.algorithm.PersonalRankTraverser,
          org.apache.hugegraph.traversal.algorithm.SameNeighborTraverser,
          org.apache.hugegraph.traversal.algorithm.ShortestPathTraverser,
          org.apache.hugegraph.traversal.algorithm.SingleSourceShortestPathTraverser,
          org.apache.hugegraph.traversal.algorithm.SubGraphTraverser,
          org.apache.hugegraph.traversal.algorithm.TemplatePathsTraverser,
          org.apache.hugegraph.traversal.algorithm.steps.EdgeStep,
          org.apache.hugegraph.traversal.algorithm.steps.RepeatEdgeStep,
          org.apache.hugegraph.traversal.algorithm.steps.WeightedEdgeStep,
          org.apache.hugegraph.traversal.optimize.ConditionP,
          org.apache.hugegraph.traversal.optimize.Text,
          org.apache.hugegraph.traversal.optimize.TraversalUtil,
          org.apache.hugegraph.util.DateUtil,
          org.opencypher.gremlin.traversal.CustomFunctions,
          org.opencypher.gremlin.traversal.CustomPredicate
        ],
        methodImports: [
          java.lang.Math#*,
          org.opencypher.gremlin.traversal.CustomPredicate#*,
          org.opencypher.gremlin.traversal.CustomFunctions#*
        ]
      },
      org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {
        files: [scripts/empty-sample.groovy]
      }
    }
  }
}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1,
      config: {
        serializeResultToString: false,
        ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
      }
  }
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0,
      config: {
        serializeResultToString: false,
        ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
      }
  }
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0,
      config: {
        serializeResultToString: false,
        ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
      }
  }
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0,
      config: {
        serializeResultToString: false,
        ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
      }
  }
metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: ./metrics/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: false, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}
}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false
}

上面的配置项很多,但目前只需要关注如下几个配置项:channelizer 和 graphs。

  • graphs:GremlinServer 启动时需要打开的图,该项是一个 map 结构,key 是图的名字,value 是该图的配置文件路径;
  • channelizer:GremlinServer 与客户端有两种通信方式,分别是 WebSocket 和 HTTP(默认)。如果选择 WebSocket, 用户可以通过 Gremlin-Console 快速体验 HugeGraph 的特性,但是不支持大规模数据导入, 推荐使用 HTTP 的通信方式,HugeGraph 的外围组件都是基于 HTTP 实现的;

默认 GremlinServer 是服务在 localhost:8182,如果需要修改,配置 host、port 即可

  • host:部署 GremlinServer 机器的机器名或 IP,目前 HugeGraphServer 不支持分布式部署,且 GremlinServer 不直接暴露给用户;
  • port:部署 GremlinServer 机器的端口;

同时需要在 rest-server.properties 中增加对应的配置项 gremlinserver.url=http://host:port

3 rest-server.properties

rest-server.properties 文件的默认内容如下:

# bind url
restserver.url=http://127.0.0.1:8080
# gremlin server url, need to be consistent with host and port in gremlin-server.yaml
#gremlinserver.url=http://127.0.0.1:8182

# graphs list with pair NAME:CONF_PATH
graphs=[hugegraph:conf/hugegraph.properties]

# authentication
#auth.authenticator=
#auth.admin_token=
#auth.user_tokens=[]

server.id=server-1
server.role=master
  • restserver.url:RestServer 提供服务的 url,根据实际环境修改;
  • graphs:RestServer 启动时也需要打开图,该项为 map 结构,key 是图的名字,value 是该图的配置文件路径;

注意:gremlin-server.yaml 和 rest-server.properties 都包含 graphs 配置项,而 init-store 命令是根据 gremlin-server.yaml 的 graphs 下的图进行初始化的。

配置项 gremlinserver.url 是 GremlinServer 为 RestServer 提供服务的 url,该配置项默认为 http://localhost:8182,如需修改,需要和 gremlin-server.yaml 中的 host 和 port 相匹配;

4 hugegraph.properties

hugegraph.properties 是一类文件,因为如果系统存在多个图,则会有多个相似的文件。该文件用来配置与图存储和查询相关的参数,文件的默认内容如下:

# gremlin entrence to create graph
gremlin.graph=org.apache.hugegraph.HugeFactory

# cache config
#schema.cache_capacity=100000
# vertex-cache default is 1000w, 10min expired
#vertex.cache_capacity=10000000
#vertex.cache_expire=600
# edge-cache default is 100w, 10min expired
#edge.cache_capacity=1000000
#edge.cache_expire=600

# schema illegal name template
#schema.illegal_name_regex=\s+|~.*

#vertex.default_label=vertex

backend=rocksdb
serializer=binary

store=hugegraph

raft.mode=false
raft.safe_read=false
raft.use_snapshot=false
raft.endpoint=127.0.0.1:8281
raft.group_peers=127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283
raft.path=./raft-log
raft.use_replicator_pipeline=true
raft.election_timeout=10000
raft.snapshot_interval=3600
raft.backend_threads=48
raft.read_index_threads=8
raft.queue_size=16384
raft.queue_publish_timeout=60
raft.apply_batch=1
raft.rpc_threads=80
raft.rpc_connect_timeout=5000
raft.rpc_timeout=60000

# if use 'ikanalyzer', need download jar from 'https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory
search.text_analyzer=jieba
search.text_analyzer_mode=INDEX

# rocksdb backend config
#rocksdb.data_path=/path/to/disk
#rocksdb.wal_path=/path/to/disk

# cassandra backend config
cassandra.host=localhost
cassandra.port=9042
cassandra.username=
cassandra.password=
#cassandra.connect_timeout=5
#cassandra.read_timeout=20
#cassandra.keyspace.strategy=SimpleStrategy
#cassandra.keyspace.replication=3

# hbase backend config
#hbase.hosts=localhost
#hbase.port=2181
#hbase.znode_parent=/hbase
#hbase.threads_max=64

# mysql backend config
#jdbc.driver=com.mysql.jdbc.Driver
#jdbc.url=jdbc:mysql://127.0.0.1:3306
#jdbc.username=root
#jdbc.password=
#jdbc.reconnect_max_times=3
#jdbc.reconnect_interval=3
#jdbc.ssl_mode=false

# postgresql & cockroachdb backend config
#jdbc.driver=org.postgresql.Driver
#jdbc.url=jdbc:postgresql://localhost:5432/
#jdbc.username=postgres
#jdbc.password=

# palo backend config
#palo.host=127.0.0.1
#palo.poll_interval=10
#palo.temp_dir=./palo-data
#palo.file_limit_size=32

重点关注未注释的几项:

  • gremlin.graph:GremlinServer 的启动入口,用户不要修改此项;
  • backend:使用的后端存储,可选值有 memory、cassandra、scylladb、mysql、hbase、postgresql 和 rocksdb;
  • serializer:主要为内部使用,用于将 schema、vertex 和 edge 序列化到后端,对应的可选值为 text、cassandra、scylladb 和 binary;(注:rocksdb 后端值需是 binary,其他后端 backend 与 serializer 值需保持一致,如 hbase 后端该值为 hbase)
  • store:图存储到后端使用的数据库名,在 cassandra 和 scylladb 中就是 keyspace 名,此项的值与 GremlinServer 和 RestServer 中的图名并无关系,但是出于直观考虑,建议仍然使用相同的名字;
  • cassandra.host:backend 为 cassandra 或 scylladb 时此项才有意义,cassandra/scylladb 集群的 seeds;
  • cassandra.port:backend 为 cassandra 或 scylladb 时此项才有意义,cassandra/scylladb 集群的 native port;
  • rocksdb.data_path:backend 为 rocksdb 时此项才有意义,rocksdb 的数据目录
  • rocksdb.wal_path:backend 为 rocksdb 时此项才有意义,rocksdb 的日志目录
  • admin.token: 通过一个 token 来获取服务器的配置信息,例如:http://localhost:8080/graphs/hugegraph/conf?token=162f7848-0b6d-4faf-b557-3a0797869c55

5 多图配置

我们的系统是可以存在多个图的,并且各个图的后端可以不一样,比如图 hugegraph_rocksdbhugegraph_mysql,其中 hugegraph_rocksdbRocksDB 作为后端,hugegraph_mysqlMySQL 作为后端。

配置方法也很简单:

[可选]:修改 rest-server.properties

通过修改 rest-server.properties 中的 graphs 配置项来设置图的配置文件目录。默认配置为 graphs=./conf/graphs,如果想要修改为其它目录则调整 graphs 配置项,比如调整为 graphs=/etc/hugegraph/graphs,示例如下:

graphs=./conf/graphs

conf/graphs 路径下基于 hugegraph.properties 修改得到 hugegraph_mysql_backend.propertieshugegraph_rocksdb_backend.properties

hugegraph_mysql_backend.properties 修改的部分如下:

backend=mysql
serializer=mysql

store=hugegraph_mysql

# mysql backend config
jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.url=jdbc:mysql://127.0.0.1:3306
jdbc.username=root
jdbc.password=123456
jdbc.reconnect_max_times=3
jdbc.reconnect_interval=3
jdbc.ssl_mode=false

hugegraph_rocksdb_backend.properties 修改的部分如下:

backend=rocksdb
serializer=binary

store=hugegraph_rocksdb

停止 Server,初始化执行 init-store.sh(为新的图创建数据库),重新启动 Server

$ ./bin/stop-hugegraph.sh
$ ./bin/init-store.sh

Initializing HugeGraph Store...
2023-06-11 14:16:14 [main] [INFO] o.a.h.u.ConfigUtil - Scanning option 'graphs' directory './conf/graphs'
2023-06-11 14:16:14 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_rocksdb_backend.properties
...
2023-06-11 14:16:15 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_rocksdb' has been initialized
2023-06-11 14:16:15 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_mysql_backend.properties
...
2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_mysql' has been initialized
2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Close graph standardhugegraph[hugegraph_rocksdb]
...
2023-06-11 14:16:16 [main] [INFO] o.a.h.HugeFactory - HugeFactory shutdown
2023-06-11 14:16:16 [hugegraph-shutdown] [INFO] o.a.h.HugeFactory - HugeGraph is shutting down
Initialization finished.
$ ./bin/start-hugegraph.sh

Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)...OK
Started [pid 21614]

查看创建的图:

curl http://127.0.0.1:8080/graphs/

{"graphs":["hugegraph_rocksdb","hugegraph_mysql"]}

查看某个图的信息:

curl http://127.0.0.1:8080/graphs/hugegraph_mysql_backend

{"name":"hugegraph_mysql","backend":"mysql"}
curl http://127.0.0.1:8080/graphs/hugegraph_rocksdb_backend

{"name":"hugegraph_rocksdb","backend":"rocksdb"}

2 - HugeGraph 配置项

Gremlin Server 配置项

对应配置文件gremlin-server.yaml

config optiondefault valuedescription
host127.0.0.1The host or ip of Gremlin Server.
port8182The listening port of Gremlin Server.
graphshugegraph: conf/hugegraph.propertiesThe map of graphs with name and config file path.
scriptEvaluationTimeout30000The timeout for gremlin script execution(millisecond).
channelizerorg.apache.tinkerpop.gremlin.server.channel.HttpChannelizerIndicates the protocol which the Gremlin Server provides service.
authenticationauthenticator: org.apache.hugegraph.auth.StandardAuthenticator, config: {tokens: conf/rest-server.properties}The authenticator and config(contains tokens path) of authentication mechanism.

Rest Server & API 配置项

对应配置文件rest-server.properties

config optiondefault valuedescription
graphs[hugegraph:conf/hugegraph.properties]The map of graphs’ name and config file.
server.idserver-1The id of rest server, used for license verification.
server.rolemasterThe role of nodes in the cluster, available types are [master, worker, computer]
restserver.urlhttp://127.0.0.1:8080The url for listening of rest server.
ssl.keystore_fileserver.keystoreThe path of server keystore file used when https protocol is enabled.
ssl.keystore_passwordThe password of the path of the server keystore file used when the https protocol is enabled.
restserver.max_worker_threads2 * CPUsThe maximum worker threads of rest server.
restserver.min_free_memory64The minimum free memory(MB) of rest server, requests will be rejected when the available memory of system is lower than this value.
restserver.request_timeout30The time in seconds within which a request must complete, -1 means no timeout.
restserver.connection_idle_timeout30The time in seconds to keep an inactive connection alive, -1 means no timeout.
restserver.connection_max_requests256The max number of HTTP requests allowed to be processed on one keep-alive connection, -1 means unlimited.
gremlinserver.urlhttp://127.0.0.1:8182The url of gremlin server.
gremlinserver.max_route8The max route number for gremlin server.
gremlinserver.timeout30The timeout in seconds of waiting for gremlin server.
batch.max_edges_per_batch500The maximum number of edges submitted per batch.
batch.max_vertices_per_batch500The maximum number of vertices submitted per batch.
batch.max_write_ratio50The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0.
batch.max_write_threads0The maximum threads for batch writing, if the value is 0, the actual value will be set to batch.max_write_ratio * restserver.max_worker_threads.
auth.authenticatorThe class path of authenticator implementation. e.g., org.apache.hugegraph.auth.StandardAuthenticator, or org.apache.hugegraph.auth.ConfigAuthenticator.
auth.admin_token162f7848-0b6d-4faf-b557-3a0797869c55Token for administrator operations, only for org.apache.hugegraph.auth.ConfigAuthenticator.
auth.graph_storehugegraphThe name of graph used to store authentication information, like users, only for org.apache.hugegraph.auth.StandardAuthenticator.
auth.user_tokens[hugegraph:9fd95c9c-711b-415b-b85f-d4df46ba5c31]The map of user tokens with name and password, only for org.apache.hugegraph.auth.ConfigAuthenticator.
auth.audit_log_rate1000.0The max rate of audit log output per user, default value is 1000 records per second.
auth.cache_capacity10240The max cache capacity of each auth cache item.
auth.cache_expire600The expiration time in seconds of vertex cache.
auth.remote_urlIf the address is empty, it provide auth service, otherwise it is auth client and also provide auth service through rpc forwarding. The remote url can be set to multiple addresses, which are concat by ‘,’.
auth.token_expire86400The expiration time in seconds after token created
auth.token_secretFXQXbJtbCLxODc6tGci732pkH1cyf8QgSecret key of HS256 algorithm.
exception.allow_tracefalseWhether to allow exception trace stack.

基本配置项

基本配置项及后端配置项对应配置文件:{graph-name}.properties,如hugegraph.properties

config optiondefault valuedescription
gremlin.graphorg.apache.hugegraph.HugeFactoryGremlin entrance to create graph.
backendrocksdbThe data store type, available values are [memory, rocksdb, cassandra, scylladb, hbase, mysql].
serializerbinaryThe serializer for backend store, available values are [text, binary, cassandra, hbase, mysql].
storehugegraphThe database name like Cassandra Keyspace.
store.connection_detect_interval600The interval in seconds for detecting connections, if the idle time of a connection exceeds this value, detect it and reconnect if needed before using, value 0 means detecting every time.
store.graphgThe graph table name, which store vertex, edge and property.
store.schemamThe schema table name, which store meta data.
store.systemsThe system table name, which store system data.
schema.illegal_name_regex.\s+$|~.The regex specified the illegal format for schema name.
schema.cache_capacity10000The max cache size(items) of schema cache.
vertex.cache_typel2The type of vertex cache, allowed values are [l1, l2].
vertex.cache_capacity10000000The max cache size(items) of vertex cache.
vertex.cache_expire600The expire time in seconds of vertex cache.
vertex.check_customized_id_existfalseWhether to check the vertices exist for those using customized id strategy.
vertex.default_labelvertexThe default vertex label.
vertex.tx_capacity10000The max size(items) of vertices(uncommitted) in transaction.
vertex.check_adjacent_vertex_existfalseWhether to check the adjacent vertices of edges exist.
vertex.lazy_load_adjacent_vertextrueWhether to lazy load adjacent vertices of edges.
vertex.part_edge_commit_size5000Whether to enable the mode to commit part of edges of vertex, enabled if commit size > 0, 0 means disabled.
vertex.encode_primary_key_numbertrueWhether to encode number value of primary key in vertex id.
vertex.remove_left_index_at_overwritefalseWhether remove left index at overwrite.
edge.cache_typel2The type of edge cache, allowed values are [l1, l2].
edge.cache_capacity1000000The max cache size(items) of edge cache.
edge.cache_expire600The expiration time in seconds of edge cache.
edge.tx_capacity10000The max size(items) of edges(uncommitted) in transaction.
query.page_size500The size of each page when querying by paging.
query.batch_size1000The size of each batch when querying by batch.
query.ignore_invalid_datatrueWhether to ignore invalid data of vertex or edge.
query.index_intersect_threshold1000The maximum number of intermediate results to intersect indexes when querying by multiple single index properties.
query.ramtable_edges_capacity20000000The maximum number of edges in ramtable, include OUT and IN edges.
query.ramtable_enablefalseWhether to enable ramtable for query of adjacent edges.
query.ramtable_vertices_capacity10000000The maximum number of vertices in ramtable, generally the largest vertex id is used as capacity.
query.optimize_aggregate_by_indexfalseWhether to optimize aggregate query(like count) by index.
oltp.concurrent_depth10The min depth to enable concurrent oltp algorithm.
oltp.concurrent_threads10Thread number to concurrently execute oltp algorithm.
oltp.collection_typeECThe implementation type of collections used in oltp algorithm.
rate_limit.read0The max rate(times/s) to execute query of vertices/edges.
rate_limit.write0The max rate(items/s) to add/update/delete vertices/edges.
task.wait_timeout10Timeout in seconds for waiting for the task to complete,such as when truncating or clearing the backend.
task.input_size_limit16777216The job input size limit in bytes.
task.result_size_limit16777216The job result size limit in bytes.
task.sync_deletionfalseWhether to delete schema or expired data synchronously.
task.ttl_delete_batch1The batch size used to delete expired data.
computer.config/conf/computer.yamlThe config file path of computer job.
search.text_analyzerikanalyzerChoose a text analyzer for searching the vertex/edge properties, available type are [word, ansj, hanlp, smartcn, jieba, jcseg, mmseg4j, ikanalyzer]. # if use ‘ikanalyzer’, need download jar from ‘https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory
search.text_analyzer_modesmartSpecify the mode for the text analyzer, the available mode of analyzer are {word: [MaximumMatching, ReverseMaximumMatching, MinimumMatching, ReverseMinimumMatching, BidirectionalMaximumMatching, BidirectionalMinimumMatching, BidirectionalMaximumMinimumMatching, FullSegmentation, MinimalWordCount, MaxNgramScore, PureEnglish], ansj: [BaseAnalysis, IndexAnalysis, ToAnalysis, NlpAnalysis], hanlp: [standard, nlp, index, nShort, shortest, speed], smartcn: [], jieba: [SEARCH, INDEX], jcseg: [Simple, Complex], mmseg4j: [Simple, Complex, MaxWord], ikanalyzer: [smart, max_word]}.
snowflake.datecenter_id0The datacenter id of snowflake id generator.
snowflake.force_stringfalseWhether to force the snowflake long id to be a string.
snowflake.worker_id0The worker id of snowflake id generator.
raft.modefalseWhether the backend storage works in raft mode.
raft.safe_readfalseWhether to use linearly consistent read.
raft.use_snapshotfalseWhether to use snapshot.
raft.endpoint127.0.0.1:8281The peerid of current raft node.
raft.group_peers127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283The peers of current raft group.
raft.path./raft-logThe log path of current raft node.
raft.use_replicator_pipelinetrueWhether to use replicator line, when turned on it multiple logs can be sent in parallel, and the next log doesn’t have to wait for the ack message of the current log to be sent.
raft.election_timeout10000Timeout in milliseconds to launch a round of election.
raft.snapshot_interval3600The interval in seconds to trigger snapshot save.
raft.backend_threadscurrent CPU v-coresThe thread number used to apply task to backend.
raft.read_index_threads8The thread number used to execute reading index.
raft.apply_batch1The apply batch size to trigger disruptor event handler.
raft.queue_size16384The disruptor buffers size for jraft RaftNode, StateMachine and LogManager.
raft.queue_publish_timeout60The timeout in second when publish event into disruptor.
raft.rpc_threads80The rpc threads for jraft RPC layer.
raft.rpc_connect_timeout5000The rpc connect timeout for jraft rpc.
raft.rpc_timeout60000The rpc timeout for jraft rpc.
raft.rpc_buf_low_water_mark10485760The ChannelOutboundBuffer’s low water mark of netty, when buffer size less than this size, the method ChannelOutboundBuffer.isWritable() will return true, it means that low downstream pressure or good network.
raft.rpc_buf_high_water_mark20971520The ChannelOutboundBuffer’s high water mark of netty, only when buffer size exceed this size, the method ChannelOutboundBuffer.isWritable() will return false, it means that the downstream pressure is too great to process the request or network is very congestion, upstream needs to limit rate at this time.
raft.read_strategyReadOnlyLeaseBasedThe linearizability of read strategy.

RPC server 配置

config optiondefault valuedescription
rpc.client_connect_timeout20The timeout(in seconds) of rpc client connect to rpc server.
rpc.client_load_balancerconsistentHashThe rpc client uses a load-balancing algorithm to access multiple rpc servers in one cluster. Default value is ‘consistentHash’, means forwarding by request parameters.
rpc.client_read_timeout40The timeout(in seconds) of rpc client read from rpc server.
rpc.client_reconnect_period10The period(in seconds) of rpc client reconnect to rpc server.
rpc.client_retries3Failed retry number of rpc client calls to rpc server.
rpc.config_order999Sofa rpc configuration file loading order, the larger the more later loading.
rpc.logger_implcom.alipay.sofa.rpc.log.SLF4JLoggerImplSofa rpc log implementation class.
rpc.protocolboltRpc communication protocol, client and server need to be specified the same value.
rpc.remote_urlThe remote urls of rpc peers, it can be set to multiple addresses, which are concat by ‘,’, empty value means not enabled.
rpc.server_adaptive_portfalseWhether the bound port is adaptive, if it’s enabled, when the port is in use, automatically +1 to detect the next available port. Note that this process is not atomic, so there may still be port conflicts.
rpc.server_hostThe hosts/ips bound by rpc server to provide services, empty value means not enabled.
rpc.server_port8090The port bound by rpc server to provide services.
rpc.server_timeout30The timeout(in seconds) of rpc server execution.

Cassandra 后端配置项

config optiondefault valuedescription
backendMust be set to cassandra.
serializerMust be set to cassandra.
cassandra.hostlocalhostThe seeds hostname or ip address of cassandra cluster.
cassandra.port9042The seeds port address of cassandra cluster.
cassandra.connect_timeout5The cassandra driver connect server timeout(seconds).
cassandra.read_timeout20The cassandra driver read from server timeout(seconds).
cassandra.keyspace.strategySimpleStrategyThe replication strategy of keyspace, valid value is SimpleStrategy or NetworkTopologyStrategy.
cassandra.keyspace.replication[3]The keyspace replication factor of SimpleStrategy, like ‘[3]’.Or replicas in each datacenter of NetworkTopologyStrategy, like ‘[dc1:2,dc2:1]’.
cassandra.usernameThe username to use to login to cassandra cluster.
cassandra.passwordThe password corresponding to cassandra.username.
cassandra.compression_typenoneThe compression algorithm of cassandra transport: none/snappy/lz4.
cassandra.jmx_port=71997199The port of JMX API service for cassandra.
cassandra.aggregation_timeout43200The timeout in seconds of waiting for aggregation.

ScyllaDB 后端配置项

config optiondefault valuedescription
backendMust be set to scylladb.
serializerMust be set to scylladb.

其它与 Cassandra 后端一致。

RocksDB 后端配置项

config optiondefault valuedescription
backendMust be set to rocksdb.
serializerMust be set to binary.
rocksdb.data_disks[]The optimized disks for storing data of RocksDB. The format of each element: STORE/TABLE: /path/disk.Allowed keys are [g/vertex, g/edge_out, g/edge_in, g/vertex_label_index, g/edge_label_index, g/range_int_index, g/range_float_index, g/range_long_index, g/range_double_index, g/secondary_index, g/search_index, g/shard_index, g/unique_index, g/olap]
rocksdb.data_pathrocksdb-data/dataThe path for storing data of RocksDB.
rocksdb.wal_pathrocksdb-data/walThe path for storing WAL of RocksDB.
rocksdb.allow_mmap_readsfalseAllow the OS to mmap file for reading sst tables.
rocksdb.allow_mmap_writesfalseAllow the OS to mmap file for writing.
rocksdb.block_cache_capacity8388608The amount of block cache in bytes that will be used by RocksDB, 0 means no block cache.
rocksdb.bloom_filter_bits_per_key-1The bits per key in bloom filter, a good value is 10, which yields a filter with ~ 1% false positive rate, -1 means no bloom filter.
rocksdb.bloom_filter_block_based_modefalseUse block based filter rather than full filter.
rocksdb.bloom_filter_whole_key_filteringtrueTrue if place whole keys in the bloom filter, else place the prefix of keys.
rocksdb.bottommost_compressionNO_COMPRESSIONThe compression algorithm for the bottommost level of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.bulkload_modefalseSwitch to the mode to bulk load data into RocksDB.
rocksdb.cache_index_and_filter_blocksfalseIndicating if we’d put index/filter blocks to the block cache.
rocksdb.compaction_styleLEVELSet compaction style for RocksDB: LEVEL/UNIVERSAL/FIFO.
rocksdb.compressionSNAPPY_COMPRESSIONThe compression algorithm for compressing blocks of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.compression_per_level[NO_COMPRESSION, NO_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION]The compression algorithms for different levels of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.delayed_write_rate16777216The rate limit in bytes/s of user write requests when need to slow down if the compaction gets behind.
rocksdb.log_levelINFOThe info log level of RocksDB.
rocksdb.max_background_jobs8Maximum number of concurrent background jobs, including flushes and compactions.
rocksdb.level_compaction_dynamic_level_bytesfalseWhether to enable level_compaction_dynamic_level_bytes, if it’s enabled we give max_bytes_for_level_multiplier a priority against max_bytes_for_level_base, the bytes of base level is dynamic for a more predictable LSM tree, it is useful to limit worse case space amplification. Turning this feature on/off for an existing DB can cause unexpected LSM tree structure so it’s not recommended.
rocksdb.max_bytes_for_level_base536870912The upper-bound of the total size of level-1 files in bytes.
rocksdb.max_bytes_for_level_multiplier10.0The ratio between the total size of level (L+1) files and the total size of level L files for all L.
rocksdb.max_open_files-1The maximum number of open files that can be cached by RocksDB, -1 means no limit.
rocksdb.max_subcompactions4The value represents the maximum number of threads per compaction job.
rocksdb.max_write_buffer_number6The maximum number of write buffers that are built up in memory.
rocksdb.max_write_buffer_number_to_maintain0The total maximum number of write buffers to maintain in memory.
rocksdb.min_write_buffer_number_to_merge2The minimum number of write buffers that will be merged together.
rocksdb.num_levels7Set the number of levels for this database.
rocksdb.optimize_filters_for_hitsfalseThis flag allows us to not store filters for the last level.
rocksdb.optimize_modetrueOptimize for heavy workloads and big datasets.
rocksdb.pin_l0_filter_and_index_blocks_in_cachefalseIndicating if we’d put index/filter blocks to the block cache.
rocksdb.sst_pathThe path for ingesting SST file into RocksDB.
rocksdb.target_file_size_base67108864The target file size for compaction in bytes.
rocksdb.target_file_size_multiplier1The size ratio between a level L file and a level (L+1) file.
rocksdb.use_direct_io_for_flush_and_compactionfalseEnable the OS to use direct read/writes in flush and compaction.
rocksdb.use_direct_readsfalseEnable the OS to use direct I/O for reading sst tables.
rocksdb.write_buffer_size134217728Amount of data in bytes to build up in memory.
rocksdb.max_manifest_file_size104857600The max size of manifest file in bytes.
rocksdb.skip_stats_update_on_db_openfalseWhether to skip statistics update when opening the database, setting this flag true allows us to not update statistics.
rocksdb.max_file_opening_threads16The max number of threads used to open files.
rocksdb.max_total_wal_size0Total size of WAL files in bytes. Once WALs exceed this size, we will start forcing the flush of column families related, 0 means no limit.
rocksdb.db_write_buffer_size0Total size of write buffers in bytes across all column families, 0 means no limit.
rocksdb.delete_obsolete_files_period21600The periodicity in seconds when obsolete files get deleted, 0 means always do full purge.
rocksdb.hard_pending_compaction_bytes_limit274877906944The hard limit to impose on pending compaction in bytes.
rocksdb.level0_file_num_compaction_trigger2Number of files to trigger level-0 compaction.
rocksdb.level0_slowdown_writes_trigger20Soft limit on number of level-0 files for slowing down writes.
rocksdb.level0_stop_writes_trigger36Hard limit on number of level-0 files for stopping writes.
rocksdb.soft_pending_compaction_bytes_limit68719476736The soft limit to impose on pending compaction in bytes.

HBase 后端配置项

config optiondefault valuedescription
backendMust be set to hbase.
serializerMust be set to hbase.
hbase.hostslocalhostThe hostnames or ip addresses of HBase zookeeper, separated with commas.
hbase.port2181The port address of HBase zookeeper.
hbase.threads_max64The max threads num of hbase connections.
hbase.znode_parent/hbaseThe znode parent path of HBase zookeeper.
hbase.zk_retry3The recovery retry times of HBase zookeeper.
hbase.aggregation_timeout43200The timeout in seconds of waiting for aggregation.
hbase.kerberos_enablefalseIs Kerberos authentication enabled for HBase.
hbase.kerberos_keytabThe HBase’s key tab file for kerberos authentication.
hbase.kerberos_principalThe HBase’s principal for kerberos authentication.
hbase.krb5_confetc/krb5.confKerberos configuration file, including KDC IP, default realm, etc.
hbase.hbase_site/etc/hbase/conf/hbase-site.xmlThe HBase’s configuration file
hbase.enable_partitiontrueIs pre-split partitions enabled for HBase.
hbase.vertex_partitions10The number of partitions of the HBase vertex table.
hbase.edge_partitions30The number of partitions of the HBase edge table.

MySQL & PostgreSQL 后端配置项

config optiondefault valuedescription
backendMust be set to mysql.
serializerMust be set to mysql.
jdbc.drivercom.mysql.jdbc.DriverThe JDBC driver class to connect database.
jdbc.urljdbc:mysql://127.0.0.1:3306The url of database in JDBC format.
jdbc.usernamerootThe username to login database.
jdbc.password******The password corresponding to jdbc.username.
jdbc.ssl_modefalseThe SSL mode of connections with database.
jdbc.reconnect_interval3The interval(seconds) between reconnections when the database connection fails.
jdbc.reconnect_max_times3The reconnect times when the database connection fails.
jdbc.storage_engineInnoDBThe storage engine of backend store database, like InnoDB/MyISAM/RocksDB for MySQL.
jdbc.postgresql.connect_databasetemplate1The database used to connect when init store, drop store or check store exist.

PostgreSQL 后端配置项

config optiondefault valuedescription
backendMust be set to postgresql.
serializerMust be set to postgresql.

其它与 MySQL 后端一致。

PostgreSQL 后端的 driver 和 url 应该设置为:

  • jdbc.driver=org.postgresql.Driver
  • jdbc.url=jdbc:postgresql://localhost:5432/

3 - HugeGraph 内置用户权限与扩展权限配置及使用

概述

HugeGraph 为了方便不同用户场景下的鉴权使用,目前内置了完备的StandardAuthenticator权限模式,支持多用户认证、 以及细粒度的权限访问控制,采用基于“用户 - 用户组 - 操作 - 资源”的 4 层设计,灵活控制用户角色与权限 (支持多 GraphServer)

StandardAuthenticator 模式的几个核心设计:

  • 初始化时创建超级管理员 (admin) 用户,后续通过超级管理员创建其它用户,新创建的用户被分配足够权限后,可以创建或管理更多的用户
  • 支持动态创建用户、用户组、资源,支持动态分配或取消权限
  • 用户可以属于一个或多个用户组,每个用户组可以拥有对任意个资源的操作权限,操作类型包括:读、写、删除、执行等种类
  • “资源” 描述了图数据库中的数据,比如符合某一类条件的顶点,每一个资源包括 typelabelproperties三个要素,共有 18 种类型、任意 label、任意 properties 可组合形成的资源,一个资源的内部条件是且关系,多个资源之间的条件是或关系

举例说明:

// 场景:某用户只有北京地区的数据读取权限
user(name=xx) -belong-> group(name=xx) -access(read)-> target(graph=graph1, resource={label: person, city: Beijing})

配置用户认证

HugeGraph 目前默认未启用用户认证功能,需通过修改配置文件来启用该功能。(Note: 如果在生产环境/外网使用, 请使用 Java11 版本 + 开启权限避免安全相关隐患)

目前已内置实现了StandardAuthenticator模式,该模式支持多用户认证与细粒度权限控制。此外,开发者可以自定义实现HugeAuthenticator接口来对接自身的权限系统。

用户认证方式均采用 HTTP Basic Authentication ,简单说就是在发送 HTTP 请求时在 Authentication 设置选择 Basic 然后输入对应的用户名和密码,对应 HTTP 明文如下所示 :

GET http://localhost:8080/graphs/hugegraph/schema/vertexlabels
Authorization: Basic admin xxxx

StandardAuthenticator 模式

StandardAuthenticator模式是通过在数据库后端存储用户信息来支持用户认证和权限控制,该实现基于数据库存储的用户的名称与密码进行认证(密码已被加密),基于用户的角色来细粒度控制用户权限。下面是具体的配置流程(重启服务生效):

在配置文件gremlin-server.yaml中配置authenticator及其rest-server文件路径:

authentication: {
  authenticator: org.apache.hugegraph.auth.StandardAuthenticator,
  authenticationHandler: org.apache.hugegraph.auth.WsAndHttpBasicAuthHandler,
  config: {tokens: conf/rest-server.properties}
}

在配置文件rest-server.properties中配置authenticator及其graph_store信息:

auth.authenticator=org.apache.hugegraph.auth.StandardAuthenticator
auth.graph_store=hugegraph

# auth client config
# 如果是分开部署 GraphServer 和 AuthServer,还需要指定下面的配置,地址填写 AuthServer 的 IP:RPC 端口
#auth.remote_url=127.0.0.1:8899,127.0.0.1:8898,127.0.0.1:8897

其中,graph_store配置项是指使用哪一个图来存储用户信息,如果存在多个图的话,选取任意一个均可。

在配置文件hugegraph{n}.properties中配置gremlin.graph信息:

gremlin.graph=org.apache.hugegraph.auth.HugeFactoryAuthProxy

然后详细的权限 API 调用和说明请参考 Authentication-API 文档。

自定义用户认证系统

如果需要支持更加灵活的用户系统,可自定义 authenticator 进行扩展,自定义 authenticator 实现接口org.apache.hugegraph.auth.HugeAuthenticator即可,然后修改配置文件中authenticator配置项指向该实现。

基于鉴权模式启动

在鉴权配置完成后,需在首次执行 init-store.sh 时命令行中输入 admin 密码 (非 docker 部署模式下)

如果基于 docker 镜像部署或者已经初始化 HugeGraph 并需要转换为鉴权模式,需要删除相关图数据并重新启动 HugeGraph, 若图已有业务数据,暂时无法直接转换鉴权模式 (hugegraph 版本 <= 1.2.0)

对于该功能的改进已经在最新版本发布 (Docker latest 可用),可参考 PR 2411, 此时可无缝切换。

# stop the hugeGraph firstly
bin/stop-hugegraph.sh

# delete the store data (here we use the default path for rocksdb)
# Note: no need to delete data in the latest code (fixed in https://github.com/apache/incubator-hugegraph/pull/2411)
rm -rf rocksdb-data/

# init store again
bin/init-store.sh

# start hugeGraph again
bin/start-hugegraph.sh

使用 Docker 时开启鉴权模式

对于镜像 hugegraph/hugegraph 大于等于 1.2.0 的版本,我们可以在启动 docker 镜像的同时开启鉴权模式

具体做法如下:

1. 采用 docker run

docker run 中添加环境变量 PASSWORD=123456(密码可以自由设置)即可开启鉴权模式::

docker run -itd -e PASSWORD=123456 --name=server -p 8080:8080 hugegraph/hugegraph:1.2.0

2. 采用 docker-compose

使用 docker-compose 在环境变量中设置 PASSWORD=123456即可

version: '3'
services:
  server:
    image: hugegraph/hugegraph:1.2.0
    container_name: server
    ports:
      - 8080:8080
    environment:
      - PASSWORD=123456

3. 进入容器后重新开启鉴权模式

首先进入容器:

docker exec -it server bash
# 用于快速修改配置, 修改前的文件被保存在conf-bak文件夹下
bin/enable-auth.sh

之后参照 基于鉴权模式启动 即可

4 - 配置 HugeGraphServer 使用 https 协议

概述

HugeGraphServer 默认使用的是 http 协议,如果用户对请求的安全性有要求,可以配置成 https。

服务端配置

修改 conf/rest-server.properties 配置文件,将 restserver.url 的 schema 部分改为 https。

# 将协议设置为 https
restserver.url=https://127.0.0.1:8080
# 服务端 keystore 文件路径,当协议为 https 时该默认值自动生效,可按需修改此项
ssl.keystore_file=conf/hugegraph-server.keystore
# 服务端 keystore 文件密码,当协议为 https 时该默认值自动生效,可按需修改此项
ssl.keystore_password=******

服务端的 conf 目录下已经给出了一个 keystore 文件hugegraph-server.keystore,该文件的密码为hugegraph, 这两项都是在开启了 https 协议时的默认值,用户可以生成自己的 keystore 文件及密码,然后修改ssl.keystore_filessl.keystore_password的值。

客户端配置

在 HugeGraph-Client 中使用 https

在构造 HugeClient 时传入 https 相关的配置,代码示例:

String url = "https://localhost:8080";
String graphName = "hugegraph";
HugeClientBuilder builder = HugeClient.builder(url, graphName);
// 客户端 keystore 文件路径
String trustStoreFilePath = "hugegraph.truststore";
// 客户端 keystore 密码
String trustStorePassword = "******";
builder.configSSL(trustStoreFilePath, trustStorePassword);
HugeClient hugeClient = builder.build();

注意:HugeGraph-Client 在 1.9.0 版本以前是直接以 new 的方式创建,并且不支持 https 协议,在 1.9.0 版本以后改成以 builder 的方式创建,并支持配置 https 协议。

在 HugeGraph-Loader 中使用 https

启动导入任务时,在命令行中添加如下选项:

# https
--protocol https
# 客户端证书文件路径,当指定 --protocol 为 https 时,默认值 conf/hugegraph.truststore 自动生效,可按需修改
--trust-store-file {file}
# 客户端证书文件密码,当指定 --protocol 为 https 时,默认值 hugegraph 自动生效,可按需修改
--trust-store-password {password}

hugegraph-loader 的 conf 目录下已经放了一个默认的客户端证书文件 hugegraph.truststore,其密码是 hugegraph。

在 HugeGraph-Tools 中使用 https

执行命令时,在命令行中添加如下选项:

# 客户端证书文件路径,当 url 中使用 https 协议时,默认值 conf/hugegraph.truststore 自动生效,可按需修改
--trust-store-file {file}
# 客户端证书文件密码,当 url 中使用 https 协议时,默认值 hugegraph 自动生效,可按需修改
--trust-store-password {password}
# 执行迁移命令时,当 --target-url 中使用 https 协议时,默认值 conf/hugegraph.truststore 自动生效,可按需修改
--target-trust-store-file {target-file}
# 执行迁移命令时,当 --target-url 中使用 https 协议时,默认值 hugegraph 自动生效,可按需修改
--target-trust-store-password {target-password}

hugegraph-tools 的 conf 目录下已经放了一个默认的客户端证书文件 hugegraph.truststore,其密码是 hugegraph。

如何生成证书文件

本部分给出生成证书的示例,如果默认的证书已经够用,或者已经知晓如何生成,可跳过。

服务端

  1. ⽣成服务端私钥,并且导⼊到服务端 keystore ⽂件中,server.keystore 是给服务端⽤的,其中保存着⾃⼰的私钥
keytool -genkey -alias serverkey -keyalg RSA -keystore server.keystore

过程中根据需求填写描述信息,默认证书的描述信息如下:

名字和姓⽒:hugegraph
组织单位名称:hugegraph
组织名称:hugegraph
城市或区域名称:BJ
州或省份名称:BJ
国家代码:CN
  1. 根据服务端私钥,导出服务端证书
keytool -export -alias serverkey -keystore server.keystore -file server.crt

server.crt 就是服务端的证书

客户端

keytool -import -alias serverkey -file server.crt -keystore client.truststore

client.truststore 是给客户端⽤的,其中保存着受信任的证书

5 - HugeGraph-Computer 配置

Computer Config Options

config optiondefault valuedescription
algorithm.message_classorg.apache.hugegraph.computer.core.config.NullThe class of message passed when compute vertex.
algorithm.params_classorg.apache.hugegraph.computer.core.config.NullThe class used to transfer algorithms’ parameters before algorithm been run.
algorithm.result_classorg.apache.hugegraph.computer.core.config.NullThe class of vertex’s value, the instance is used to store computation result for the vertex.
allocator.max_vertices_per_thread10000Maximum number of vertices per thread processed in each memory allocator
bsp.etcd_endpointshttp://localhost:2379The end points to access etcd.
bsp.log_interval30000The log interval(in ms) to print the log while waiting bsp event.
bsp.max_super_step10The max super step of the algorithm.
bsp.register_timeout300000The max timeout to wait for master and works to register.
bsp.wait_master_timeout86400000The max timeout(in ms) to wait for master bsp event.
bsp.wait_workers_timeout86400000The max timeout to wait for workers bsp event.
hgkv.max_data_block_size65536The max byte size of hgkv-file data block.
hgkv.max_file_size2147483648The max number of bytes in each hgkv-file.
hgkv.max_merge_files10The max number of files to merge at one time.
hgkv.temp_file_dir/tmp/hgkvThis folder is used to store temporary files, temporary files will be generated during the file merging process.
hugegraph.namehugegraphThe graph name to load data and write results back.
hugegraph.urlhttp://127.0.0.1:8080The hugegraph url to load data and write results back.
input.edge_directionOUTThe data of the edge in which direction is loaded, when the value is BOTH, the edges in both OUT and IN direction will be loaded.
input.edge_freqMULTIPLEThe frequency of edges can exist between a pair of vertices, allowed values: [SINGLE, SINGLE_PER_LABEL, MULTIPLE]. SINGLE means that only one edge can exist between a pair of vertices, use sourceId + targetId to identify it; SINGLE_PER_LABEL means that each edge label can exist one edge between a pair of vertices, use sourceId + edgelabel + targetId to identify it; MULTIPLE means that many edge can exist between a pair of vertices, use sourceId + edgelabel + sortValues + targetId to identify it.
input.filter_classorg.apache.hugegraph.computer.core.input.filter.DefaultInputFilterThe class to create input-filter object, input-filter is used to Filter vertex edges according to user needs.
input.loader_schema_pathThe schema path of loader input, only takes effect when the input.source_type=loader is enabled
input.loader_struct_pathThe struct path of loader input, only takes effect when the input.source_type=loader is enabled
input.max_edges_in_one_vertex200The maximum number of adjacent edges allowed to be attached to a vertex, the adjacent edges will be stored and transferred together as a batch unit.
input.source_typehugegraph-serverThe source type to load input data, allowed values: [‘hugegraph-server’, ‘hugegraph-loader’], the ‘hugegraph-loader’ means use hugegraph-loader load data from HDFS or file, if use ‘hugegraph-loader’ load data then please config ‘input.loader_struct_path’ and ‘input.loader_schema_path’.
input.split_fetch_timeout300The timeout in seconds to fetch input splits
input.split_max_splits10000000The maximum number of input splits
input.split_page_size500The page size for streamed load input split data
input.split_size1048576The input split size in bytes
job.idlocal_0001The job id on Yarn cluster or K8s cluster.
job.partitions_count1The partitions count for computing one graph algorithm job.
job.partitions_thread_nums4The number of threads for partition parallel compute.
job.workers_count1The workers count for computing one graph algorithm job.
master.computation_classorg.apache.hugegraph.computer.core.master.DefaultMasterComputationMaster-computation is computation that can determine whether to continue next superstep. It runs at the end of each superstep on master.
output.batch_size500The batch size of output
output.batch_threads1The threads number used to batch output
output.hdfs_core_site_pathThe hdfs core site path.
output.hdfs_delimiter,The delimiter of hdfs output.
output.hdfs_kerberos_enablefalseIs Kerberos authentication enabled for Hdfs.
output.hdfs_kerberos_keytabThe Hdfs’s key tab file for kerberos authentication.
output.hdfs_kerberos_principalThe Hdfs’s principal for kerberos authentication.
output.hdfs_krb5_conf/etc/krb5.confKerberos configuration file.
output.hdfs_merge_partitionstrueWhether merge output files of multiple partitions.
output.hdfs_path_prefix/hugegraph-computer/resultsThe directory of hdfs output result.
output.hdfs_replication3The replication number of hdfs.
output.hdfs_site_pathThe hdfs site path.
output.hdfs_urlhdfs://127.0.0.1:9000The hdfs url of output.
output.hdfs_userhadoopThe hdfs user of output.
output.output_classorg.apache.hugegraph.computer.core.output.LogOutputThe class to output the computation result of each vertex. Be called after iteration computation.
output.result_namevalueThe value is assigned dynamically by #name() of instance created by WORKER_COMPUTATION_CLASS.
output.result_write_typeOLAP_COMMONThe result write-type to output to hugegraph, allowed values are: [OLAP_COMMON, OLAP_SECONDARY, OLAP_RANGE].
output.retry_interval10The retry interval when output failed
output.retry_times3The retry times when output failed
output.single_threads1The threads number used to single output
output.thread_pool_shutdown_timeout60The timeout seconds of output threads pool shutdown
output.with_adjacent_edgesfalseOutput the adjacent edges of the vertex or not
output.with_edge_propertiesfalseOutput the properties of the edge or not
output.with_vertex_propertiesfalseOutput the properties of the vertex or not
sort.thread_nums4The number of threads performing internal sorting.
transport.client_connect_timeout3000The timeout(in ms) of client connect to server.
transport.client_threads4The number of transport threads for client.
transport.close_timeout10000The timeout(in ms) of close server or close client.
transport.finish_session_timeout0The timeout(in ms) to finish session, 0 means using (transport.sync_request_timeout * transport.max_pending_requests).
transport.heartbeat_interval20000The minimum interval(in ms) between heartbeats on client side.
transport.io_modeAUTOThe network IO Mode, either ‘NIO’, ‘EPOLL’, ‘AUTO’, the ‘AUTO’ means selecting the property mode automatically.
transport.max_pending_requests8The max number of client unreceived ack, it will trigger the sending unavailable if the number of unreceived ack >= max_pending_requests.
transport.max_syn_backlog511The capacity of SYN queue on server side, 0 means using system default value.
transport.max_timeout_heartbeat_count120The maximum times of timeout heartbeat on client side, if the number of timeouts waiting for heartbeat response continuously > max_heartbeat_timeouts the channel will be closed from client side.
transport.min_ack_interval200The minimum interval(in ms) of server reply ack.
transport.min_pending_requests6The minimum number of client unreceived ack, it will trigger the sending available if the number of unreceived ack < min_pending_requests.
transport.network_retries3The number of retry attempts for network communication,if network unstable.
transport.provider_classorg.apache.hugegraph.computer.core.network.netty.NettyTransportProviderThe transport provider, currently only supports Netty.
transport.receive_buffer_size0The size of socket receive-buffer in bytes, 0 means using system default value.
transport.recv_file_modetrueWhether enable receive buffer-file mode, it will receive buffer write file from socket by zero-copy if enable.
transport.send_buffer_size0The size of socket send-buffer in bytes, 0 means using system default value.
transport.server_host127.0.0.1The server hostname or ip to listen on to transfer data.
transport.server_idle_timeout360000The max timeout(in ms) of server idle.
transport.server_port0The server port to listen on to transfer data. The system will assign a random port if it’s set to 0.
transport.server_threads4The number of transport threads for server.
transport.sync_request_timeout10000The timeout(in ms) to wait response after sending sync-request.
transport.tcp_keep_alivetrueWhether enable TCP keep-alive.
transport.transport_epoll_ltfalseWhether enable EPOLL level-trigger.
transport.write_buffer_high_mark67108864The high water mark for write buffer in bytes, it will trigger the sending unavailable if the number of queued bytes > write_buffer_high_mark.
transport.write_buffer_low_mark33554432The low water mark for write buffer in bytes, it will trigger the sending available if the number of queued bytes < write_buffer_low_mark.org.apache.hugegraph.config.OptionChecker$$Lambda$97/0x00000008001c8440@776a6d9b
transport.write_socket_timeout3000The timeout(in ms) to write data to socket buffer.
valuefile.max_segment_size1073741824The max number of bytes in each segment of value-file.
worker.combiner_classorg.apache.hugegraph.computer.core.config.NullCombiner can combine messages into one value for a vertex, for example page-rank algorithm can combine messages of a vertex to a sum value.
worker.computation_classorg.apache.hugegraph.computer.core.config.NullThe class to create worker-computation object, worker-computation is used to compute each vertex in each superstep.
worker.data_dirs[jobs]The directories separated by ‘,’ that received vertices and messages can persist into.
worker.edge_properties_combiner_classorg.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombinerThe combiner can combine several properties of the same edge into one properties at inputstep.
worker.partitionerorg.apache.hugegraph.computer.core.graph.partition.HashPartitionerThe partitioner that decides which partition a vertex should be in, and which worker a partition should be in.
worker.received_buffers_bytes_limit104857600The limit bytes of buffers of received data, the total size of all buffers can’t excess this limit. If received buffers reach this limit, they will be merged into a file.
worker.vertex_properties_combiner_classorg.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombinerThe combiner can combine several properties of the same vertex into one properties at inputstep.
worker.wait_finish_messages_timeout86400000The max timeout(in ms) message-handler wait for finish-message of all workers.
worker.wait_sort_timeout600000The max timeout(in ms) message-handler wait for sort-thread to sort one batch of buffers.
worker.write_buffer_capacity52428800The initial size of write buffer that used to store vertex or message.
worker.write_buffer_threshold52428800The threshold of write buffer, exceeding it will trigger sorting, the write buffer is used to store vertex or message.

K8s Operator Config Options

NOTE: Option needs to be converted through environment variable settings, e.g. k8s.internal_etcd_url => INTERNAL_ETCD_URL

config optiondefault valuedescription
k8s.auto_destroy_podtrueWhether to automatically destroy all pods when the job is completed or failed.
k8s.close_reconciler_timeout120The max timeout(in ms) to close reconciler.
k8s.internal_etcd_urlhttp://127.0.0.1:2379The internal etcd url for operator system.
k8s.max_reconcile_retry3The max retry times of reconcile.
k8s.probe_backlog50The maximum backlog for serving health probes.
k8s.probe_port9892The value is the port that the controller bind to for serving health probes.
k8s.ready_check_internal1000The time interval(ms) of check ready.
k8s.ready_timeout30000The max timeout(in ms) of check ready.
k8s.reconciler_count10The max number of reconciler thread.
k8s.resync_period600000The minimum frequency at which watched resources are reconciled.
k8s.timezoneAsia/ShanghaiThe timezone of computer job and operator.
k8s.watch_namespacehugegraph-computer-systemThe value is watch custom resources in the namespace, ignore other namespaces, the ‘*’ means is all namespaces will be watched.

HugeGraph-Computer CRD

CRD: https://github.com/apache/hugegraph-computer/blob/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml

specdefault valuedescriptionrequired
algorithmNameThe name of algorithm.true
jobIdThe job id.true
imageThe image of algorithm.true
computerConfThe map of computer config options.true
workerInstancesThe number of worker instances, it will instead the ‘job.workers_count’ option.true
pullPolicyAlwaysThe pull-policy of image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#image-pull-policyfalse
pullSecretsThe pull-secrets of Image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-podfalse
masterCpuThe cpu limit of master, the unit can be ’m’ or without unit detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpufalse
workerCpuThe cpu limit of worker, the unit can be ’m’ or without unit detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpufalse
masterMemoryThe memory limit of master, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memoryfalse
workerMemoryThe memory limit of worker, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memoryfalse
log4jXmlThe content of log4j.xml for computer job.false
jarFileThe jar path of computer algorithm.false
remoteJarUriThe remote jar uri of computer algorithm, it will overlay algorithm image.false
jvmOptionsThe java startup parameters of computer job.false
envVarsplease refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/false
envFromplease refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/false
masterCommandbin/start-computer.shThe run command of master, equivalent to ‘Entrypoint’ field of Docker.false
masterArgs["-r master", “-d k8s”]The run args of master, equivalent to ‘Cmd’ field of Docker.false
workerCommandbin/start-computer.shThe run command of worker, equivalent to ‘Entrypoint’ field of Docker.false
workerArgs["-r worker", “-d k8s”]The run args of worker, equivalent to ‘Cmd’ field of Docker.false
volumesPlease refer to: https://kubernetes.io/docs/concepts/storage/volumes/false
volumeMountsPlease refer to: https://kubernetes.io/docs/concepts/storage/volumes/false
secretPathsThe map of k8s-secret name and mount path.false
configMapPathsThe map of k8s-configmap name and mount path.false
podTemplateSpecPlease refer to: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-template-v1/#PodTemplateSpecfalse
securityContextPlease refer to: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/false

KubeDriver Config Options

config optiondefault valuedescription
k8s.build_image_bash_pathThe path of command used to build image.
k8s.enable_internal_algorithmtrueWhether enable internal algorithm.
k8s.framework_image_urlhugegraph/hugegraph-computer:latestThe image url of computer framework.
k8s.image_repository_passwordThe password for login image repository.
k8s.image_repository_registryThe address for login image repository.
k8s.image_repository_urlhugegraph/hugegraph-computerThe url of image repository.
k8s.image_repository_usernameThe username for login image repository.
k8s.internal_algorithm[pageRank]The name list of all internal algorithm.
k8s.internal_algorithm_image_urlhugegraph/hugegraph-computer:latestThe image url of internal algorithm.
k8s.jar_file_dir/cache/jars/The directory where the algorithm jar to upload location.
k8s.kube_config~/.kube/configThe path of k8s config file.
k8s.log4j_xml_pathThe log4j.xml path for computer job.
k8s.namespacehugegraph-computer-systemThe namespace of hugegraph-computer system.
k8s.pull_secret_names[]The names of pull-secret for pulling image.