HugeGraph-Computer Configuration Reference

Computer Config Options

Default Value Notes:

  • Configuration items listed below show the code default values (defined in ComputerOptions.java)
  • When the packaged configuration file (conf/computer.properties in the distribution) specifies a different value, it’s noted as: value (packaged: value)
  • Example: 300000 (packaged: 100000) means the code default is 300000, but the distributed package defaults to 100000
  • For production deployments, the packaged defaults take precedence unless you explicitly override them

1. Basic Configuration

Core job settings for HugeGraph-Computer.

config optiondefault valuedescription
hugegraph.urlhttp://127.0.0.1:8080The HugeGraph server URL to load data and write results back.
hugegraph.namehugegraphThe graph name to load data and write results back.
hugegraph.username"" (empty)The username for HugeGraph authentication (leave empty if authentication is disabled).
hugegraph.password"" (empty)The password for HugeGraph authentication (leave empty if authentication is disabled).
job.idlocal_0001 (packaged: local_001)The job identifier on YARN cluster or K8s cluster.
job.namespace"" (empty)The job namespace that can separate different data sources. 🔒 Managed by system - do not modify manually.
job.workers_count1The number of workers for computing one graph algorithm job. 🔒 Managed by system - do not modify manually in K8s.
job.partitions_count1The number of partitions for computing one graph algorithm job.
job.partitions_thread_nums4The number of threads for partition parallel compute.

2. Algorithm Configuration

Algorithm-specific configuration for computation logic.

config optiondefault valuedescription
algorithm.params_classorg.apache.hugegraph.computer.core.config.Null⚠️ REQUIRED The class used to transfer algorithm parameters before the algorithm is run.
algorithm.result_classorg.apache.hugegraph.computer.core.config.NullThe class of vertex’s value, used to store the computation result for the vertex.
algorithm.message_classorg.apache.hugegraph.computer.core.config.NullThe class of message passed when computing a vertex.

3. Input Configuration

Configuration for loading input data from HugeGraph or other sources.

3.1 Input Source

config optiondefault valuedescription
input.source_typehugegraph-serverThe source type to load input data, allowed values: [‘hugegraph-server’, ‘hugegraph-loader’]. The ‘hugegraph-loader’ means use hugegraph-loader to load data from HDFS or file. If using ‘hugegraph-loader’, please configure ‘input.loader_struct_path’ and ‘input.loader_schema_path’.
input.loader_struct_path"" (empty)The struct path of loader input, only takes effect when input.source_type=loader is enabled.
input.loader_schema_path"" (empty)The schema path of loader input, only takes effect when input.source_type=loader is enabled.

3.2 Input Splits

config optiondefault valuedescription
input.split_size1048576 (1 MB)The input split size in bytes.
input.split_max_splits10000000The maximum number of input splits.
input.split_page_size500The page size for streamed load input split data.
input.split_fetch_timeout300The timeout in seconds to fetch input splits.

3.3 Input Processing

config optiondefault valuedescription
input.filter_classorg.apache.hugegraph.computer.core.input.filter.DefaultInputFilterThe class to create input-filter object. Input-filter is used to filter vertex edges according to user needs.
input.edge_directionOUTThe direction of edges to load, allowed values: [OUT, IN, BOTH]. When the value is BOTH, edges in both OUT and IN directions will be loaded.
input.edge_freqMULTIPLEThe frequency of edges that can exist between a pair of vertices, allowed values: [SINGLE, SINGLE_PER_LABEL, MULTIPLE]. SINGLE means only one edge can exist between a pair of vertices (identified by sourceId + targetId); SINGLE_PER_LABEL means each edge label can have one edge between a pair of vertices (identified by sourceId + edgeLabel + targetId); MULTIPLE means many edges can exist between a pair of vertices (identified by sourceId + edgeLabel + sortValues + targetId).
input.max_edges_in_one_vertex200The maximum number of adjacent edges allowed to be attached to a vertex. The adjacent edges will be stored and transferred together as a batch unit.

3.4 Input Performance

config optiondefault valuedescription
input.send_thread_nums4The number of threads for parallel sending of vertices or edges.

4. Snapshot & Storage Configuration

HugeGraph-Computer supports snapshot functionality to save vertex/edge partitions to local storage or MinIO object storage, enabling checkpoint recovery or accelerating repeated computations.

4.1 Basic Snapshot Configuration

config optiondefault valuedescription
snapshot.writefalseWhether to write snapshots of input vertex/edge partitions.
snapshot.loadfalseWhether to load from snapshots of vertex/edge partitions.
snapshot.name"" (empty)User-defined snapshot name to distinguish different snapshots.

4.2 MinIO Integration (Optional)

MinIO can be used as a distributed object storage backend for snapshots in K8s deployments.

config optiondefault valuedescription
snapshot.minio_endpoint"" (empty)MinIO service endpoint (e.g., http://minio:9000). Required when using MinIO.
snapshot.minio_access_keyminioadminMinIO access key for authentication.
snapshot.minio_secret_keyminioadminMinIO secret key for authentication.
snapshot.minio_bucket_name"" (empty)MinIO bucket name for storing snapshot data.

Usage Scenarios:

  • Checkpoint Recovery: Resume from snapshots after job failures, avoiding data reloading
  • Repeated Computations: Load data from snapshots when running the same algorithm multiple times
  • A/B Testing: Save multiple snapshot versions of the same dataset to test different algorithm parameters

Example: Local Snapshot (in computer.properties):

snapshot.write=true
snapshot.name=pagerank-snapshot-20260201

Example: MinIO Snapshot (in K8s CRD computerConf):

computerConf:
  snapshot.write: "true"
  snapshot.name: "pagerank-snapshot-v1"
  snapshot.minio_endpoint: "http://minio:9000"
  snapshot.minio_access_key: "my-access-key"
  snapshot.minio_secret_key: "my-secret-key"
  snapshot.minio_bucket_name: "hugegraph-snapshots"

5. Worker & Master Configuration

Configuration for worker and master computation logic.

5.1 Master Configuration

config optiondefault valuedescription
master.computation_classorg.apache.hugegraph.computer.core.master.DefaultMasterComputationMaster-computation is computation that can determine whether to continue to the next superstep. It runs at the end of each superstep on the master.

5.2 Worker Computation

config optiondefault valuedescription
worker.computation_classorg.apache.hugegraph.computer.core.config.NullThe class to create worker-computation object. Worker-computation is used to compute each vertex in each superstep.
worker.combiner_classorg.apache.hugegraph.computer.core.config.NullCombiner can combine messages into one value for a vertex. For example, PageRank algorithm can combine messages of a vertex to a sum value.
worker.partitionerorg.apache.hugegraph.computer.core.graph.partition.HashPartitionerThe partitioner that decides which partition a vertex should be in, and which worker a partition should be in.

5.3 Worker Combiners

config optiondefault valuedescription
worker.vertex_properties_combiner_classorg.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombinerThe combiner can combine several properties of the same vertex into one properties at input step.
worker.edge_properties_combiner_classorg.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombinerThe combiner can combine several properties of the same edge into one properties at input step.

5.4 Worker Buffers

config optiondefault valuedescription
worker.received_buffers_bytes_limit104857600 (100 MB)The limit bytes of buffers of received data. The total size of all buffers can’t exceed this limit. If received buffers reach this limit, they will be merged into a file (spill to disk).
worker.write_buffer_capacity52428800 (50 MB)The initial size of write buffer that used to store vertex or message.
worker.write_buffer_threshold52428800 (50 MB)The threshold of write buffer. Exceeding it will trigger sorting. The write buffer is used to store vertex or message.

5.5 Worker Data & Timeouts

config optiondefault valuedescription
worker.data_dirs[jobs]The directories separated by ‘,’ that received vertices and messages can persist into.
worker.wait_sort_timeout600000 (10 minutes)The max timeout (in ms) for message-handler to wait for sort-thread to sort one batch of buffers.
worker.wait_finish_messages_timeout86400000 (24 hours)The max timeout (in ms) for message-handler to wait for finish-message of all workers.

6. I/O & Output Configuration

Configuration for output computation results.

6.1 Output Class & Result

config optiondefault valuedescription
output.output_classorg.apache.hugegraph.computer.core.output.LogOutputThe class to output the computation result of each vertex. Called after iteration computation.
output.result_namevalueThe value is assigned dynamically by #name() of instance created by WORKER_COMPUTATION_CLASS.
output.result_write_typeOLAP_COMMONThe result write-type to output to HugeGraph, allowed values: [OLAP_COMMON, OLAP_SECONDARY, OLAP_RANGE].

6.2 Output Behavior

config optiondefault valuedescription
output.with_adjacent_edgesfalseWhether to output the adjacent edges of the vertex.
output.with_vertex_propertiesfalseWhether to output the properties of the vertex.
output.with_edge_propertiesfalseWhether to output the properties of the edge.

6.3 Batch Output

config optiondefault valuedescription
output.batch_size500The batch size of output.
output.batch_threads1The number of threads used for batch output.
output.single_threads1The number of threads used for single output.

6.4 HDFS Output

config optiondefault valuedescription
output.hdfs_urlhdfs://127.0.0.1:9000The HDFS URL for output.
output.hdfs_userhadoopThe HDFS user for output.
output.hdfs_path_prefix/hugegraph-computer/resultsThe directory of HDFS output results.
output.hdfs_delimiter, (comma)The delimiter of HDFS output.
output.hdfs_merge_partitionstrueWhether to merge output files of multiple partitions.
output.hdfs_replication3The replication number of HDFS.
output.hdfs_core_site_path"" (empty)The HDFS core site path.
output.hdfs_site_path"" (empty)The HDFS site path.
output.hdfs_kerberos_enablefalseWhether Kerberos authentication is enabled for HDFS.
output.hdfs_kerberos_principal"" (empty)The HDFS principal for Kerberos authentication.
output.hdfs_kerberos_keytab"" (empty)The HDFS keytab file for Kerberos authentication.
output.hdfs_krb5_conf/etc/krb5.confKerberos configuration file path.

6.5 Retry & Timeout

config optiondefault valuedescription
output.retry_times3The retry times when output fails.
output.retry_interval10The retry interval (in seconds) when output fails.
output.thread_pool_shutdown_timeout60The timeout (in seconds) of output thread pool shutdown.

7. Network & Transport Configuration

Configuration for network communication between workers and master.

7.1 Server Configuration

config optiondefault valuedescription
transport.server_host127.0.0.1🔒 Managed by system The server hostname or IP to listen on to transfer data. Do not modify manually.
transport.server_port0🔒 Managed by system The server port to listen on to transfer data. The system will assign a random port if set to 0. Do not modify manually.
transport.server_threads4The number of transport threads for server.

7.2 Client Configuration

config optiondefault valuedescription
transport.client_threads4The number of transport threads for client.
transport.client_connect_timeout3000The timeout (in ms) of client connect to server.

7.3 Protocol Configuration

config optiondefault valuedescription
transport.provider_classorg.apache.hugegraph.computer.core.network.netty.NettyTransportProviderThe transport provider, currently only supports Netty.
transport.io_modeAUTOThe network IO mode, allowed values: [NIO, EPOLL, AUTO]. AUTO means selecting the appropriate mode automatically.
transport.tcp_keep_alivetrueWhether to enable TCP keep-alive.
transport.transport_epoll_ltfalseWhether to enable EPOLL level-trigger (only effective when io_mode=EPOLL).

7.4 Buffer Configuration

config optiondefault valuedescription
transport.send_buffer_size0The size of socket send-buffer in bytes. 0 means using system default value.
transport.receive_buffer_size0The size of socket receive-buffer in bytes. 0 means using system default value.
transport.write_buffer_high_mark67108864 (64 MB)The high water mark for write buffer in bytes. It will trigger sending unavailable if the number of queued bytes > write_buffer_high_mark.
transport.write_buffer_low_mark33554432 (32 MB)The low water mark for write buffer in bytes. It will trigger sending available if the number of queued bytes < write_buffer_low_mark.

7.5 Flow Control

config optiondefault valuedescription
transport.max_pending_requests8The max number of client unreceived ACKs. It will trigger sending unavailable if the number of unreceived ACKs >= max_pending_requests.
transport.min_pending_requests6The minimum number of client unreceived ACKs. It will trigger sending available if the number of unreceived ACKs < min_pending_requests.
transport.min_ack_interval200The minimum interval (in ms) of server reply ACK.

7.6 Timeouts

config optiondefault valuedescription
transport.close_timeout10000The timeout (in ms) of close server or close client.
transport.sync_request_timeout10000The timeout (in ms) to wait for response after sending sync-request.
transport.finish_session_timeout0The timeout (in ms) to finish session. 0 means using (transport.sync_request_timeout × transport.max_pending_requests).
transport.write_socket_timeout3000The timeout (in ms) to write data to socket buffer.
transport.server_idle_timeout360000 (6 minutes)The max timeout (in ms) of server idle.

7.7 Heartbeat

config optiondefault valuedescription
transport.heartbeat_interval20000 (20 seconds)The minimum interval (in ms) between heartbeats on client side.
transport.max_timeout_heartbeat_count120The maximum times of timeout heartbeat on client side. If the number of timeouts waiting for heartbeat response continuously > max_timeout_heartbeat_count, the channel will be closed from client side.

7.8 Advanced Network Settings

config optiondefault valuedescription
transport.max_syn_backlog511The capacity of SYN queue on server side. 0 means using system default value.
transport.recv_file_modetrueWhether to enable receive buffer-file mode. It will receive buffer and write to file from socket using zero-copy if enabled. Note: Requires OS support for zero-copy (e.g., Linux sendfile/splice).
transport.network_retries3The number of retry attempts for network communication if network is unstable.

8. Storage & Persistence Configuration

Configuration for HGKV (HugeGraph Key-Value) storage engine and value files.

8.1 HGKV Configuration

config optiondefault valuedescription
hgkv.max_file_size2147483648 (2 GB)The max number of bytes in each HGKV file.
hgkv.max_data_block_size65536 (64 KB)The max byte size of HGKV file data block.
hgkv.max_merge_files10The max number of files to merge at one time.
hgkv.temp_file_dir/tmp/hgkvThis folder is used to store temporary files during the file merging process.

8.2 Value File Configuration

config optiondefault valuedescription
valuefile.max_segment_size1073741824 (1 GB)The max number of bytes in each segment of value-file.

9. BSP & Coordination Configuration

Configuration for Bulk Synchronous Parallel (BSP) protocol and etcd coordination.

config optiondefault valuedescription
bsp.etcd_endpointshttp://localhost:2379🔒 Managed by system in K8s The endpoints to access etcd. For multiple endpoints, use comma-separated list: http://host1:port1,http://host2:port2. Do not modify manually in K8s deployments.
bsp.max_super_step10 (packaged: 2)The max super step of the algorithm.
bsp.register_timeout300000 (packaged: 100000)The max timeout (in ms) to wait for master and workers to register.
bsp.wait_workers_timeout86400000 (24 hours)The max timeout (in ms) to wait for workers BSP event.
bsp.wait_master_timeout86400000 (24 hours)The max timeout (in ms) to wait for master BSP event.
bsp.log_interval30000 (30 seconds)The log interval (in ms) to print the log while waiting for BSP event.

10. Performance Tuning Configuration

Configuration for performance optimization.

config optiondefault valuedescription
allocator.max_vertices_per_thread10000Maximum number of vertices per thread processed in each memory allocator.
sort.thread_nums4The number of threads performing internal sorting.

11. System Administration Configuration

⚠️ Configuration items managed by the system - users are prohibited from modifying these manually.

The following configuration items are automatically managed by the K8s Operator, Driver, or runtime system. Manual modification will cause cluster communication failures or job scheduling errors.

config optionmanaged bydescription
bsp.etcd_endpointsK8s OperatorAutomatically set to operator’s etcd service address
transport.server_hostRuntimeAutomatically set to pod/container hostname
transport.server_portRuntimeAutomatically assigned random port
job.namespaceK8s OperatorAutomatically set to job namespace
job.idK8s OperatorAutomatically set to job ID from CRD
job.workers_countK8s OperatorAutomatically set from CRD workerInstances
rpc.server_hostRuntimeRPC server hostname (system-managed)
rpc.server_portRuntimeRPC server port (system-managed)
rpc.remote_urlRuntimeRPC remote URL (system-managed)

Why These Are Forbidden:

  • BSP/RPC Configuration: Must match the actual deployed etcd/RPC services. Manual overrides break coordination.
  • Job Configuration: Must match K8s CRD specifications. Mismatches cause worker count errors.
  • Transport Configuration: Must use actual pod hostnames/ports. Manual values prevent inter-worker communication.

K8s Operator Config Options

NOTE: Option needs to be converted through environment variable settings, e.g. k8s.internal_etcd_url => INTERNAL_ETCD_URL

config optiondefault valuedescription
k8s.auto_destroy_podtrueWhether to automatically destroy all pods when the job is completed or failed.
k8s.close_reconciler_timeout120The max timeout (in ms) to close reconciler.
k8s.internal_etcd_urlhttp://127.0.0.1:2379The internal etcd URL for operator system.
k8s.max_reconcile_retry3The max retry times of reconcile.
k8s.probe_backlog50The maximum backlog for serving health probes.
k8s.probe_port9892The port that the controller binds to for serving health probes.
k8s.ready_check_internal1000The time interval (ms) of check ready.
k8s.ready_timeout30000The max timeout (in ms) of check ready.
k8s.reconciler_count10The max number of reconciler threads.
k8s.resync_period600000The minimum frequency at which watched resources are reconciled.
k8s.timezoneAsia/ShanghaiThe timezone of computer job and operator.
k8s.watch_namespacehugegraph-computer-systemThe namespace to watch custom resources in. Use ‘*’ to watch all namespaces.

HugeGraph-Computer CRD

CRD: https://github.com/apache/hugegraph-computer/blob/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml

specdefault valuedescriptionrequired
algorithmNameThe name of algorithm.true
jobIdThe job id.true
imageThe image of algorithm.true
computerConfThe map of computer config options.true
workerInstancesThe number of worker instances, it will override the ‘job.workers_count’ option.true
pullPolicyAlwaysThe pull-policy of image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#image-pull-policyfalse
pullSecretsThe pull-secrets of Image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-podfalse
masterCpuThe cpu limit of master, the unit can be ’m’ or without unit detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpufalse
workerCpuThe cpu limit of worker, the unit can be ’m’ or without unit detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpufalse
masterMemoryThe memory limit of master, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memoryfalse
workerMemoryThe memory limit of worker, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memoryfalse
log4jXmlThe content of log4j.xml for computer job.false
jarFileThe jar path of computer algorithm.false
remoteJarUriThe remote jar uri of computer algorithm, it will overlay algorithm image.false
jvmOptionsThe java startup parameters of computer job.false
envVarsplease refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/false
envFromplease refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/false
masterCommandbin/start-computer.shThe run command of master, equivalent to ‘Entrypoint’ field of Docker.false
masterArgs["-r master", “-d k8s”]The run args of master, equivalent to ‘Cmd’ field of Docker.false
workerCommandbin/start-computer.shThe run command of worker, equivalent to ‘Entrypoint’ field of Docker.false
workerArgs["-r worker", “-d k8s”]The run args of worker, equivalent to ‘Cmd’ field of Docker.false
volumesPlease refer to: https://kubernetes.io/docs/concepts/storage/volumes/false
volumeMountsPlease refer to: https://kubernetes.io/docs/concepts/storage/volumes/false
secretPathsThe map of k8s-secret name and mount path.false
configMapPathsThe map of k8s-configmap name and mount path.false
podTemplateSpecPlease refer to: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-template-v1/#PodTemplateSpecfalse
securityContextPlease refer to: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/false

KubeDriver Config Options

config optiondefault valuedescription
k8s.build_image_bash_pathThe path of command used to build image.
k8s.enable_internal_algorithmtrueWhether enable internal algorithm.
k8s.framework_image_urlhugegraph/hugegraph-computer:latestThe image url of computer framework.
k8s.image_repository_passwordThe password for login image repository.
k8s.image_repository_registryThe address for login image repository.
k8s.image_repository_urlhugegraph/hugegraph-computerThe url of image repository.
k8s.image_repository_usernameThe username for login image repository.
k8s.internal_algorithm[pageRank]The name list of all internal algorithm. Note: Algorithm names use camelCase here (e.g., pageRank), but algorithm implementations return underscore_case (e.g., page_rank).
k8s.internal_algorithm_image_urlhugegraph/hugegraph-computer:latestThe image url of internal algorithm.
k8s.jar_file_dir/cache/jars/The directory where the algorithm jar will be uploaded.
k8s.kube_config~/.kube/configThe path of k8s config file.
k8s.log4j_xml_pathThe log4j.xml path for computer job.
k8s.namespacehugegraph-computer-systemThe namespace of hugegraph-computer system.
k8s.pull_secret_names[]The names of pull-secret for pulling image.