This is the multi-page printable view of this section. Click here to print.
Quick Start
1 - HugeGraph (OLTP)
1.1 - HugeGraph-Server Quick Start
1 HugeGraph-Server Overview
HugeGraph-Server is the core part of the HugeGraph Project, contains submodules such as graph-core, backend, API.
The Core Module is an implementation of the Tinkerpop interface; The Backend module is used to save the graph data to the data store, currently supported backends include: Memory, Cassandra, ScyllaDB, RocksDB; The API Module provides HTTP Server, which converts Client’s HTTP request into a call to Core Module.
There will be two spellings HugeGraph-Server and HugeGraphServer in the document, and other modules are similar. There is no big difference in the meaning of these two ways, which can be distinguished as follows:
HugeGraph-Serverrepresents the code of server-related components,HugeGraphServerrepresents the service process.
2 Dependency for Building/Running
2.1 Install Java 11 (JDK 11)
You need to use Java 11 to run HugeGraph-Server (compatible with Java 8 before 1.5.0, but not recommended to use),
and configure by yourself.
Be sure to execute the java -version command to check the jdk version before reading
Note: Using Java8 will lose some security guarantees, we recommend using Java11 in production
3 Deploy
There are four ways to deploy HugeGraph-Server components:
- Method 1: Use Docker container (Convenient for Test/Dev)
- Method 2: Download the binary tarball
- Method 3: Source code compilation
- Method 4: One-click deployment
Note: If it’s exposed to the public network, must enable Auth authentication to ensure safety (so as the legacy version).
3.1 Use Docker container (Convenient for Test/Dev)
You can refer to Docker deployment guide.
We can use docker run -itd --name=graph -e PASSWORD=xxx -p 8080:8080 hugegraph/hugegraph:1.5.0 to quickly start an inner HugeGraph server with RocksDB in background.
Optional:
- use docker exec -it graph bashto enter the container to do some operations.
- use docker run -itd --name=graph -p 8080:8080 -e PRELOAD="true" hugegraph/hugegraph:1.5.0to start with a built-in example graph. We can useRESTful APIto verify the result. The detailed step can refer to 5.1.9
- use -e PASSWORD=xxxto enable auth mode and set the password for admin. You can find more details from Config Authentication
If you use docker desktop, you can set the option like:

Also, if we want to manage the other Hugegraph related instances in one file, we can use docker-compose to deploy, with the command docker-compose up -d (you can config only server). Here is an example docker-compose.yml:
version: '3'
services:
  server:
    image: hugegraph/hugegraph:1.5.0
    container_name: server
    environment:
     - PASSWORD=xxx
    # PASSWORD is an option to enable auth mode with the password you set.
    #  - PRELOAD=true
    # PRELOAD is a option to preload a build-in sample graph when initializing.
    ports:
      - 8080:8080
Note:
The docker image of the hugegraph is a convenient release to start it quickly, but not official distribution artifacts. You can find more details from ASF Release Distribution Policy.
Recommend to use
release tag(like1.5.0/1.5.0) for the stable version. Uselatesttag to experience the newest functions in development.
3.2 Download the binary tar tarball
You could download the binary tarball from the download page of the ASF site like this:
# use the latest version, here is 1.5.0 for example
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-{version}.tar.gz
tar zxf *hugegraph*.tar.gz
# (Optional) verify the integrity with SHA512 (recommended)
shasum -a 512 apache-hugegraph-incubating-{version}.tar.gz
curl https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-{version}.tar.gz.sha512
3.3 Source code compilation
Please ensure that the wget command is installed before compiling the source code
We could get HugeGraph source code in 2 ways: (So as the other HugeGraph repos/modules)
- download the stable/release version from the ASF site
- clone the unstable/latest version by GitBox(ASF) or GitHub
# Way 1. download release package from the ASF site
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-src-{version}.tar.gz
tar zxf *hugegraph*.tar.gz
# (Optional) verify the integrity with SHA512 (recommended)
shasum -a 512 apache-hugegraph-incubating-src-{version}.tar.gz
curl https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-{version}-src.tar.gz.sha512
# Way2 : clone the latest code by git way (e.g GitHub)
git clone https://github.com/apache/hugegraph.git
Compile and generate tarball
cd *hugegraph
# (Optional) use "-P stage" param if you build failed with the latest code(during pre-release period)
mvn package -DskipTests -ntp
The execution log is as follows:
......
[INFO] Reactor Summary for hugegraph 1.5.0:
[INFO] 
[INFO] hugegraph .......................................... SUCCESS [  2.405 s]
[INFO] hugegraph-core ..................................... SUCCESS [ 13.405 s]
[INFO] hugegraph-api ...................................... SUCCESS [ 25.943 s]
[INFO] hugegraph-cassandra ................................ SUCCESS [ 54.270 s]
[INFO] hugegraph-scylladb ................................. SUCCESS [  1.032 s]
[INFO] hugegraph-rocksdb .................................. SUCCESS [ 34.752 s]
[INFO] hugegraph-mysql .................................... SUCCESS [  1.778 s]
[INFO] hugegraph-palo ..................................... SUCCESS [  1.070 s]
[INFO] hugegraph-hbase .................................... SUCCESS [ 32.124 s]
[INFO] hugegraph-postgresql ............................... SUCCESS [  1.823 s]
[INFO] hugegraph-dist ..................................... SUCCESS [ 17.426 s]
[INFO] hugegraph-example .................................. SUCCESS [  1.941 s]
[INFO] hugegraph-test ..................................... SUCCESS [01:01 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
......
After successful execution, *hugegraph-*.tar.gz files will be generated in the hugegraph directory, which is the tarball generated by compilation.
Outdated tools
#### 3.4 One-click deployment (Outdated)HugeGraph-Tools provides a command-line tool for one-click deployment, users can use this tool to quickly download, decompress, configure and start HugeGraphServer and HugeGraph-Hubble with one click.
Of course, you should download the tarball of HugeGraph-Toolchain first.
# download toolchain binary package, it includes loader + tool + hubble
# please check the latest version (e.g. here is 1.5.0)
wget https://downloads.apache.org/incubator/hugegraph/1.5.0/apache-hugegraph-toolchain-incubating-1.5.0.tar.gz
tar zxf *hugegraph-*.tar.gz
# enter the tool's package
cd *hugegraph*/*tool* 
note:
${version}is the version, The latest version can refer to Download Page, or click the link to download directly from the Download page
The general entry script for HugeGraph-Tools is bin/hugegraph, Users can use the help command to view its usage, here only the commands for one-click deployment are introduced.
bin/hugegraph deploy -v {hugegraph-version} -p {install-path} [-u {download-path-prefix}]
{hugegraph-version} indicates the version of HugeGraphServer and HugeGraphStudio to be deployed, users can view the conf/version-mapping.yaml file for version information, {install-path} specify the installation directory of HugeGraphServer and HugeGraphStudio, {download-path-prefix} optional, specify the download address of HugeGraphServer and HugeGraphStudio tarball, use default download URL if not provided, for example, to start HugeGraph-Server and HugeGraphStudio version 0.6, write the above command as bin/hugegraph deploy -v 0.6 -p services.
4 Config
If you need to quickly start HugeGraph just for testing, then you only need to modify a few configuration items (see next section). For detailed configuration introduction, please refer to configuration document and introduction to configuration items
5 Startup
5.1 Use a startup script to startup
The startup is divided into “first startup” and “non-first startup.” This distinction is because the back-end database needs to be initialized before the first startup, and then the service is started. after the service is stopped artificially, or when the service needs to be started again for other reasons, because the backend database is persistent, you can start the service directly.
When HugeGraphServer starts, it will connect to the backend storage and try to check the version number of the backend storage. If the backend is not initialized or the backend has been initialized but the version does not match (old version data), HugeGraphServer will fail to start and give an error message.
If you need to access HugeGraphServer externally, please modify the restserver.url configuration item of rest-server.properties
(default is http://127.0.0.1:8080), change to machine name or IP address.
Since the configuration (hugegraph.properties) and startup steps required by various backends are slightly different, the following will introduce the configuration and startup of each backend one by one.
Follow the Server Authentication Configuration before you start Server later.
5.1.1 Distributed Storage (HStore)
Click to expand/collapse Distributed Storage configuration and startup method
Distributed storage is a new feature introduced after HugeGraph 1.5.0, which implements distributed data storage and computation based on HugeGraph-PD and HugeGraph-Store components.
To use the distributed storage engine, you need to deploy HugeGraph-PD and HugeGraph-Store first. See HugeGraph-PD Quick Start and HugeGraph-Store Quick Start.
After ensuring that both PD and Store services are started, modify the hugegraph.properties configuration of HugeGraph-Server:
backend=hstore
serializer=binary
task.scheduler_type=distributed
# PD service address, multiple PD addresses are separated by commas, configure PD's RPC port
pd.peers=127.0.0.1:8686,127.0.0.1:8687,127.0.0.1:8688
If configuring multiple HugeGraph-Server nodes, you need to modify the rest-server.properties configuration file for each node, for example:
Node 1 (Master node):
restserver.url=http://127.0.0.1:8081
gremlinserver.url=http://127.0.0.1:8181
rpc.server_host=127.0.0.1
rpc.server_port=8091
server.id=server-1
server.role=master
Node 2 (Worker node):
restserver.url=http://127.0.0.1:8082
gremlinserver.url=http://127.0.0.1:8182
rpc.server_host=127.0.0.1
rpc.server_port=8092
server.id=server-2
server.role=worker
Also, you need to modify the port configuration in gremlin-server.yaml for each node:
Node 1:
host: 127.0.0.1
port: 8181
Node 2:
host: 127.0.0.1
port: 8182
Initialize the database:
cd *hugegraph-${version}
bin/init-store.sh
Start the Server:
bin/start-hugegraph.sh
The startup sequence for using the distributed storage engine is:
- Start HugeGraph-PD
- Start HugeGraph-Store
- Initialize the database (only for the first time)
- Start HugeGraph-Server
Verify that the service is started properly:
curl http://localhost:8081/graphs
# Should return: {"graphs":["hugegraph"]}
The sequence to stop the services should be the reverse of the startup sequence:
- Stop HugeGraph-Server
- Stop HugeGraph-Store
- Stop HugeGraph-PD
bin/stop-hugegraph.sh
5.1.2 Memory
Click to expand/collapse Memory configuration and startup methods
Update hugegraph.properties
backend=memory
serializer=text
The data of the Memory backend is stored in memory and cannot be persisted. It does not need to initialize the backend. This is the only backend that does not require initialization.
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
The prompted url is the same as the restserver.url configured in rest-server.properties
5.1.3 RocksDB / ToplingDB
Click to expand/collapse RocksDB configuration and startup methods
RocksDB is an embedded database that does not require manual installation and deployment. GCC version >= 4.3.0 (GLIBCXX_3.4.10) is required. If not, GCC needs to be upgraded in advance
Update hugegraph.properties
backend=rocksdb
serializer=binary
rocksdb.data_path=.
rocksdb.wal_path=.
Initialize the database (required on the first startup, or a new configuration was manually added under ‘conf/graphs/’)
cd *hugegraph-${version}
bin/init-store.sh
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
ToplingDB (Beta): As a high-performance alternative to RocksDB, please refer to the configuration guide: ToplingDB Quick Start
5.1.4 Cassandra
Click to expand/collapse Cassandra configuration and startup methods
users need to install Cassandra by themselves, requiring version 3.0 or above, download link
Update hugegraph.properties
backend=cassandra
serializer=cassandra
# cassandra backend config
cassandra.host=localhost
cassandra.port=9042
cassandra.username=
cassandra.password=
#cassandra.connect_timeout=5
#cassandra.read_timeout=20
#cassandra.keyspace.strategy=SimpleStrategy
#cassandra.keyspace.replication=3
Initialize the database (required on the first startup, or a new configuration was manually added under ‘conf/graphs/’)
cd *hugegraph-${version}
bin/init-store.sh
Initing HugeGraph Store...
2017-12-01 11:26:51 1424  [main] [INFO ] org.apache.hugegraph.HugeGraph [] - Opening backend store: 'cassandra'
2017-12-01 11:26:52 2389  [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph, try init keyspace later
2017-12-01 11:26:52 2472  [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph, try init keyspace later
2017-12-01 11:26:52 2557  [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph, try init keyspace later
2017-12-01 11:26:53 2797  [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_graph
2017-12-01 11:26:53 2945  [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_schema
2017-12-01 11:26:53 3044  [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_index
2017-12-01 11:26:53 3046  [pool-3-thread-1] [INFO ] org.apache.hugegraph.backend.Transaction [] - Clear cache on event 'store.init'
2017-12-01 11:26:59 9720  [main] [INFO ] org.apache.hugegraph.HugeGraph [] - Opening backend store: 'cassandra'
2017-12-01 11:27:00 9805  [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph1, try init keyspace later
2017-12-01 11:27:00 9886  [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph1, try init keyspace later
2017-12-01 11:27:00 9955  [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph1, try init keyspace later
2017-12-01 11:27:00 10175 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_graph
2017-12-01 11:27:00 10321 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_schema
2017-12-01 11:27:00 10413 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_index
2017-12-01 11:27:00 10413 [pool-3-thread-1] [INFO ] org.apache.hugegraph.backend.Transaction [] - Clear cache on event 'store.init'
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
5.1.5 ScyllaDB
Click to expand/collapse ScyllaDB configuration and startup methods
users need to install ScyllaDB by themselves, version 2.1 or above is recommended, download link
Update hugegraph.properties
backend=scylladb
serializer=scylladb
# cassandra backend config
cassandra.host=localhost
cassandra.port=9042
cassandra.username=
cassandra.password=
#cassandra.connect_timeout=5
#cassandra.read_timeout=20
#cassandra.keyspace.strategy=SimpleStrategy
#cassandra.keyspace.replication=3
Since the scylladb database itself is an “optimized version” based on cassandra, if the user does not have scylladb installed, they can also use cassandra as the backend storage directly. They only need to change the backend and serializer to scylladb, and the host and post point to the seeds and port of the cassandra cluster. Yes, but it is not recommended to do so, it will not take advantage of scylladb itself.
Initialize the database (required on the first startup, or a new configuration was manually added under ‘conf/graphs/’)
cd *hugegraph-${version}
bin/init-store.sh
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
5.1.6 HBase
Click to expand/collapse HBase configuration and startup methods
users need to install HBase by themselves, requiring version 2.0 or above,download link
Update hugegraph.properties
backend=hbase
serializer=hbase
# hbase backend config
hbase.hosts=localhost
hbase.port=2181
# Note: recommend to modify the HBase partition number by the actual/env data amount & RS amount before init store
# it may influence the loading speed a lot
#hbase.enable_partition=true
#hbase.vertex_partitions=10
#hbase.edge_partitions=30
Initialize the database (required on the first startup, or a new configuration was manually added under ‘conf/graphs/’)
cd *hugegraph-${version}
bin/init-store.sh
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
for more other backend configurations, please refer tointroduction to configuration options
5.1.7 MySQL
Click to expand/collapse MySQL configuration and startup methods
Because MySQL is licensed under the GPL and incompatible with the Apache License, users must install MySQL themselves, download link
Download the MySQL driver package, such as mysql-connector-java-8.0.30.jar, and place it in the lib directory of HugeGraph-Server.
Update hugegraph.properties to configure the database URL, username, and password.
store is the database name; it will be created automatically if it doesn’t exist.
backend=mysql
serializer=mysql
store=hugegraph
# mysql backend config
jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.url=jdbc:mysql://127.0.0.1:3306
jdbc.username=
jdbc.password=
jdbc.reconnect_max_times=3
jdbc.reconnect_interval=3
jdbc.ssl_mode=false
Initialize the database (required for first startup or when manually adding new configurations to conf/graphs/)
cd *hugegraph-${version}
bin/init-store.sh
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
5.1.8 Create an example graph when startup
Carry the -p true arguments when starting the script, which indicates preload, to create a sample graph.
bin/start-hugegraph.sh -p true
Starting HugeGraphServer in daemon mode...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)......OK
And use the RESTful API to request HugeGraphServer and get the following result:
> curl "http://localhost:8080/graphs/hugegraph/graph/vertices" | gunzip
{"vertices":[{"id":"2:lop","label":"software","type":"vertex","properties":{"name":"lop","lang":"java","price":328}},{"id":"1:josh","label":"person","type":"vertex","properties":{"name":"josh","age":32,"city":"Beijing"}},{"id":"1:marko","label":"person","type":"vertex","properties":{"name":"marko","age":29,"city":"Beijing"}},{"id":"1:peter","label":"person","type":"vertex","properties":{"name":"peter","age":35,"city":"Shanghai"}},{"id":"1:vadas","label":"person","type":"vertex","properties":{"name":"vadas","age":27,"city":"Hongkong"}},{"id":"2:ripple","label":"software","type":"vertex","properties":{"name":"ripple","lang":"java","price":199}}]}
This indicates the successful creation of the sample graph.
5.2 Use Docker to startup
In 3.1 Use Docker container, we have introduced how to use docker to deploy hugegraph-server. server can also preload an example graph by setting the parameter.
5.2.1 Uses Cassandra as storage
Click to expand/collapse Cassandra configuration and startup methods
When using Docker, we can use Cassandra as the backend storage. We highly recommend using docker-compose directly to manage both the server and Cassandra.
The sample docker-compose.yml can be obtained on GitHub, and you can start it with docker-compose up -d. (If using Cassandra 4.0 as the backend storage, it takes approximately two minutes to initialize. Please be patient.)
version: "3"
services:
  graph:
    image: hugegraph/hugegraph
    container_name: cas-server
    ports:
      - 8080:8080
    environment:
      hugegraph.backend: cassandra
      hugegraph.serializer: cassandra
      hugegraph.cassandra.host: cas-cassandra
      hugegraph.cassandra.port: 9042
    networks:
      - ca-network
    depends_on:
      - cassandra
    healthcheck:
      test: ["CMD", "bin/gremlin-console.sh", "--" ,"-e", "scripts/remote-connect.groovy"]
      interval: 10s
      timeout: 30s
      retries: 3
  cassandra:
    image: cassandra:4
    container_name: cas-cassandra
    ports:
      - 7000:7000
      - 9042:9042
    security_opt:
      - seccomp:unconfined
    networks:
      - ca-network
    healthcheck:
      test: ["CMD", "cqlsh", "--execute", "describe keyspaces;"]
      interval: 10s
      timeout: 30s
      retries: 5
networks:
  ca-network:
volumes:
  hugegraph-data:
In this YAML file, configuration parameters related to Cassandra need to be passed as environment variables in the format of hugegraph.<parameter_name>.
Specifically, in the configuration file hugegraph.properties , there are settings like backend=xxx and cassandra.host=xxx. To configure these settings during the process of passing environment variables, we need to prepend hugegraph. to these configurations, like hugegraph.backend and hugegraph.cassandra.host.
The rest of the configurations can be referenced under 4 config
5.2.2 Create an example graph when starting a server
Set the environment variable PRELOAD=true when starting Docker to load data during the execution of the startup script.
- Use - docker run- Use - docker run -itd --name=server -p 8080:8080 -e PRELOAD=true hugegraph/hugegraph:1.5.0
- Use - docker-compose- Create - docker-compose.ymlas following. We should set the environment variable- PRELOAD=true.- example.groovyis a predefined script to preload the sample data. If needed, we can mount a new- example.groovyto change the preload data.- version: '3' services: server: image: hugegraph/hugegraph:1.5.0 container_name: server environment: - PRELOAD=true - PASSWORD=xxx ports: - 8080:8080- Use - docker-compose up -dto start the container
And use the RESTful API to request HugeGraphServer and get the following result:
> curl "http://localhost:8080/graphs/hugegraph/graph/vertices" | gunzip
{"vertices":[{"id":"2:lop","label":"software","type":"vertex","properties":{"name":"lop","lang":"java","price":328}},{"id":"1:josh","label":"person","type":"vertex","properties":{"name":"josh","age":32,"city":"Beijing"}},{"id":"1:marko","label":"person","type":"vertex","properties":{"name":"marko","age":29,"city":"Beijing"}},{"id":"1:peter","label":"person","type":"vertex","properties":{"name":"peter","age":35,"city":"Shanghai"}},{"id":"1:vadas","label":"person","type":"vertex","properties":{"name":"vadas","age":27,"city":"Hongkong"}},{"id":"2:ripple","label":"software","type":"vertex","properties":{"name":"ripple","lang":"java","price":199}}]}
This indicates the successful creation of the sample graph.
6. Access server
6.1 Service startup status check
Use jps to see a service process
jps
6475 HugeGraphServer
curl request RESTfulAPI
echo `curl -o /dev/null -s -w %{http_code} "http://localhost:8080/graphs/hugegraph/graph/vertices"`
Return 200, which means the server starts normally.
6.2 Request Server
The RESTful API of HugeGraphServer includes various types of resources, typically including graph, schema, gremlin, traverser and task.
- graphcontains- vertices、- edges
- schemacontains- vertexlabels、- propertykeys、- edgelabels、- indexlabels
- gremlincontains various- Gremlinstatements, such as- g.v(), which can be executed synchronously or asynchronously
- traversercontains various advanced queries including shortest paths, intersections, N-step reachable neighbors, etc.
- taskcontains query and delete with asynchronous tasks
6.2.1 Get vertices and its related properties in hugegraph
curl http://localhost:8080/graphs/hugegraph/graph/vertices 
explanation
- Since there are many vertices and edges in the graph, for list-type requests, such as getting all vertices, getting all edges, etc., the server will compress the data and return it, so when use curl, you get a bunch of garbled characters, you can redirect to gunzip for decompression. It is recommended to use the Chrome browser + Restlet plugin to send HTTP requests for testing. - curl "http://localhost:8080/graphs/hugegraph/graph/vertices" | gunzip
- The current default configuration of HugeGraphServer can only be accessed locally, and the configuration can be modified so that it can be accessed on other machines. - vim conf/rest-server.properties restserver.url=http://0.0.0.0:8080
response body:
{
    "vertices": [
        {
            "id": "2lop",
            "label": "software",
            "type": "vertex",
            "properties": {
                "price": [
                    {
                        "id": "price",
                        "value": 328
                    }
                ],
                "name": [
                    {
                        "id": "name",
                        "value": "lop"
                    }
                ],
                "lang": [
                    {
                        "id": "lang",
                        "value": "java"
                    }
                ]
            }
        },
        {
            "id": "1josh",
            "label": "person",
            "type": "vertex",
            "properties": {
                "name": [
                    {
                        "id": "name",
                        "value": "josh"
                    }
                ],
                "age": [
                    {
                        "id": "age",
                        "value": 32
                    }
                ]
            }
        },
        ...
    ]
}
For the detailed API, please refer to RESTful-API
You can also visit localhost:8080/swagger-ui/index.html to check the API.

When using Swagger UI to debug the API provided by HugeGraph, if HugeGraph Server turns on authentication mode, you can enter authentication information on the Swagger page.

Currently, HugeGraph supports setting authentication information in two forms: Basic and Bearer.

7 Stop Server
$cd *hugegraph-${version}
$bin/stop-hugegraph.sh
8 Debug Server with IntelliJ IDEA
Please refer to Setup Server in IDEA
1.2 - HugeGraph-PD Quick Start
1 HugeGraph-PD Overview
HugeGraph-PD (Placement Driver) is the metadata management component of HugeGraph’s distributed version, responsible for managing the distribution of graph data and coordinating storage nodes. It plays a central role in distributed HugeGraph, maintaining cluster status and coordinating HugeGraph-Store storage nodes.
2 Prerequisites
2.1 Requirements
- Operating System: Linux or MacOS (Windows has not been fully tested)
- Java version: ≥ 11
- Maven version: ≥ 3.5.0
3 Deployment
There are two ways to deploy the HugeGraph-PD component:
- Method 1: Download the tar package
- Method 2: Compile from source
3.1 Download the tar package
Download the latest version of HugeGraph-PD from the Apache HugeGraph official download page:
# Replace {version} with the latest version number, e.g., 1.5.0
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-{version}.tar.gz  
tar zxf apache-hugegraph-incubating-{version}.tar.gz
cd apache-hugegraph-incubating-{version}/apache-hugegraph-pd-incubating-{version}
3.2 Compile from source
# 1. Clone the source code
git clone https://github.com/apache/hugegraph.git
# 2. Build the project
cd hugegraph
mvn clean install -DskipTests=true
# 3. After successful compilation, the PD module build artifacts will be located at
#    apache-hugegraph-incubating-{version}/apache-hugegraph-pd-incubating-{version}
#    target/apache-hugegraph-incubating-{version}.tar.gz
4 Configuration
The main configuration file for PD is conf/application.yml. Here are the key configuration items:
spring:
  application:
    name: hugegraph-pd
grpc:
  # gRPC port for cluster mode
  port: 8686
  host: 127.0.0.1
server:
  # REST service port
  port: 8620
pd:
  # Storage path
  data-path: ./pd_data
  # Auto-expansion check cycle (seconds)
  patrol-interval: 1800
  # Initial store list, stores in the list are automatically activated
  initial-store-count: 1
  # Store configuration information, format is IP:gRPC port
  initial-store-list: 127.0.0.1:8500
raft:
  # Cluster mode
  address: 127.0.0.1:8610
  # Raft addresses of all PD nodes in the cluster
  peers-list: 127.0.0.1:8610
store:
  # Store offline time (seconds). After this time, the store is considered permanently unavailable
  max-down-time: 172800
  # Whether to enable store monitoring data storage
  monitor_data_enabled: true
  # Monitoring data interval
  monitor_data_interval: 1 minute
  # Monitoring data retention time
  monitor_data_retention: 1 day
  initial-store-count: 1
partition:
  # Default number of replicas per partition
  default-shard-count: 1
  # Default maximum number of replicas per machine
  store-max-shard-count: 12
For multi-node deployment, you need to modify the port and address configurations for each node to ensure proper communication between nodes.
5 Start and Stop
5.1 Start PD
In the PD installation directory, execute:
./bin/start-hugegraph-pd.sh
After successful startup, you can see logs similar to the following in logs/hugegraph-pd-stdout.log:
2024-xx-xx xx:xx:xx [main] [INFO] o.a.h.p.b.HugePDServer - Started HugePDServer in x.xxx seconds (JVM running for x.xxx)
5.2 Stop PD
In the PD installation directory, execute:
./bin/stop-hugegraph-pd.sh
6 Verification
Confirm that the PD service is running properly:
curl http://localhost:8620/actuator/health
If it returns {"status":"UP"}, it indicates that the PD service has been successfully started.
Additionally, you can verify the status of the Store node by querying the PD API:
curl http://localhost:8620/v1/stores
If the Store is configured successfully, the response of the above interface should contain the status information of the current node. The status “Up” indicates that the node is running normally. Only the response of one node configuration is shown here. If all three nodes are configured successfully and are running, the storeId list in the response should contain three IDs, and the Up, numOfService, and numOfNormalService fields in stateCountMap should be 3.
{
  "message": "OK",
  "data": {
    "stores": [
      {
        "storeId": 8319292642220586694,
        "address": "127.0.0.1:8500",
        "raftAddress": "127.0.0.1:8510",
        "version": "",
        "state": "Up",
        "deployPath": "/Users/{your_user_name}/hugegraph/apache-hugegraph-incubating-1.5.0/apache-hugegraph-store-incubating-1.5.0/lib/hg-store-node-1.5.0.jar",
        "dataPath": "./storage",
        "startTimeStamp": 1754027127969,
        "registedTimeStamp": 1754027127969,
        "lastHeartBeat": 1754027909444,
        "capacity": 494384795648,
        "available": 346535829504,
        "partitionCount": 0,
        "graphSize": 0,
        "keyCount": 0,
        "leaderCount": 0,
        "serviceName": "127.0.0.1:8500-store",
        "serviceVersion": "",
        "serviceCreatedTimeStamp": 1754027127000,
        "partitions": []
      }
    ],
    "stateCountMap": {
      "Up": 1
    },
    "numOfService": 1,
    "numOfNormalService": 1
  },
  "status": 0
}
1.3 - HugeGraph-Store Quick Start
1 HugeGraph-Store Overview
HugeGraph-Store is the storage node component of HugeGraph’s distributed version, responsible for actually storing and managing graph data. It works in conjunction with HugeGraph-PD to form HugeGraph’s distributed storage engine, providing high availability and horizontal scalability.
2 Prerequisites
2.1 Requirements
- Operating System: Linux or MacOS (Windows has not been fully tested)
- Java version: ≥ 11
- Maven version: ≥ 3.5.0
- Deployed HugeGraph-PD (for multi-node deployment)
3 Deployment
There are two ways to deploy the HugeGraph-Store component:
- Method 1: Download the tar package
- Method 2: Compile from source
3.1 Download the tar package
Download the latest version of HugeGraph-Store from the Apache HugeGraph official download page:
# Replace {version} with the latest version number, e.g., 1.5.0
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-{version}.tar.gz  
tar zxf apache-hugegraph-incubating-{version}.tar.gz
cd apache-hugegraph-incubating-{version}/apache-hugegraph-hstore-incubating-{version}
3.2 Compile from source
# 1. Clone the source code
git clone https://github.com/apache/hugegraph.git
# 2. Build the project
cd hugegraph
mvn clean install -DskipTests=true
# 3. After successful compilation, the Store module build artifacts will be located at
#    apache-hugegraph-incubating-{version}/apache-hugegraph-hstore-incubating-{version}
#    target/apache-hugegraph-incubating-{version}.tar.gz
4 Configuration
The main configuration file for Store is conf/application.yml. Here are the key configuration items:
pdserver:
  # PD service address, multiple PD addresses are separated by commas (configure PD's gRPC port)
  address: 127.0.0.1:8686
grpc:
  # gRPC service address
  host: 127.0.0.1
  port: 8500
  netty-server:
    max-inbound-message-size: 1000MB
raft:
  # raft cache queue size
  disruptorBufferSize: 1024
  address: 127.0.0.1:8510
  max-log-file-size: 600000000000
  # Snapshot generation time interval, in seconds
  snapshotInterval: 1800
server:
  # REST service address
  port: 8520
app:
  # Storage path, supports multiple paths separated by commas
  data-path: ./storage
  #raft-path: ./storage
spring:
  application:
    name: store-node-grpc-server
  profiles:
    active: default
    include: pd
logging:
  config: 'file:./conf/log4j2.xml'
  level:
    root: info
For multi-node deployment, you need to modify the following configurations for each Store node:
- grpc.port(RPC port) for each node
- raft.address(Raft protocol port) for each node
- server.port(REST port) for each node
- app.data-path(data storage path) for each node
5 Start and Stop
5.1 Start Store
Ensure that the PD service is already started, then in the Store installation directory, execute:
./bin/start-hugegraph-store.sh
After successful startup, you can see logs similar to the following in logs/hugegraph-store-server.log:
2024-xx-xx xx:xx:xx [main] [INFO] o.a.h.s.n.StoreNodeApplication - Started StoreNodeApplication in x.xxx seconds (JVM running for x.xxx)
5.2 Stop Store
In the Store installation directory, execute:
./bin/stop-hugegraph-store.sh
6 Multi-Node Deployment Example
Below is a configuration example for a three-node deployment:
6.1 Three-Node Configuration Reference
- 3 PD nodes- raft ports: 8610, 8611, 8612
- rpc ports: 8686, 8687, 8688
- rest ports: 8620, 8621, 8622
 
- 3 Store nodes- raft ports: 8510, 8511, 8512
- rpc ports: 8500, 8501, 8502
- rest ports: 8520, 8521, 8522
 
6.2 Store Node Configuration
For the three Store nodes, the main configuration differences are as follows:
Node A:
grpc:
  port: 8500
raft:
  address: 127.0.0.1:8510
server:
  port: 8520
app:
  data-path: ./storage-a
Node B:
grpc:
  port: 8501
raft:
  address: 127.0.0.1:8511
server:
  port: 8521
app:
  data-path: ./storage-b
Node C:
grpc:
  port: 8502
raft:
  address: 127.0.0.1:8512
server:
  port: 8522
app:
  data-path: ./storage-c
All nodes should point to the same PD cluster:
pdserver:
  address: 127.0.0.1:8686,127.0.0.1:8687,127.0.0.1:8688
7 Verify Store Service
Confirm that the Store service is running properly:
curl http://localhost:8520/actuator/health
If it returns {"status":"UP"}, it indicates that the Store service has been successfully started.
Additionally, you can check the status of Store nodes in the cluster through the PD API:
curl http://localhost:8620/pd/api/v1/stores
2 - HugeGraph ToolChain
Testing Guide: For running toolchain tests locally, please refer to HugeGraph Toolchain Local Testing Guide
2.1 - HugeGraph-Hubble Quick Start
1 HugeGraph-Hubble Overview
Note: The current version of Hubble has not yet added Auth/Login related interfaces and standalone protection, it will be added in the next Release version (> 1.5). Please be careful not to expose it in a public network environment or untrusted networks to avoid related SEC issues (you can also use IP & port whitelist + HTTPS)
Testing Guide: For running HugeGraph-Hubble tests locally, please refer to HugeGraph Toolchain Local Testing Guide
HugeGraph-Hubble is HugeGraph’s one-stop visual analysis platform. The platform covers the whole process from data modeling, to efficient data import, to real-time and offline analysis of data, and unified management of graphs, realizing the whole process wizard of graph application. It is designed to improve the user’s use fluency, lower the user’s use threshold, and provide a more efficient and easy-to-use user experience.
The platform mainly includes the following modules:
Graph Management
The graph management module realizes the unified management of multiple graphs and graph access, editing, deletion, and query by creating graph and connecting the platform and graph data.
Metadata Modeling
The metadata modeling module realizes the construction and management of graph models by creating attribute libraries, vertex types, edge types, and index types. The platform provides two modes, list mode and graph mode, which can display the metadata model in real time, which is more intuitive. At the same time, it also provides a metadata reuse function across graphs, which saves the tedious and repetitive creation process of the same metadata, greatly improves modeling efficiency and enhances ease of use.
Graph Analysis
By inputting the graph traversal language Gremlin, high-performance general analysis of graph data can be realized, and functions such as customized multidimensional path query of vertices can be provided, and three kinds of graph result display methods are provided, including: graph form, table form, Json form, and multidimensional display. The data form meets the needs of various scenarios used by users. It provides functions such as running records and collection of common statements, realizing the traceability of graph operations, and the reuse and sharing of query input, which is fast and efficient. It supports the export of graph data, and the export format is JSON format.
Task Management
For Gremlin tasks that need to traverse the whole graph, index creation and reconstruction, and other time-consuming asynchronous tasks, the platform provides corresponding task management functions to achieve unified management and result viewing of asynchronous tasks.
Data Import
“Note: The data import function is currently suitable for preliminary use. For formal data import, please use hugegraph-loader, which has much better performance, stability, and functionality.”
Data import is to convert the user’s business data into the vertices and edges of the graph and insert it into the graph database. The platform provides a wizard-style visual import module. By creating import tasks, the management of import tasks and the parallel operation of multiple import tasks are realized. Improve import performance. After entering the import task, you only need to follow the platform step prompts, upload files as needed, and fill in the content to easily implement the import process of graph data. At the same time, it supports breakpoint resuming, error retry mechanism, etc., which reduces import costs and improves efficiency.
2 Deploy
There are three ways to deploy hugegraph-hubble
- Use Docker (Convenient for Test/Dev)
- Download the Toolchain binary package
- Source code compilation
2.1 Use docker (Convenient for Test/Dev)
Special Note: If you are starting
hubblewith Docker, andhubbleand the server are on the same host. When configuring the hostname for the graph on the Hubble web page, please do not directly set it tolocalhost/127.0.0.1. This will refer to thehubblecontainer internally rather than the host machine, resulting in a connection failure to the server.If
hubbleandserveris in the same docker network, we recommend using thecontainer_name(in our example, it isserver) as the hostname, and8080as the port. Or you can use the host IP as the hostname, and the port is configured by the host for the server.
We can use docker run -itd --name=hubble -p 8088:8088 hugegraph/hubble:1.5.0 to quick start hubble.
Alternatively, you can use Docker Compose to start hubble. Additionally, if hubble and the graph is in the same Docker network, you can access the graph using the container name of the graph, eliminating the need for the host machine’s IP address.
Use docker-compose up -d,docker-compose.yml is following:
version: '3'
services:
  server:
    image: hugegraph/hugegraph:1.5.0
    container_name: server
    environment:
      - PASSWORD=xxx
    ports:
      - 8080:8080
  hubble:
    image: hugegraph/hubble:1.5.0
    container_name: hubble
    ports:
      - 8088:8088
Note:
The docker image of hugegraph-hubble is a convenience release to start hugegraph-hubble quickly, but not official distribution artifacts. You can find more details from ASF Release Distribution Policy.
Recommend to use
release tag(like1.5.0) for the stable version. Uselatesttag to experience the newest functions in development.
2.2 Download the Toolchain binary package
hubble is in the toolchain project. First, download the binary tar tarball
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-toolchain-incubating-{version}.tar.gz
tar -xvf apache-hugegraph-toolchain-incubating-{version}.tar.gz 
cd apache-hugegraph-toolchain-incubating-{version}.tar.gz/apache-hugegraph-hubble-incubating-{version}
Run hubble
bin/start-hubble.sh
Then, we can see:
starting HugeGraphHubble ..............timed out with http status 502
2023-08-30 20:38:34 [main] [INFO ] o.a.h.HugeGraphHubble [] - Starting HugeGraphHubble v1.0.0 on cpu05 with PID xxx (~/apache-hugegraph-toolchain-incubating-1.0.0/apache-hugegraph-hubble-incubating-1.0.0/lib/hubble-be-1.0.0.jar started by $USER in ~/apache-hugegraph-toolchain-incubating-1.0.0/apache-hugegraph-hubble-incubating-1.0.0)
...
2023-08-30 20:38:38 [main] [INFO ] c.z.h.HikariDataSource [] - hugegraph-hubble-HikariCP - Start completed.
2023-08-30 20:38:41 [main] [INFO ] o.a.c.h.Http11NioProtocol [] - Starting ProtocolHandler ["http-nio-0.0.0.0-8088"]
2023-08-30 20:38:41 [main] [INFO ] o.a.h.HugeGraphHubble [] - Started HugeGraphHubble in 7.379 seconds (JVM running for 8.499)
Then use a web browser to access ip:8088 and you can see the Hubble page. You can stop the service using bin/stop-hubble.sh.
2.3 Source code compilation
Note: The plugin frontend-maven-plugin has been added to hugegraph-hubble/hubble-be/pom.xml. To compile hubble, you do not need to install Nodejs V16.x and yarn environment in your local environment in advance. You can directly execute the following steps.
Download the toolchain source code.
git clone https://github.com/apache/hugegraph-toolchain.git
Compile hubble. It depends on the loader and client, so you need to build these dependencies in advance during the compilation process (you can skip this step later).
cd hugegraph-toolchain
sudo pip install -r hugegraph-hubble/hubble-dist/assembly/travis/requirements.txt
mvn install -pl hugegraph-client,hugegraph-loader -am -Dmaven.javadoc.skip=true -DskipTests -ntp
cd hugegraph-hubble
mvn -e compile package -Dmaven.javadoc.skip=true -Dmaven.test.skip=true -ntp
cd apache-hugegraph-hubble-incubating*
Run hubble
bin/start-hubble.sh -d
3 Platform Workflows
The module usage process of the platform is as follows:

4 Platform Instructions
4.1 Graph Management
4.1.1 Graph creation
Under the graph management module, click [Create graph], and realize the connection of multiple graphs by filling in the graph ID, graph name, host name, port number, username, and password information.

Create graph by filling in the content as follows:

Special Note: If you are starting
hubblewith Docker, andhubbleand the server are on the same host. When configuring the hostname for the graph on the Hubble web page, please do not directly set it tolocalhost/127.0.0.1. Ifhubbleandserveris in the same docker network, we recommend using thecontainer_name(in our example, it isgraph) as the hostname, and8080as the port. Or you can use the host IP as the hostname, and the port is configured by the host for the server.
4.1.2 Graph Access
Realize the information access to the graph space. After entering, you can perform operations such as multidimensional query analysis, metadata management, data import, and algorithm analysis of the graph.

4.1.3 Graph management
- Users can achieve unified management of graphs through overview, search, and information editing and deletion of single graphs.
- Search range: You can search for the graph name and ID.

4.2 Metadata Modeling (list + graph mode)
4.2.1 Module entry
Left navigation:

4.2.2 Property type
4.2.2.1 Create type
- Fill in or select the attribute name, data type, and cardinality to complete the creation of the attribute.
- Created attributes can be used as attributes of vertex type and edge type.
List mode:

Graph mode:

4.2.2.2 Reuse
- The platform provides the [Reuse] function, which can directly reuse the metadata of other graphs.
- Select the graph ID that needs to be reused, and continue to select the attributes that need to be reused. After that, the platform will check whether there is a conflict. After passing, the metadata can be reused.
Select reuse items:

Check reuse items:

4.2.2.3 Management
- You can delete a single item or delete it in batches in the attribute list.
4.2.3 Vertex type
4.2.3.1 Create type
- Fill in or select the vertex type name, ID strategy, association attribute, primary key attribute, vertex style, content displayed below the vertex in the query result, and index information: including whether to create a type index, and the specific content of the attribute index, complete the vertex Type creation.
List mode:

Graph mode:

4.2.3.2 Reuse
- The multiplexing of vertex types will reuse the attributes and attribute indexes associated with this type together.
- The reuse method is similar to the property reuse, see 3.2.2.2.
4.2.3.3 Administration
- Editing operations are available. The vertex style, association type, vertex display content, and attribute index can be edited, and the rest cannot be edited. 
- You can delete a single item or delete it in batches. 

4.2.4 Edge Types
4.2.4.1 Create
- Fill in or select the edge type name, start point type, end point type, associated attributes, whether to allow multiple connections, edge style, content displayed below the edge in the query result, and index information: including whether to create a type index, and attribute index The specific content, complete the creation of the edge type.
List mode:

Graph mode:

4.2.4.2 Reuse
- The reuse of the edge type will reuse the start point type, end point type, associated attribute and attribute index of this type.
- The reuse method is similar to the property reuse, see 3.2.2.2.
4.2.4.3 Administration
- Editing operations are available. Edge styles, associated attributes, edge display content, and attribute indexes can be edited, and the rest cannot be edited, the same as the vertex type.
- You can delete a single item or delete it in batches.
4.2.5 Index Types
Displays vertex and edge indices for vertex types and edge types.
4.3 Data Import
Note:currently, we recommend to use hugegraph-loader to import data formally. The built-in import of
hubbleis used for testing and getting started.
The usage process of data import is as follows:

4.3.1 Module entrance
Left navigation:

4.3.2 Create task
- Fill in the task name and remarks (optional) to create an import task.
- Multiple import tasks can be created and imported in parallel.

4.3.3 Uploading files
- Upload the file that needs to be composed. The currently supported format is CSV, which will be updated continuously in the future.
- Multiple files can be uploaded at the same time.

4.3.4 Setting up data mapping
- Set up data mapping for uploaded files, including file settings and type settings 
- File settings: Check or fill in whether to include the header, separator, encoding format and other settings of the file itself, all set the default values, no need to fill in manually 
- Type setting: - Vertex map and edge map: - 【Vertex Type】: Select the vertex type, and upload the column data in the file for its ID mapping; - 【Edge Type】: Select the edge type and map the column data of the uploaded file to the ID column of its start point type and end point type; 
- Mapping settings: upload the column data in the file for the attribute mapping of the selected vertex type. Here, if the attribute name is the same as the header name of the file, the mapping attribute can be automatically matched, and there is no need to manually fill in the selection. 
- After completing the setting, the setting list will be displayed before proceeding to the next step. It supports the operations of adding, editing and deleting mappings. 
 
Fill in the settings map:

Mapping list:

4.3.5 Import data
Before importing, you need to fill in the import setting parameters. After filling in, you can start importing data into the gallery.
- Import settings
- The import setting parameter items are as shown in the figure below, all set the default value, no need to fill in manually

- Import details
- Click Start Import to start the file import task
- The import details provide the mapping type, import speed, import progress, time-consuming and the specific status of the current task set for each uploaded file, and can pause, resume, stop and other operations for each task
- If the import fails, you can view the specific reason

4.4 Data Analysis
4.4.1 Module entry
Left navigation:

4.4.2 Multi-graphs switching
By switching the entrance on the left, flexibly switch the operation space of multiple graphs

4.4.3 Graph Analysis and Processing
HugeGraph supports Gremlin, a graph traversal query language of Apache TinkerPop3. Gremlin is a general graph database query language. By entering Gremlin statements and clicking execute, you can perform query and analysis operations on graph data, and create and delete vertices/edges. vertex/edge attribute modification, etc.
After Gremlin query, below is the graph result display area, which provides 3 kinds of graph result display modes: [Graph Mode], [Table Mode], [Json Mode].
Support zoom, center, full screen, export and other operations.
【Picture Mode】

【Table mode】

【Json mode】

4.4.4 Data Details
Click the vertex/edge entity to view the data details of the vertex/edge, including vertex/edge type, vertex ID, attribute and corresponding value, expand the information display dimension of the graph, and improve the usability.
4.4.5 Multidimensional Path Query of Graph Results
In addition to the global query, an in-depth customized query and hidden operations can be performed for the vertices in the query result to realize customized mining of graph results.
Right-click a vertex, and the menu entry of the vertex appears, which can be displayed, inquired, hidden, etc.
- Expand: Click to display the vertices associated with the selected point.
- Query: By selecting the edge type and edge direction associated with the selected point, and then selecting its attributes and corresponding filtering rules under this condition, a customized path display can be realized.
- Hide: When clicked, hides the selected point and its associated edges.
Double-clicking a vertex also displays the vertex associated with the selected point.

4.4.6 Add vertex/edge
4.4.6.1 Added vertex
In the graph area, two entries can be used to dynamically add vertices, as follows:
- Click on the graph area panel, the Add Vertex entry appears
- Click the first icon in the action bar in the upper right corner
Complete the addition of vertices by selecting or filling in the vertex type, ID value, and attribute information.
The entry is as follows:

Add the vertex content as follows:

4.4.6.2 Add edge
Right-click a vertex in the graph result to add the outgoing or incoming edge of that point.
4.4.7 Execute the query of records and favorites
- Record each query record at the bottom of the graph area, including: query time, execution type, content, status, time-consuming, as well as [collection] and [load] operations, to achieve a comprehensive record of graph execution, with traces to follow, and Can quickly load and reuse execution content
- Provides the function of collecting sentences, which can be used to collect frequently used sentences, which is convenient for fast calling of high-frequency sentences.

4.5 Task Management
4.5.1 Module entry
Left navigation:

4.5.2 Task Management
- Provide unified management and result viewing of asynchronous tasks. There are 4 types of asynchronous tasks, namely:
- gremlin: Gremlin tasks
- algorithm: OLAP algorithm task
- remove_schema: remove metadata
- rebuild_index: rebuild the index
- The list displays the asynchronous task information of the current graph, including task ID, task name, task type, creation time, time-consuming, status, operation, and realizes the management of asynchronous tasks.
- Support filtering by task type and status
- Support searching for task ID and task name
- Asynchronous tasks can be deleted or deleted in batches

4.5.3 Gremlin asynchronous tasks
- Create a task
- The data analysis module currently supports two Gremlin operations, Gremlin query and Gremlin task; if the user switches to the Gremlin task, after clicking execute, an asynchronous task will be created in the asynchronous task center;
- Task submission
- After the task is submitted successfully, the graph area returns the submission result and task ID
- Mission details
- Provide [View] entry, you can jump to the task details to view the specific execution of the current task After jumping to the task center, the currently executing task line will be displayed directly

Click to view the entry to jump to the task management list, as follows:

- View the results
- The results are displayed in the form of JSON
4.5.4 OLAP algorithm tasks
There is no visual OLAP algorithm execution on Hubble. You can call the RESTful API to perform OLAP algorithm tasks, find the corresponding tasks by ID in the task management, and view the progress and results.
4.5.5 Delete metadata, rebuild index
- Create a task
- In the metadata modeling module, when deleting metadata, an asynchronous task for deleting metadata can be created

- When editing an existing vertex/edge type operation, when adding an index, an asynchronous task of creating an index can be created

- Task details
- After confirming/saving, you can jump to the task center to view the details of the current task

2.2 - HugeGraph-Loader Quick Start
1 HugeGraph-Loader Overview
HugeGraph-Loader is the data import component of HugeGraph, which can convert data from various data sources into graph vertices and edges and import them into the graph database in batches.
Currently supported data sources include:
- Local disk file or directory, supports TEXT, CSV and JSON format files, supports compressed files
- HDFS file or directory supports compressed files
- Mainstream relational databases, such as MySQL, PostgreSQL, Oracle, SQL Server
Local disk files and HDFS files support resumable uploads.
It will be explained in detail below.
Note: HugeGraph-Loader requires HugeGraph Server service, please refer to HugeGraph-Server Quick Start to download and start Server
Testing Guide: For running HugeGraph-Loader tests locally, please refer to HugeGraph Toolchain Local Testing Guide
2 Get HugeGraph-Loader
There are two ways to get HugeGraph-Loader:
- Use docker image (Convenient for Test/Dev)
- Download the compiled tarball
- Clone source code then compile and install
2.1 Use Docker image (Convenient for Test/Dev)
We can deploy the loader service using docker run -itd --name loader hugegraph/loader:1.5.0. For the data that needs to be loaded, it can be copied into the loader container either by mounting -v /path/to/data/file:/loader/file or by using docker cp.
Alternatively, to start the loader using docker-compose, the command is docker-compose up -d. An example of the docker-compose.yml is as follows:
version: '3'
services:
  server:
    image: hugegraph/hugegraph:1.3.0
    container_name: server
    ports:
      - 8080:8080
  loader:
    image: hugegraph/loader:1.3.0
    container_name: loader
    # mount your own data here
    # volumes:
      # - /path/to/data/file:/loader/file
The specific data loading process can be referenced under 4.5 User Docker to load data
Note:
The docker image of hugegraph-loader is a convenience release to start hugegraph-loader quickly, but not official distribution artifacts. You can find more details from ASF Release Distribution Policy.
Recommend to use
release tag(like1.5.0) for the stable version. Uselatesttag to experience the newest functions in development.
2.2 Download the compiled archive
Download the latest version of the HugeGraph-Toolchain release package:
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-toolchain-incubating-{version}.tar.gz
tar zxf *hugegraph*.tar.gz
2.3 Clone source code to compile and install
Clone the latest version of HugeGraph-Loader source package:
# 1. get from github
git clone https://github.com/apache/hugegraph-toolchain.git
# 2. get from direct  (e.g. here is 1.0.0, please choose the latest version)
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-toolchain-incubating-{version}-src.tar.gz
click to fold/collapse hwo to install ojdbc
Due to the license limitation of the Oracle OJDBC, you need to manually install ojdbc to the local maven repository.
Visit the Oracle jdbc downloads page. Select Oracle Database 12c Release 2 (12.2.0.1) drivers, as shown in the following figure.
After opening the link, select “ojdbc8.jar”.
Install ojdbc8 to the local maven repository, enter the directory where ojdbc8.jar is located, and execute the following command.
mvn install:install-file -Dfile=./ojdbc8.jar -DgroupId=com.oracle -DartifactId=ojdbc8 -Dversion=12.2.0.1 -Dpackaging=jar
Compile and generate tar package:
cd hugegraph-loader
mvn clean package -DskipTests
3 How to use
The basic process of using HugeGraph-Loader is divided into the following steps:
- Write graph schema
- Prepare data files
- Write input source map files
- Execute command import
3.1 Construct graph schema
This step is the modeling process. Users need to have a clear idea of their existing data and the graph model they want to create, and then write the schema to build the graph model.
For example, if you want to create a graph with two types of vertices and two types of edges, the vertices are “people” and “software”, the edges are “people know people” and “people create software”, and these vertices and edges have some attributes, For example, the vertex “person” has: “name”, “age” and other attributes, “Software” includes: “name”, “sale price” and other attributes; side “knowledge” includes: “date” attribute and so on.

graph model example
After designing the graph model, we can use groovy to write the definition of schema and save it to a file, here named schema.groovy.
// Create some properties
schema.propertyKey("name").asText().ifNotExist().create();
schema.propertyKey("age").asInt().ifNotExist().create();
schema.propertyKey("city").asText().ifNotExist().create();
schema.propertyKey("date").asText().ifNotExist().create();
schema.propertyKey("price").asDouble().ifNotExist().create();
// Create the person vertex type, which has three attributes: name, age, city, and the primary key is name
schema.vertexLabel("person").properties("name", "age", "city").primaryKeys("name").ifNotExist().create();
// Create a software vertex type, which has two properties: name, price, the primary key is name
schema.vertexLabel("software").properties("name", "price").primaryKeys("name").ifNotExist().create();
// Create the knows edge type, which goes from person to person
schema.edgeLabel("knows").sourceLabel("person").targetLabel("person").ifNotExist().create();
// Create the created edge type, which points from person to software
schema.edgeLabel("created").sourceLabel("person").targetLabel("software").ifNotExist().create();
Please refer to the corresponding section in hugegraph-client for the detailed description of the schema.
3.2 Prepare data
The data sources currently supported by HugeGraph-Loader include:
- local disk file or directory
- HDFS file or directory
- Partial relational database
- Kafka topic
3.2.1 Data source structure
3.2.1.1 Local disk file or directory
The user can specify a local disk file as the data source. If the data is scattered in multiple files, a certain directory is also supported as the data source, but multiple directories are not supported as the data source for the time being.
For example, my data is scattered in multiple files, part-0, part-1 … part-n. To perform the import, it must be ensured that they are placed in one directory. Then in the loader’s mapping file, specify path as the directory.
Supported file formats include:
- TEXT
- CSV
- JSON
TEXT is a text file with custom delimiters, the first line is usually the header, and the name of each column is recorded, and no header line is allowed (specified in the mapping file). Each remaining row represents a record, which will be converted into a vertex/edge; each column of the row corresponds to a field, which will be converted into the id, label or attribute of the vertex/edge;
An example is as follows:
id|name|lang|price|ISBN
1|lop|java|328|ISBN978-7-107-18618-5
2|ripple|java|199|ISBN978-7-100-13678-5
CSV is a TEXT file with commas , as delimiters. When a column value itself contains a comma, the column value needs to be enclosed in double quotes, for example:
marko,29,Beijing
"li,nary",26,"Wu,han"
The JSON file requires that each line is a JSON string, and the format of each line needs to be consistent.
{"source_name": "marko", "target_name": "vadas", "date": "20160110", "weight": 0.5}
{"source_name": "marko", "target_name": "josh", "date": "20130220", "weight": 1.0}
3.2.1.2 HDFS file or directory
Users can also specify HDFS files or directories as data sources, all of the above requirements for local disk files or directories apply here. In addition, since HDFS usually stores compressed files, loader also provides support for compressed files, and local disk file or directory also supports compressed files.
Currently supported compressed file types include: GZIP, BZ2, XZ, LZMA, SNAPPY_RAW, SNAPPY_FRAMED, Z, DEFLATE, LZ4_BLOCK, LZ4_FRAMED, ORC, and PARQUET.
3.2.1.3 Mainstream relational database
The loader also supports some relational databases as data sources, and currently supports MySQL, PostgreSQL, Oracle, and SQL Server.
However, the requirements for the table structure are relatively strict at present. If association query needs to be done during the import process, such a table structure is not allowed. The associated query means: after reading a row of the table, it is found that the value of a certain column cannot be used directly (such as a foreign key), and you need to do another query to determine the true value of the column.
For example, Suppose there are three tables, person, software and created
// person schema
id | name | age | city
// software schema
id | name | lang | price
// created schema
id | p_id | s_id | date
If the id strategy of person or software is specified as PRIMARY_KEY when modeling (schema), choose name as the primary key (note: this is the concept of vertex-label in hugegraph), when importing edge data, the source vertex and target need to be spliced out. For the id of the vertex, you must go to the person/software table with p_id/s_id to find the corresponding name. In the case of the schema that requires additional query, the loader does not support it temporarily. In this case, the following two methods can be used instead:
- The id strategy of person and software is still specified as PRIMARY_KEY, but the id column of the person table and software table is used as the primary key attribute of the vertex, so that the id can be generated by directly splicing p_id and s_id with the label of the vertex when importing an edge;
- Specify the id policy of person and software as CUSTOMIZE, and then directly use the id column of the person table and the software table as the vertex id, so that p_id and s_id can be used directly when importing edges;
The key point is to make the edge use p_id and s_id directly, don’t check it again.
3.2.2 Prepare vertex and edge data
3.2.2.1 Vertex Data
The vertex data file consists of data line by line. Generally, each line is used as a vertex, and each column is used as a vertex attribute. The following description uses CSV format as an example.
- person vertex data (the data itself does not contain a header)
Tom,48,Beijing
Jerry,36,Shanghai
- software vertex data (the data itself contains the header)
name,price
Photoshop,999
Office,388
3.2.2.2 Edge data
The edge data file consists of data line by line. Generally, each line is used as an edge. Some columns are used as the IDs of the source and target vertices, and other columns are used as edge attributes. The following uses JSON format as an example.
- knows edge data
{"source_name": "Tom", "target_name": "Jerry", "date": "2008-12-12"}
- created edge data
{"source_name": "Tom", "target_name": "Photoshop"}
{"source_name": "Tom", "target_name": "Office"}
{"source_name": "Jerry", "target_name": "Office"}
3.3 Write data source mapping file
3.3.1 Mapping file overview
The mapping file of the input source is used to describe how to establish the mapping relationship between the input source data and the vertex type/edge type of the graph. It is organized in JSON format and consists of multiple mapping blocks, each of which is responsible for mapping an input source. Mapped to vertices and edges.
Specifically, each mapping block contains an input source and multiple vertex mapping and edge mapping blocks, and the input source block corresponds to the local disk file or directory, HDFS file or directory and relational database are responsible for describing the basic information of the data source, such as where the data is, what format, what is the delimiter, etc. The vertex map/edge map is bound to the input source, which columns of the input source can be selected, which columns are used as ids, which columns are used as attributes, and what attributes are mapped to each column, the values of the columns are mapped to what values of attributes, and so on.
In the simplest terms, each mapping block describes: where is the file to be imported, which type of vertices/edges each line of the file is to be used as which columns of the file need to be imported, and the corresponding vertices/edges of these columns. what properties, etc.
Note: The format of the mapping file before version 0.11.0 and the format after 0.11.0 has changed greatly. For the convenience of expression, the mapping file (format) before 0.11.0 is called version 1.0, and the version after 0.11.0 is version 2.0. And unless otherwise specified, the “map file” refers to version 2.0.
Click to expand/collapse the skeleton of the map file for version 2.0
{
  "version": "2.0",
  "structs": [
    {
      "id": "1",
      "input": {
      },
      "vertices": [
        {},
        {}
      ],
      "edges": [
        {},
        {}
      ]
    }
  ]
}
Two versions of the mapping file are given directly here (the above graph model and data file are described)
Click to expand/collapse the mapping file for version 2.0
{
  "version": "2.0",
  "structs": [
    {
      "id": "1",
      "skip": false,
      "input": {
        "type": "FILE",
        "path": "vertex_person.csv",
        "file_filter": {
          "extensions": [
            "*"
          ]
        },
        "format": "CSV",
        "delimiter": ",",
        "date_format": "yyyy-MM-dd HH:mm:ss",
        "time_zone": "GMT+8",
        "skipped_line": {
          "regex": "(^#|^//).*|"
        },
        "compression": "NONE",
        "header": [
          "name",
          "age",
          "city"
        ],
        "charset": "UTF-8",
        "list_format": {
          "start_symbol": "[",
          "elem_delimiter": "|",
          "end_symbol": "]"
        }
      },
      "vertices": [
        {
          "label": "person",
          "skip": false,
          "id": null,
          "unfold": false,
          "field_mapping": {},
          "value_mapping": {},
          "selected": [],
          "ignored": [],
          "null_values": [
            ""
          ],
          "update_strategies": {}
        }
      ],
      "edges": []
    },
    {
      "id": "2",
      "skip": false,
      "input": {
        "type": "FILE",
        "path": "vertex_software.csv",
        "file_filter": {
          "extensions": [
            "*"
          ]
        },
        "format": "CSV",
        "delimiter": ",",
        "date_format": "yyyy-MM-dd HH:mm:ss",
        "time_zone": "GMT+8",
        "skipped_line": {
          "regex": "(^#|^//).*|"
        },
        "compression": "NONE",
        "header": null,
        "charset": "UTF-8",
        "list_format": {
          "start_symbol": "",
          "elem_delimiter": ",",
          "end_symbol": ""
        }
      },
      "vertices": [
        {
          "label": "software",
          "skip": false,
          "id": null,
          "unfold": false,
          "field_mapping": {},
          "value_mapping": {},
          "selected": [],
          "ignored": [],
          "null_values": [
            ""
          ],
          "update_strategies": {}
        }
      ],
      "edges": []
    },
    {
      "id": "3",
      "skip": false,
      "input": {
        "type": "FILE",
        "path": "edge_knows.json",
        "file_filter": {
          "extensions": [
            "*"
          ]
        },
        "format": "JSON",
        "delimiter": null,
        "date_format": "yyyy-MM-dd HH:mm:ss",
        "time_zone": "GMT+8",
        "skipped_line": {
          "regex": "(^#|^//).*|"
        },
        "compression": "NONE",
        "header": null,
        "charset": "UTF-8",
        "list_format": null
      },
      "vertices": [],
      "edges": [
        {
          "label": "knows",
          "skip": false,
          "source": [
            "source_name"
          ],
          "unfold_source": false,
          "target": [
            "target_name"
          ],
          "unfold_target": false,
          "field_mapping": {
            "source_name": "name",
            "target_name": "name"
          },
          "value_mapping": {},
          "selected": [],
          "ignored": [],
          "null_values": [
            ""
          ],
          "update_strategies": {}
        }
      ]
    },
    {
      "id": "4",
      "skip": false,
      "input": {
        "type": "FILE",
        "path": "edge_created.json",
        "file_filter": {
          "extensions": [
            "*"
          ]
        },
        "format": "JSON",
        "delimiter": null,
        "date_format": "yyyy-MM-dd HH:mm:ss",
        "time_zone": "GMT+8",
        "skipped_line": {
          "regex": "(^#|^//).*|"
        },
        "compression": "NONE",
        "header": null,
        "charset": "UTF-8",
        "list_format": null
      },
      "vertices": [],
      "edges": [
        {
          "label": "created",
          "skip": false,
          "source": [
            "source_name"
          ],
          "unfold_source": false,
          "target": [
            "target_name"
          ],
          "unfold_target": false,
          "field_mapping": {
            "source_name": "name",
            "target_name": "name"
          },
          "value_mapping": {},
          "selected": [],
          "ignored": [],
          "null_values": [
            ""
          ],
          "update_strategies": {}
        }
      ]
    }
  ]
}
Click to expand/collapse the mapping file for version 1.0
{
  "vertices": [
    {
      "label": "person",
      "input": {
        "type": "file",
        "path": "vertex_person.csv",
        "format": "CSV",
        "header": ["name", "age", "city"],
        "charset": "UTF-8"
      }
    },
    {
      "label": "software",
      "input": {
        "type": "file",
        "path": "vertex_software.csv",
        "format": "CSV"
      }
    }
  ],
  "edges": [
    {
      "label": "knows",
      "source": ["source_name"],
      "target": ["target_name"],
      "input": {
        "type": "file",
        "path": "edge_knows.json",
        "format": "JSON"
      },
      "field_mapping": {
        "source_name": "name",
        "target_name": "name"
      }
    },
    {
      "label": "created",
      "source": ["source_name"],
      "target": ["target_name"],
      "input": {
        "type": "file",
        "path": "edge_created.json",
        "format": "JSON"
      },
      "field_mapping": {
        "source_name": "name",
        "target_name": "name"
      }
    }
  ]
}
The 1.0 version of the mapping file is centered on the vertex and edge, and sets the input source; while the 2.0 version is centered on the input source, and sets the vertex and edge mapping. Some input sources (such as a file) can generate both vertices and edges. If you write in the 1.0 format, you need to write an input block in each of the vertex and edge mapping blocks. The two input blocks are exactly the same; and the 2.0 version only needs to write input once. Therefore, compared with version 1.0, version 2.0 can save some repetitive writing of input.
In the bin directory of hugegraph-loader-{version}, there is a script tool mapping-convert.sh that can directly convert the mapping file of version 1.0 to version 2.0. The usage is as follows:
bin/mapping-convert.sh struct.json
A struct-v2.json will be generated in the same directory as struct.json.
3.3.2 Input Source
Input sources are currently divided into four categories: FILE, HDFS, JDBC and KAFKA, which are distinguished by the type node. We call them local file input sources, HDFS input sources, JDBC input sources, and KAFKA input sources, which are described below.
3.3.2.1 Local file input source
- id: The id of the input source. This field is used to support some internal functions. It is not required (it will be automatically generated if it is not filled in). It is strongly recommended to write it, which is very helpful for debugging;
- skip: whether to skip the input source, because the JSON file cannot add comments, if you do not want to import an input source during a certain import, but do not want to delete the configuration of the input source, you can set it to true to skip it, the default is false, not required;
- input: input source map block, composite structure- type: an input source type, file or FILE must be filled;
- path: the path of the local file or directory, the absolute path or the relative path relative to the mapping file, it is recommended to use the absolute path, required;
- file_filter: filter files with compound conditions from path, compound structure, currently only supports configuration extensions, represented by child nodeextensions, the default is “*”, which means to keep all files;
- format: the format of the local file, the optional values are CSV, TEXT and JSON, which must be uppercase and required;
- header: the column name of each column of the file, if not specified, the first line of the data file will be used as the header; when the file itself has a header and the header is specified, the first line of the file will be treated as a normal data line; JSON The file does not need to specify a header, optional;
- delimiter: The column delimiter of the file line, the default is comma ","as the delimiter, theJSONfile does not need to be specified, optional;
- charset: the encoded character set of the file, the default is UTF-8, optional;
- date_format: custom date format, the default value is yyyy-MM-dd HH:mm:ss, optional; if the date is presented in the form of a timestamp, this item must be written as timestamp(fixed writing);
- time_zone: Set which time zone the date data is in, the default value is GMT+8, optional;
- skipped_line: The line to be skipped, compound structure, currently only the regular expression of the line to be skipped can be configured, described by the child node regex, no line is skipped by default, optional;
- compression: The compression format of the file, the optional values are NONE, GZIP, BZ2, XZ, LZMA, SNAPPY_RAW, SNAPPY_FRAMED, Z, DEFLATE, LZ4_BLOCK, LZ4_FRAMED, ORC and PARQUET, the default is NONE, which means a non-compressed file, optional;
- list_format: When a column of the file (non-JSON) is a collection structure (the Cardinality of the PropertyKey in the corresponding figure is Set or List), you can use this item to set the start character, separator, and end character of the column, compound structure :- start_symbol: The start character of the collection structure column (the default value is [, JSON format currently does not support specification)
- elem_delimiter: the delimiter of the collection structure column (the default value is |, JSON format currently only supports native,delimiter)
- end_symbol: the end character of the collection structure column (the default value is ], the JSON format does not currently support specification)
 
- start_symbol: The start character of the collection structure column (the default value is 
 
3.3.2.2 HDFS input source
The nodes and meanings of the above local file input source are basically applicable here. Only the different and unique nodes of the HDFS input source are listed below.
- type: input source type, must fill in hdfs or HDFS, required;
- path: the path of the HDFS file or directory, it must be the absolute path of HDFS, required;
- core_site_path: the path of the core-site.xml file of the HDFS cluster, the key point is to specify the address of the NameNode (fs.default.name) and the implementation of the file system (fs.hdfs.impl);
3.3.2.3 JDBC input source
As mentioned above, it supports multiple relational databases, but because their mapping structures are very similar, they are collectively referred to as JDBC input sources, and then use the vendor node to distinguish different databases.
- type: input source type, must fill in jdbc or JDBC, required;
- vendor: database type, optional options are [MySQL, PostgreSQL, Oracle, SQLServer], case-insensitive, required;
- driver: the type of driver used by jdbc, required;
- url: the url of the database that jdbc wants to connect to, required;
- database: the name of the database to be connected, required;
- schema: The name of the schema to be connected, different databases have different requirements, and the details are explained below;
- table: the name of the table to be connected, at least one of tableorcustom_sqlis required;
- custom_sql: custom SQL statement, at least one of tableorcustom_sqlis required;
- username: username to connect to the database, required;
- password: password for connecting to the database, required;
- batch_size: The size of one page when obtaining table data by page, the default is 500, optional;
MYSQL
| Node | Fixed value or common value | 
|---|---|
| vendor | MYSQL | 
| driver | com.mysql.cj.jdbc.Driver | 
| url | jdbc:mysql://127.0.0.1:3306 | 
schema: nullable, if filled in, it must be the same as the value of database
POSTGRESQL
| Node | Fixed value or common value | 
|---|---|
| vendor | POSTGRESQL | 
| driver | org.postgresql.Driver | 
| url | jdbc:postgresql://127.0.0.1:5432 | 
schema: nullable, default is “public”
ORACLE
| Node | Fixed value or common value | 
|---|---|
| vendor | ORACLE | 
| driver | oracle.jdbc.driver.OracleDriver | 
| url | jdbc:oracle:thin:@127.0.0.1:1521 | 
schema: nullable, the default value is the same as the username
SQLSERVER
| Node | Fixed value or common value | 
|---|---|
| vendor | SQLSERVER | 
| driver | com.microsoft.sqlserver.jdbc.SQLServerDriver | 
| url | jdbc:sqlserver://127.0.0.1:1433 | 
schema: required
3.3.2.4 Kafka input source
- type: input source type, kafkaorKAFKA, required;
- bootstrap_server: set the list of kafka bootstrap servers;
- topic: the topic to subscribe to;
- group: group of Kafka consumers;
- from_beginning: set whether to read from the beginning;
- format: format of the local file, options are CSV, TEXT and JSON, must be uppercase, required;
- header: column name of each column of the file, if not specified, the first line of the data file will be used as the header; when the file itself has a header and the header is specified, the first line of the file will be treated as an ordinary data line; JSON files do not need to specify the header, optional;
- delimiter: delimiter of the file line, default is comma “,” as delimiter, JSON files do not need to specify, optional;
- charset: encoding charset of the file, default is UTF-8, optional;
- date_format: customized date format, default value is yyyy-MM-dd HH:mm:ss, optional; if the date is presented in the form of timestamp, this item must be written as timestamp (fixed);
- extra_date_formats: a customized list of another date formats, empty by default, optional; each item in the list is an alternate date format to the date_format specified date format;
- time_zone: set which time zone the date data is in, default is GMT+8, optional;
- skipped_line: the line you want to skip, composite structure, currently can only configure the regular expression of the line to be skipped, described by the child node regex, the default is not to skip any line, optional;
- early_stop: the record pulled from Kafka broker at a certain time is empty, stop the task, default is false, only for debugging, optional;
3.3.3 Vertex and Edge Mapping
The nodes of vertex and edge mapping (a key in the JSON file) have a lot of the same parts. The same parts are introduced first, and then the unique nodes of vertex map and edge map are introduced respectively.
Nodes of the same section
- label: labelto which the vertex/edge data to be imported belongs, required;
- field_mapping: Map the column name of the input source column to the attribute name of the vertex/edge, optional;
- value_mapping: map the data value of the input source to the attribute value of the vertex/edge, optional;
- selected: select some columns to insert, other unselected ones are not inserted, cannot exist at the same time as ignored, optional;
- ignored: ignore some columns so that they do not participate in insertion, cannot exist at the same time as selected, optional;
- null_values: You can specify some strings to represent null values, such as “NULL”. If the vertex/edge attribute corresponding to this column is also a nullable attribute, the value of this attribute will not be set when constructing the vertex/edge, optional ;
- update_strategies: If the data needs to be updated in batches in a specific way, you can specify a specific update strategy for each attribute (see below for details), optional;
- unfold: Whether to unfold the column, each unfolded column will form a row with other columns, which is equivalent to unfolding into multiple rows; for example, the value of a certain column (id column) of the file is [1,2,3], The values of other columns are18,Beijing. When unfold is set, this row will become 3 rows, namely:1,18,Beijing,2,18,Beijingand3,18, Beijing. Note that this will only expand the column selected as id. Default false, optional;
Update strategy supports 8 types: (requires all uppercase)
- Value accumulation: SUM
- Take the greater of the two numbers/dates: BIGGER
- Take the smaller of two numbers/dates: SMALLER
- Set property takes union: UNION
- Set attribute intersection: INTERSECTION
- List attribute append element: APPEND
- List/Set attribute delete element: ELIMINATE
- Override an existing property: OVERRIDE
Note: If the newly imported attribute value is empty, the existing old data will be used instead of the empty value. For the effect, please refer to the following example
// The update strategy is specified in the JSON file as follows
{
  "vertices": [
    {
      "label": "person",
      "update_strategies": {
        "age": "SMALLER",
        "set": "UNION"
      },
      "input": {
        "type": "file",
        "path": "vertex_person.txt",
        "format": "TEXT",
        "header": ["name", "age", "set"]
      }
    }
  ]
}
// 1. Write a line of data with the OVERRIDE update strategy (null means empty here)
'a b null null'
// 2. Write another line
'null null c d'
// 3. Finally we can get
'a b c d'   
// If there is no update strategy, you will get
'null null c d'
Note : After adopting the batch update strategy, the number of disk read requests will increase significantly, and the import speed will be several times slower than that of pure write coverage (at this time HDD disk [IOPS](https://en.wikipedia .org/wiki/IOPS) will be the bottleneck, SSD is recommended for speed)
Unique Nodes for Vertex Maps
- id: Specify a column as the id column of the vertex. When the vertex id policy is CUSTOMIZE, it is required; when the id policy isPRIMARY_KEY, it must be empty;
Unique Nodes for Edge Maps
- source: Select certain columns of the input source as the id column of source vertex. When the id policy of the source vertex is CUSTOMIZE, a certain column must be specified as the id column of the vertex; when the id policy of the source vertex isWhen PRIMARY_KEY, one or more columns must be specified for splicing the id of the generated vertex, that is, no matter which id strategy is used, this item is required;
- target: Specify certain columns as the id columns of target vertex, similar to source, so I won’t repeat them;
- unfold_source: Whether to unfold the source column of the file, the effect is similar to that in the vertex map, and will not be repeated;
- unfold_target: Whether to unfold the target column of the file, the effect is similar to that in the vertex mapping, and will not be repeated;
3.4 Execute command import
After preparing the graph model, data file, and input source mapping relationship file, the data file can be imported into the graph database.
The import process is controlled by commands submitted by the user, and the user can control the specific process of execution through different parameters.
3.4.1 Parameter description
| Parameter | Default value | Required or not | Description | 
|---|---|---|---|
| -for--file | Y | path to configure script | |
| -gor--graph | Y | graph space name | |
| -sor--schema | Y | schema file path | |
| -hor--host | localhost | address of HugeGraphServer | |
| -por--port | 8080 | port number of HugeGraphServer | |
| --username | null | When HugeGraphServer enables permission authentication, the username of the current graph | |
| --token | null | When HugeGraphServer has enabled authorization authentication, the token of the current graph | |
| --protocol | http | Protocol for sending requests to the server, optional http or https | |
| --trust-store-file | When the request protocol is https, the client’s certificate file path | ||
| --trust-store-password | When the request protocol is https, the client certificate password | ||
| --clear-all-data | false | Whether to clear the original data on the server before importing data | |
| --clear-timeout | 240 | Timeout for clearing the original data on the server before importing data | |
| --incremental-mode | false | Whether to use the breakpoint resume mode, only the input source is FILE and HDFS support this mode, enabling this mode can start the import from the place where the last import stopped | |
| --failure-mode | false | When the failure mode is true, the data that failed before will be imported. Generally speaking, the failed data file needs to be manually corrected and edited, and then imported again | |
| --batch-insert-threads | CPUs | Batch insert thread pool size (CPUs is the number of logical cores available to the current OS) | |
| --single-insert-threads | 8 | Size of single insert thread pool | |
| --max-conn | 4 * CPUs | The maximum number of HTTP connections between HugeClient and HugeGraphServer, it is recommended to adjust this when adjusting threads | |
| --max-conn-per-route | 2 * CPUs | The maximum number of HTTP connections for each route between HugeClient and HugeGraphServer, it is recommended to adjust this item at the same time when adjusting the thread | |
| --batch-size | 500 | The number of data items in each batch when importing data | |
| --max-parse-errors | 1 | The maximum number of lines of data parsing errors allowed, and the program exits when this value is reached | |
| --max-insert-errors | 500 | The maximum number of rows of data insertion errors allowed, and the program exits when this value is reached | |
| --timeout | 60 | Timeout (seconds) for inserting results to return | |
| --shutdown-timeout | 10 | Waiting time for multithreading to stop (seconds) | |
| --retry-times | 0 | Number of retries when a specific exception occurs | |
| --retry-interval | 10 | interval before retry (seconds) | |
| --check-vertex | false | Whether to check whether the vertex connected by the edge exists when inserting the edge | |
| --print-progress | true | Whether to print the number of imported items in the console in real time | |
| --dry-run | false | Turn on this mode, only parsing but not importing, usually used for testing | |
| --help | false | print help information | 
3.4.2 Breakpoint Continuation Mode
Usually, the Loader task takes a long time to execute. If the import interrupt process exits for some reason, and next time you want to continue the import from the interrupted point, this is the scenario of using breakpoint continuation.
The user sets the command line parameter –incremental-mode to true to open the breakpoint resume mode. The key to breakpoint continuation lies in the progress file. When the import process exits, the import progress at the time of exit will be recorded.
Recorded in the progress file, the progress file is located in the ${struct} directory, the file name is like load-progress ${date}, ${struct} is the prefix of the mapping file, and ${date} is the start of the import
moment. For example, for an import task started at 2019-10-10 12:30:30, the mapping file used is struct-example.json, then the path of the progress file is the same as struct-example.json
Sibling struct-example/load-progress 2019-10-10 12:30:30.
Note: The generation of progress files is independent of whether –incremental-mode is turned on or not, and a progress file is generated at the end of each import.
If the data file formats are all legal and the import task is stopped by the user (CTRL + C or kill, kill -9 is not supported), that is to say, if there is no error record, the next import only needs to be set to Continue for the breakpoint.
But if the limit of –max-parse-errors or –max-insert-errors is reached because too much data is invalid or network abnormality is reached, Loader will record these original rows that failed to insert into In the failed file, after the user modifies the data lines in the failed file, set –reload-failure to true to import these “failed files” as input sources (does not affect the normal file import), Of course, if there is still a problem with the modified data line, it will be logged again to the failure file (don’t worry about duplicate lines).
Each vertex map or edge map will generate its own failure file when data insertion fails. The failure file is divided into a parsing failure file (suffix .parse-error) and an insertion failure file (suffix .insert-error).
They are stored in the ${struct}/current directory. For example, there is a vertex mapping person, and an edge mapping knows in the mapping file, each of which has some error lines. When the Loader exits, you will see the following files in the ${struct}/current directory:
- person-b4cd32ab.parse-error: Vertex map person parses wrong data
- person-b4cd32ab.insert-error: Vertex map person inserts wrong data
- knows-eb6b2bac.parse-error: edge map knows parses wrong data
- knows-eb6b2bac.insert-error: edge map knows inserts wrong data
.parse-error and .insert-error do not always exist together. Only lines with parsing errors will have .parse-error files, and only lines with insertion errors will have .insert-error files.
3.4.3 logs directory file description
The log and error data during program execution will be written into the hugegraph-loader.log file.
3.4.4 Execute command
Run bin/hugegraph-loader and pass in parameters
bin/hugegraph-loader -g {GRAPH_NAME} -f ${INPUT_DESC_FILE} -s ${SCHEMA_FILE} -h {HOST} -p {PORT}
4 Complete example
Given below is an example in the example directory of the hugegraph-loader package. (GitHub address)
4.1 Prepare data
Vertex file: example/file/vertex_person.csv
marko,29,Beijing
vadas,27,Hongkong
josh,32,Beijing
peter,35,Shanghai
"li,nary",26,"Wu,han"
tom,null,NULL
Vertex file: example/file/vertex_software.txt
id|name|lang|price|ISBN
1|lop|java|328|ISBN978-7-107-18618-5
2|ripple|java|199|ISBN978-7-100-13678-5
Edge file: example/file/edge_knows.json
{"source_name": "marko", "target_name": "vadas", "date": "20160110", "weight": 0.5}
{"source_name": "marko", "target_name": "josh", "date": "20130220", "weight": 1.0}
Edge file: example/file/edge_created.json
{"aname": "marko", "bname": "lop", "date": "20171210", "weight": 0.4}
{"aname": "josh", "bname": "lop", "date": "20091111", "weight": 0.4}
{"aname": "josh", "bname": "ripple", "date": "20171210", "weight": 1.0}
{"aname": "peter", "bname": "lop", "date": "20170324", "weight": 0.2}
4.2 Write schema
Click to expand/collapse the schema file: example/file/schema.groovy
schema.propertyKey("name").asText().ifNotExist().create();
schema.propertyKey("age").asInt().ifNotExist().create();
schema.propertyKey("city").asText().ifNotExist().create();
schema.propertyKey("weight").asDouble().ifNotExist().create();
schema.propertyKey("lang").asText().ifNotExist().create();
schema.propertyKey("date").asText().ifNotExist().create();
schema.propertyKey("price").asDouble().ifNotExist().create();
schema.vertexLabel("person").properties("name", "age", "city").primaryKeys("name").ifNotExist().create();
schema.vertexLabel("software").properties("name", "lang", "price").primaryKeys("name").ifNotExist().create();
schema.indexLabel("personByAge").onV("person").by("age").range().ifNotExist().create();
schema.indexLabel("personByCity").onV("person").by("city").secondary().ifNotExist().create();
schema.indexLabel("personByAgeAndCity").onV("person").by("age", "city").secondary().ifNotExist().create();
schema.indexLabel("softwareByPrice").onV("software").by("price").range().ifNotExist().create();
schema.edgeLabel("knows").sourceLabel("person").targetLabel("person").properties("date", "weight").ifNotExist().create();
schema.edgeLabel("created").sourceLabel("person").targetLabel("software").properties("date", "weight").ifNotExist().create();
schema.indexLabel("createdByDate").onE("created").by("date").secondary().ifNotExist().create();
schema.indexLabel("createdByWeight").onE("created").by("weight").range().ifNotExist().create();
schema.indexLabel("knowsByWeight").onE("knows").by("weight").range().ifNotExist().create();
4.3 Write the input source mapping file example/file/struct.json
Click to expand/collapse the input source mapping file example/file/struct.json
{
  "vertices": [
    {
      "label": "person",
      "input": {
        "type": "file",
        "path": "example/file/vertex_person.csv",
        "format": "CSV",
        "header": ["name", "age", "city"],
        "charset": "UTF-8",
        "skipped_line": {
          "regex": "(^#|^//).*"
        }
      },
      "null_values": ["NULL", "null", ""]
    },
    {
      "label": "software",
      "input": {
        "type": "file",
        "path": "example/file/vertex_software.txt",
        "format": "TEXT",
        "delimiter": "|",
        "charset": "GBK"
      },
      "id": "id",
      "ignored": ["ISBN"]
    }
  ],
  "edges": [
    {
      "label": "knows",
      "source": ["source_name"],
      "target": ["target_name"],
      "input": {
        "type": "file",
        "path": "example/file/edge_knows.json",
        "format": "JSON",
        "date_format": "yyyyMMdd"
      },
      "field_mapping": {
        "source_name": "name",
        "target_name": "name"
      }
    },
    {
      "label": "created",
      "source": ["source_name"],
      "target": ["target_id"],
      "input": {
        "type": "file",
        "path": "example/file/edge_created.json",
        "format": "JSON",
        "date_format": "yyyy-MM-dd"
      },
      "field_mapping": {
        "source_name": "name"
      }
    }
  ]
}
4.4 Command to import
sh bin/hugegraph-loader.sh -g hugegraph -f example/file/struct.json -s example/file/schema.groovy
After the import is complete, statistics similar to the following will appear:
vertices/edges has been loaded this time : 8/6
--------------------------------------------------
count metrics
     input read success            : 14
     input read failure            : 0
     vertex parse success          : 8
     vertex parse failure          : 0
     vertex insert success         : 8
     vertex insert failure         : 0
     edge parse success            : 6
     edge parse failure            : 0
     edge insert success           : 6
     edge insert failure           : 0
4.5 Use Docker to load data
4.5.1 Use docker exec to load data directly
4.5.1.1 Prepare data
If you just want to try out the loader, you can import the built-in example dataset without needing to prepare additional data yourself.
If using custom data, before importing data with the loader, we need to copy the data into the container.
First, following the steps in 4.1–4.3, we can prepare the data and then use docker cp to copy the prepared data into the loader container.
Suppose we’ve prepared the corresponding dataset following the above steps, stored in the hugegraph-dataset folder with the following file structure:
tree -f hugegraph-dataset/
hugegraph-dataset
├── hugegraph-dataset/edge_created.json
├── hugegraph-dataset/edge_knows.json
├── hugegraph-dataset/schema.groovy
├── hugegraph-dataset/struct.json
├── hugegraph-dataset/vertex_person.csv
└── hugegraph-dataset/vertex_software.txt
Copy the files into the container.
docker cp hugegraph-dataset loader:/loader/dataset
docker exec -it loader ls /loader/dataset
edge_created.json  edge_knows.json  schema.groovy  struct.json  vertex_person.csv  vertex_software.txt
4.5.1.2 Data loading
Taking the built-in example dataset as an example, we can use the following command to load the data.
If you need to import your custom dataset, you need to modify the paths for -f (data script) and -s (schema) configurations.
You can refer to 3.4.1-Parameter description for the rest of the parameters.
docker exec -it loader bin/hugegraph-loader.sh -g hugegraph -f example/file/struct.json -s example/file/schema.groovy -h server -p 8080
If loading a custom dataset, following the previous example, you would use:
docker exec -it loader bin/hugegraph-loader.sh -g hugegraph -f /loader/dataset/struct.json -s /loader/dataset/schema.groovy -h server -p 8080
If
loaderandserverare in the same Docker network, you can specify-h {server_container_name}; otherwise, you need to specify the IP of theserverhost (in our example,server_container_nameisserver).
Then we can see the result:
HugeGraphLoader worked in NORMAL MODE
vertices/edges loaded this time : 8/6
--------------------------------------------------
count metrics
    input read success            : 14                  
    input read failure            : 0                   
    vertex parse success          : 8                   
    vertex parse failure          : 0                   
    vertex insert success         : 8                   
    vertex insert failure         : 0                   
    edge parse success            : 6                   
    edge parse failure            : 0                   
    edge insert success           : 6                   
    edge insert failure           : 0                   
--------------------------------------------------
meter metrics
    total time                    : 0.199s              
    read time                     : 0.046s              
    load time                     : 0.153s              
    vertex load time              : 0.077s              
    vertex load rate(vertices/s)  : 103                 
    edge load time                : 0.112s              
    edge load rate(edges/s)       : 53   
You can also use curl or hubble to observe the import result. Here’s an example using curl:
> curl "http://localhost:8080/graphs/hugegraph/graph/vertices" | gunzip
{"vertices":[{"id":1,"label":"software","type":"vertex","properties":{"name":"lop","lang":"java","price":328.0}},{"id":2,"label":"software","type":"vertex","properties":{"name":"ripple","lang":"java","price":199.0}},{"id":"1:tom","label":"person","type":"vertex","properties":{"name":"tom"}},{"id":"1:josh","label":"person","type":"vertex","properties":{"name":"josh","age":32,"city":"Beijing"}},{"id":"1:marko","label":"person","type":"vertex","properties":{"name":"marko","age":29,"city":"Beijing"}},{"id":"1:peter","label":"person","type":"vertex","properties":{"name":"peter","age":35,"city":"Shanghai"}},{"id":"1:vadas","label":"person","type":"vertex","properties":{"name":"vadas","age":27,"city":"Hongkong"}},{"id":"1:li,nary","label":"person","type":"vertex","properties":{"name":"li,nary","age":26,"city":"Wu,han"}}]}
If you want to check the import result of edges, you can use curl "http://localhost:8080/graphs/hugegraph/graph/edges" | gunzip.
4.5.2 Enter the docker container to load data
Besides using docker exec directly for data import, we can also enter the container for data loading. The basic process is similar to 4.5.1.
Enter the container by docker exec -it loader bash and execute the command:
sh bin/hugegraph-loader.sh -g hugegraph -f example/file/struct.json -s example/file/schema.groovy -h server -p 8080
The results of the execution will be similar to those shown in 4.5.1.
4.6 Import data by spark-loader
Spark version: Spark 3+, other versions has not been tested.
HugeGraph Toolchain version: toolchain-1.0.0
The parameters of spark-loader are divided into two parts. Note: Because the abbreviations of
these two-parameter names have overlapping parts, please use the full name of the parameter.
And there is no need to guarantee the order between the two parameters.
- hugegraph parameters (Reference: hugegraph-loader parameter description )
- Spark task submission parameters (Reference: Submitting Applications)
Example:
sh bin/hugegraph-spark-loader.sh --master yarn \
--deploy-mode cluster --name spark-hugegraph-loader --file ./hugegraph.json \
--username admin --token admin --host xx.xx.xx.xx --port 8093 \
--graph graph-test --num-executors 6 --executor-cores 16 --executor-memory 15g
2.3 - HugeGraph-Tools Quick Start
1 HugeGraph-Tools Overview
HugeGraph-Tools is an automated deployment, management and backup/restore component of HugeGraph.
Testing Guide: For running HugeGraph-Tools tests locally, please refer to HugeGraph Toolchain Local Testing Guide
2 Get HugeGraph-Tools
There are two ways to get HugeGraph-Tools:
- Download the compiled tarball
- Clone source code then compile and install
2.1 Download the compiled archive
Download the latest version of the HugeGraph-Toolchain package:
wget https://downloads.apache.org/incubator/hugegraph/1.0.0/apache-hugegraph-toolchain-incubating-1.0.0.tar.gz
tar zxf *hugegraph*.tar.gz
2.2 Clone source code to compile and install
Please ensure that the wget command is installed before compiling the source code
Download the latest version of the HugeGraph-Tools source package:
# 1. get from github
git clone https://github.com/apache/hugegraph-toolchain.git
# 2. get from direct  (e.g. here is 1.0.0, please choose the latest version)
wget https://downloads.apache.org/incubator/hugegraph/1.0.0/apache-hugegraph-toolchain-incubating-1.0.0-src.tar.gz
Compile and generate tar package:
cd hugegraph-tools
mvn package -DskipTests
Generate tar package hugegraph-tools-${version}.tar.gz
3 How to use
3.1 Function overview
After decompression, enter the hugegraph-tools directory, you can use bin/hugegraph or bin/hugegraph help to view the usage information. mainly divided:
- Graph management Type,graph-mode-set、graph-mode-get、graph-list、graph-get and graph-clear
- Asynchronous task management Type,task-list、task-get、task-delete、task-cancel and task-clear
- Gremlin Type,gremlin-execute and gremlin-schedule
- Backup/Restore Type,backup、restore、migrate、schedule-backup and dump
- Install the deployment Type,deploy、clear、start-all and stop-all
Usage: hugegraph [options] [command] [command options]
3.2 [options]-Global Variable
options is a global variable of HugeGraph-Tools, which can be configured in hugegraph-tools/bin/hugegraph, including:
- –graph,HugeGraph-Tools The name of the graph to operate on, the default value is hugegraph
- –url,The service address of HugeGraph-Server, the default is http://127.0.0.1:8080
- –user,When HugeGraph-Server opens authentication, pass username
- –password,When HugeGraph-Server opens authentication, pass the user’s password
- –timeout,Timeout when connecting to HugeGraph-Server, the default is 30s
- –trust-store-file,The path of the certificate file, when –url uses https, the truststore file used by HugeGraph-Client, the default is empty, which means using the built-in truststore file conf/hugegraph.truststore of hugegraph-tools
- –trust-store-password,The password of the certificate file, when –url uses https, the password of the truststore used by HugeGraph-Client, the default is empty, representing the password of the built-in truststore file of hugegraph-tools
The above global variables can also be set through environment variables. One way is to use export on the command line to set temporary environment variables, which are valid until the command line is closed
| Global Variable | Environment Variable | Example | 
|---|---|---|
| –url | HUGEGRAPH_URL | export HUGEGRAPH_URL=http://127.0.0.1:8080 | 
| –graph | HUGEGRAPH_GRAPH | export HUGEGRAPH_GRAPH=hugegraph | 
| –user | HUGEGRAPH_USERNAME | export HUGEGRAPH_USERNAME=admin | 
| –password | HUGEGRAPH_PASSWORD | export HUGEGRAPH_PASSWORD=test | 
| –timeout | HUGEGRAPH_TIMEOUT | export HUGEGRAPH_TIMEOUT=30 | 
| –trust-store-file | HUGEGRAPH_TRUST_STORE_FILE | export HUGEGRAPH_TRUST_STORE_FILE=/tmp/trust-store | 
| –trust-store-password | HUGEGRAPH_TRUST_STORE_PASSWORD | export HUGEGRAPH_TRUST_STORE_PASSWORD=xxxx | 
Another way is to set the environment variable in the bin/hugegraph script:
#!/bin/bash
# Set environment here if needed
#export HUGEGRAPH_URL=
#export HUGEGRAPH_GRAPH=
#export HUGEGRAPH_USERNAME=
#export HUGEGRAPH_PASSWORD=
#export HUGEGRAPH_TIMEOUT=
#export HUGEGRAPH_TRUST_STORE_FILE=
#export HUGEGRAPH_TRUST_STORE_PASSWORD=
3.3 Graph Management Type,graph-mode-set、graph-mode-get、graph-list、graph-get and graph-clear
- graph-mode-set,set graph restore mode- –graph-mode or -m, required, specifies the mode to be set, legal values include [NONE, RESTORING, MERGING, LOADING]
 
- graph-mode-get,get graph restore mode
- graph-list,list all graphs in a HugeGraph-Server
- graph-get,get a graph and its storage backend type
- graph-clear,clear all schema and data of a graph- –confirm-message Or -c, required, delete confirmation information, manual input is required, double confirmation to prevent accidental deletion, “I’m sure to delete all data”, including double quotes
 
When you need to restore the backup graph to a new graph, you need to set the graph mode to RESTORING mode; when you need to merge the backup graph into an existing graph, you need to first set the graph mode to MERGING model.
3.4 Asynchronous task management Type,task-list、task-get and task-delete
- task-list,List the asynchronous tasks in a graph, which can be filtered according to the status of the tasks- –status,Optional, specify the status of the task to view, i.e. filter tasks by status
- –limit,Optional, specify the number of tasks to be obtained, the default is -1, which means to obtain all eligible tasks
 
- task-get,Get detailed information about an asynchronous task- –task-id,Required, specifies the ID of the asynchronous task
 
- task-delete,Delete information about an asynchronous task- –task-id,Required, specifies the ID of the asynchronous task
 
- task-cancel,Cancel the execution of an asynchronous task- –task-id,ID of the asynchronous task to cancel
 
- task-clear,Clean up completed asynchronous tasks- –force,Optional. When set, it means to clean up all asynchronous tasks. Unfinished ones are canceled first, and then all asynchronous tasks are cleared. By default, only completed asynchronous tasks are cleaned up
 
3.5 Gremlin Type,gremlin-execute and gremlin-schedule
- gremlin-execute, send Gremlin statements to HugeGraph-Server to execute query or modification operations, execute synchronously, and return results after completion- –file or -f, specify the script file to execute, UTF-8 encoding, mutually exclusive with –script
- –script or -s, specifies the script string to execute, mutually exclusive with –file
- –aliases or -a, Gremlin alias settings, the format is: key1=value1,key2=value2,…
- –bindings or -b, Gremlin binding settings, the format is: key1=value1,key2=value2,…
- –language or -l, the language of the Gremlin script, the default is gremlin-groovy
 –file and –script are mutually exclusive, one of them must be set 
- gremlin-schedule, send Gremlin statements to HugeGraph-Server to perform query or modification operations, asynchronous execution, and return the asynchronous task id immediately after the task is submitted- –file or -f, specify the script file to execute, UTF-8 encoding, mutually exclusive with –script
- –script or -s, specifies the script string to execute, mutually exclusive with –file
- –bindings or -b, Gremlin binding settings, the format is: key1=value1,key2=value2,…
- –language or -l, the language of the Gremlin script, the default is gremlin-groovy
 –file and –script are mutually exclusive, one of them must be set 
3.6 Backup/Restore Type
- backup, back up the schema or data in a certain graph out of the HugeGraph system, and store it on the local disk or HDFS in the form of JSON- –format, the backup format, optional values include [json, text], the default is json
- –all-properties, whether to back up all properties of vertices/edges, only valid when –format is text, default false
- –label, the type of vertices/edges to be backed up, only valid when –format is text, only valid when backing up vertices or edges
- –properties, properties of vertices/edges to be backed up, separated by commas, only valid when –format is text, valid only when backing up vertices or edges
- –compress, whether to compress data during backup, the default is true
- –directory or -d, the directory to store schema or data, the default is ‘./{graphName}’ for local directory, and ‘{fs.default.name}/{graphName}’ for HDFS
- –huge-types or -t, the data types to be backed up, separated by commas, the optional value is ‘all’ or a combination of one or more [vertex, edge, vertex_label, edge_label, property_key, index_label], ‘all’ Represents all 6 types, namely vertices, edges and all schemas
- –log or -l, specify the log directory, the default is the current directory
- –retry, specify the number of failed retries, the default is 3
- –split-size or -s, specifies the size of splitting vertices or edges when backing up, the default is 1048576
- -D, use the mode of -Dkey=value to specify dynamic parameters, and specify HDFS configuration items when backing up data to HDFS, for example: -Dfs.default.name=hdfs://localhost:9000
 
- restore, restore schema or data stored in JSON format to a new graph (RESTORING mode) or merge into an existing graph (MERGING mode)- –directory or -d, the directory to store schema or data, the default is ‘./{graphName}’ for local directory, and ‘{fs.default.name}/{graphName}’ for HDFS
- –clean, whether to delete the directory specified by –directory after the recovery map is completed, the default is false
- –huge-types or -t, data types to restore, separated by commas, optional value is ‘all’ or a combination of one or more [vertex, edge, vertex_label, edge_label, property_key, index_label], ‘all’ Represents all 6 types, namely vertices, edges and all schemas
- –log or -l, specify the log directory, the default is the current directory
- –retry, specify the number of failed retries, the default is 3
- -D, use the mode of -Dkey=value to specify dynamic parameters, which are used to specify HDFS configuration items when restoring graphs from HDFS, for example: -Dfs.default.name=hdfs://localhost:9000
 restore command can be used only if –format is executed as backup for json 
- migrate, migrate the currently connected graph to another HugeGraphServer- –target-graph, the name of the target graph, the default is hugegraph
- –target-url, the HugeGraphServer where the target graph is located, the default is http://127.0.0.1:8081
- –target-username, the username to access the target map
- –target-password, the password to access the target map
- –target-timeout, the timeout for accessing the target map
- –target-trust-store-file, access the truststore file used by the target graph
- –target-trust-store-password, the password to access the truststore used by the target map
- –directory or -d, during the migration process, the directory where the schema or data of the source graph is stored. For a local directory, the default is ‘./{graphName}’; for HDFS, the default is ‘{fs.default.name}/ {graphName}’
- –huge-types or -t, the data types to be migrated, separated by commas, the optional value is ‘all’ or a combination of one or more [vertex, edge, vertex_label, edge_label, property_key, index_label], ‘all’ Represents all 6 types, namely vertices, edges and all schemas
- –log or -l, specify the log directory, the default is the current directory
- –retry, specify the number of failed retries, the default is 3
- –split-size or -s, specify the size of the vertex or edge block when backing up the source graph during the migration process, the default is 1048576
- -D, use the mode of -Dkey=value to specify dynamic parameters, which are used to specify HDFS configuration items when the data needs to be backed up to HDFS during the migration process, for example: -Dfs.default.name=hdfs://localhost: 9000
- –graph-mode or -m, the mode to set the target graph when restoring the source graph to the target graph, legal values include [RESTORING, MERGING]
- –keep-local-data, whether to keep the backup of the source map generated in the process of migrating the map, the default is false, that is, the backup of the source map is not kept after the default migration map ends
 
- schedule-backup, periodically back up the graph and keep a certain number of the latest backups (currently only supports local file systems)- –directory or -d, required, specifies the directory of the backup data
- –backup-num, optional, specifies the number of latest backups to save, defaults to 3
- –interval, an optional item, specifies the backup cycle, the format is the same as the Linux crontab format
 
- dump, export all the vertices and edges of the entire graph, and store them in vertex vertex-edge1 vertex-edge2...JSON format by default. Users can also customize the storage format, just need to be inhugegraph-tools/src/main/java/com/baidu/hugegraph/formatterImplement a class inherited fromFormatterin the directory, such asCustomFormatter, and specify this class as formatter when using it, for examplebin/hugegraph dump -f CustomFormatter- –formatter or -f, specify the formatter to use, the default is JsonFormatter
- –directory or -d, the directory where schema or data is stored, the default is the current directory
- –log or -l, specify the log directory, the default is the current directory
- –retry, specify the number of failed retries, the default is 3
- –split-size or -s, specifies the size of splitting vertices or edges when backing up, the default is 1048576
- -D, use the mode of -Dkey=value to specify dynamic parameters, and specify HDFS configuration items when backing up data to HDFS, for example: -Dfs.default.name=hdfs://localhost:9000
 
3.7 Install the deployment type
- deploy, one-click download, install and start HugeGraph-Server and HugeGraph-Studio- -v, required, specifies the version number of HugeGraph-Server and HugeGraph-Studio installed, the latest is 0.9
- -p, required, specifies the installed HugeGraph-Server and HugeGraph-Studio directories
- -u, optional, specifies the link to download the HugeGraph-Server and HugeGraph-Studio compressed packages
 
- clear, clean up HugeGraph-Server and HugeGraph-Studio directories and tarballs- -p, required, specifies the directory of HugeGraph-Server and HugeGraph-Studio to be cleaned
 
- start-all, start HugeGraph-Server and HugeGraph-Studio with one click, and start monitoring, automatically pull up the service when the service dies- -v, required, specifies the version number of HugeGraph-Server and HugeGraph-Studio to be started, the latest is 0.9
- -p, required, specifies the directory where HugeGraph-Server and HugeGraph-Studio are installed
 
- stop-all, close HugeGraph-Server and HugeGraph-Studio with one click
There is an optional parameter -u in the deploy command. When provided, the specified download address will be used instead of the default download address to download the tar package, and the address will be written into the
~/hugegraph-download-url-prefixfile; if no address is specified later When -u and~/hugegraph-download-url-prefixare not specified, the tar package will be downloaded from the address specified by~/hugegraph-download-url-prefix; if there is neither -u nor~/hugegraph-download-url-prefix, it will be downloaded from the default download address
3.8 Specific command parameters
The specific parameters of each subcommand are as follows:
Usage: hugegraph [options] [command] [command options]
  Options:
    --graph
      Name of graph
      Default: hugegraph
    --password
      Password of user
    --timeout
      Connection timeout
      Default: 30
    --trust-store-file
      The path of client truststore file used when https protocol is enabled
    --trust-store-password
      The password of the client truststore file used when the https protocol 
      is enabled
    --url
      The URL of HugeGraph-Server
      Default: http://127.0.0.1:8080
    --user
      Name of user
  Commands:
    graph-list      List all graphs
      Usage: graph-list
    graph-get      Get graph info
      Usage: graph-get
    graph-clear      Clear graph schema and data
      Usage: graph-clear [options]
        Options:
        * --confirm-message, -c
            Confirm message of graph clear is "I'm sure to delete all data". 
            (Note: include "")
    graph-mode-set      Set graph mode
      Usage: graph-mode-set [options]
        Options:
        * --graph-mode, -m
            Graph mode, include: [NONE, RESTORING, MERGING]
            Possible Values: [NONE, RESTORING, MERGING, LOADING]
    graph-mode-get      Get graph mode
      Usage: graph-mode-get
    task-list      List tasks
      Usage: task-list [options]
        Options:
          --limit
            Limit number, no limit if not provided
            Default: -1
          --status
            Status of task
    task-get      Get task info
      Usage: task-get [options]
        Options:
        * --task-id
            Task id
            Default: 0
    task-delete      Delete task
      Usage: task-delete [options]
        Options:
        * --task-id
            Task id
            Default: 0
    task-cancel      Cancel task
      Usage: task-cancel [options]
        Options:
        * --task-id
            Task id
            Default: 0
    task-clear      Clear completed tasks
      Usage: task-clear [options]
        Options:
          --force
            Force to clear all tasks, cancel all uncompleted tasks firstly, 
            and delete all completed tasks
            Default: false
    gremlin-execute      Execute Gremlin statements
      Usage: gremlin-execute [options]
        Options:
          --aliases, -a
            Gremlin aliases, valid format is: 'key1=value1,key2=value2...'
            Default: {}
          --bindings, -b
            Gremlin bindings, valid format is: 'key1=value1,key2=value2...'
            Default: {}
          --file, -f
            Gremlin Script file to be executed, UTF-8 encoded, exclusive to 
            --script 
          --language, -l
            Gremlin script language
            Default: gremlin-groovy
          --script, -s
            Gremlin script to be executed, exclusive to --file
    gremlin-schedule      Execute Gremlin statements as asynchronous job
      Usage: gremlin-schedule [options]
        Options:
          --bindings, -b
            Gremlin bindings, valid format is: 'key1=value1,key2=value2...'
            Default: {}
          --file, -f
            Gremlin Script file to be executed, UTF-8 encoded, exclusive to 
            --script 
          --language, -l
            Gremlin script language
            Default: gremlin-groovy
          --script, -s
            Gremlin script to be executed, exclusive to --file
    backup      Backup graph schema/data. If directory is on HDFS, use -D to 
            set HDFS params. For exmaple:
            -Dfs.default.name=hdfs://localhost:9000 
      Usage: backup [options]
        Options:
          --all-properties
            All properties to be backup flag
            Default: false
          --compress
            compress flag
            Default: true
          --directory, -d
            Directory of graph schema/data, default is './{graphname}' in 
            local file system or '{fs.default.name}/{graphname}' in HDFS
          --format
            File format, valid is [json, text]
            Default: json
          --huge-types, -t
            Type of schema/data. Concat with ',' if more than one. 'all' means 
            all vertices, edges and schema, in other words, 'all' equals with 
            'vertex,edge,vertex_label,edge_label,property_key,index_label' 
            Default: [PROPERTY_KEY, VERTEX_LABEL, EDGE_LABEL, INDEX_LABEL, VERTEX, EDGE]
          --label
            Vertex or edge label, only valid when type is vertex or edge
          --log, -l
            Directory of log
            Default: ./logs
          --properties
            Vertex or edge properties to backup, only valid when type is
            vertex or edge
            Default: []
          --retry
            Retry times, default is 3
            Default: 3
          --split-size, -s
            Split size of shard
            Default: 1048576
          -D
            HDFS config parameters
            Syntax: -Dkey=value
            Default: {}
    schedule-backup      Schedule backup task
      Usage: schedule-backup [options]
        Options:
          --backup-num
            The number of latest backups to keep
            Default: 3
        * --directory, -d
            The directory of backups stored
          --interval
            The interval of backup, format is: "a b c d e". 'a' means minute 
            (0 - 59), 'b' means hour (0 - 23), 'c' means day of month (1 - 
            31), 'd' means month (1 - 12), 'e' means day of week (0 - 6) 
            (Sunday=0), "*" means all
            Default: "0 0 * * *"
    dump      Dump graph to files
      Usage: dump [options]
        Options:
          --directory, -d
            Directory of graph schema/data, default is './{graphname}' in 
            local file system or '{fs.default.name}/{graphname}' in HDFS
          --formatter, -f
            Formatter to customize format of vertex/edge
            Default: JsonFormatter
          --log, -l
            Directory of log
            Default: ./logs
          --retry
            Retry times, default is 3
            Default: 3
          --split-size, -s
            Split size of shard
            Default: 1048576
          -D
            HDFS config parameters
            Syntax: -Dkey=value
            Default: {}
    restore      Restore graph schema/data. If directory is on HDFS, use -D to 
            set HDFS params if needed. For 
            exmaple:-Dfs.default.name=hdfs://localhost:9000 
      Usage: restore [options]
        Options:
          --clean
            Whether to remove the directory of graph data after restored
            Default: false
          --directory, -d
            Directory of graph schema/data, default is './{graphname}' in 
            local file system or '{fs.default.name}/{graphname}' in HDFS
          --huge-types, -t
            Type of schema/data. Concat with ',' if more than one. 'all' means 
            all vertices, edges and schema, in other words, 'all' equals with 
            'vertex,edge,vertex_label,edge_label,property_key,index_label' 
            Default: [PROPERTY_KEY, VERTEX_LABEL, EDGE_LABEL, INDEX_LABEL, VERTEX, EDGE]
          --log, -l
            Directory of log
            Default: ./logs
          --retry
            Retry times, default is 3
            Default: 3
          -D
            HDFS config parameters
            Syntax: -Dkey=value
            Default: {}
    migrate      Migrate graph
      Usage: migrate [options]
        Options:
          --directory, -d
            Directory of graph schema/data, default is './{graphname}' in 
            local file system or '{fs.default.name}/{graphname}' in HDFS
          --graph-mode, -m
            Mode used when migrating to target graph, include: [RESTORING, 
            MERGING] 
            Default: RESTORING
            Possible Values: [NONE, RESTORING, MERGING, LOADING]
          --huge-types, -t
            Type of schema/data. Concat with ',' if more than one. 'all' means 
            all vertices, edges and schema, in other words, 'all' equals with 
            'vertex,edge,vertex_label,edge_label,property_key,index_label' 
            Default: [PROPERTY_KEY, VERTEX_LABEL, EDGE_LABEL, INDEX_LABEL, VERTEX, EDGE]
          --keep-local-data
            Whether to keep the local directory of graph data after restored
            Default: false
          --log, -l
            Directory of log
            Default: ./logs
          --retry
            Retry times, default is 3
            Default: 3
          --split-size, -s
            Split size of shard
            Default: 1048576
          --target-graph
            The name of target graph to migrate
            Default: hugegraph
          --target-password
            The password of target graph to migrate
          --target-timeout
            The timeout to connect target graph to migrate
            Default: 0
          --target-trust-store-file
            The trust store file of target graph to migrate
          --target-trust-store-password
            The trust store password of target graph to migrate
          --target-url
            The url of target graph to migrate
            Default: http://127.0.0.1:8081
          --target-user
            The username of target graph to migrate
          -D
            HDFS config parameters
            Syntax: -Dkey=value
            Default: {}
    deploy      Install HugeGraph-Server and HugeGraph-Studio
      Usage: deploy [options]
        Options:
        * -p
            Install path of HugeGraph-Server and HugeGraph-Studio
          -u
            Download url prefix path of HugeGraph-Server and HugeGraph-Studio
        * -v
            Version of HugeGraph-Server and HugeGraph-Studio
    start-all      Start HugeGraph-Server and HugeGraph-Studio
      Usage: start-all [options]
        Options:
        * -p
            Install path of HugeGraph-Server and HugeGraph-Studio
        * -v
            Version of HugeGraph-Server and HugeGraph-Studio
    clear      Clear HugeGraph-Server and HugeGraph-Studio
      Usage: clear [options]
        Options:
        * -p
            Install path of HugeGraph-Server and HugeGraph-Studio
    stop-all      Stop HugeGraph-Server and HugeGraph-Studio
      Usage: stop-all
    help      Print usage
      Usage: help
3.9 Specific command example
1. gremlin statement
# Execute gremlin synchronously
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph gremlin-execute --script 'g.V().count()'
# Execute gremlin asynchronously
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph gremlin-schedule --script 'g.V().count()'
2. Show task status
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph task-list
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph task-list --limit 5
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph task-list --status success
3. Set and show graph mode
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-set -m RESTORING MERGING NONE
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-set -m RESTORING
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-get
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-list
4. Cleanup Graph
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-clear -c "I'm sure to delete all data"
5. Backup Graph
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph backup -t all --directory ./backup-test
6. Periodic Backup Graph
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph --interval */2 * * * * schedule-backup -d ./backup-0.10.2
7. Recovery Graph
# set graph mode
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-set -m RESTORING
# recovery graph
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph restore -t all --directory ./backup-test
# restore graph mode
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-set -m NONE
8. Graph Migration
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph migrate --target-url http://127.0.0.1:8090 --target-graph hugegraph
3 - HugeGraph-AI
hugegraph-ai integrates HugeGraph with artificial intelligence capabilities, providing comprehensive support for developers to build AI-powered graph applications.
✨ Key Features
- GraphRAG: Build intelligent question-answering systems with graph-enhanced retrieval
- Knowledge Graph Construction: Automated graph building from text using LLMs
- Graph ML: Integration with 20+ graph learning algorithms (GCN, GAT, GraphSAGE, etc.)
- Python Client: Easy-to-use Python interface for HugeGraph operations
- AI Agents: Intelligent graph analysis and reasoning capabilities
🚀 Quick Start
[!NOTE] For a complete deployment guide and detailed examples, please refer to hugegraph-llm/README.md
Prerequisites
- Python 3.9+ (3.10+ recommended for hugegraph-llm)
- uv (recommended package manager)
- HugeGraph Server 1.3+ (1.5+ recommended)
- Docker (optional, for containerized deployment)
Option 1: Docker Deployment (Recommended)
# Clone the repository
git clone https://github.com/apache/incubator-hugegraph-ai.git
cd incubator-hugegraph-ai
# Set up environment and start services
cp docker/env.template docker/.env
# Edit docker/.env to set your PROJECT_PATH
cd docker
docker-compose -f docker-compose-network.yml up -d
# Access services:
# - HugeGraph Server: http://localhost:8080
# - RAG Service: http://localhost:8001
Option 2: Source Installation
# 1. Start HugeGraph Server
docker run -itd --name=server -p 8080:8080 hugegraph/hugegraph
# 2. Clone and set up the project
git clone https://github.com/apache/incubator-hugegraph-ai.git
cd incubator-hugegraph-ai/hugegraph-llm
# 3. Install dependencies
uv venv && source .venv/bin/activate
uv pip install -e .
# 4. Start the demo
python -m hugegraph_llm.demo.rag_demo.app
# Visit http://127.0.0.1:8001
Basic Usage Examples
GraphRAG - Question Answering
from hugegraph_llm.operators.graph_rag_task import RAGPipeline
# Initialize RAG pipeline
graph_rag = RAGPipeline()
# Ask questions about your graph
result = (graph_rag
    .extract_keywords(text="Tell me about Al Pacino.")
    .keywords_to_vid()
    .query_graphdb(max_deep=2, max_graph_items=30)
    .synthesize_answer()
    .run())
Knowledge Graph Construction
from hugegraph_llm.models.llms.init_llm import LLMs
from hugegraph_llm.operators.kg_construction_task import KgBuilder
# Build KG from text
TEXT = "Your text content here..."
builder = KgBuilder(LLMs().get_chat_llm())
(builder
    .import_schema(from_hugegraph="hugegraph")
    .chunk_split(TEXT)
    .extract_info(extract_type="property_graph")
    .commit_to_hugegraph()
    .run())
Graph Machine Learning
from pyhugegraph.client import PyHugeClient
# Connect to HugeGraph and run ML algorithms
# See hugegraph-ml documentation for detailed examples
📦 Modules
hugegraph-llm 
Large language model integration for graph applications:
- GraphRAG: Retrieval-augmented generation with graph data
- Knowledge Graph Construction: Build KGs from text automatically
- Natural Language Interface: Query graphs using natural language
- AI Agents: Intelligent graph analysis and reasoning
hugegraph-ml
Graph machine learning with 20+ implemented algorithms:
- Node Classification: GCN, GAT, GraphSAGE, APPNP, etc.
- Graph Classification: DiffPool, P-GNN, etc.
- Graph Embedding: DeepWalk, Node2Vec, GRACE, etc.
- Link Prediction: SEAL, GATNE, etc.
hugegraph-python-client
Python client for HugeGraph operations:
- Schema Management: Define vertex/edge labels and properties
- CRUD Operations: Create, read, update, delete graph data
- Gremlin Queries: Execute graph traversal queries
- REST API: Complete HugeGraph REST API coverage
📚 Learn More
🔗 Related Projects
- hugegraph - Core graph database
- hugegraph-toolchain - Development tools (Loader, Dashboard, etc.)
- hugegraph-computer - Graph computing system
🤝 Contributing
We welcome contributions! Please see our contribution guidelines for details.
Development Setup:
- Use GitHub Desktop for easier PR management
- Run ./style/code_format_and_analysis.shbefore submitting PRs
- Check existing issues before reporting bugs
📄 License
hugegraph-ai is licensed under Apache 2.0 License.
📞 Contact Us
- GitHub Issues: Report bugs or request features (fastest response)
- Email: dev@hugegraph.apache.org (subscription required)
- WeChat: Follow “Apache HugeGraph” official account

3.1 - HugeGraph-LLM
Please refer to the AI repository README for the most up-to-date documentation, and the official website regularly is updated and synchronized.
Bridge the gap between Graph Databases and Large Language Models
🎯 Overview
HugeGraph-LLM is a comprehensive toolkit that combines the power of graph databases with large language models. It enables seamless integration between HugeGraph and LLMs for building intelligent applications.
Key Features
- 🏗️ Knowledge Graph Construction - Build KGs automatically using LLMs + HugeGraph
- 🗣️ Natural Language Querying - Operate graph databases using natural language (Gremlin/Cypher)
- 🔍 Graph-Enhanced RAG - Leverage knowledge graphs to improve answer accuracy (GraphRAG & Graph Agent)
For detailed source code doc, visit our DeepWiki page. (Recommended)
📋 Prerequisites
[!IMPORTANT]
- Python: 3.10+ (not tested on 3.12)
- HugeGraph Server: 1.3+ (recommended: 1.5+)
- UV Package Manager: 0.7+
🚀 Quick Start
Choose your preferred deployment method:
Option 1: Docker Compose (Recommended)
The fastest way to get started with both HugeGraph Server and RAG Service:
# 1. Set up environment
cp docker/env.template docker/.env
# Edit docker/.env and set PROJECT_PATH to your actual project path
# 2. Deploy services
cd docker
docker-compose -f docker-compose-network.yml up -d
# 3. Verify deployment
docker-compose -f docker-compose-network.yml ps
# 4. Access services
# HugeGraph Server: http://localhost:8080
# RAG Service: http://localhost:8001
Option 2: Individual Docker Containers
For more control over individual components:
Available Images
- hugegraph/rag- Development image with source code access
- hugegraph/rag-bin- Production-optimized binary (compiled with Nuitka)
# 1. Create network
docker network create -d bridge hugegraph-net
# 2. Start HugeGraph Server
docker run -itd --name=server -p 8080:8080 --network hugegraph-net hugegraph/hugegraph
# 3. Start RAG Service
docker pull hugegraph/rag:latest
docker run -itd --name rag \
  -v /path/to/your/hugegraph-llm/.env:/home/work/hugegraph-llm/.env \
  -p 8001:8001 --network hugegraph-net hugegraph/rag
# 4. Monitor logs
docker logs -f rag
Option 3: Build from Source
For development and customization:
# 1. Start HugeGraph Server
docker run -itd --name=server -p 8080:8080 hugegraph/hugegraph
# 2. Install UV package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# 3. Clone and setup project
git clone https://github.com/apache/incubator-hugegraph-ai.git
cd incubator-hugegraph-ai/hugegraph-llm
# 4. Create virtual environment and install dependencies
uv venv && source .venv/bin/activate
uv pip install -e .
# 5. Launch RAG demo
python -m hugegraph_llm.demo.rag_demo.app
# Access at: http://127.0.0.1:8001
# 6. (Optional) Custom host/port
python -m hugegraph_llm.demo.rag_demo.app --host 127.0.0.1 --port 18001
Additional Setup (Optional)
# Download NLTK stopwords for better text processing
python ./hugegraph_llm/operators/common_op/nltk_helper.py
# Update configuration files
python -m hugegraph_llm.config.generate --update
[!TIP] Check our Quick Start Guide for detailed usage examples and query logic explanations.
💡 Usage Examples
Knowledge Graph Construction
Interactive Web Interface
Use the Gradio interface for visual knowledge graph building:
Input Options:
- Text: Direct text input for RAG index creation
- Files: Upload TXT or DOCX files (multiple selection supported)
Schema Configuration:
- Custom Schema: JSON format following our template
- HugeGraph Schema: Use existing graph instance schema (e.g., “hugegraph”)

Programmatic Construction
Build knowledge graphs with code using the KgBuilder class:
from hugegraph_llm.models.llms.init_llm import LLMs
from hugegraph_llm.operators.kg_construction_task import KgBuilder
# Initialize and chain operations
TEXT = "Your input text here..."
builder = KgBuilder(LLMs().get_chat_llm())
(
    builder
    .import_schema(from_hugegraph="talent_graph").print_result()
    .chunk_split(TEXT).print_result()
    .extract_info(extract_type="property_graph").print_result()
    .commit_to_hugegraph()
    .run()
)
Pipeline Workflow:
graph LR
    A[Import Schema] --> B[Chunk Split]
    B --> C[Extract Info]
    C --> D[Commit to HugeGraph]
    D --> E[Execute Pipeline]
    
    style A fill:#fff2cc
    style B fill:#d5e8d4
    style C fill:#dae8fc
    style D fill:#f8cecc
    style E fill:#e1d5e7
Graph-Enhanced RAG
Leverage HugeGraph for retrieval-augmented generation:
from hugegraph_llm.operators.graph_rag_task import RAGPipeline
# Initialize RAG pipeline
graph_rag = RAGPipeline()
# Execute RAG workflow
(
    graph_rag
    .extract_keywords(text="Tell me about Al Pacino.")
    .keywords_to_vid()
    .query_graphdb(max_deep=2, max_graph_items=30)
    .merge_dedup_rerank()
    .synthesize_answer(vector_only_answer=False, graph_only_answer=True)
    .run(verbose=True)
)
RAG Pipeline Flow:
graph TD
    A[User Query] --> B[Extract Keywords]
    B --> C[Match Graph Nodes]
    C --> D[Retrieve Graph Context]
    D --> E[Rerank Results]
    E --> F[Generate Answer]
    
    style A fill:#e3f2fd
    style B fill:#f3e5f5
    style C fill:#e8f5e8
    style D fill:#fff3e0
    style E fill:#fce4ec
    style F fill:#e0f2f1
🔧 Configuration
After running the demo, configuration files are automatically generated:
- Environment: hugegraph-llm/.env
- Prompts: hugegraph-llm/src/hugegraph_llm/resources/demo/config_prompt.yaml
[!NOTE] Configuration changes are automatically saved when using the web interface. For manual changes, simply refresh the page to load updates.
LLM Provider Support: This project uses LiteLLM for multi-provider LLM support.
📚 Additional Resources
- Graph Visualization: Use HugeGraph Hubble for data analysis and schema management
- API Documentation: Explore our REST API endpoints for integration
- Community: Join our discussions and contribute to the project
License: Apache License 2.0 | Community: Apache HugeGraph
3.2 - GraphRAG UI Details
Follow up main doc to introduce the basic UI function & details, welcome to update and improve at any time, thanks
1. Core Logic of the Project
Build RAG Index Responsibilities:
- Split and vectorize text
- Extract text into a graph (construct a knowledge graph) and vectorize the vertices
(Graph)RAG & User Functions Responsibilities:
- Retrieve relevant content from the constructed knowledge graph and vector database based on the query to supplement the prompt.
2. (Processing Flow) Build RAG Index
Construct a knowledge graph, chunk vector, and graph vid vector from the text.
graph TD;
    A[Raw Text] --> B[Text Segmentation]
    B --> C[Vectorization]
    C --> D[Store in Vector Database]
    A --> F[Text Segmentation]
    F --> G[LLM extracts graph based on schema \nand segmented text]
    G --> H[Store graph in Graph Database, \nautomatically vectorize vertices \nand store in Vector Database]
    
    I[Retrieve vertices from Graph Database] --> J[Vectorize vertices and store in Vector Database \nNote: Incremental update]
Four Input Fields:
- Doc(s): Input text
- Schema: The schema of the graph, which can be provided as a JSON-formatted schema or as the graph name (if it exists in the database).
- Graph Extract Prompt Header: The header of the prompt
- Output: Display results
Buttons:
- Get RAG Info - Get Vector Index Info: Retrieve vector index information 
- Get Graph Index Info: Retrieve graph index information 
 
- Clear RAG Data - Clear Chunks Vector Index: Clear chunk vector
- Clear Graph Vid Vector Index: Clear graph vid vector
- Clear Graph Data: Clear Graph Data
 
- Import into Vector: Convert the text in Doc(s) into vectors (requires chunking the text first and then converting the chunks into vectors) 
- Extract Graph Data (1): Extract graph data from Doc(s) based on the Schema, using the Graph Extract Prompt Header and chunked content as the prompt 
- Load into GraphDB (2): Store the extracted graph data into the database (automatically calls Update Vid Embedding to store vectors in the vector database) 
- Update Vid Embedding: Convert graph vid into vectors 
Execution Flow:
- Input text into the Doc(s) field.
- Click the Import into Vector button to split and vectorize the text, storing it in the vector database.
- Input the graph Schema into the Schema field.
- Click the Extract Graph Data (1) button to extract the text into a graph.
- Click the Load into GraphDB (2) button to store the extracted graph into the graph database (this automatically calls Update Vid Embedding to store the vectors in the vector database).
- Click the Update Vid Embedding button to vectorize the graph vertices and store them in the vector database.
3. (Processing Flow) (Graph)RAG & User Functions
The Import into Vector button in the previous module converts text (chunks) into vectors, and the Update Vid Embedding button converts graph vid into vectors. These vectors are stored separately to supplement the context for queries (answer generation) in this module. In other words, the previous module prepares the data for RAG (vectorization), while this module executes RAG.
This module consists of two parts:
- HugeGraph RAG Query
- (Batch) Back-testing
The first part handles single queries, while the second part handles multiple queries at once. Below is an explanation of the first part.
graph TD;
    A[Question] --> B[Vectorize the question and search \nfor the most similar chunk in the Vector Database (chunk)]
    A --> F[Extract keywords using LLM]
    F --> G[Match vertices precisely in Graph Database \nusing keywords; perform fuzzy matching in \nVector Database (graph vid)]
    G --> H[Generate Gremlin query using matched vertices and query with LLM]
    H --> I[Execute Gremlin query; if successful, finish; if failed, fallback to BFS]
    
    B --> J[Sort results]
    I --> J
    J --> K[Generate answer]
Input Fields:
- Question: Input the query
- Query Prompt: The prompt template used to ask the final question to the LLM
- Keywords Extraction Prompt: The prompt template for extracting keywords from the question
- Template Num: < 0 means disable text2gql; = 0 means no template(zero-shot); > 0 means using the specified number of templates
Query Scope Selection:
- Basic LLM Answer: Does not use RAG functionality
- Vector-only Answer: Uses only vector-based retrieval (queries chunk vectors in the vector database)
- Graph-only Answer: Uses only graph-based retrieval (queries graph vid vectors in the vector database and the graph database)
- Graph-Vector Answer: Uses both graph-based and vector-based retrieval
Execution Flow:
Graph-only Answer:
- Extract keywords from the question using the Keywords Extraction Prompt.
- Use the extracted keywords to: - First, perform an exact match in the graph database. 
- If no match is found, perform a fuzzy match in the vector database (graph vid vector) to retrieve relevant vertices. 
 
- text2gql: Call the text2gql-related interface, using the matched vertices as entities to convert the question into a Gremlin query and execute it in the graph database. 
- BFS: If text2gql fails (LLM-generated queries might be invalid), fall back to executing a graph query using a predefined Gremlin query template (essentially a BFS traversal). 
Vector-only Answer:
- Convert the query into a vector. 
- Search for the most similar content in the chunk vector dataset in the vector database. 
Sorting and Answer Generation:
- After executing the retrieval, sort the search (retrieval) results to construct the final prompt. 
- Generate answers based on different prompt configurations and display them in different output fields: - Basic LLM Answer
- Vector-only Answer
- Graph-only Answer
- Graph-Vector Answer
 
4. (Processing Flow) Text2Gremlin
Converts natural language queries into Gremlin queries.
This module consists of two parts:
- Build Vector Template Index (Optional): Vectorizes query/gremlin pairs from sample files and stores them in the vector database for reference when generating Gremlin queries.
- Natural Language to Gremlin: Converts natural language queries into Gremlin queries.
The first part is straightforward, so the focus is on the second part.
graph TD;
    A[Gremlin Pairs File] --> C[Vectorize query]
    C --> D[Store in Vector Database]
    
    F[Natural Language Query] --> G[Search for the most similar query \nin the Vector Database \n(If no Gremlin pairs exist in the Vector Database, \ndefault files will be automatically vectorized) \nand retrieve the corresponding Gremlin]
    G --> H[Add the matched pair to the prompt \nand use LLM to generate the Gremlin \ncorresponding to the Natural Language Query]
Input Fields for the Second Part:
- Natural Language Query: Input the natural language text to be converted into Gremlin.
- Schema: Input the graph schema.
Execution Flow:
- Input the query (natural language) into the Natural Language Query field. 
- Input the graph schema into the Schema field. 
- Click the Text2Gremlin button, and the following execution logic applies: - Convert the query into a vector. 
- Construct the prompt: - Retrieve the graph schema.
- Query the vector database for example vectors, retrieving query-gremlin pairs similar to the input query (if the vector database lacks examples, it automatically initializes with examples from the resources folder).
 
 
  - Generate the Gremlin query using the constructed prompt.
5. Graph Tools
Input Gremlin queries to execute corresponding operations.
4 - HugeGraph Computing (OLAP)
4.1 - HugeGraph-Vermeer Quick Start
1. Overview of Vermeer
1.1 Architecture
Vermeer is a high-performance, memory-first graph computing framework written in Go (start once, execute any task), supporting ultra-fast computation of 15+ OLAP graph algorithms (most tasks complete in seconds to minutes), with master and worker roles. Currently, there is only one master (HA can be added), and there can be multiple workers.
The master is responsible for communication, forwarding, and aggregation, with minimal computation and resource usage. Workers are computation nodes used to store graph data and run computation tasks, consuming a large amount of memory and CPU. The grpc and rest modules handle internal communication and external calls, respectively.
The framework’s runtime configuration can be passed via command-line parameters or specified in configuration files located in the config/ directory. The --env parameter can specify which configuration file to use, e.g., --env=master specifies using master.ini. Note that the master needs to specify the listening port, and the worker needs to specify the listening port and the master’s ip:port.
1.2 Running Method
Enter the directory and input ./vermeer --env=master or ./vermeer --env=worker01.
2. Task Creation REST API
2.1 Introduction
This REST API provides all task creation functions, including reading graph data and various computation functions, offering both asynchronous and synchronous return interfaces. The returned content includes information about the created tasks. The overall process of using Vermeer is to first create a task to read the graph data, and after the graph is read, create a computation task to execute the computation. The graph will not be automatically deleted; multiple computation tasks can be run on one graph without repeated reading. If deletion is needed, the delete graph interface can be used. Task statuses can be divided into graph reading task status and computation task status. Generally, the client only needs to know four statuses: created, in progress, completed, and error. The graph status is the basis for determining whether the graph is available. If the graph is being read or the graph status is erroneous, the graph cannot be used to create computation tasks. The delete graph interface is only available when the graph is in the loaded or error status and has no computation tasks.
Available URLs are as follows:
- Asynchronous return interface: POST http://master_ip:port/tasks/create returns only whether the task creation is successful, and the task status needs to be actively queried to determine completion.
- Synchronous return interface: POST http://master_ip:port/tasks/create/sync returns after the task is completed.
2.2 Loading Graph Data
Refer to the Vermeer parameter list document for specific parameters.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "load",
 "graph": "testdb",
 "params": {
 "load.parallel": "50",
 "load.type": "local",
 "load.vertex_files": "{\"localhost\":\"data/twitter-2010.v_[0,99]\"}",
 "load.edge_files": "{\"localhost\":\"data/twitter-2010.e_[0,99]\"}",
 "load.use_out_degree": "1",
 "load.use_outedge": "1"
 }
}
2.3 Output Computation Results
All Vermeer computation tasks support multiple result output methods, which can be customized: local, hdfs, afs, or hugegraph. Add the corresponding parameters under the params parameter when sending the request to take effect. When output.need_statistics is set to 1, it supports outputting statistical information of the computation results, which will be written in the interface task information. The statistical mode operators currently support “count” and “modularity,” but only for community detection algorithms.
Refer to the Vermeer parameter list document for specific parameters.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "pagerank",
 "compute.parallel":"10",
 "compute.max_step":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/pagerank"
  }
 }
3. Supported Algorithms
3.1 PageRank
The PageRank algorithm, also known as the web ranking algorithm, is a technique used by search engines to calculate the relevance and importance of web pages (nodes) based on their mutual hyperlinks.
- If a web page is linked to by many other web pages, it indicates that the web page is relatively important, and its PageRank value will be relatively high.
- If a web page with a high PageRank value links to other web pages, the PageRank value of the linked web pages will also increase accordingly.
The PageRank algorithm is suitable for scenarios such as web page ranking and identifying key figures in social networks.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "pagerank",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/pagerank",
 "compute.max_step":"10"
 }
}
3.2 WCC (Weakly Connected Components)
The weakly connected components algorithm calculates all connected subgraphs in an undirected graph and outputs the weakly connected subgraph ID to which each vertex belongs, indicating the connectivity between points and distinguishing different connected communities.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "wcc",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/wcc",
 "compute.max_step":"10"
 }
}
3.3 LPA (Label Propagation Algorithm)
The label propagation algorithm is a graph clustering algorithm commonly used in social networks to discover potential communities.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "lpa",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/lpa",
 "compute.max_step":"10"
 }
}
3.4 Degree Centrality
The degree centrality algorithm calculates the degree centrality value of each node in the graph, supporting both undirected and directed graphs. Degree centrality is an important indicator of node importance; the more edges a node has with other nodes, the higher its degree centrality value, and the more important the node is in the graph. In an undirected graph, degree centrality is calculated based on edge information to count the number of times a node appears, resulting in the degree centrality value of the node. In a directed graph, it is based on the direction of the edges, filtering based on input or output-edge information to count the number of times a node appears, resulting in the in-degree or out-degree value of the node. It indicates the importance of each point, with more important points having higher degrees.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "degree",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/degree",
 "degree.direction":"both"
 }
}
3.5 Closeness Centrality
Closeness centrality is used to calculate the inverse of the shortest distance from a node to all other reachable nodes, accumulating and normalizing the value. Closeness centrality can be used to measure the time it takes for information to be transmitted from the node to other nodes. The larger the closeness centrality of a node, the closer its position in the graph is to the center, suitable for scenarios such as identifying key nodes in social networks.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "closeness_centrality",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/closeness_centrality",
 "closeness_centrality.sample_rate":"0.01"
 }
}
3.6 Betweenness Centrality
The betweenness centrality algorithm determines the value of a node as a “bridge” node; the larger the value, the more likely it is to be a necessary path between two points in the graph. Typical examples include mutual followers in social networks. It is suitable for measuring the degree of aggregation around a node in a community.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "betweenness_centrality",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/betweenness_centrality",
 "betweenness_centrality.sample_rate":"0.01"
 }
}
3.7 Triangle Count
The triangle count algorithm calculates the number of triangles passing through each vertex, suitable for calculating the relationships between users and whether the associations form triangles. The more triangles, the higher the degree of association between nodes in the graph, and the tighter the organizational relationship. In social networks, triangles indicate cohesive communities, and identifying triangles helps understand clustering and interconnections among individuals or groups in the network. In financial or transaction networks, the presence of triangles may indicate suspicious or fraudulent activities, and triangle counting can help identify transaction patterns that may require further investigation.
The output result is the Triangle Count corresponding to each vertex, i.e., the number of triangles the vertex is part of.
Note: This algorithm is for undirected graphs and ignores edge directions.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "triangle_count",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/triangle_count"
 }
}
3.8 K-Core
The K-Core algorithm marks all vertices with a degree of K, suitable for graph pruning and finding the core part of the graph.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "kcore",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/kcore",
 "kcore.degree_k":"5"
 }
}
3.9 SSSP (Single Source Shortest Path)
The single source the shortest path algorithm calculates the shortest distance from one point to all other points.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "sssp",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/degree",
 "sssp.source":"tom"
 }
}
3.10 KOUT
Starting from a point, get the k-layer nodes of this point.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "kout",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/kout",
 "kout.source":"tom",
 "compute.max_step":"6"
 }
}
3.11 Louvain
The Louvain algorithm is a community detection algorithm based on modularity. The basic idea is that nodes in the network try to traverse all neighbor community labels and choose the community label that maximizes the modularity increment. After maximizing modularity, each community is regarded as a new node, and the process is repeated until the modularity no longer increases.
The distributed Louvain algorithm implemented on Vermeer is affected by factors such as node order and parallel computation. Due to the random traversal order of the Louvain algorithm, community compression also has a certain randomness, leading to different results in multiple executions. However, the overall trend will not change significantly.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "louvain",
 "compute.parallel":"10",
 "compute.max_step":"1000",
 "louvain.threshold":"0.0000001",
 "louvain.resolution":"1.0",
 "louvain.step":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/louvain"
  }
 }
3.12 Jaccard Similarity Coefficient
The Jaccard index, also known as the Jaccard similarity coefficient, is used to compare the similarity and diversity between finite sample sets. The larger the Jaccard coefficient value, the higher the similarity of the samples. It is used to calculate the Jaccard similarity coefficient between a given source point and all other points in the graph.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "jaccard",
 "compute.parallel":"10",
 "compute.max_step":"2",
 "jaccard.source":"123",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/jaccard"
 }
}
3.13 Personalized PageRank
The goal of personalized PageRank is to calculate the relevance of all nodes relative to user u. Starting from the node corresponding to user u, at each node, there is a probability of 1-d to stop walking and start again from u, or a probability of d to continue walking, randomly selecting a node from the nodes pointed to by the current node to walk down. It is used to calculate the personalized PageRank score starting from a given starting point, suitable for scenarios such as social recommendations.
Since the calculation requires using out-degree, load.use_out_degree needs to be set to 1 when reading the graph.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "ppr",
 "compute.parallel":"100",
 "compute.max_step":"10",
 "ppr.source":"123",
 "ppr.damping":"0.85",
 "ppr.diff_threshold":"0.00001",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/ppr"
 }
}
3.14 Global Kout
Calculate the k-degree neighbors of all nodes in the graph (excluding themselves and 1~k-1 degree neighbors). Due to the severe memory expansion of the global kout algorithm, k is currently limited to 1 and 2. Additionally, the global kout algorithm supports filtering functions (parameters such as “compute.filter”:“risk_level==1”), and the filtering condition is judged when calculating the k-degree. The final result set includes those that meet the filtering condition. The algorithm’s final output is the number of neighbors that meet the condition.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "kout_all",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"10",
 "output.file_path":"result/kout",
 "compute.max_step":"2",
 "compute.filter":"risk_level==1"
 }
}
3.15 Clustering Coefficient
The clustering coefficient represents the coefficient of the clustering degree of nodes in a graph. In real networks, especially in specific networks, nodes tend to establish a tightly organized relationship due to relatively high-density connection points. The clustering coefficient algorithm (Cluster Coefficient) is used to calculate the clustering degree of nodes in the graph. This algorithm is for local clustering coefficients. The local clustering coefficient can measure the clustering degree around each node in the graph.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "clustering_coefficient",
 "compute.parallel":"100",
 "compute.max_step":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/cc"
 }
}
3.16 SCC (Strongly Connected Components)
In the mathematical theory of directed graphs, if every vertex of a graph can be reached from any other point in the graph, the graph is said to be strongly connected. The parts of any directed graph that can achieve strong connectivity are called strongly connected components. It indicates the connectivity between points and distinguishes different connected communities.
Request example:
POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "scc",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/scc",
 "compute.max_step":"200"
 }
}
🚧, further updates and improvements will be made at any time. Suggestions and feedback are welcome.
4.2 - HugeGraph-Computer Quick Start
1 HugeGraph-Computer Overview
The HugeGraph-Computer is a distributed graph processing system for HugeGraph (OLAP). It is an implementation of Pregel. It runs on a Kubernetes(K8s) framework.(It focuses on supporting graph data volumes of hundreds of billions to trillions, using disk for sorting and acceleration, which is one of the biggest differences from Vermeer)
Features
- Support distributed MPP graph computing, and integrates with HugeGraph as graph input/output storage.
- Based on the BSP (Bulk Synchronous Parallel) model, an algorithm performs computing through multiple parallel iterations; every iteration is a superstep.
- Auto memory management. The framework will never be OOM(Out of Memory) since it will split some data to disk if it doesn’t have enough memory to hold all the data.
- The part of edges or the messages of super node can be in memory, so you will never lose it.
- You can load the data from HDFS or HugeGraph, or any other system.
- You can output the results to HDFS or HugeGraph, or any other system.
- Easy to develop a new algorithm. You just need to focus on vertex-only processing just like as in a single server, without worrying about message transfer and memory/storage management.
2 Dependency for Building/Running
2.1 Install Java 11 (JDK 11)
Must use ≥ Java 11 to run Computer, and configure by yourself.
Be sure to execute the java -version command to check the jdk version before reading
3 Get Started
3.1 Run PageRank algorithm locally
To run the algorithm with HugeGraph-Computer, you need to install Java 11 or later versions.
You also need to deploy HugeGraph-Server and Etcd.
There are two ways to get HugeGraph-Computer:
- Download the compiled tarball
- Clone source code then compile and package
3.1.1 Download the compiled archive
Download the latest version of the HugeGraph-Computer release package:
wget https://downloads.apache.org/incubator/hugegraph/${version}/apache-hugegraph-computer-incubating-${version}.tar.gz
tar zxvf apache-hugegraph-computer-incubating-${version}.tar.gz -C hugegraph-computer
3.1.2 Clone source code to compile and package
Clone the latest version of HugeGraph-Computer source package:
$ git clone https://github.com/apache/hugegraph-computer.git
Compile and generate tar package:
cd hugegraph-computer
mvn clean package -DskipTests
3.1.3 Start master node
You can use
-cparameter specify the configuration file, more computer config please see:Computer Config Options
cd hugegraph-computer
bin/start-computer.sh -d local -r master
3.1.4 Start worker node
bin/start-computer.sh -d local -r worker
3.1.5 Query algorithm results
3.1.5.1 Enable OLAP index query for server
If the OLAP index is not enabled, it needs to be enabled. More reference: modify-graphs-read-mode
PUT http://localhost:8080/graphs/hugegraph/graph_read_mode
"ALL"
3.1.5.2 Query page_rank property value:
curl "http://localhost:8080/graphs/hugegraph/graph/vertices?page&limit=3" | gunzip
3.2 Run PageRank algorithm in Kubernetes
To run an algorithm with HugeGraph-Computer, you need to deploy HugeGraph-Server first
3.2.1 Install HugeGraph-Computer CRD
# Kubernetes version >= v1.16
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml
# Kubernetes version < v1.16
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1beta1.yaml
3.2.2 Show CRD
kubectl get crd
NAME                                        CREATED AT
hugegraphcomputerjobs.hugegraph.apache.org   2021-09-16T08:01:08Z
3.2.3 Install hugegraph-computer-operator&etcd-server
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-operator.yaml
3.2.4 Wait for hugegraph-computer-operator&etcd-server deployment to complete
kubectl get pod -n hugegraph-computer-operator-system
NAME                                                              READY   STATUS    RESTARTS   AGE
hugegraph-computer-operator-controller-manager-58c5545949-jqvzl   1/1     Running   0          15h
hugegraph-computer-operator-etcd-28lm67jxk5                       1/1     Running   0          15h
3.2.5 Submit a job
More computer crd please see: Computer CRD
More computer config please see: Computer Config Options
cat <<EOF | kubectl apply --filename -
apiVersion: hugegraph.apache.org/v1
kind: HugeGraphComputerJob
metadata:
  namespace: hugegraph-computer-operator-system
  name: &jobName pagerank-sample
spec:
  jobId: *jobName
  algorithmName: page_rank
  image: hugegraph/hugegraph-computer:latest # algorithm image url
  jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar # algorithm jar path
  pullPolicy: Always
  workerCpu: "4"
  workerMemory: "4Gi"
  workerInstances: 5
  computerConf:
    job.partitions_count: "20"
    algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
    hugegraph.url: http://${hugegraph-server-host}:${hugegraph-server-port} # hugegraph server url
    hugegraph.name: hugegraph # hugegraph graph name
EOF
3.2.6 Show job
kubectl get hcjob/pagerank-sample -n hugegraph-computer-operator-system
NAME               JOBID              JOBSTATUS
pagerank-sample    pagerank-sample    RUNNING
3.2.7 Show log of nodes
# Show the master log
kubectl logs -l component=pagerank-sample-master -n hugegraph-computer-operator-system
# Show the worker log
kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system
# Show diagnostic log of a job
# NOTE: diagnostic log exist only when the job fails, and it will only be saved for one hour.
kubectl get event --field-selector reason=ComputerJobFailed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
3.2.8 Show success event of a job
NOTE: it will only be saved for one hour
kubectl get event --field-selector reason=ComputerJobSucceed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
3.2.9 Query algorithm results
If the output to Hugegraph-Server is consistent with Locally, if output to HDFS, please check the result file in the directory of /hugegraph-computer/results/{jobId} directory.
4. Built-In algorithms document
4.1 Supported algorithms list:
Centrality Algorithm:
- PageRank
- BetweennessCentrality
- ClosenessCentrality
- DegreeCentrality
Community Algorithm:
- ClusteringCoefficient
- Kcore
- Lpa
- TriangleCount
- Wcc
Path Algorithm:
- RingsDetection
- RingsDetectionWithFilter
More algorithms please see: Built-In algorithms
4.2 Algorithm describe
TODO
5 Algorithm development guide
TODO
6 Note
- If some classes under computer-k8s cannot be found, you need to execute mvn compilein advance to generate corresponding classes.
5 - HugeGraph Client
5.1 - HugeGraph-Java-Client
1 Overview Of Hugegraph
HugeGraph-Client sends HTTP request to HugeGraph-Server to get and parse the execution result of Server. We support HugeGraph-Client for Java/Go/Python language. You can use Client-API to write code to operate HugeGraph, such as adding, deleting, modifying, and querying schema and graph data, or executing gremlin statements.
HugeGraph client SDK tool based on Go language (version >=1.2.0)
2 What You Need
- Java 11 (also supports Java 8)
- Maven 3.5+
3 How To Use
The basic steps to use HugeGraph-Client are as follows:
- Build a new Maven project by IDEA or Eclipse
- Add HugeGraph-Client dependency in a pom file;
- Create an object to invoke the interface of HugeGraph-Client
See the complete example in the following section for the detail.
4 Complete Example
4.1 Build New Maven Project
Using IDEA or Eclipse to create the project:
4.2 Add Hugegraph-Client Dependency In POM
<dependencies>
    <dependency>
        <groupId>org.apache.hugegraph</groupId>
        <artifactId>hugegraph-client</artifactId>
        <!-- Update to the latest release version -->
        <version>1.5.0</version>
    </dependency>    
</dependencies>
Note: The versions of all graph components remain consistent
4.3 Example
4.3.1 SingleExample
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import org.apache.hugegraph.driver.GraphManager;
import org.apache.hugegraph.driver.GremlinManager;
import org.apache.hugegraph.driver.HugeClient;
import org.apache.hugegraph.driver.SchemaManager;
import org.apache.hugegraph.structure.constant.T;
import org.apache.hugegraph.structure.graph.Edge;
import org.apache.hugegraph.structure.graph.Path;
import org.apache.hugegraph.structure.graph.Vertex;
import org.apache.hugegraph.structure.gremlin.Result;
import org.apache.hugegraph.structure.gremlin.ResultSet;
public class SingleExample {
    public static void main(String[] args) throws IOException {
        // If connect failed will throw a exception.
        HugeClient hugeClient = HugeClient.builder("http://localhost:8080",
                                                   "hugegraph")
                                          .build();
        SchemaManager schema = hugeClient.schema();
        schema.propertyKey("name").asText().ifNotExist().create();
        schema.propertyKey("age").asInt().ifNotExist().create();
        schema.propertyKey("city").asText().ifNotExist().create();
        schema.propertyKey("weight").asDouble().ifNotExist().create();
        schema.propertyKey("lang").asText().ifNotExist().create();
        schema.propertyKey("date").asDate().ifNotExist().create();
        schema.propertyKey("price").asInt().ifNotExist().create();
        schema.vertexLabel("person")
              .properties("name", "age", "city")
              .primaryKeys("name")
              .ifNotExist()
              .create();
        schema.vertexLabel("software")
              .properties("name", "lang", "price")
              .primaryKeys("name")
              .ifNotExist()
              .create();
        schema.indexLabel("personByCity")
              .onV("person")
              .by("city")
              .secondary()
              .ifNotExist()
              .create();
        schema.indexLabel("personByAgeAndCity")
              .onV("person")
              .by("age", "city")
              .secondary()
              .ifNotExist()
              .create();
        schema.indexLabel("softwareByPrice")
              .onV("software")
              .by("price")
              .range()
              .ifNotExist()
              .create();
        schema.edgeLabel("knows")
              .sourceLabel("person")
              .targetLabel("person")
              .properties("date", "weight")
              .ifNotExist()
              .create();
        schema.edgeLabel("created")
              .sourceLabel("person").targetLabel("software")
              .properties("date", "weight")
              .ifNotExist()
              .create();
        schema.indexLabel("createdByDate")
              .onE("created")
              .by("date")
              .secondary()
              .ifNotExist()
              .create();
        schema.indexLabel("createdByWeight")
              .onE("created")
              .by("weight")
              .range()
              .ifNotExist()
              .create();
        schema.indexLabel("knowsByWeight")
              .onE("knows")
              .by("weight")
              .range()
              .ifNotExist()
              .create();
        GraphManager graph = hugeClient.graph();
        Vertex marko = graph.addVertex(T.LABEL, "person", "name", "marko",
                                       "age", 29, "city", "Beijing");
        Vertex vadas = graph.addVertex(T.LABEL, "person", "name", "vadas",
                                       "age", 27, "city", "Hongkong");
        Vertex lop = graph.addVertex(T.LABEL, "software", "name", "lop",
                                     "lang", "java", "price", 328);
        Vertex josh = graph.addVertex(T.LABEL, "person", "name", "josh",
                                      "age", 32, "city", "Beijing");
        Vertex ripple = graph.addVertex(T.LABEL, "software", "name", "ripple",
                                        "lang", "java", "price", 199);
        Vertex peter = graph.addVertex(T.LABEL, "person", "name", "peter",
                                       "age", 35, "city", "Shanghai");
        marko.addEdge("knows", vadas, "date", "2016-01-10", "weight", 0.5);
        marko.addEdge("knows", josh, "date", "2013-02-20", "weight", 1.0);
        marko.addEdge("created", lop, "date", "2017-12-10", "weight", 0.4);
        josh.addEdge("created", lop, "date", "2009-11-11", "weight", 0.4);
        josh.addEdge("created", ripple, "date", "2017-12-10", "weight", 1.0);
        peter.addEdge("created", lop, "date", "2017-03-24", "weight", 0.2);
        GremlinManager gremlin = hugeClient.gremlin();
        System.out.println("==== Path ====");
        ResultSet resultSet = gremlin.gremlin("g.V().outE().path()").execute();
        Iterator<Result> results = resultSet.iterator();
        results.forEachRemaining(result -> {
            System.out.println(result.getObject().getClass());
            Object object = result.getObject();
            if (object instanceof Vertex) {
                System.out.println(((Vertex) object).id());
            } else if (object instanceof Edge) {
                System.out.println(((Edge) object).id());
            } else if (object instanceof Path) {
                List<Object> elements = ((Path) object).objects();
                elements.forEach(element -> {
                    System.out.println(element.getClass());
                    System.out.println(element);
                });
            } else {
                System.out.println(object);
            }
        });
        hugeClient.close();
    }
}
4.3.2 BatchExample
import java.util.ArrayList;
import java.util.List;
import org.apache.hugegraph.driver.GraphManager;
import org.apache.hugegraph.driver.HugeClient;
import org.apache.hugegraph.driver.SchemaManager;
import org.apache.hugegraph.structure.graph.Edge;
import org.apache.hugegraph.structure.graph.Vertex;
public class BatchExample {
    public static void main(String[] args) {
        // If connect failed will throw a exception.
        HugeClient hugeClient = HugeClient.builder("http://localhost:8080",
                                                   "hugegraph")
                                          .build();
        SchemaManager schema = hugeClient.schema();
        schema.propertyKey("name").asText().ifNotExist().create();
        schema.propertyKey("age").asInt().ifNotExist().create();
        schema.propertyKey("lang").asText().ifNotExist().create();
        schema.propertyKey("date").asDate().ifNotExist().create();
        schema.propertyKey("price").asInt().ifNotExist().create();
        schema.vertexLabel("person")
              .properties("name", "age")
              .primaryKeys("name")
              .ifNotExist()
              .create();
        schema.vertexLabel("person")
              .properties("price")
              .nullableKeys("price")
              .append();
        schema.vertexLabel("software")
              .properties("name", "lang", "price")
              .primaryKeys("name")
              .ifNotExist()
              .create();
        schema.indexLabel("softwareByPrice")
              .onV("software").by("price")
              .range()
              .ifNotExist()
              .create();
        schema.edgeLabel("knows")
              .link("person", "person")
              .properties("date")
              .ifNotExist()
              .create();
        schema.edgeLabel("created")
              .link("person", "software")
              .properties("date")
              .ifNotExist()
              .create();
        schema.indexLabel("createdByDate")
              .onE("created").by("date")
              .secondary()
              .ifNotExist()
              .create();
        // get schema object by name
        System.out.println(schema.getPropertyKey("name"));
        System.out.println(schema.getVertexLabel("person"));
        System.out.println(schema.getEdgeLabel("knows"));
        System.out.println(schema.getIndexLabel("createdByDate"));
        // list all schema objects
        System.out.println(schema.getPropertyKeys());
        System.out.println(schema.getVertexLabels());
        System.out.println(schema.getEdgeLabels());
        System.out.println(schema.getIndexLabels());
        GraphManager graph = hugeClient.graph();
        Vertex marko = new Vertex("person").property("name", "marko")
                                           .property("age", 29);
        Vertex vadas = new Vertex("person").property("name", "vadas")
                                           .property("age", 27);
        Vertex lop = new Vertex("software").property("name", "lop")
                                           .property("lang", "java")
                                           .property("price", 328);
        Vertex josh = new Vertex("person").property("name", "josh")
                                          .property("age", 32);
        Vertex ripple = new Vertex("software").property("name", "ripple")
                                              .property("lang", "java")
                                              .property("price", 199);
        Vertex peter = new Vertex("person").property("name", "peter")
                                           .property("age", 35);
        Edge markoKnowsVadas = new Edge("knows").source(marko).target(vadas)
                                                .property("date", "2016-01-10");
        Edge markoKnowsJosh = new Edge("knows").source(marko).target(josh)
                                               .property("date", "2013-02-20");
        Edge markoCreateLop = new Edge("created").source(marko).target(lop)
                                                 .property("date",
                                                           "2017-12-10");
        Edge joshCreateRipple = new Edge("created").source(josh).target(ripple)
                                                   .property("date",
                                                             "2017-12-10");
        Edge joshCreateLop = new Edge("created").source(josh).target(lop)
                                                .property("date", "2009-11-11");
        Edge peterCreateLop = new Edge("created").source(peter).target(lop)
                                                 .property("date",
                                                           "2017-03-24");
        List<Vertex> vertices = new ArrayList<>();
        vertices.add(marko);
        vertices.add(vadas);
        vertices.add(lop);
        vertices.add(josh);
        vertices.add(ripple);
        vertices.add(peter);
        List<Edge> edges = new ArrayList<>();
        edges.add(markoKnowsVadas);
        edges.add(markoKnowsJosh);
        edges.add(markoCreateLop);
        edges.add(joshCreateRipple);
        edges.add(joshCreateLop);
        edges.add(peterCreateLop);
        vertices = graph.addVertices(vertices);
        vertices.forEach(vertex -> System.out.println(vertex));
        edges = graph.addEdges(edges, false);
        edges.forEach(edge -> System.out.println(edge));
        hugeClient.close();
    }
}
4.4 Run The Example
Before running Example, you need to start the Server. For the startup process, seeHugeGraph-Server Quick Start.
4.5 More Information About Client-API
5.2 - HugeGraph Python Client Quick Start
The hugegraph-python-client is a Python client/SDK for HugeGraph Database.
It is used to define graph structures, perform CRUD operations on graph data, manage schemas, and execute Gremlin queries. Both the hugegraph-llm and hugegraph-ml modules depend on this foundational library.
Installation
Install the released package (Stable)
To install the hugegraph-python-client, you can use uv/pip or source code building:
# uv is optional, you can use pip directly
uv pip install hugegraph-python # Note: may not the latest version, recommend to install from source
# WIP: we will use 'hugegraph-python-client' as the package name soon
Install from Source (Latest Code)
To install from the source, clone the repository and install the required dependencies:
git clone https://github.com/apache/incubator-hugegraph-ai.git
cd incubator-hugegraph-ai/hugegraph-python-client
# Normal install 
uv pip install .
# (Optional) install the devel version
uv pip install -e .
Usage
Defining Graph Structures
You can use the hugegraph-python-client to define graph structures. Below is an example of how to define a graph:
from pyhugegraph.client import PyHugeClient
# Initialize the client
# For HugeGraph API version ≥ v3: (Or enable graphspace function)  
# - The 'graphspace' parameter becomes relevant if graphspaces are enabled.(default name is 'DEFAULT')
# - Otherwise, the graphspace parameter is optional and can be ignored. 
client = PyHugeClient("127.0.0.1", "8080", user="admin", pwd="admin", graph="hugegraph", graphspace="DEFAULT")
''''
Note:
Could refer to the official REST-API doc of your HugeGraph version for accurate details.
If some API is not as expected, please submit a issue or contact us.
''''
schema = client.schema()
schema.propertyKey("name").asText().ifNotExist().create()
schema.propertyKey("birthDate").asText().ifNotExist().create()
schema.vertexLabel("Person").properties("name", "birthDate").usePrimaryKeyId().primaryKeys("name").ifNotExist().create()
schema.vertexLabel("Movie").properties("name").usePrimaryKeyId().primaryKeys("name").ifNotExist().create()
schema.edgeLabel("ActedIn").sourceLabel("Person").targetLabel("Movie").ifNotExist().create()
print(schema.getVertexLabels())
print(schema.getEdgeLabels())
print(schema.getRelations())
# Init Graph
g = client.graph()
v_al_pacino = g.addVertex("Person", {"name": "Al Pacino", "birthDate": "1940-04-25"})
v_robert = g.addVertex("Person", {"name": "Robert De Niro", "birthDate": "1943-08-17"})
v_godfather = g.addVertex("Movie", {"name": "The Godfather"})
v_godfather2 = g.addVertex("Movie", {"name": "The Godfather Part II"})
v_godfather3 = g.addVertex("Movie", {"name": "The Godfather Coda The Death of Michael Corleone"})
g.addEdge("ActedIn", v_al_pacino.id, v_godfather.id, {})
g.addEdge("ActedIn", v_al_pacino.id, v_godfather2.id, {})
g.addEdge("ActedIn", v_al_pacino.id, v_godfather3.id, {})
g.addEdge("ActedIn", v_robert.id, v_godfather2.id, {})
res = g.getVertexById(v_al_pacino.id).label
print(res)
g.close()
Schema Management
The hugegraph-python-client provides comprehensive schema management capabilities.
Define Property Keys
# Define a property key
client.schema().propertyKey('name').dataType('STRING').cardinality('SINGLE').create()
Define Vertex Labels
# Define a vertex label
client.schema().vertexLabel('person').properties('name', 'age').primaryKeys('name').create()
Define Edge Labels
# Define an edge label
client.schema().edgeLabel('knows').sourceLabel('person').targetLabel('person').properties('since').create()
Define Index Labels
# Define an index label
client.schema().indexLabel('personByName').onV('person').by('name').secondary().create()
CRUD Operations
The client allows you to perform CRUD operations on the graph data. Below are examples of how to create, read, update, and delete vertices and edges:
Create Vertices and Edges
# Create vertices
v1 = client.graph().addVertex('person').property('name', 'John').property('age', 29).create()
v2 = client.graph().addVertex('person').property('name', 'Jane').property('age', 25).create()
# Create an edge
client.graph().addEdge(v1, 'knows', v2).property('since', '2020').create()
Read Vertices and Edges
# Get a vertex by ID
vertex = client.graph().getVertexById(v1.id)
print(vertex)
# Get an edge by ID
edge = client.graph().getEdgeById(edge.id)
print(edge)
Update Vertices and Edges
# Update a vertex
client.graph().updateVertex(v1.id).property('age', 30).update()
# Update an edge
client.graph().updateEdge(edge.id).property('since', '2021').update()
Delete Vertices and Edges
# Delete a vertex
client.graph().deleteVertex(v1.id)
# Delete an edge
client.graph().deleteEdge(edge.id)
Execute Gremlin Queries
The client also supports executing Gremlin queries:
# Execute a Gremlin query
g = client.gremlin()
res = g.exec("g.V().limit(5)")
print(res)
Other info is under 🚧 (Welcome to add more docs for it, users could refer java-client-doc for similar usage)
Contributing
- Welcome to contribute to hugegraph-python-client. Please see the Guidelines for more information.
- Code format: Please run ./style/code_format_and_analysis.shto format your code before submitting a PR.
Thank you to all the people who already contributed to hugegraph-python-client!
Contact Us
- GitHub Issues: Feedback on usage issues and functional requirements (quick response)
5.3 - HugeGraph Go Client Quick Start
A HugeGraph Client SDK tool based on the Go language.
Software Architecture
(Software architecture description)
Installation Tutorial
go get github.com/apache/incubator-hugegraph-toolchain/hugegraph-client-go
Implemented APIs
| API | Description | 
|---|---|
| schema | Get schema information | 
| version | Get version information | 
Usage Instructions
1. Initialize the Client
package main
import (
	"log"
	"os"
	"github.com/apache/incubator-hugegraph-toolchain/hugegraph-client-go"
	"github.com/apache/incubator-hugegraph-toolchain/hugegraph-client-go/hgtransport"
)
func main() {
	client, err := hugegraph.NewCommonClient(hugegraph.Config{
		Host:     "127.0.0.1",
		Port:     8080,
		Graph:    "hugegraph",
		Username: "", // Fill in the username according to the actual situation
		Password: "", // Fill in the password according to the actual situation
		Logger: &hgtransport.ColorLogger{
			Output:             os.Stdout,
			EnableRequestBody:  true,
			EnableResponseBody: true,
		},
	})
	if err != nil {
		log.Fatalf("Error creating the client: %s\n", err)
	}
	// Use the client for operations...
	_ = client // Avoid "imported and not used" error
}
2. Get HugeGraph Version
Get Version Information Using SDK
package main
import (
	"fmt"
	"log"
	"os"
	"github.com/apache/incubator-hugegraph-toolchain/hugegraph-client-go"
	"github.com/apache/incubator-hugegraph-toolchain/hugegraph-client-go/hgtransport"
)
// initClient initializes and returns a HugeGraph client instance
func initClient() *hugegraph.CommonClient {
	client, err := hugegraph.NewCommonClient(hugegraph.Config{
		Host:     "127.0.0.1",
		Port:     8080,
		Graph:    "hugegraph",
		Username: "",
		Password: "",
		Logger: &hgtransport.ColorLogger{
			Output:             os.Stdout,
			EnableRequestBody:  true,
			EnableResponseBody: true,
		},
	})
	if err != nil {
		log.Fatalf("Error creating the client: %s\n", err)
	}
	return client
}
func getVersion() {
	client := initClient()
	// Assume client has a Version method that returns version information and an error
	// res, err := client.Version() // Actual call
	// Simulate return, as the client.Version() return type in the original README does not fully match the usage here
	type VersionInfo struct {
		Versions struct {
			Version string `json:"version"`
			Core    string `json:"core"`
			Gremlin string `json:"gremlin"`
			API     string `json:"api"`
		} `json:"versions"`
		// Body io.ReadCloser // Assume there is a Body to close, adjust according to the actual SDK
	}
	// Simulate API call and return
	res := &VersionInfo{
		Versions: struct {
			Version string `json:"version"`
			Core    string `json:"core"`
			Gremlin string `json:"gremlin"`
			API     string `json:"api"`
		}{
			Version: "1.0.0", // Example version
			Core:    "1.0.0",
			Gremlin: "3.x.x",
			API:     "v1",
		},
	}
	// err := error(nil) // Assume no error
	// if err != nil {
	// 	log.Fatalf("Error getting the response: %s\n", err)
	// }
	// defer res.Body.Close() // If there is a Body, it needs to be closed
	fmt.Println(res.Versions)
	fmt.Println(res.Versions.Version)
}
func main() {
	getVersion()
}
Structure of the Return Value
package main
// VersionResponse defines the structure returned by the version API
type VersionResponse struct {
	Versions struct {
		Version string `json:"version"` // hugegraph version
		Core    string `json:"core"`    // hugegraph core version
		Gremlin string `json:"gremlin"` // hugegraph gremlin version
		API     string `json:"api"`     // hugegraph api version
	} `json:"versions"`
}