Welcome to HugeGraph docs
This is the multi-page printable view of this section. Click here to print.
Documentation
- 1: Introduction with HugeGraph
- 2: Download Apache HugeGraph (Incubating)
- 3: Quick Start
- 3.1: HugeGraph-Server Quick Start
- 3.2: HugeGraph-Loader Quick Start
- 3.3: HugeGraph-Hubble Quick Start
- 3.4: HugeGraph-Client Quick Start
- 3.5: HugeGraph-AI Quick Start
- 3.6: HugeGraph-Tools Quick Start
- 3.7: HugeGraph-Computer Quick Start
- 4: Config
- 4.1: HugeGraph configuration
- 4.2: HugeGraph Config Options
- 4.3: Built-in User Authentication and Authorization Configuration and Usage in HugeGraph
- 4.4: Configuring HugeGraphServer to Use HTTPS Protocol
- 4.5: HugeGraph-Computer Config
- 5: API
- 5.1: HugeGraph RESTful API
- 5.1.1: Schema API
- 5.1.2: PropertyKey API
- 5.1.3: VertexLabel API
- 5.1.4: EdgeLabel API
- 5.1.5: IndexLabel API
- 5.1.6: Rebuild API
- 5.1.7: Vertex API
- 5.1.8: Edge API
- 5.1.9: Traverser API
- 5.1.10: Rank API
- 5.1.11: Variable API
- 5.1.12: Graphs API
- 5.1.13: Task API
- 5.1.14: Gremlin API
- 5.1.15: Cypher API
- 5.1.16: Authentication API
- 5.1.17: Metrics API
- 5.1.18: Other API
- 5.2: HugeGraph Java Client
- 5.3: Gremlin-Console
- 6: GUIDES
- 6.1: HugeGraph Architecture Overview
- 6.2: HugeGraph Design Concepts
- 6.3: HugeGraph Plugin mechanism and plug-in extension process
- 6.4: Backup and Restore
- 6.5: FAQ
- 6.6: Security Report
- 7: QUERY LANGUAGE
- 7.1: HugeGraph Gremlin
- 7.2: HugeGraph Examples
- 8: PERFORMANCE
- 8.1: HugeGraph BenchMark Performance
- 8.2: HugeGraph-API Performance
- 8.2.1: v0.5.6 Stand-alone(RocksDB)
- 8.2.2: v0.5.6 Cluster(Cassandra)
- 8.3: HugeGraph-Loader Performance
- 8.4:
- 9: Contribution Guidelines
- 9.1: How to Contribute to HugeGraph
- 9.2: Subscribe Mailing Lists
- 9.3: Validate Apache Release
- 9.4: Setup Server in IDEA (Dev)
- 9.5: Apache HugeGraph Committer Guide
- 10: CHANGELOGS
- 10.1: HugeGraph 1.0.0 Release Notes
- 10.2: HugeGraph 1.2.0 Release Notes
- 10.3: HugeGraph 1.3.0 Release Notes
- 10.4: HugeGraph 1.5.0 Release Notes
- 11:
- 12:
1 - Introduction with HugeGraph
Summary
Apache HugeGraph is an easy-to-use, efficient, general-purpose open source graph database system (Graph Database, GitHub project address), implemented the Apache TinkerPop3 framework and is fully compatible with the Gremlin query language, With complete toolchain components, it helps users easily build applications and products based on graph databases. HugeGraph supports fast import of more than 10 billion vertices and edges, and provides millisecond-level relational query capability (OLTP). It supports large-scale distributed graph computing (OLAP).
Typical application scenarios of HugeGraph include deep relationship exploration, association analysis, path search, feature extraction, data clustering, community detection, knowledge graph, etc., and are applicable to business fields such as network security, telecommunication fraud, financial risk control, advertising recommendation, social network, and intelligence Robots, etc.
Features
HugeGraph supports graph operations in online and offline environments, supports batch import of data, supports efficient complex relationship analysis, and can be seamlessly integrated with big data platforms. HugeGraph supports multi-user parallel operations. Users can enter Gremlin query statements and get graph query results in time. They can also call HugeGraph API in user programs for graph analysis or query.
This system has the following features:
- Ease of use: HugeGraph supports Gremlin graph query language and RESTful API, provides common interfaces for graph retrieval, and has peripheral tools with complete functions to easily implement various graph-based query and analysis operations.
- Efficiency: HugeGraph has been deeply optimized in graph storage and graph computing, and provides a variety of batch import tools, which can easily complete the rapid import of tens of billions of data, and achieve millisecond-level response for graph retrieval through optimized queries. Supports simultaneous online real-time operations of thousands of users.
- Universal: HugeGraph supports the Apache Gremlin standard graph query language and the Property Graph standard graph modeling method, and supports graph-based OLTP and OLAP schemes. Integrate Apache Hadoop and Apache Spark big data platforms.
- Scalable: supports distributed storage, multiple copies of data, and horizontal expansion, built-in multiple back-end storage engines, and can easily expand the back-end storage engine through plug-ins.
- Open: HugeGraph code is open source (Apache 2 License), customers can modify and customize independently, and selectively give back to the open-source community.
The functions of this system include but are not limited to:
- Supports batch import of data from multiple data sources (including local files, HDFS files, MySQL databases, and other data sources), and supports import of multiple file formats (including TXT, CSV, JSON, and other formats)
- With a visual operation interface, it can be used for operation, analysis, and display diagrams, reducing the threshold for users to use
- Optimized graph interface: shortest path (Shortest Path), K-step connected subgraph (K-neighbor), K-step to reach the adjacent point (K-out), personalized recommendation algorithm PersonalRank, etc.
- Implemented based on the Apache TinkerPop3 framework, supports Gremlin graph query language
- Support attribute graph, attributes can be added to vertices and edges, and support rich attribute types
- Has independent schema metadata information, has powerful graph modeling capabilities, and facilitates third-party system integration
- Support multi-vertex ID strategy: support primary key ID, support automatic ID generation, support user-defined string ID, support user-defined digital ID
- The attributes of edges and vertices can be indexed to support precise query, range query, and full-text search
- The storage system adopts a plug-in method, supporting RocksDB (standalone/cluster), Cassandra, ScyllaDB, HBase, MySQL, PostgreSQL, Palo and Memory, etc.
- Integrated with big data systems such as HDFS, Spark/Flink, GraphX, etc., supports BulkLoad operation to import massive data.
- Supports HA(high availability), multiple data replicas, backup and recovery, monitoring, distributed Trace, etc.
Modules
- HugeGraph-Server: HugeGraph-Server is the core part of the HugeGraph project, containing Core, Backend, API and other submodules;
- Core: Implements the graph engine, connects to the Backend module downwards, and supports the API module upwards;
- Backend: Implements the storage of graph data to the backend, supports backends including Memory, Cassandra, ScyllaDB, RocksDB, HBase, MySQL and PostgreSQL, users can choose one according to the actual situation;
- API: Built-in REST Server provides RESTful API to users and is fully compatible with Gremlin queries. (Supports distributed storage and computation pushdown)
- HugeGraph-Toolchain: (Toolchain)
- HugeGraph-Client: HugeGraph-Client provides a RESTful API client for connecting to HugeGraph-Server, currently only the Java version is implemented, users of other languages can implement it themselves;
- HugeGraph-Loader: HugeGraph-Loader is a data import tool based on HugeGraph-Client, which transforms ordinary text data into vertices and edges of the graph and inserts them into the graph database;
- HugeGraph-Hubble: HugeGraph-Hubble is HugeGraph’s Web visualization management platform, a one-stop visualization analysis platform, the platform covers the whole process from data modeling, to fast data import, to online and offline analysis of data, and unified management of the graph;
- HugeGraph-Tools: HugeGraph-Tools is HugeGraph’s deployment and management tool, including graph management, backup/recovery, Gremlin execution and other functions.
- HugeGraph-Computer: HugeGraph-Computer is a distributed graph processing system (OLAP). It is an implementation of Pregel. It can run on clusters such as Kubernetes/Yarn, and supports large-scale graph computing.
- HugeGraph-AI: HugeGraph-AI is HugeGraph’s independent AI component, providing training and inference functions of graph neural networks, LLM/Graph RAG combination/Python-Client and other related components, continuously updating.
Contact Us
- GitHub Issues: Feedback on usage issues and functional requirements (quick response)
- Feedback Email: dev@hugegraph.apache.org (subscriber only)
- Security Email: security@hugegraph.apache.org (Report SEC problems)
- WeChat public account: Apache HugeGraph, welcome to scan this QR code to follow us.
2 - Download Apache HugeGraph (Incubating)
Instructions:
- It is recommended to use the latest version of the HugeGraph software package. Please select Java11 for the runtime environment.
- To verify downloads, use the corresponding hash (SHA512), signature, and Project Signature Verification KEYS.
- Instructions for checking hash (SHA512) and signatures are on the Validate Release page, and you can also refer to ASF official instructions.
Note: The version numbers of all components of HugeGraph have been kept consistent, and the version numbers of Maven repositories such as
client/loader/hubble/common
are the same. You can refer to these for dependency references maven example.
Latest Version 1.5.0
Note: Starting from version
1.5.0
, a Java11 runtime environment is required.
- Release Date: 2024-12-10
- Release Notes
Binary Packages
Server | Toolchain |
---|---|
[Binary] [Sign] [SHA512] | [Binary] [Sign] [SHA512] |
Source Packages
Please refer to build from source.
Server | Toolchain | AI | Computer |
---|---|---|---|
[Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] |
Archived Versions
Note:
1.3.0
is the last major version compatible with Java8, please switch to or migrate to Java11 as soon as possible (lower versions of Java have potentially more SEC risks and performance impacts).
1.3.0
- Release Date: 2024-04-01
- Release Notes
Binary Packages
Server | Toolchain |
---|---|
[Binary] [Sign] [SHA512] | [Binary] [Sign] [SHA512] |
Source Packages
Server | Toolchain | AI | Common |
---|---|---|---|
[Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] |
1.2.0
- Release Date: 2023-12-28
- Release Notes
Binary Packages
Server | Toolchain |
---|---|
[Binary] [Sign] [SHA512] | [Binary] [Sign] [SHA512] |
Source Packages
Server | Toolchain | Computer | Common |
---|---|---|---|
[Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] |
1.0.0
- Release Date: 2023-02-22
- Release Notes
Binary Packages
Server | Toolchain | Computer |
---|---|---|
[Binary] [Sign] [SHA512] | [Binary] [Sign] [SHA512] | [Binary] [Sign] [SHA512] |
Source Packages
Server | Toolchain | Computer | Common |
---|---|---|---|
[Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] | [Source] [Sign] [SHA512] |
3 - Quick Start
3.1 - HugeGraph-Server Quick Start
1 HugeGraph-Server Overview
HugeGraph-Server
is the core part of the HugeGraph Project, contains submodules such as Core, Backend, API.
The Core Module is an implementation of the Tinkerpop interface; The Backend module is used to save the graph data to the data store, currently supported backends include: Memory, Cassandra, ScyllaDB, RocksDB; The API Module provides HTTP Server, which converts Client’s HTTP request into a call to Core Module.
There will be two spellings HugeGraph-Server and HugeGraphServer in the document, and other modules are similar. There is no big difference in the meaning of these two ways, which can be distinguished as follows:
HugeGraph-Server
represents the code of server-related components,HugeGraphServer
represents the service process.
2 Dependency for Building/Running
2.1 Install Java 11 (JDK 11)
You need to use Java 11 to run HugeGraph-Server
(compatible with Java 8 before 1.5.0, but not recommended to use),
and configure by yourself.
Be sure to execute the java -version
command to check the jdk version before reading
Note: Using Java8 will lose some security guarantees, we recommend using Java11 in production or
environments exposed to the public network and enable Auth authentication.
3 Deploy
There are four ways to deploy HugeGraph-Server components:
- Method 1: Use Docker container (Convenient for Test/Dev)
- Method 2: Download the binary tarball
- Method 3: Source code compilation
- Method 4: One-click deployment
3.1 Use Docker container (Convenient for Test/Dev)
You can refer to Docker deployment guide.
We can use docker run -itd --name=graph -p 8080:8080 hugegraph/hugegraph:1.5.0
to quickly start an inner HugeGraph server
with RocksDB
in background.
Optional:
- use
docker exec -it graph bash
to enter the container to do some operations. - use
docker run -itd --name=graph -p 8080:8080 -e PRELOAD="true" hugegraph/hugegraph:1.5.0
to start with a built-in example graph. We can useRESTful API
to verify the result. The detailed step can refer to 5.1.7 - use
-e PASSWORD=123456
to enable auth mode and set the password for admin. You can find more details from Config Authentication
If you use docker desktop, you can set the option like:
Also, if we want to manage the other Hugegraph related instances in one file, we can use docker-compose
to deploy, with the command docker-compose up -d
(you can config only server
). Here is an example docker-compose.yml
:
version: '3'
services:
server:
image: hugegraph/hugegraph:1.5.0
container_name: server
# environment:
# - PRELOAD=true
# PRELOAD is a option to preload a build-in sample graph when initializing.
# - PASSWORD=123456
# PASSWORD is an option to enable auth mode with the password you set.
ports:
- 8080:8080
Note:
The docker image of hugegraph is a convenience release to start hugegraph quickly, but not official distribution artifacts. You can find more details from ASF Release Distribution Policy.
Recommend to use
release tag
(like1.5.0
/1.5.0
) for the stable version. Uselatest
tag to experience the newest functions in development.
3.2 Download the binary tar tarball
You could download the binary tarball from the download page of ASF site like this:
# use the latest version, here is 1.5.0 for example
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-{version}.tar.gz
tar zxf *hugegraph*.tar.gz
# (Optional) verify the integrity with SHA512 (recommended)
shasum -a 512 apache-hugegraph-incubating-{version}.tar.gz
curl https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-{version}.tar.gz.sha512
3.3 Source code compilation
Please ensure that the wget command is installed before compiling the source code
We could get HugeGraph source code by 2 ways: (So as the other HugeGraph repos/modules)
- download the stable/release version from the ASF site
- clone the unstable/latest version by GitBox(ASF) or GitHub
# Way 1. download release package from the ASF site
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-src-{version}.tar.gz
tar zxf *hugegraph*.tar.gz
# (Optional) verify the integrity with SHA512 (recommended)
shasum -a 512 apache-hugegraph-incubating-src-{version}.tar.gz
curl https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-incubating-{version}-src.tar.gz.sha512
# Way2 : clone the latest code by git way (e.g GitHub)
git clone https://github.com/apache/hugegraph.git
Compile and generate tarball
cd *hugegraph
# (Optional) use "-P stage" param if you build failed with the latest code(during pre-release period)
mvn package -DskipTests -ntp
The execution log is as follows:
......
[INFO] Reactor Summary for hugegraph 1.5.0:
[INFO]
[INFO] hugegraph .......................................... SUCCESS [ 2.405 s]
[INFO] hugegraph-core ..................................... SUCCESS [ 13.405 s]
[INFO] hugegraph-api ...................................... SUCCESS [ 25.943 s]
[INFO] hugegraph-cassandra ................................ SUCCESS [ 54.270 s]
[INFO] hugegraph-scylladb ................................. SUCCESS [ 1.032 s]
[INFO] hugegraph-rocksdb .................................. SUCCESS [ 34.752 s]
[INFO] hugegraph-mysql .................................... SUCCESS [ 1.778 s]
[INFO] hugegraph-palo ..................................... SUCCESS [ 1.070 s]
[INFO] hugegraph-hbase .................................... SUCCESS [ 32.124 s]
[INFO] hugegraph-postgresql ............................... SUCCESS [ 1.823 s]
[INFO] hugegraph-dist ..................................... SUCCESS [ 17.426 s]
[INFO] hugegraph-example .................................. SUCCESS [ 1.941 s]
[INFO] hugegraph-test ..................................... SUCCESS [01:01 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
......
After successful execution, *hugegraph-*.tar.gz
files will be generated in the hugegraph directory, which is the tarball generated by compilation.
3.4 One-click deployment
HugeGraph-Tools
provides a command-line tool for one-click deployment, users can use this tool to quickly download, decompress, configure and start HugeGraphServer
and HugeGraph-Hubble
with one click.
Of course, you should download the tarball of HugeGraph-Toolchain
first.
# download toolchain binary package, it includes loader + tool + hubble
# please check the latest version (e.g. here is 1.5.0)
wget https://downloads.apache.org/incubator/hugegraph/1.5.0/apache-hugegraph-toolchain-incubating-1.5.0.tar.gz
tar zxf *hugegraph-*.tar.gz
# enter the tool's package
cd *hugegraph*/*tool*
note:
${version}
is the version, The latest version can refer to Download Page, or click the link to download directly from the Download page
The general entry script for HugeGraph-Tools is bin/hugegraph
, Users can use the help
command to view its usage, here only the commands for one-click deployment are introduced.
bin/hugegraph deploy -v {hugegraph-version} -p {install-path} [-u {download-path-prefix}]
{hugegraph-version}
indicates the version of HugeGraphServer and HugeGraphStudio to be deployed, users can view the conf/version-mapping.yaml
file for version information, {install-path}
specify the installation directory of HugeGraphServer and HugeGraphStudio, {download-path-prefix}
optional, specify the download address of HugeGraphServer and HugeGraphStudio tarball, use default download URL if not provided, for example, to start HugeGraph-Server and HugeGraphStudio version 0.6, write the above command as bin/hugegraph deploy -v 0.6 -p services
.
4 Config
If you need to quickly start HugeGraph just for testing, then you only need to modify a few configuration items (see next section). for detailed configuration introduction, please refer to configuration document and introduction to configuration items
5 Startup
5.1 Use a startup script to startup
The startup is divided into “first startup” and “non-first startup.” This distinction is because the back-end database needs to be initialized before the first startup, and then the service is started. after the service is stopped artificially, or when the service needs to be started again for other reasons, because the backend database is persistent, you can start the service directly.
When HugeGraphServer starts, it will connect to the backend storage and try to check the version number of the backend storage. If the backend is not initialized or the backend has been initialized but the version does not match (old version data), HugeGraphServer will fail to start and give an error message.
If you need to access HugeGraphServer externally, please modify the restserver.url
configuration item of rest-server.properties
(default is http://127.0.0.1:8080
), change to machine name or IP address.
Since the configuration (hugegraph.properties) and startup steps required by various backends are slightly different, the following will introduce the configuration and startup of each backend one by one.
If you want to use HugeGraph authentication mode, you should follow the Server Authentication Configuration before you start Server later.
5.1.1 Memory
Click to expand/collapse Memory configuration and startup methods
Update hugegraph.properties
backend=memory
serializer=text
The data of the Memory backend is stored in memory and cannot be persisted. It does not need to initialize the backend. This is the only backend that does not require initialization.
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
The prompted url is the same as the restserver.url configured in rest-server.properties
5.1.2 RocksDB
Click to expand/collapse RocksDB configuration and startup methods
RocksDB is an embedded database that does not require manual installation and deployment. GCC version >= 4.3.0 (GLIBCXX_3.4.10) is required. If not, GCC needs to be upgraded in advance
Update hugegraph.properties
backend=rocksdb
serializer=binary
rocksdb.data_path=.
rocksdb.wal_path=.
Initialize the database (required on first startup or a new configuration was manually added under ‘conf/graphs/’)
cd *hugegraph-${version}
bin/init-store.sh
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
5.1.3 Cassandra
Click to expand/collapse Cassandra configuration and startup methods
users need to install Cassandra by themselves, requiring version 3.0 or above, download link
Update hugegraph.properties
backend=cassandra
serializer=cassandra
# cassandra backend config
cassandra.host=localhost
cassandra.port=9042
cassandra.username=
cassandra.password=
#cassandra.connect_timeout=5
#cassandra.read_timeout=20
#cassandra.keyspace.strategy=SimpleStrategy
#cassandra.keyspace.replication=3
Initialize the database (required on first startup or a new configuration was manually added under ‘conf/graphs/’)
cd *hugegraph-${version}
bin/init-store.sh
Initing HugeGraph Store...
2017-12-01 11:26:51 1424 [main] [INFO ] org.apache.hugegraph.HugeGraph [] - Opening backend store: 'cassandra'
2017-12-01 11:26:52 2389 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph, try init keyspace later
2017-12-01 11:26:52 2472 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph, try init keyspace later
2017-12-01 11:26:52 2557 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph, try init keyspace later
2017-12-01 11:26:53 2797 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_graph
2017-12-01 11:26:53 2945 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_schema
2017-12-01 11:26:53 3044 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_index
2017-12-01 11:26:53 3046 [pool-3-thread-1] [INFO ] org.apache.hugegraph.backend.Transaction [] - Clear cache on event 'store.init'
2017-12-01 11:26:59 9720 [main] [INFO ] org.apache.hugegraph.HugeGraph [] - Opening backend store: 'cassandra'
2017-12-01 11:27:00 9805 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph1, try init keyspace later
2017-12-01 11:27:00 9886 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph1, try init keyspace later
2017-12-01 11:27:00 9955 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Failed to connect keyspace: hugegraph1, try init keyspace later
2017-12-01 11:27:00 10175 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_graph
2017-12-01 11:27:00 10321 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_schema
2017-12-01 11:27:00 10413 [main] [INFO ] org.apache.hugegraph.backend.store.cassandra.CassandraStore [] - Store initialized: huge_index
2017-12-01 11:27:00 10413 [pool-3-thread-1] [INFO ] org.apache.hugegraph.backend.Transaction [] - Clear cache on event 'store.init'
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
5.1.4 ScyllaDB
Click to expand/collapse ScyllaDB configuration and startup methods
users need to install ScyllaDB by themselves, version 2.1 or above is recommended, download link
Update hugegraph.properties
backend=scylladb
serializer=scylladb
# cassandra backend config
cassandra.host=localhost
cassandra.port=9042
cassandra.username=
cassandra.password=
#cassandra.connect_timeout=5
#cassandra.read_timeout=20
#cassandra.keyspace.strategy=SimpleStrategy
#cassandra.keyspace.replication=3
Since the scylladb database itself is an “optimized version” based on cassandra, if the user does not have scylladb installed, they can also use cassandra as the backend storage directly. They only need to change the backend and serializer to scylladb, and the host and post point to the seeds and port of the cassandra cluster. Yes, but it is not recommended to do so, it will not take advantage of scylladb itself.
Initialize the database (required on first startup or a new configuration was manually added under ‘conf/graphs/’)
cd *hugegraph-${version}
bin/init-store.sh
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
5.1.5 HBase
Click to expand/collapse HBase configuration and startup methods
users need to install HBase by themselves, requiring version 2.0 or above,download link
Update hugegraph.properties
backend=hbase
serializer=hbase
# hbase backend config
hbase.hosts=localhost
hbase.port=2181
# Note: recommend to modify the HBase partition number by the actual/env data amount & RS amount before init store
# it may influence the loading speed a lot
#hbase.enable_partition=true
#hbase.vertex_partitions=10
#hbase.edge_partitions=30
Initialize the database (required on first startup or a new configuration was manually added under ‘conf/graphs/’)
cd *hugegraph-${version}
bin/init-store.sh
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
for more other backend configurations, please refer tointroduction to configuration options
5.1.6 MySQL
Click to expand/collapse MySQL configuration and startup methods
Due to MySQL is under GPL license, which is not compatible with Apache License indeed, Users need to install MySQL, Download Link
Download MySQL’s [driver package] (https://repo1.maven.org/maven2/mysql/mysql-connector-java/), such as mysql-connector-java-8.0.30.jar
, and put it into HugeGraph- Server’s lib
directory.
Modify hugegraph.properties
, configure the database URL, username and password, store
is the database name, if not, it will be created automatically.
backend=mysql
serializer=mysql
store=hugegraph
# mysql backend config
jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.url=jdbc:mysql://127.0.0.1:3306
jdbc.username=
jdbc.password=
jdbc.reconnect_max_times=3
jdbc.reconnect_interval=3
jdbc.ssl_mode=false
Initialize the database (required on first startup or a new configuration was manually added under ‘conf/graphs/’)
cd *hugegraph-${version}
bin/init-store.sh
Start server
bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)....OK
5.1.7 Create an example graph when startup
Carry the -p true
arguments when starting the script, which indicates preload
, to create a sample graph.
bin/start-hugegraph.sh -p true
Starting HugeGraphServer in daemon mode...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)......OK
And use the RESTful API to request HugeGraphServer
and get the following result:
> curl "http://localhost:8080/graphs/hugegraph/graph/vertices" | gunzip
{"vertices":[{"id":"2:lop","label":"software","type":"vertex","properties":{"name":"lop","lang":"java","price":328}},{"id":"1:josh","label":"person","type":"vertex","properties":{"name":"josh","age":32,"city":"Beijing"}},{"id":"1:marko","label":"person","type":"vertex","properties":{"name":"marko","age":29,"city":"Beijing"}},{"id":"1:peter","label":"person","type":"vertex","properties":{"name":"peter","age":35,"city":"Shanghai"}},{"id":"1:vadas","label":"person","type":"vertex","properties":{"name":"vadas","age":27,"city":"Hongkong"}},{"id":"2:ripple","label":"software","type":"vertex","properties":{"name":"ripple","lang":"java","price":199}}]}
This indicates the successful creation of the sample graph.
5.2 Use Docker to startup
In 3.3 Use Docker container, we have introduced how to use docker to deploy hugegraph-server
. server
can also preload an example graph by setting the parameter.
5.2.1 Uses Cassandra as storage
Click to expand/collapse Cassandra configuration and startup methods
When using Docker, we can use Cassandra as the backend storage. We highly recommend using docker-compose directly to manage both the server and Cassandra.
The sample docker-compose.yml
can be obtained on GitHub, and you can start it with docker-compose up -d
. (If using Cassandra 4.0 as the backend storage, it takes approximately two minutes to initialize. Please be patient.)
version: "3"
services:
graph:
image: hugegraph/hugegraph
container_name: cas-server
ports:
- 8080:8080
environment:
hugegraph.backend: cassandra
hugegraph.serializer: cassandra
hugegraph.cassandra.host: cas-cassandra
hugegraph.cassandra.port: 9042
networks:
- ca-network
depends_on:
- cassandra
healthcheck:
test: ["CMD", "bin/gremlin-console.sh", "--" ,"-e", "scripts/remote-connect.groovy"]
interval: 10s
timeout: 30s
retries: 3
cassandra:
image: cassandra:4
container_name: cas-cassandra
ports:
- 7000:7000
- 9042:9042
security_opt:
- seccomp:unconfined
networks:
- ca-network
healthcheck:
test: ["CMD", "cqlsh", "--execute", "describe keyspaces;"]
interval: 10s
timeout: 30s
retries: 5
networks:
ca-network:
volumes:
hugegraph-data:
In this YAML file, configuration parameters related to Cassandra need to be passed as environment variables in the format of hugegraph.<parameter_name>
.
Specifically, in the configuration file hugegraph.properties
, there are settings like backend=xxx
and cassandra.host=xxx
. To configure these settings during the process of passing environment variables, we need to prepend hugegraph.
to these configurations, like hugegraph.backend
and hugegraph.cassandra.host
.
The rest of the configurations can be referenced under 4 config
5.2.2 Create example graph when starting server
Set the environment variable PRELOAD=true
when starting Docker to load data during the execution of the startup script.
Use
docker run
Use
docker run -itd --name=server -p 8080:8080 -e PRELOAD=true hugegraph/hugegraph:1.5.0
Use
docker-compose
Create
docker-compose.yml
as following. We should set the environment variablePRELOAD=true
.example.groovy
is a predefined script to preload the sample data. If needed, we can mount a newexample.groovy
to change the preload data.version: '3' services: server: image: hugegraph/hugegraph:1.5.0 container_name: server environment: - PRELOAD=true ports: - 8080:8080
Use
docker-compose up -d
to start the container
And use the RESTful API to request HugeGraphServer
and get the following result:
> curl "http://localhost:8080/graphs/hugegraph/graph/vertices" | gunzip
{"vertices":[{"id":"2:lop","label":"software","type":"vertex","properties":{"name":"lop","lang":"java","price":328}},{"id":"1:josh","label":"person","type":"vertex","properties":{"name":"josh","age":32,"city":"Beijing"}},{"id":"1:marko","label":"person","type":"vertex","properties":{"name":"marko","age":29,"city":"Beijing"}},{"id":"1:peter","label":"person","type":"vertex","properties":{"name":"peter","age":35,"city":"Shanghai"}},{"id":"1:vadas","label":"person","type":"vertex","properties":{"name":"vadas","age":27,"city":"Hongkong"}},{"id":"2:ripple","label":"software","type":"vertex","properties":{"name":"ripple","lang":"java","price":199}}]}
This indicates the successful creation of the sample graph.
6. Access server
6.1 Service startup status check
Use jps
to see a service process
jps
6475 HugeGraphServer
curl
request RESTfulAPI
echo `curl -o /dev/null -s -w %{http_code} "http://localhost:8080/graphs/hugegraph/graph/vertices"`
Return 200, which means the server starts normally.
6.2 Request Server
The RESTful API of HugeGraphServer includes various types of resources, typically including graph, schema, gremlin, traverser and task.
graph
containsvertices
、edges
schema
containsvertexlabels
、propertykeys
、edgelabels
、indexlabels
gremlin
contains variousGremlin
statements, such asg.v()
, which can be executed synchronously or asynchronouslytraverser
contains various advanced queries including shortest paths, intersections, N-step reachable neighbors, etc.task
contains query and delete with asynchronous tasks
6.2.1 Get vertices and its related properties in hugegraph
curl http://localhost:8080/graphs/hugegraph/graph/vertices
explanation
Since there are many vertices and edges in the graph, for list-type requests, such as getting all vertices, getting all edges, etc., the server will compress the data and return it, so when use curl, you get a bunch of garbled characters, you can redirect to gunzip for decompression. It is recommended to use Chrome browser + Restlet plugin to send HTTP requests for testing.
curl "http://localhost:8080/graphs/hugegraph/graph/vertices" | gunzip
The current default configuration of HugeGraphServer can only be accessed locally, and the configuration can be modified so that it can be accessed on other machines.
vim conf/rest-server.properties restserver.url=http://0.0.0.0:8080
response body:
{
"vertices": [
{
"id": "2lop",
"label": "software",
"type": "vertex",
"properties": {
"price": [
{
"id": "price",
"value": 328
}
],
"name": [
{
"id": "name",
"value": "lop"
}
],
"lang": [
{
"id": "lang",
"value": "java"
}
]
}
},
{
"id": "1josh",
"label": "person",
"type": "vertex",
"properties": {
"name": [
{
"id": "name",
"value": "josh"
}
],
"age": [
{
"id": "age",
"value": 32
}
]
}
},
...
]
}
For detailed API, please refer to RESTful-API
You can also visit localhost:8080/swagger-ui/index.html
to check the API.
When using Swagger UI to debug the API provided by HugeGraph, if HugeGraph Server turns on authentication mode, you can enter authentication information on the Swagger page.
Currently, HugeGraph supports setting authentication information in two forms: Basic and Bearer.
7 Stop Server
$cd *hugegraph-${version}
$bin/stop-hugegraph.sh
8 Debug Server with IntelliJ IDEA
Please refer to Setup Server in IDEA
3.2 - HugeGraph-Loader Quick Start
1 HugeGraph-Loader Overview
HugeGraph-Loader is the data import component of HugeGraph, which can convert data from various data sources into graph vertices and edges and import them into the graph database in batches.
Currently supported data sources include:
- Local disk file or directory, supports TEXT, CSV and JSON format files, supports compressed files
- HDFS file or directory supports compressed files
- Mainstream relational databases, such as MySQL, PostgreSQL, Oracle, SQL Server
Local disk files and HDFS files support resumable uploads.
It will be explained in detail below.
Note: HugeGraph-Loader requires HugeGraph Server service, please refer to HugeGraph-Server Quick Start to download and start Server
2 Get HugeGraph-Loader
There are two ways to get HugeGraph-Loader:
- Use docker image (Convenient for Test/Dev)
- Download the compiled tarball
- Clone source code then compile and install
2.1 Use Docker image (Convenient for Test/Dev)
We can deploy the loader service using docker run -itd --name loader hugegraph/loader:1.5.0
. For the data that needs to be loaded, it can be copied into the loader container either by mounting -v /path/to/data/file:/loader/file
or by using docker cp
.
Alternatively, to start the loader using docker-compose, the command is docker-compose up -d
. An example of the docker-compose.yml is as follows:
version: '3'
services:
server:
image: hugegraph/hugegraph:1.3.0
container_name: server
ports:
- 8080:8080
loader:
image: hugegraph/loader:1.3.0
container_name: loader
# mount your own data here
# volumes:
# - /path/to/data/file:/loader/file
The specific data loading process can be referenced under 4.5 User Docker to load data
Note:
The docker image of hugegraph-loader is a convenience release to start hugegraph-loader quickly, but not official distribution artifacts. You can find more details from ASF Release Distribution Policy.
Recommend to use
release tag
(like1.5.0
) for the stable version. Uselatest
tag to experience the newest functions in development.
2.2 Download the compiled archive
Download the latest version of the HugeGraph-Toolchain release package:
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-toolchain-incubating-{version}.tar.gz
tar zxf *hugegraph*.tar.gz
2.3 Clone source code to compile and install
Clone the latest version of HugeGraph-Loader source package:
# 1. get from github
git clone https://github.com/apache/hugegraph-toolchain.git
# 2. get from direct (e.g. here is 1.0.0, please choose the latest version)
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-toolchain-incubating-{version}-src.tar.gz
click to fold/collapse hwo to install ojdbc
Due to the license limitation of the Oracle OJDBC
, you need to manually install ojdbc to the local maven repository.
Visit the Oracle jdbc downloads page. Select Oracle Database 12c Release 2 (12.2.0.1) drivers, as shown in the following figure.
After opening the link, select “ojdbc8.jar”.
Install ojdbc8 to the local maven repository, enter the directory where ojdbc8.jar
is located, and execute the following command.
mvn install:install-file -Dfile=./ojdbc8.jar -DgroupId=com.oracle -DartifactId=ojdbc8 -Dversion=12.2.0.1 -Dpackaging=jar
Compile and generate tar package:
cd hugegraph-loader
mvn clean package -DskipTests
3 How to use
The basic process of using HugeGraph-Loader is divided into the following steps:
- Write graph schema
- Prepare data files
- Write input source map files
- Execute command import
3.1 Construct graph schema
This step is the modeling process. Users need to have a clear idea of their existing data and the graph model they want to create, and then write the schema to build the graph model.
For example, if you want to create a graph with two types of vertices and two types of edges, the vertices are “people” and “software”, the edges are “people know people” and “people create software”, and these vertices and edges have some attributes, For example, the vertex “person” has: “name”, “age” and other attributes, “Software” includes: “name”, “sale price” and other attributes; side “knowledge” includes: “date” attribute and so on.
graph model example
After designing the graph model, we can use groovy
to write the definition of schema
and save it to a file, here named schema.groovy
.
// Create some properties
schema.propertyKey("name").asText().ifNotExist().create();
schema.propertyKey("age").asInt().ifNotExist().create();
schema.propertyKey("city").asText().ifNotExist().create();
schema.propertyKey("date").asText().ifNotExist().create();
schema.propertyKey("price").asDouble().ifNotExist().create();
// Create the person vertex type, which has three attributes: name, age, city, and the primary key is name
schema.vertexLabel("person").properties("name", "age", "city").primaryKeys("name").ifNotExist().create();
// Create a software vertex type, which has two properties: name, price, the primary key is name
schema.vertexLabel("software").properties("name", "price").primaryKeys("name").ifNotExist().create();
// Create the knows edge type, which goes from person to person
schema.edgeLabel("knows").sourceLabel("person").targetLabel("person").ifNotExist().create();
// Create the created edge type, which points from person to software
schema.edgeLabel("created").sourceLabel("person").targetLabel("software").ifNotExist().create();
Please refer to the corresponding section in hugegraph-client for the detailed description of the schema.
3.2 Prepare data
The data sources currently supported by HugeGraph-Loader include:
- local disk file or directory
- HDFS file or directory
- Partial relational database
- Kafka topic
3.2.1 Data source structure
3.2.1.1 Local disk file or directory
The user can specify a local disk file as the data source. If the data is scattered in multiple files, a certain directory is also supported as the data source, but multiple directories are not supported as the data source for the time being.
For example, my data is scattered in multiple files, part-0, part-1 … part-n. To perform the import, it must be ensured that they are placed in one directory. Then in the loader’s mapping file, specify path
as the directory.
Supported file formats include:
- TEXT
- CSV
- JSON
TEXT is a text file with custom delimiters, the first line is usually the header, and the name of each column is recorded, and no header line is allowed (specified in the mapping file). Each remaining row represents a record, which will be converted into a vertex/edge; each column of the row corresponds to a field, which will be converted into the id, label or attribute of the vertex/edge;
An example is as follows:
id|name|lang|price|ISBN
1|lop|java|328|ISBN978-7-107-18618-5
2|ripple|java|199|ISBN978-7-100-13678-5
CSV is a TEXT file with commas ,
as delimiters. When a column value itself contains a comma, the column value needs to be enclosed in double quotes, for example:
marko,29,Beijing
"li,nary",26,"Wu,han"
The JSON file requires that each line is a JSON string, and the format of each line needs to be consistent.
{"source_name": "marko", "target_name": "vadas", "date": "20160110", "weight": 0.5}
{"source_name": "marko", "target_name": "josh", "date": "20130220", "weight": 1.0}
3.2.1.2 HDFS file or directory
Users can also specify HDFS files or directories as data sources, all of the above requirements for local disk files or directories
apply here. In addition, since HDFS usually stores compressed files, loader also provides support for compressed files, and local disk file or directory
also supports compressed files.
Currently supported compressed file types include: GZIP, BZ2, XZ, LZMA, SNAPPY_RAW, SNAPPY_FRAMED, Z, DEFLATE, LZ4_BLOCK, LZ4_FRAMED, ORC, and PARQUET.
3.2.1.3 Mainstream relational database
The loader also supports some relational databases as data sources, and currently supports MySQL, PostgreSQL, Oracle, and SQL Server.
However, the requirements for the table structure are relatively strict at present. If association query needs to be done during the import process, such a table structure is not allowed. The associated query means: after reading a row of the table, it is found that the value of a certain column cannot be used directly (such as a foreign key), and you need to do another query to determine the true value of the column.
For example, Suppose there are three tables, person, software and created
// person schema
id | name | age | city
// software schema
id | name | lang | price
// created schema
id | p_id | s_id | date
If the id strategy of person or software is specified as PRIMARY_KEY when modeling (schema), choose name as the primary key (note: this is the concept of vertex-label in hugegraph), when importing edge data, the source vertex and target need to be spliced out. For the id of the vertex, you must go to the person/software table with p_id/s_id to find the corresponding name. In the case of the schema that requires additional query, the loader does not support it temporarily. In this case, the following two methods can be used instead:
- The id strategy of person and software is still specified as PRIMARY_KEY, but the id column of the person table and software table is used as the primary key attribute of the vertex, so that the id can be generated by directly splicing p_id and s_id with the label of the vertex when importing an edge;
- Specify the id policy of person and software as CUSTOMIZE, and then directly use the id column of the person table and the software table as the vertex id, so that p_id and s_id can be used directly when importing edges;
The key point is to make the edge use p_id and s_id directly, don’t check it again.
3.2.2 Prepare vertex and edge data
3.2.2.1 Vertex Data
The vertex data file consists of data line by line. Generally, each line is used as a vertex, and each column is used as a vertex attribute. The following description uses CSV format as an example.
- person vertex data (the data itself does not contain a header)
Tom,48,Beijing
Jerry,36,Shanghai
- software vertex data (the data itself contains the header)
name,price
Photoshop,999
Office,388
3.2.2.2 Edge data
The edge data file consists of data line by line. Generally, each line is used as an edge. Some columns are used as the IDs of the source and target vertices, and other columns are used as edge attributes. The following uses JSON format as an example.
- knows edge data
{"source_name": "Tom", "target_name": "Jerry", "date": "2008-12-12"}
- created edge data
{"source_name": "Tom", "target_name": "Photoshop"}
{"source_name": "Tom", "target_name": "Office"}
{"source_name": "Jerry", "target_name": "Office"}
3.3 Write data source mapping file
3.3.1 Mapping file overview
The mapping file of the input source is used to describe how to establish the mapping relationship between the input source data and the vertex type/edge type of the graph. It is organized in JSON
format and consists of multiple mapping blocks, each of which is responsible for mapping an input source. Mapped to vertices and edges.
Specifically, each mapping block contains an input source and multiple vertex mapping and edge mapping blocks, and the input source block corresponds to the local disk file or directory
, HDFS file or directory
and relational database
are responsible for describing the basic information of the data source, such as where the data is, what format, what is the delimiter, etc. The vertex map/edge map is bound to the input source, which columns of the input source can be selected, which columns are used as ids, which columns are used as attributes, and what attributes are mapped to each column, the values of the columns are mapped to what values of attributes, and so on.
In the simplest terms, each mapping block describes: where is the file to be imported, which type of vertices/edges each line of the file is to be used as which columns of the file need to be imported, and the corresponding vertices/edges of these columns. what properties, etc.
Note: The format of the mapping file before version 0.11.0 and the format after 0.11.0 has changed greatly. For the convenience of expression, the mapping file (format) before 0.11.0 is called version 1.0, and the version after 0.11.0 is version 2.0. And unless otherwise specified, the “map file” refers to version 2.0.
Click to expand/collapse the skeleton of the map file for version 2.0
{
"version": "2.0",
"structs": [
{
"id": "1",
"input": {
},
"vertices": [
{},
{}
],
"edges": [
{},
{}
]
}
]
}
Two versions of the mapping file are given directly here (the above graph model and data file are described)
Click to expand/collapse the mapping file for version 2.0
{
"version": "2.0",
"structs": [
{
"id": "1",
"skip": false,
"input": {
"type": "FILE",
"path": "vertex_person.csv",
"file_filter": {
"extensions": [
"*"
]
},
"format": "CSV",
"delimiter": ",",
"date_format": "yyyy-MM-dd HH:mm:ss",
"time_zone": "GMT+8",
"skipped_line": {
"regex": "(^#|^//).*|"
},
"compression": "NONE",
"header": [
"name",
"age",
"city"
],
"charset": "UTF-8",
"list_format": {
"start_symbol": "[",
"elem_delimiter": "|",
"end_symbol": "]"
}
},
"vertices": [
{
"label": "person",
"skip": false,
"id": null,
"unfold": false,
"field_mapping": {},
"value_mapping": {},
"selected": [],
"ignored": [],
"null_values": [
""
],
"update_strategies": {}
}
],
"edges": []
},
{
"id": "2",
"skip": false,
"input": {
"type": "FILE",
"path": "vertex_software.csv",
"file_filter": {
"extensions": [
"*"
]
},
"format": "CSV",
"delimiter": ",",
"date_format": "yyyy-MM-dd HH:mm:ss",
"time_zone": "GMT+8",
"skipped_line": {
"regex": "(^#|^//).*|"
},
"compression": "NONE",
"header": null,
"charset": "UTF-8",
"list_format": {
"start_symbol": "",
"elem_delimiter": ",",
"end_symbol": ""
}
},
"vertices": [
{
"label": "software",
"skip": false,
"id": null,
"unfold": false,
"field_mapping": {},
"value_mapping": {},
"selected": [],
"ignored": [],
"null_values": [
""
],
"update_strategies": {}
}
],
"edges": []
},
{
"id": "3",
"skip": false,
"input": {
"type": "FILE",
"path": "edge_knows.json",
"file_filter": {
"extensions": [
"*"
]
},
"format": "JSON",
"delimiter": null,
"date_format": "yyyy-MM-dd HH:mm:ss",
"time_zone": "GMT+8",
"skipped_line": {
"regex": "(^#|^//).*|"
},
"compression": "NONE",
"header": null,
"charset": "UTF-8",
"list_format": null
},
"vertices": [],
"edges": [
{
"label": "knows",
"skip": false,
"source": [
"source_name"
],
"unfold_source": false,
"target": [
"target_name"
],
"unfold_target": false,
"field_mapping": {
"source_name": "name",
"target_name": "name"
},
"value_mapping": {},
"selected": [],
"ignored": [],
"null_values": [
""
],
"update_strategies": {}
}
]
},
{
"id": "4",
"skip": false,
"input": {
"type": "FILE",
"path": "edge_created.json",
"file_filter": {
"extensions": [
"*"
]
},
"format": "JSON",
"delimiter": null,
"date_format": "yyyy-MM-dd HH:mm:ss",
"time_zone": "GMT+8",
"skipped_line": {
"regex": "(^#|^//).*|"
},
"compression": "NONE",
"header": null,
"charset": "UTF-8",
"list_format": null
},
"vertices": [],
"edges": [
{
"label": "created",
"skip": false,
"source": [
"source_name"
],
"unfold_source": false,
"target": [
"target_name"
],
"unfold_target": false,
"field_mapping": {
"source_name": "name",
"target_name": "name"
},
"value_mapping": {},
"selected": [],
"ignored": [],
"null_values": [
""
],
"update_strategies": {}
}
]
}
]
}
Click to expand/collapse the mapping file for version 1.0
{
"vertices": [
{
"label": "person",
"input": {
"type": "file",
"path": "vertex_person.csv",
"format": "CSV",
"header": ["name", "age", "city"],
"charset": "UTF-8"
}
},
{
"label": "software",
"input": {
"type": "file",
"path": "vertex_software.csv",
"format": "CSV"
}
}
],
"edges": [
{
"label": "knows",
"source": ["source_name"],
"target": ["target_name"],
"input": {
"type": "file",
"path": "edge_knows.json",
"format": "JSON"
},
"field_mapping": {
"source_name": "name",
"target_name": "name"
}
},
{
"label": "created",
"source": ["source_name"],
"target": ["target_name"],
"input": {
"type": "file",
"path": "edge_created.json",
"format": "JSON"
},
"field_mapping": {
"source_name": "name",
"target_name": "name"
}
}
]
}
The 1.0 version of the mapping file is centered on the vertex and edge, and sets the input source; while the 2.0 version is centered on the input source, and sets the vertex and edge mapping. Some input sources (such as a file) can generate both vertices and edges. If you write in the 1.0 format, you need to write an input block in each of the vertex and edge mapping blocks. The two input blocks are exactly the same; and the 2.0 version only needs to write input once. Therefore, compared with version 1.0, version 2.0 can save some repetitive writing of input.
In the bin directory of hugegraph-loader-{version}, there is a script tool mapping-convert.sh
that can directly convert the mapping file of version 1.0 to version 2.0. The usage is as follows:
bin/mapping-convert.sh struct.json
A struct-v2.json will be generated in the same directory as struct.json.
3.3.2 Input Source
Input sources are currently divided into four categories: FILE, HDFS, JDBC and KAFKA, which are distinguished by the type
node. We call them local file input sources, HDFS input sources, JDBC input sources, and KAFKA input sources, which are described below.
3.3.2.1 Local file input source
- id: The id of the input source. This field is used to support some internal functions. It is not required (it will be automatically generated if it is not filled in). It is strongly recommended to write it, which is very helpful for debugging;
- skip: whether to skip the input source, because the JSON file cannot add comments, if you do not want to import an input source during a certain import, but do not want to delete the configuration of the input source, you can set it to true to skip it, the default is false, not required;
- input: input source map block, composite structure
- type: an input source type, file or FILE must be filled;
- path: the path of the local file or directory, the absolute path or the relative path relative to the mapping file, it is recommended to use the absolute path, required;
- file_filter: filter files with compound conditions from
path
, compound structure, currently only supports configuration extensions, represented by child nodeextensions
, the default is “*”, which means to keep all files; - format: the format of the local file, the optional values are CSV, TEXT and JSON, which must be uppercase and required;
- header: the column name of each column of the file, if not specified, the first line of the data file will be used as the header; when the file itself has a header and the header is specified, the first line of the file will be treated as a normal data line; JSON The file does not need to specify a header, optional;
- delimiter: The column delimiter of the file line, the default is comma
","
as the delimiter, theJSON
file does not need to be specified, optional; - charset: the encoded character set of the file, the default is
UTF-8
, optional; - date_format: custom date format, the default value is yyyy-MM-dd HH:mm:ss, optional; if the date is presented in the form of a timestamp, this item must be written as
timestamp
(fixed writing); - time_zone: Set which time zone the date data is in, the default value is
GMT+8
, optional; - skipped_line: The line to be skipped, compound structure, currently only the regular expression of the line to be skipped can be configured, described by the child node
regex
, no line is skipped by default, optional; - compression: The compression format of the file, the optional values are NONE, GZIP, BZ2, XZ, LZMA, SNAPPY_RAW, SNAPPY_FRAMED, Z, DEFLATE, LZ4_BLOCK, LZ4_FRAMED, ORC and PARQUET, the default is NONE, which means a non-compressed file, optional;
- list_format: When a column of the file (non-JSON) is a collection structure (the Cardinality of the PropertyKey in the corresponding figure is Set or List), you can use this item to set the start character, separator, and end character of the column, compound structure :
- start_symbol: The start character of the collection structure column (the default value is
[
, JSON format currently does not support specification) - elem_delimiter: the delimiter of the collection structure column (the default value is
|
, JSON format currently only supports native,
delimiter) - end_symbol: the end character of the collection structure column (the default value is
]
, the JSON format does not currently support specification)
- start_symbol: The start character of the collection structure column (the default value is
3.3.2.2 HDFS input source
The nodes and meanings of the above local file input source
are basically applicable here. Only the different and unique nodes of the HDFS input source are listed below.
- type: input source type, must fill in hdfs or HDFS, required;
- path: the path of the HDFS file or directory, it must be the absolute path of HDFS, required;
- core_site_path: the path of the core-site.xml file of the HDFS cluster, the key point is to specify the address of the NameNode (
fs.default.name
) and the implementation of the file system (fs.hdfs.impl
);
3.3.2.3 JDBC input source
As mentioned above, it supports multiple relational databases, but because their mapping structures are very similar, they are collectively referred to as JDBC input sources, and then use the vendor
node to distinguish different databases.
- type: input source type, must fill in jdbc or JDBC, required;
- vendor: database type, optional options are [MySQL, PostgreSQL, Oracle, SQLServer], case-insensitive, required;
- driver: the type of driver used by jdbc, required;
- url: the url of the database that jdbc wants to connect to, required;
- database: the name of the database to be connected, required;
- schema: The name of the schema to be connected, different databases have different requirements, and the details are explained below;
- table: the name of the table to be connected, at least one of
table
orcustom_sql
is required; - custom_sql: custom SQL statement, at least one of
table
orcustom_sql
is required; - username: username to connect to the database, required;
- password: password for connecting to the database, required;
- batch_size: The size of one page when obtaining table data by page, the default is 500, optional;
MYSQL
Node | Fixed value or common value |
---|---|
vendor | MYSQL |
driver | com.mysql.cj.jdbc.Driver |
url | jdbc:mysql://127.0.0.1:3306 |
schema: nullable, if filled in, it must be the same as the value of database
POSTGRESQL
Node | Fixed value or common value |
---|---|
vendor | POSTGRESQL |
driver | org.postgresql.Driver |
url | jdbc:postgresql://127.0.0.1:5432 |
schema: nullable, default is “public”
ORACLE
Node | Fixed value or common value |
---|---|
vendor | ORACLE |
driver | oracle.jdbc.driver.OracleDriver |
url | jdbc:oracle:thin:@127.0.0.1:1521 |
schema: nullable, the default value is the same as the username
SQLSERVER
Node | Fixed value or common value |
---|---|
vendor | SQLSERVER |
driver | com.microsoft.sqlserver.jdbc.SQLServerDriver |
url | jdbc:sqlserver://127.0.0.1:1433 |
schema: required
3.3.2.4 Kafka input source
- type: input source type,
kafka
orKAFKA
, required; - bootstrap_server: set the list of kafka bootstrap servers;
- topic: the topic to subscribe to;
- group: group of Kafka consumers;
- from_beginning: set whether to read from the beginning;
- format: format of the local file, options are CSV, TEXT and JSON, must be uppercase, required;
- header: column name of each column of the file, if not specified, the first line of the data file will be used as the header; when the file itself has a header and the header is specified, the first line of the file will be treated as an ordinary data line; JSON files do not need to specify the header, optional;
- delimiter: delimiter of the file line, default is comma “,” as delimiter, JSON files do not need to specify, optional;
- charset: encoding charset of the file, default is UTF-8, optional;
- date_format: customized date format, default value is yyyy-MM-dd HH:mm:ss, optional; if the date is presented in the form of timestamp, this item must be written as timestamp (fixed);
- extra_date_formats: a customized list of another date formats, empty by default, optional; each item in the list is an alternate date format to the date_format specified date format;
- time_zone: set which time zone the date data is in, default is GMT+8, optional;
- skipped_line: the line you want to skip, composite structure, currently can only configure the regular expression of the line to be skipped, described by the child node regex, the default is not to skip any line, optional;
- early_stop: the record pulled from Kafka broker at a certain time is empty, stop the task, default is false, only for debugging, optional;
3.3.3 Vertex and Edge Mapping
The nodes of vertex and edge mapping (a key in the JSON file) have a lot of the same parts. The same parts are introduced first, and then the unique nodes of vertex map
and edge map
are introduced respectively.
Nodes of the same section
- label:
label
to which the vertex/edge data to be imported belongs, required; - field_mapping: Map the column name of the input source column to the attribute name of the vertex/edge, optional;
- value_mapping: map the data value of the input source to the attribute value of the vertex/edge, optional;
- selected: select some columns to insert, other unselected ones are not inserted, cannot exist at the same time as
ignored
, optional; - ignored: ignore some columns so that they do not participate in insertion, cannot exist at the same time as
selected
, optional; - null_values: You can specify some strings to represent null values, such as “NULL”. If the vertex/edge attribute corresponding to this column is also a nullable attribute, the value of this attribute will not be set when constructing the vertex/edge, optional ;
- update_strategies: If the data needs to be updated in batches in a specific way, you can specify a specific update strategy for each attribute (see below for details), optional;
- unfold: Whether to unfold the column, each unfolded column will form a row with other columns, which is equivalent to unfolding into multiple rows; for example, the value of a certain column (id column) of the file is
[1,2,3]
, The values of other columns are18,Beijing
. When unfold is set, this row will become 3 rows, namely:1,18,Beijing
,2,18,Beijing
and3,18, Beijing
. Note that this will only expand the column selected as id. Default false, optional;
Update strategy supports 8 types: (requires all uppercase)
- Value accumulation:
SUM
- Take the greater of the two numbers/dates:
BIGGER
- Take the smaller of two numbers/dates:
SMALLER
- Set property takes union:
UNION
- Set attribute intersection:
INTERSECTION
- List attribute append element:
APPEND
- List/Set attribute delete element:
ELIMINATE
- Override an existing property:
OVERRIDE
Note: If the newly imported attribute value is empty, the existing old data will be used instead of the empty value. For the effect, please refer to the following example
// The update strategy is specified in the JSON file as follows
{
"vertices": [
{
"label": "person",
"update_strategies": {
"age": "SMALLER",
"set": "UNION"
},
"input": {
"type": "file",
"path": "vertex_person.txt",
"format": "TEXT",
"header": ["name", "age", "set"]
}
}
]
}
// 1. Write a line of data with the OVERRIDE update strategy (null means empty here)
'a b null null'
// 2. Write another line
'null null c d'
// 3. Finally we can get
'a b c d'
// If there is no update strategy, you will get
'null null c d'
Note : After adopting the batch update strategy, the number of disk read requests will increase significantly, and the import speed will be several times slower than that of pure write coverage (at this time HDD disk [IOPS](https://en.wikipedia .org/wiki/IOPS) will be the bottleneck, SSD is recommended for speed)
Unique Nodes for Vertex Maps
- id: Specify a column as the id column of the vertex. When the vertex id policy is
CUSTOMIZE
, it is required; when the id policy isPRIMARY_KEY
, it must be empty;
Unique Nodes for Edge Maps
- source: Select certain columns of the input source as the id column of source vertex. When the id policy of the source vertex is
CUSTOMIZE
, a certain column must be specified as the id column of the vertex; when the id policy of the source vertex isWhen PRIMARY_KEY
, one or more columns must be specified for splicing the id of the generated vertex, that is, no matter which id strategy is used, this item is required; - target: Specify certain columns as the id columns of target vertex, similar to source, so I won’t repeat them;
- unfold_source: Whether to unfold the source column of the file, the effect is similar to that in the vertex map, and will not be repeated;
- unfold_target: Whether to unfold the target column of the file, the effect is similar to that in the vertex mapping, and will not be repeated;
3.4 Execute command import
After preparing the graph model, data file, and input source mapping relationship file, the data file can be imported into the graph database.
The import process is controlled by commands submitted by the user, and the user can control the specific process of execution through different parameters.
3.4.1 Parameter description
Parameter | Default value | Required or not | Description |
---|---|---|---|
-f or --file | Y | path to configure script | |
-g or --graph | Y | graph space name | |
-s or --schema | Y | schema file path | |
-h or --host | localhost | address of HugeGraphServer | |
-p or --port | 8080 | port number of HugeGraphServer | |
--username | null | When HugeGraphServer enables permission authentication, the username of the current graph | |
--token | null | When HugeGraphServer has enabled authorization authentication, the token of the current graph | |
--protocol | http | Protocol for sending requests to the server, optional http or https | |
--trust-store-file | When the request protocol is https, the client’s certificate file path | ||
--trust-store-password | When the request protocol is https, the client certificate password | ||
--clear-all-data | false | Whether to clear the original data on the server before importing data | |
--clear-timeout | 240 | Timeout for clearing the original data on the server before importing data | |
--incremental-mode | false | Whether to use the breakpoint resume mode, only the input source is FILE and HDFS support this mode, enabling this mode can start the import from the place where the last import stopped | |
--failure-mode | false | When the failure mode is true, the data that failed before will be imported. Generally speaking, the failed data file needs to be manually corrected and edited, and then imported again | |
--batch-insert-threads | CPUs | Batch insert thread pool size (CPUs is the number of logical cores available to the current OS) | |
--single-insert-threads | 8 | Size of single insert thread pool | |
--max-conn | 4 * CPUs | The maximum number of HTTP connections between HugeClient and HugeGraphServer, it is recommended to adjust this when adjusting threads | |
--max-conn-per-route | 2 * CPUs | The maximum number of HTTP connections for each route between HugeClient and HugeGraphServer, it is recommended to adjust this item at the same time when adjusting the thread | |
--batch-size | 500 | The number of data items in each batch when importing data | |
--max-parse-errors | 1 | The maximum number of lines of data parsing errors allowed, and the program exits when this value is reached | |
--max-insert-errors | 500 | The maximum number of rows of data insertion errors allowed, and the program exits when this value is reached | |
--timeout | 60 | Timeout (seconds) for inserting results to return | |
--shutdown-timeout | 10 | Waiting time for multithreading to stop (seconds) | |
--retry-times | 0 | Number of retries when a specific exception occurs | |
--retry-interval | 10 | interval before retry (seconds) | |
--check-vertex | false | Whether to check whether the vertex connected by the edge exists when inserting the edge | |
--print-progress | true | Whether to print the number of imported items in the console in real time | |
--dry-run | false | Turn on this mode, only parsing but not importing, usually used for testing | |
--help | false | print help information |
3.4.2 Breakpoint Continuation Mode
Usually, the Loader task takes a long time to execute. If the import interrupt process exits for some reason, and next time you want to continue the import from the interrupted point, this is the scenario of using breakpoint continuation.
The user sets the command line parameter –incremental-mode to true to open the breakpoint resume mode. The key to breakpoint continuation lies in the progress file. When the import process exits, the import progress at the time of exit will be recorded.
Recorded in the progress file, the progress file is located in the ${struct}
directory, the file name is like load-progress ${date}
, ${struct} is the prefix of the mapping file, and ${date} is the start of the import
moment. For example, for an import task started at 2019-10-10 12:30:30
, the mapping file used is struct-example.json
, then the path of the progress file is the same as struct-example.json
Sibling struct-example/load-progress 2019-10-10 12:30:30
.
Note: The generation of progress files is independent of whether –incremental-mode is turned on or not, and a progress file is generated at the end of each import.
If the data file formats are all legal and the import task is stopped by the user (CTRL + C or kill, kill -9 is not supported), that is to say, if there is no error record, the next import only needs to be set to Continue for the breakpoint.
But if the limit of –max-parse-errors or –max-insert-errors is reached because too much data is invalid or network abnormality is reached, Loader will record these original rows that failed to insert into In the failed file, after the user modifies the data lines in the failed file, set –reload-failure to true to import these “failed files” as input sources (does not affect the normal file import), Of course, if there is still a problem with the modified data line, it will be logged again to the failure file (don’t worry about duplicate lines).
Each vertex map or edge map will generate its own failure file when data insertion fails. The failure file is divided into a parsing failure file (suffix .parse-error) and an insertion failure file (suffix .insert-error).
They are stored in the ${struct}/current
directory. For example, there is a vertex mapping person, and an edge mapping knows in the mapping file, each of which has some error lines. When the Loader exits, you will see the following files in the ${struct}/current
directory:
- person-b4cd32ab.parse-error: Vertex map person parses wrong data
- person-b4cd32ab.insert-error: Vertex map person inserts wrong data
- knows-eb6b2bac.parse-error: edge map knows parses wrong data
- knows-eb6b2bac.insert-error: edge map knows inserts wrong data
.parse-error and .insert-error do not always exist together. Only lines with parsing errors will have .parse-error files, and only lines with insertion errors will have .insert-error files.
3.4.3 logs directory file description
The log and error data during program execution will be written into the hugegraph-loader.log file.
3.4.4 Execute command
Run bin/hugegraph-loader and pass in parameters
bin/hugegraph-loader -g {GRAPH_NAME} -f ${INPUT_DESC_FILE} -s ${SCHEMA_FILE} -h {HOST} -p {PORT}
4 Complete example
Given below is an example in the example directory of the hugegraph-loader package. (GitHub address)
4.1 Prepare data
Vertex file: example/file/vertex_person.csv
marko,29,Beijing
vadas,27,Hongkong
josh,32,Beijing
peter,35,Shanghai
"li,nary",26,"Wu,han"
tom,null,NULL
Vertex file: example/file/vertex_software.txt
id|name|lang|price|ISBN
1|lop|java|328|ISBN978-7-107-18618-5
2|ripple|java|199|ISBN978-7-100-13678-5
Edge file: example/file/edge_knows.json
{"source_name": "marko", "target_name": "vadas", "date": "20160110", "weight": 0.5}
{"source_name": "marko", "target_name": "josh", "date": "20130220", "weight": 1.0}
Edge file: example/file/edge_created.json
{"aname": "marko", "bname": "lop", "date": "20171210", "weight": 0.4}
{"aname": "josh", "bname": "lop", "date": "20091111", "weight": 0.4}
{"aname": "josh", "bname": "ripple", "date": "20171210", "weight": 1.0}
{"aname": "peter", "bname": "lop", "date": "20170324", "weight": 0.2}
4.2 Write schema
Click to expand/collapse the schema file: example/file/schema.groovy
schema.propertyKey("name").asText().ifNotExist().create();
schema.propertyKey("age").asInt().ifNotExist().create();
schema.propertyKey("city").asText().ifNotExist().create();
schema.propertyKey("weight").asDouble().ifNotExist().create();
schema.propertyKey("lang").asText().ifNotExist().create();
schema.propertyKey("date").asText().ifNotExist().create();
schema.propertyKey("price").asDouble().ifNotExist().create();
schema.vertexLabel("person").properties("name", "age", "city").primaryKeys("name").ifNotExist().create();
schema.vertexLabel("software").properties("name", "lang", "price").primaryKeys("name").ifNotExist().create();
schema.indexLabel("personByAge").onV("person").by("age").range().ifNotExist().create();
schema.indexLabel("personByCity").onV("person").by("city").secondary().ifNotExist().create();
schema.indexLabel("personByAgeAndCity").onV("person").by("age", "city").secondary().ifNotExist().create();
schema.indexLabel("softwareByPrice").onV("software").by("price").range().ifNotExist().create();
schema.edgeLabel("knows").sourceLabel("person").targetLabel("person").properties("date", "weight").ifNotExist().create();
schema.edgeLabel("created").sourceLabel("person").targetLabel("software").properties("date", "weight").ifNotExist().create();
schema.indexLabel("createdByDate").onE("created").by("date").secondary().ifNotExist().create();
schema.indexLabel("createdByWeight").onE("created").by("weight").range().ifNotExist().create();
schema.indexLabel("knowsByWeight").onE("knows").by("weight").range().ifNotExist().create();
4.3 Write the input source mapping file example/file/struct.json
Click to expand/collapse the input source mapping file example/file/struct.json
{
"vertices": [
{
"label": "person",
"input": {
"type": "file",
"path": "example/file/vertex_person.csv",
"format": "CSV",
"header": ["name", "age", "city"],
"charset": "UTF-8",
"skipped_line": {
"regex": "(^#|^//).*"
}
},
"null_values": ["NULL", "null", ""]
},
{
"label": "software",
"input": {
"type": "file",
"path": "example/file/vertex_software.txt",
"format": "TEXT",
"delimiter": "|",
"charset": "GBK"
},
"id": "id",
"ignored": ["ISBN"]
}
],
"edges": [
{
"label": "knows",
"source": ["source_name"],
"target": ["target_name"],
"input": {
"type": "file",
"path": "example/file/edge_knows.json",
"format": "JSON",
"date_format": "yyyyMMdd"
},
"field_mapping": {
"source_name": "name",
"target_name": "name"
}
},
{
"label": "created",
"source": ["source_name"],
"target": ["target_id"],
"input": {
"type": "file",
"path": "example/file/edge_created.json",
"format": "JSON",
"date_format": "yyyy-MM-dd"
},
"field_mapping": {
"source_name": "name"
}
}
]
}
4.4 Command to import
sh bin/hugegraph-loader.sh -g hugegraph -f example/file/struct.json -s example/file/schema.groovy
After the import is complete, statistics similar to the following will appear:
vertices/edges has been loaded this time : 8/6
--------------------------------------------------
count metrics
input read success : 14
input read failure : 0
vertex parse success : 8
vertex parse failure : 0
vertex insert success : 8
vertex insert failure : 0
edge parse success : 6
edge parse failure : 0
edge insert success : 6
edge insert failure : 0
4.5 Use Docker to load data
4.5.1 Use docker exec to load data directly
4.5.1.1 Prepare data
If you just want to try out the loader, you can import the built-in example dataset without needing to prepare additional data yourself.
If using custom data, before importing data with the loader, we need to copy the data into the container.
First, following the steps in 4.1–4.3, we can prepare the data and then use docker cp
to copy the prepared data into the loader container.
Suppose we’ve prepared the corresponding dataset following the above steps, stored in the hugegraph-dataset
folder with the following file structure:
tree -f hugegraph-dataset/
hugegraph-dataset
├── hugegraph-dataset/edge_created.json
├── hugegraph-dataset/edge_knows.json
├── hugegraph-dataset/schema.groovy
├── hugegraph-dataset/struct.json
├── hugegraph-dataset/vertex_person.csv
└── hugegraph-dataset/vertex_software.txt
Copy the files into the container.
docker cp hugegraph-dataset loader:/loader/dataset
docker exec -it loader ls /loader/dataset
edge_created.json edge_knows.json schema.groovy struct.json vertex_person.csv vertex_software.txt
4.5.1.2 Data loading
Taking the built-in example dataset as an example, we can use the following command to load the data.
If you need to import your custom dataset, you need to modify the paths for -f
(data script) and -s
(schema) configurations.
You can refer to 3.4.1-Parameter description for the rest of the parameters.
docker exec -it loader bin/hugegraph-loader.sh -g hugegraph -f example/file/struct.json -s example/file/schema.groovy -h server -p 8080
If loading a custom dataset, following the previous example, you would use:
docker exec -it loader bin/hugegraph-loader.sh -g hugegraph -f /loader/dataset/struct.json -s /loader/dataset/schema.groovy -h server -p 8080
If
loader
andserver
are in the same Docker network, you can specify-h {server_container_name}
; otherwise, you need to specify the IP of theserver
host (in our example,server_container_name
isserver
).
Then we can see the result:
HugeGraphLoader worked in NORMAL MODE
vertices/edges loaded this time : 8/6
--------------------------------------------------
count metrics
input read success : 14
input read failure : 0
vertex parse success : 8
vertex parse failure : 0
vertex insert success : 8
vertex insert failure : 0
edge parse success : 6
edge parse failure : 0
edge insert success : 6
edge insert failure : 0
--------------------------------------------------
meter metrics
total time : 0.199s
read time : 0.046s
load time : 0.153s
vertex load time : 0.077s
vertex load rate(vertices/s) : 103
edge load time : 0.112s
edge load rate(edges/s) : 53
You can also use curl
or hubble
to observe the import result. Here’s an example using curl
:
> curl "http://localhost:8080/graphs/hugegraph/graph/vertices" | gunzip
{"vertices":[{"id":1,"label":"software","type":"vertex","properties":{"name":"lop","lang":"java","price":328.0}},{"id":2,"label":"software","type":"vertex","properties":{"name":"ripple","lang":"java","price":199.0}},{"id":"1:tom","label":"person","type":"vertex","properties":{"name":"tom"}},{"id":"1:josh","label":"person","type":"vertex","properties":{"name":"josh","age":32,"city":"Beijing"}},{"id":"1:marko","label":"person","type":"vertex","properties":{"name":"marko","age":29,"city":"Beijing"}},{"id":"1:peter","label":"person","type":"vertex","properties":{"name":"peter","age":35,"city":"Shanghai"}},{"id":"1:vadas","label":"person","type":"vertex","properties":{"name":"vadas","age":27,"city":"Hongkong"}},{"id":"1:li,nary","label":"person","type":"vertex","properties":{"name":"li,nary","age":26,"city":"Wu,han"}}]}
If you want to check the import result of edges, you can use curl "http://localhost:8080/graphs/hugegraph/graph/edges" | gunzip
.
4.5.2 Enter the docker container to load data
Besides using docker exec
directly for data import, we can also enter the container for data loading. The basic process is similar to 4.5.1.
Enter the container by docker exec -it loader bash
and execute the command:
sh bin/hugegraph-loader.sh -g hugegraph -f example/file/struct.json -s example/file/schema.groovy -h server -p 8080
The results of the execution will be similar to those shown in 4.5.1.
4.6 Import data by spark-loader
Spark version: Spark 3+, other versions has not been tested.
HugeGraph Toolchain version: toolchain-1.0.0
The parameters of spark-loader
are divided into two parts. Note: Because the abbreviations of
these two-parameter names have overlapping parts, please use the full name of the parameter.
And there is no need to guarantee the order between the two parameters.
- hugegraph parameters (Reference: hugegraph-loader parameter description )
- Spark task submission parameters (Reference: Submitting Applications)
Example:
sh bin/hugegraph-spark-loader.sh --master yarn \
--deploy-mode cluster --name spark-hugegraph-loader --file ./hugegraph.json \
--username admin --token admin --host xx.xx.xx.xx --port 8093 \
--graph graph-test --num-executors 6 --executor-cores 16 --executor-memory 15g
3.3 - HugeGraph-Hubble Quick Start
1 HugeGraph-Hubble Overview
Note: The current version of Hubble has not yet added Auth/Login related interfaces and standalone protection, it will be added in the next Release version (> 1.5). Please be careful not to expose it in a public network environment or untrusted networks to avoid related SEC issues (you can also use IP & port whitelist + HTTPS)
HugeGraph-Hubble is HugeGraph’s one-stop visual analysis platform. The platform covers the whole process from data modeling, to efficient data import, to real-time and offline analysis of data, and unified management of graphs, realizing the whole process wizard of graph application. It is designed to improve the user’s use fluency, lower the user’s use threshold, and provide a more efficient and easy-to-use user experience.
The platform mainly includes the following modules:
Graph Management
The graph management module realizes the unified management of multiple graphs and graph access, editing, deletion, and query by creating graph and connecting the platform and graph data.
Metadata Modeling
The metadata modeling module realizes the construction and management of graph models by creating attribute libraries, vertex types, edge types, and index types. The platform provides two modes, list mode and graph mode, which can display the metadata model in real time, which is more intuitive. At the same time, it also provides a metadata reuse function across graphs, which saves the tedious and repetitive creation process of the same metadata, greatly improves modeling efficiency and enhances ease of use.
Graph Analysis
By inputting the graph traversal language Gremlin, high-performance general analysis of graph data can be realized, and functions such as customized multidimensional path query of vertices can be provided, and three kinds of graph result display methods are provided, including: graph form, table form, Json form, and multidimensional display. The data form meets the needs of various scenarios used by users. It provides functions such as running records and collection of common statements, realizing the traceability of graph operations, and the reuse and sharing of query input, which is fast and efficient. It supports the export of graph data, and the export format is JSON format.
Task Management
For Gremlin tasks that need to traverse the whole graph, index creation and reconstruction, and other time-consuming asynchronous tasks, the platform provides corresponding task management functions to achieve unified management and result viewing of asynchronous tasks.
Data Import
“Note: The data import function is currently suitable for preliminary use. For formal data import, please use hugegraph-loader, which has much better performance, stability, and functionality.”
Data import is to convert the user’s business data into the vertices and edges of the graph and insert it into the graph database. The platform provides a wizard-style visual import module. By creating import tasks, the management of import tasks and the parallel operation of multiple import tasks are realized. Improve import performance. After entering the import task, you only need to follow the platform step prompts, upload files as needed, and fill in the content to easily implement the import process of graph data. At the same time, it supports breakpoint resuming, error retry mechanism, etc., which reduces import costs and improves efficiency.
2 Deploy
There are three ways to deploy hugegraph-hubble
- Use Docker (Convenient for Test/Dev)
- Download the Toolchain binary package
- Source code compilation
2.1 Use docker (Convenient for Test/Dev)
Special Note: If you are starting
hubble
with Docker, andhubble
and the server are on the same host. When configuring the hostname for the graph on the Hubble web page, please do not directly set it tolocalhost/127.0.0.1
. This will refer to thehubble
container internally rather than the host machine, resulting in a connection failure to the server.If
hubble
andserver
is in the same docker network, we recommend using thecontainer_name
(in our example, it isserver
) as the hostname, and8080
as the port. Or you can use the host IP as the hostname, and the port is configured by the host for the server.
We can use docker run -itd --name=hubble -p 8088:8088 hugegraph/hubble:1.5.0
to quick start hubble.
Alternatively, you can use Docker Compose to start hubble
. Additionally, if hubble
and the graph are in the same Docker network, you can access the graph using the container name of the graph, eliminating the need for the host machine’s IP address.
Use docker-compose up -d
,docker-compose.yml
is following:
version: '3'
services:
server:
image: hugegraph/hugegraph:1.5.0
container_name: server
ports:
- 8080:8080
hubble:
image: hugegraph/hubble:1.5.0
container_name: hubble
ports:
- 8088:8088
Note:
The docker image of hugegraph-hubble is a convenience release to start hugegraph-hubble quickly, but not official distribution artifacts. You can find more details from ASF Release Distribution Policy.
Recommend to use
release tag
(like1.5.0
) for the stable version. Uselatest
tag to experience the newest functions in development.
2.2 Download the Toolchain binary package
hubble
is in the toolchain
project. First, download the binary tar tarball
wget https://downloads.apache.org/incubator/hugegraph/{version}/apache-hugegraph-toolchain-incubating-{version}.tar.gz
tar -xvf apache-hugegraph-toolchain-incubating-{version}.tar.gz
cd apache-hugegraph-toolchain-incubating-{version}.tar.gz/apache-hugegraph-hubble-incubating-{version}
Run hubble
bin/start-hubble.sh
Then, we can see:
starting HugeGraphHubble ..............timed out with http status 502
2023-08-30 20:38:34 [main] [INFO ] o.a.h.HugeGraphHubble [] - Starting HugeGraphHubble v1.0.0 on cpu05 with PID xxx (~/apache-hugegraph-toolchain-incubating-1.0.0/apache-hugegraph-hubble-incubating-1.0.0/lib/hubble-be-1.0.0.jar started by $USER in ~/apache-hugegraph-toolchain-incubating-1.0.0/apache-hugegraph-hubble-incubating-1.0.0)
...
2023-08-30 20:38:38 [main] [INFO ] c.z.h.HikariDataSource [] - hugegraph-hubble-HikariCP - Start completed.
2023-08-30 20:38:41 [main] [INFO ] o.a.c.h.Http11NioProtocol [] - Starting ProtocolHandler ["http-nio-0.0.0.0-8088"]
2023-08-30 20:38:41 [main] [INFO ] o.a.h.HugeGraphHubble [] - Started HugeGraphHubble in 7.379 seconds (JVM running for 8.499)
Then use a web browser to access ip:8088
and you can see the Hubble
page. You can stop the service using bin/stop-hubble.sh.
2.3 Source code compilation
Note: The plugin frontend-maven-plugin
has been added to hugegraph-hubble/hubble-be/pom.xml
. To compile hubble, you do not need to install Nodejs V16.x
and yarn
environment in your local environment in advance. You can directly execute the following steps.
Download the toolchain source code.
git clone https://github.com/apache/hugegraph-toolchain.git
Compile hubble
. It depends on the loader and client, so you need to build these dependencies in advance during the compilation process (you can skip this step later).
cd hugegraph-toolchain
sudo pip install -r hugegraph-hubble/hubble-dist/assembly/travis/requirements.txt
mvn install -pl hugegraph-client,hugegraph-loader -am -Dmaven.javadoc.skip=true -DskipTests -ntp
cd hugegraph-hubble
mvn -e compile package -Dmaven.javadoc.skip=true -Dmaven.test.skip=true -ntp
cd apache-hugegraph-hubble-incubating*
Run hubble
bin/start-hubble.sh -d
3 Platform Workflows
The module usage process of the platform is as follows:
4 Platform Instructions
4.1 Graph Management
4.1.1 Graph creation
Under the graph management module, click [Create graph], and realize the connection of multiple graphs by filling in the graph ID, graph name, host name, port number, username, and password information.
Create graph by filling in the content as follows:
Special Note: If you are starting
hubble
with Docker, andhubble
and the server are on the same host. When configuring the hostname for the graph on the Hubble web page, please do not directly set it tolocalhost/127.0.0.1
. Ifhubble
andserver
is in the same docker network, we recommend using thecontainer_name
(in our example, it isgraph
) as the hostname, and8080
as the port. Or you can use the host IP as the hostname, and the port is configured by the host for the server.
4.1.2 Graph Access
Realize the information access to the graph space. After entering, you can perform operations such as multidimensional query analysis, metadata management, data import, and algorithm analysis of the graph.
4.1.3 Graph management
- Users can achieve unified management of graphs through overview, search, and information editing and deletion of single graphs.
- Search range: You can search for the graph name and ID.
4.2 Metadata Modeling (list + graph mode)
4.2.1 Module entry
Left navigation:
4.2.2 Property type
4.2.2.1 Create type
- Fill in or select the attribute name, data type, and cardinality to complete the creation of the attribute.
- Created attributes can be used as attributes of vertex type and edge type.
List mode:
Graph mode:
4.2.2.2 Reuse
- The platform provides the [Reuse] function, which can directly reuse the metadata of other graphs.
- Select the graph ID that needs to be reused, and continue to select the attributes that need to be reused. After that, the platform will check whether there is a conflict. After passing, the metadata can be reused.
Select reuse items:
Check reuse items:
4.2.2.3 Management
- You can delete a single item or delete it in batches in the attribute list.
4.2.3 Vertex type
4.2.3.1 Create type
- Fill in or select the vertex type name, ID strategy, association attribute, primary key attribute, vertex style, content displayed below the vertex in the query result, and index information: including whether to create a type index, and the specific content of the attribute index, complete the vertex Type creation.
List mode:
Graph mode:
4.2.3.2 Reuse
- The multiplexing of vertex types will reuse the attributes and attribute indexes associated with this type together.
- The reuse method is similar to the property reuse, see 3.2.2.2.
4.2.3.3 Administration
Editing operations are available. The vertex style, association type, vertex display content, and attribute index can be edited, and the rest cannot be edited.
You can delete a single item or delete it in batches.
4.2.4 Edge Types
4.2.4.1 Create
- Fill in or select the edge type name, start point type, end point type, associated attributes, whether to allow multiple connections, edge style, content displayed below the edge in the query result, and index information: including whether to create a type index, and attribute index The specific content, complete the creation of the edge type.
List mode:
Graph mode:
4.2.4.2 Reuse
- The reuse of the edge type will reuse the start point type, end point type, associated attribute and attribute index of this type.
- The reuse method is similar to the property reuse, see 3.2.2.2.
4.2.4.3 Administration
- Editing operations are available. Edge styles, associated attributes, edge display content, and attribute indexes can be edited, and the rest cannot be edited, the same as the vertex type.
- You can delete a single item or delete it in batches.
4.2.5 Index Types
Displays vertex and edge indices for vertex types and edge types.
4.3 Data Import
Note:currently, we recommend to use hugegraph-loader to import data formally. The built-in import of
hubble
is used for testing and getting started.
The usage process of data import is as follows:
4.3.1 Module entrance
Left navigation:
4.3.2 Create task
- Fill in the task name and remarks (optional) to create an import task.
- Multiple import tasks can be created and imported in parallel.
4.3.3 Uploading files
- Upload the file that needs to be composed. The currently supported format is CSV, which will be updated continuously in the future.
- Multiple files can be uploaded at the same time.
4.3.4 Setting up data mapping
Set up data mapping for uploaded files, including file settings and type settings
File settings: Check or fill in whether to include the header, separator, encoding format and other settings of the file itself, all set the default values, no need to fill in manually
Type setting:
Vertex map and edge map:
【Vertex Type】: Select the vertex type, and upload the column data in the file for its ID mapping;
【Edge Type】: Select the edge type and map the column data of the uploaded file to the ID column of its start point type and end point type;
Mapping settings: upload the column data in the file for the attribute mapping of the selected vertex type. Here, if the attribute name is the same as the header name of the file, the mapping attribute can be automatically matched, and there is no need to manually fill in the selection.
After completing the setting, the setting list will be displayed before proceeding to the next step. It supports the operations of adding, editing and deleting mappings.
Fill in the settings map:
Mapping list:
4.3.5 Import data
Before importing, you need to fill in the import setting parameters. After filling in, you can start importing data into the gallery.
- Import settings
- The import setting parameter items are as shown in the figure below, all set the default value, no need to fill in manually
- Import details
- Click Start Import to start the file import task
- The import details provide the mapping type, import speed, import progress, time-consuming and the specific status of the current task set for each uploaded file, and can pause, resume, stop and other operations for each task
- If the import fails, you can view the specific reason
4.4 Data Analysis
4.4.1 Module entry
Left navigation:
4.4.2 Multi-graphs switching
By switching the entrance on the left, flexibly switch the operation space of multiple graphs
4.4.3 Graph Analysis and Processing
HugeGraph supports Gremlin, a graph traversal query language of Apache TinkerPop3. Gremlin is a general graph database query language. By entering Gremlin statements and clicking execute, you can perform query and analysis operations on graph data, and create and delete vertices/edges. vertex/edge attribute modification, etc.
After Gremlin query, below is the graph result display area, which provides 3 kinds of graph result display modes: [Graph Mode], [Table Mode], [Json Mode].
Support zoom, center, full screen, export and other operations.
【Picture Mode】
【Table mode】
【Json mode】
4.4.4 Data Details
Click the vertex/edge entity to view the data details of the vertex/edge, including vertex/edge type, vertex ID, attribute and corresponding value, expand the information display dimension of the graph, and improve the usability.
4.4.5 Multidimensional Path Query of Graph Results
In addition to the global query, an in-depth customized query and hidden operations can be performed for the vertices in the query result to realize customized mining of graph results.
Right-click a vertex, and the menu entry of the vertex appears, which can be displayed, inquired, hidden, etc.
- Expand: Click to display the vertices associated with the selected point.
- Query: By selecting the edge type and edge direction associated with the selected point, and then selecting its attributes and corresponding filtering rules under this condition, a customized path display can be realized.
- Hide: When clicked, hides the selected point and its associated edges.
Double-clicking a vertex also displays the vertex associated with the selected point.
4.4.6 Add vertex/edge
4.4.6.1 Added vertex
In the graph area, two entries can be used to dynamically add vertices, as follows:
- Click on the graph area panel, the Add Vertex entry appears
- Click the first icon in the action bar in the upper right corner
Complete the addition of vertices by selecting or filling in the vertex type, ID value, and attribute information.
The entry is as follows:
Add the vertex content as follows:
4.4.6.2 Add edge
Right-click a vertex in the graph result to add the outgoing or incoming edge of that point.
4.4.7 Execute the query of records and favorites
- Record each query record at the bottom of the graph area, including: query time, execution type, content, status, time-consuming, as well as [collection] and [load] operations, to achieve a comprehensive record of graph execution, with traces to follow, and Can quickly load and reuse execution content
- Provides the function of collecting sentences, which can be used to collect frequently used sentences, which is convenient for fast calling of high-frequency sentences.
4.5 Task Management
4.5.1 Module entry
Left navigation:
4.5.2 Task Management
- Provide unified management and result viewing of asynchronous tasks. There are 4 types of asynchronous tasks, namely:
- gremlin: Gremlin tasks
- algorithm: OLAP algorithm task
- remove_schema: remove metadata
- rebuild_index: rebuild the index
- The list displays the asynchronous task information of the current graph, including task ID, task name, task type, creation time, time-consuming, status, operation, and realizes the management of asynchronous tasks.
- Support filtering by task type and status
- Support searching for task ID and task name
- Asynchronous tasks can be deleted or deleted in batches
4.5.3 Gremlin asynchronous tasks
- Create a task
- The data analysis module currently supports two Gremlin operations, Gremlin query and Gremlin task; if the user switches to the Gremlin task, after clicking execute, an asynchronous task will be created in the asynchronous task center;
- Task submission
- After the task is submitted successfully, the graph area returns the submission result and task ID
- Mission details
- Provide [View] entry, you can jump to the task details to view the specific execution of the current task After jumping to the task center, the currently executing task line will be displayed directly
Click to view the entry to jump to the task management list, as follows:
- View the results
- The results are displayed in the form of JSON
4.5.4 OLAP algorithm tasks
There is no visual OLAP algorithm execution on Hubble. You can call the RESTful API to perform OLAP algorithm tasks, find the corresponding tasks by ID in the task management, and view the progress and results.
4.5.5 Delete metadata, rebuild index
- Create a task
- In the metadata modeling module, when deleting metadata, an asynchronous task for deleting metadata can be created
- When editing an existing vertex/edge type operation, when adding an index, an asynchronous task of creating an index can be created
- Task details
- After confirming/saving, you can jump to the task center to view the details of the current task
3.4 - HugeGraph-Client Quick Start
1 Overview Of Hugegraph
HugeGraph-Client sends HTTP request to HugeGraph-Server to get and parse the execution result of Server. We support HugeGraph-Client for Java/Go/Python language. You can use Client-API to write code to operate HugeGraph, such as adding, deleting, modifying, and querying schema and graph data, or executing gremlin statements.
HugeGraph client SDK tool based on Go language (version >=1.2.0)
2 What You Need
- Java 11 (also supports Java 8)
- Maven 3.5+
3 How To Use
The basic steps to use HugeGraph-Client are as follows:
- Build a new Maven project by IDEA or Eclipse
- Add HugeGraph-Client dependency in a pom file;
- Create an object to invoke the interface of HugeGraph-Client
See the complete example in the following section for the detail.
4 Complete Example
4.1 Build New Maven Project
Using IDEA or Eclipse to create the project:
4.2 Add Hugegraph-Client Dependency In POM
<dependencies>
<dependency>
<groupId>org.apache.hugegraph</groupId>
<artifactId>hugegraph-client</artifactId>
<!-- Update to the latest release version -->
<version>1.5.0</version>
</dependency>
</dependencies>
Note: The versions of all graph components remain consistent
4.3 Example
4.3.1 SingleExample
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import org.apache.hugegraph.driver.GraphManager;
import org.apache.hugegraph.driver.GremlinManager;
import org.apache.hugegraph.driver.HugeClient;
import org.apache.hugegraph.driver.SchemaManager;
import org.apache.hugegraph.structure.constant.T;
import org.apache.hugegraph.structure.graph.Edge;
import org.apache.hugegraph.structure.graph.Path;
import org.apache.hugegraph.structure.graph.Vertex;
import org.apache.hugegraph.structure.gremlin.Result;
import org.apache.hugegraph.structure.gremlin.ResultSet;
public class SingleExample {
public static void main(String[] args) throws IOException {
// If connect failed will throw a exception.
HugeClient hugeClient = HugeClient.builder("http://localhost:8080",
"hugegraph")
.build();
SchemaManager schema = hugeClient.schema();
schema.propertyKey("name").asText().ifNotExist().create();
schema.propertyKey("age").asInt().ifNotExist().create();
schema.propertyKey("city").asText().ifNotExist().create();
schema.propertyKey("weight").asDouble().ifNotExist().create();
schema.propertyKey("lang").asText().ifNotExist().create();
schema.propertyKey("date").asDate().ifNotExist().create();
schema.propertyKey("price").asInt().ifNotExist().create();
schema.vertexLabel("person")
.properties("name", "age", "city")
.primaryKeys("name")
.ifNotExist()
.create();
schema.vertexLabel("software")
.properties("name", "lang", "price")
.primaryKeys("name")
.ifNotExist()
.create();
schema.indexLabel("personByCity")
.onV("person")
.by("city")
.secondary()
.ifNotExist()
.create();
schema.indexLabel("personByAgeAndCity")
.onV("person")
.by("age", "city")
.secondary()
.ifNotExist()
.create();
schema.indexLabel("softwareByPrice")
.onV("software")
.by("price")
.range()
.ifNotExist()
.create();
schema.edgeLabel("knows")
.sourceLabel("person")
.targetLabel("person")
.properties("date", "weight")
.ifNotExist()
.create();
schema.edgeLabel("created")
.sourceLabel("person").targetLabel("software")
.properties("date", "weight")
.ifNotExist()
.create();
schema.indexLabel("createdByDate")
.onE("created")
.by("date")
.secondary()
.ifNotExist()
.create();
schema.indexLabel("createdByWeight")
.onE("created")
.by("weight")
.range()
.ifNotExist()
.create();
schema.indexLabel("knowsByWeight")
.onE("knows")
.by("weight")
.range()
.ifNotExist()
.create();
GraphManager graph = hugeClient.graph();
Vertex marko = graph.addVertex(T.LABEL, "person", "name", "marko",
"age", 29, "city", "Beijing");
Vertex vadas = graph.addVertex(T.LABEL, "person", "name", "vadas",
"age", 27, "city", "Hongkong");
Vertex lop = graph.addVertex(T.LABEL, "software", "name", "lop",
"lang", "java", "price", 328);
Vertex josh = graph.addVertex(T.LABEL, "person", "name", "josh",
"age", 32, "city", "Beijing");
Vertex ripple = graph.addVertex(T.LABEL, "software", "name", "ripple",
"lang", "java", "price", 199);
Vertex peter = graph.addVertex(T.LABEL, "person", "name", "peter",
"age", 35, "city", "Shanghai");
marko.addEdge("knows", vadas, "date", "2016-01-10", "weight", 0.5);
marko.addEdge("knows", josh, "date", "2013-02-20", "weight", 1.0);
marko.addEdge("created", lop, "date", "2017-12-10", "weight", 0.4);
josh.addEdge("created", lop, "date", "2009-11-11", "weight", 0.4);
josh.addEdge("created", ripple, "date", "2017-12-10", "weight", 1.0);
peter.addEdge("created", lop, "date", "2017-03-24", "weight", 0.2);
GremlinManager gremlin = hugeClient.gremlin();
System.out.println("==== Path ====");
ResultSet resultSet = gremlin.gremlin("g.V().outE().path()").execute();
Iterator<Result> results = resultSet.iterator();
results.forEachRemaining(result -> {
System.out.println(result.getObject().getClass());
Object object = result.getObject();
if (object instanceof Vertex) {
System.out.println(((Vertex) object).id());
} else if (object instanceof Edge) {
System.out.println(((Edge) object).id());
} else if (object instanceof Path) {
List<Object> elements = ((Path) object).objects();
elements.forEach(element -> {
System.out.println(element.getClass());
System.out.println(element);
});
} else {
System.out.println(object);
}
});
hugeClient.close();
}
}
4.3.2 BatchExample
import java.util.ArrayList;
import java.util.List;
import org.apache.hugegraph.driver.GraphManager;
import org.apache.hugegraph.driver.HugeClient;
import org.apache.hugegraph.driver.SchemaManager;
import org.apache.hugegraph.structure.graph.Edge;
import org.apache.hugegraph.structure.graph.Vertex;
public class BatchExample {
public static void main(String[] args) {
// If connect failed will throw a exception.
HugeClient hugeClient = HugeClient.builder("http://localhost:8080",
"hugegraph")
.build();
SchemaManager schema = hugeClient.schema();
schema.propertyKey("name").asText().ifNotExist().create();
schema.propertyKey("age").asInt().ifNotExist().create();
schema.propertyKey("lang").asText().ifNotExist().create();
schema.propertyKey("date").asDate().ifNotExist().create();
schema.propertyKey("price").asInt().ifNotExist().create();
schema.vertexLabel("person")
.properties("name", "age")
.primaryKeys("name")
.ifNotExist()
.create();
schema.vertexLabel("person")
.properties("price")
.nullableKeys("price")
.append();
schema.vertexLabel("software")
.properties("name", "lang", "price")
.primaryKeys("name")
.ifNotExist()
.create();
schema.indexLabel("softwareByPrice")
.onV("software").by("price")
.range()
.ifNotExist()
.create();
schema.edgeLabel("knows")
.link("person", "person")
.properties("date")
.ifNotExist()
.create();
schema.edgeLabel("created")
.link("person", "software")
.properties("date")
.ifNotExist()
.create();
schema.indexLabel("createdByDate")
.onE("created").by("date")
.secondary()
.ifNotExist()
.create();
// get schema object by name
System.out.println(schema.getPropertyKey("name"));
System.out.println(schema.getVertexLabel("person"));
System.out.println(schema.getEdgeLabel("knows"));
System.out.println(schema.getIndexLabel("createdByDate"));
// list all schema objects
System.out.println(schema.getPropertyKeys());
System.out.println(schema.getVertexLabels());
System.out.println(schema.getEdgeLabels());
System.out.println(schema.getIndexLabels());
GraphManager graph = hugeClient.graph();
Vertex marko = new Vertex("person").property("name", "marko")
.property("age", 29);
Vertex vadas = new Vertex("person").property("name", "vadas")
.property("age", 27);
Vertex lop = new Vertex("software").property("name", "lop")
.property("lang", "java")
.property("price", 328);
Vertex josh = new Vertex("person").property("name", "josh")
.property("age", 32);
Vertex ripple = new Vertex("software").property("name", "ripple")
.property("lang", "java")
.property("price", 199);
Vertex peter = new Vertex("person").property("name", "peter")
.property("age", 35);
Edge markoKnowsVadas = new Edge("knows").source(marko).target(vadas)
.property("date", "2016-01-10");
Edge markoKnowsJosh = new Edge("knows").source(marko).target(josh)
.property("date", "2013-02-20");
Edge markoCreateLop = new Edge("created").source(marko).target(lop)
.property("date",
"2017-12-10");
Edge joshCreateRipple = new Edge("created").source(josh).target(ripple)
.property("date",
"2017-12-10");
Edge joshCreateLop = new Edge("created").source(josh).target(lop)
.property("date", "2009-11-11");
Edge peterCreateLop = new Edge("created").source(peter).target(lop)
.property("date",
"2017-03-24");
List<Vertex> vertices = new ArrayList<>();
vertices.add(marko);
vertices.add(vadas);
vertices.add(lop);
vertices.add(josh);
vertices.add(ripple);
vertices.add(peter);
List<Edge> edges = new ArrayList<>();
edges.add(markoKnowsVadas);
edges.add(markoKnowsJosh);
edges.add(markoCreateLop);
edges.add(joshCreateRipple);
edges.add(joshCreateLop);
edges.add(peterCreateLop);
vertices = graph.addVertices(vertices);
vertices.forEach(vertex -> System.out.println(vertex));
edges = graph.addEdges(edges, false);
edges.forEach(edge -> System.out.println(edge));
hugeClient.close();
}
}
4.4 Run The Example
Before running Example, you need to start the Server. For the startup process, seeHugeGraph-Server Quick Start.
4.5 More Information About Client-API
3.5 - HugeGraph-AI Quick Start
1 HugeGraph-AI Overview
hugegraph-ai aims to explore the integration of HugeGraph and artificial intelligence (AI), including applications combined with large models, integration with graph machine learning components, etc., to provide comprehensive support for developers to use HugeGraph’s AI capabilities in projects.
2 Environment Requirements
- python 3.9+
- hugegraph-server 1.2+
3 Preparation
Start the HugeGraph database, you can run it via Docker/Binary Package.
Refer to detailed doc for more guidanceClone this project
git clone https://github.com/apache/incubator-hugegraph-ai.git
Install hugegraph-python-client and hugegraph_llm
cd ./incubator-hugegraph-ai # better to use virtualenv (source venv/bin/activate) pip install ./hugegraph-python-client pip install -r ./hugegraph-llm/requirements.txt
Enter the project directory
cd ./hugegraph-llm/src
Start the gradio interactive demo of Graph RAG, you can run with the following command, and open http://127.0.0.1:8001 after starting
python3 -m hugegraph_llm.demo.rag_demo.app
The default host is
0.0.0.0
and the port is8001
. You can change them by passing command line arguments--host
and--port
.python3 -m hugegraph_llm.demo.rag_demo.app --host 127.0.0.1 --port 18001
Or start the gradio interactive demo of Text2Gremlin, you can run with the following command, and open http://127.0.0.1:8002 after starting. You can also change the default host
0.0.0.0
and port8002
as above. (🚧ing)python3 -m hugegraph_llm.demo.gremlin_generate_web_demo
After running the web demo, the config file
.env
will be automatically generated at the pathhugegraph-llm/.env
. Additionally, a prompt-related configuration fileconfig_prompt.yaml
will also be generated at the pathhugegraph-llm/src/hugegraph_llm/resources/demo/config_prompt.yaml
.You can modify the content on the web page, and it will be automatically saved to the configuration file after the corresponding feature is triggered. You can also modify the file directly without restarting the web application; refresh the page to load your latest changes.
(Optional)To regenerate the config file, you can use
config.generate
with-u
or--update
.python3 -m hugegraph_llm.config.generate --update
(Optional) You could use hugegraph-hubble to visit the graph data, could run it via Docker/Docker-Compose for guidance. (Hubble is a graph-analysis dashboard include data loading/schema management/graph traverser/display).
(Optional) offline download NLTK stopwords
python ./hugegraph_llm/operators/common_op/nltk_helper.py
4 Examples
4.1 Build a knowledge graph in HugeGraph through LLM
4.1.1 Build a knowledge graph through the gradio interactive interface
Parameter description:
- Docs:
- text: Build rag index from plain text
- file: Upload file(s) which should be TXT or .docx (Multiple files can be selected together)
- Schema: (Except 2 types)
- User-defined Schema (JSON format, follow the template to modify it)
- Specify the name of the HugeGraph graph instance, it will automatically get the schema from it (like “hugegraph”)
- Graph extract head: The user-defined prompt of graph extracting
- If it already exists the graph data, you should click “Rebuild vid Index” to update the index
4.1.2 Build a knowledge graph through code
The KgBuilder
class is used to construct a knowledge graph. Here is a brief usage guide:
- Initialization: The
KgBuilder
class is initialized with an instance of a language model. This can be obtained from theLLMs
class.
Initialize the LLMs instance, get the LLM, and then create a task instanceKgBuilder
for graph construction.KgBuilder
defines multiple operators, and users can freely combine them according to their needs. (tip:print_result()
can print the result of each step in the console, without affecting the overall execution logic)from hugegraph_llm.models.llms.init_llm import LLMs from hugegraph_llm.operators.kg_construction_task import KgBuilder TEXT = "" builder = KgBuilder(LLMs().get_llm()) ( builder .import_schema(from_hugegraph="talent_graph").print_result() .chunk_split(TEXT).print_result() .extract_info(extract_type="property_graph").print_result() .commit_to_hugegraph() .run() )
- Import Schema: The
import_schema
method is used to import a schema from a source. The source can be a HugeGraph instance, a user-defined schema or an extraction result. The methodprint_result
can be chained to print the result.# Import schema from a HugeGraph instance builder.import_schema(from_hugegraph="xxx").print_result() # Import schema from an extraction result builder.import_schema(from_extraction="xxx").print_result() # Import schema from user-defined schema builder.import_schema(from_user_defined="xxx").print_result()
- Chunk Split: The
chunk_split
method is used to split the input text into chunks. The text should be passed as a string argument to the method.# Split the input text into documents builder.chunk_split(TEXT, split_type="document").print_result() # Split the input text into paragraphs builder.chunk_split(TEXT, split_type="paragraph").print_result() # Split the input text into sentences builder.chunk_split(TEXT, split_type="sentence").print_result()
- Extract Info: The
extract_info
method is used to extract info from a text. The text should be passed as a string argument to the method.TEXT = "Meet Sarah, a 30-year-old attorney, and her roommate, James, whom she's shared a home with since 2010." # extract property graph from the input text builder.extract_info(extract_type="property_graph").print_result() # extract triples from the input text builder.extract_info(extract_type="property_graph").print_result()
- Commit to HugeGraph: The
commit_to_hugegraph
method is used to commit the constructed knowledge graph to a HugeGraph instance.builder.commit_to_hugegraph().print_result()
- Run: The
run
method is used to execute the chained operations.The methods of thebuilder.run()
KgBuilder
class can be chained together to perform a sequence of operations.
4.2 Retrieval augmented generation (RAG) based on HugeGraph
The RAGPipeline
class is used to integrate HugeGraph with large language models to provide retrieval-augmented generation capabilities.
Here is a brief usage guide:
- Extract Keyword: Extract keywords and expand synonyms.
from hugegraph_llm.operators.graph_rag_task import RAGPipeline graph_rag = RAGPipeline() graph_rag.extract_keywords(text="Tell me about Al Pacino.").print_result()
- Match Vid from Keywords: Match the nodes with the keywords in the graph.
graph_rag.keywords_to_vid().print_result()
- Query Graph for Rag: Retrieve the corresponding keywords and their multi-degree associated relationships from HugeGraph.
graph_rag.query_graphdb(max_deep=2, max_items=30).print_result()
- Rerank Searched Result: Rerank the searched results based on the similarity between the question and the results.
graph_rag.merge_dedup_rerank().print_result()
- Synthesize Answer: Summarize the results and organize the language to answer the question.
graph_rag.synthesize_answer(vector_only_answer=False, graph_only_answer=True).print_result()
- Run: The
run
method is used to execute the above operations.graph_rag.run(verbose=True)
3.6 - HugeGraph-Tools Quick Start
1 HugeGraph-Tools Overview
HugeGraph-Tools is an automated deployment, management and backup/restore component of HugeGraph.
2 Get HugeGraph-Tools
There are two ways to get HugeGraph-Tools:
- Download the compiled tarball
- Clone source code then compile and install
2.1 Download the compiled archive
Download the latest version of the HugeGraph-Toolchain package:
wget https://downloads.apache.org/incubator/hugegraph/1.0.0/apache-hugegraph-toolchain-incubating-1.0.0.tar.gz
tar zxf *hugegraph*.tar.gz
2.2 Clone source code to compile and install
Please ensure that the wget command is installed before compiling the source code
Download the latest version of the HugeGraph-Tools source package:
# 1. get from github
git clone https://github.com/apache/hugegraph-toolchain.git
# 2. get from direct (e.g. here is 1.0.0, please choose the latest version)
wget https://downloads.apache.org/incubator/hugegraph/1.0.0/apache-hugegraph-toolchain-incubating-1.0.0-src.tar.gz
Compile and generate tar package:
cd hugegraph-tools
mvn package -DskipTests
Generate tar package hugegraph-tools-${version}.tar.gz
3 How to use
3.1 Function overview
After decompression, enter the hugegraph-tools directory, you can use bin/hugegraph
or bin/hugegraph help
to view the usage information. mainly divided:
- Graph management Type,graph-mode-set、graph-mode-get、graph-list、graph-get and graph-clear
- Asynchronous task management Type,task-list、task-get、task-delete、task-cancel and task-clear
- Gremlin Type,gremlin-execute and gremlin-schedule
- Backup/Restore Type,backup、restore、migrate、schedule-backup and dump
- Install the deployment Type,deploy、clear、start-all and stop-all
Usage: hugegraph [options] [command] [command options]
3.2 [options]-Global Variable
options
is a global variable of HugeGraph-Tools, which can be configured in hugegraph-tools/bin/hugegraph, including:
- –graph,HugeGraph-Tools The name of the graph to operate on, the default value is hugegraph
- –url,The service address of HugeGraph-Server, the default is http://127.0.0.1:8080
- –user,When HugeGraph-Server opens authentication, pass username
- –password,When HugeGraph-Server opens authentication, pass the user’s password
- –timeout,Timeout when connecting to HugeGraph-Server, the default is 30s
- –trust-store-file,The path of the certificate file, when –url uses https, the truststore file used by HugeGraph-Client, the default is empty, which means using the built-in truststore file conf/hugegraph.truststore of hugegraph-tools
- –trust-store-password,The password of the certificate file, when –url uses https, the password of the truststore used by HugeGraph-Client, the default is empty, representing the password of the built-in truststore file of hugegraph-tools
The above global variables can also be set through environment variables. One way is to use export on the command line to set temporary environment variables, which are valid until the command line is closed
Global Variable | Environment Variable | Example |
---|---|---|
–url | HUGEGRAPH_URL | export HUGEGRAPH_URL=http://127.0.0.1:8080 |
–graph | HUGEGRAPH_GRAPH | export HUGEGRAPH_GRAPH=hugegraph |
–user | HUGEGRAPH_USERNAME | export HUGEGRAPH_USERNAME=admin |
–password | HUGEGRAPH_PASSWORD | export HUGEGRAPH_PASSWORD=test |
–timeout | HUGEGRAPH_TIMEOUT | export HUGEGRAPH_TIMEOUT=30 |
–trust-store-file | HUGEGRAPH_TRUST_STORE_FILE | export HUGEGRAPH_TRUST_STORE_FILE=/tmp/trust-store |
–trust-store-password | HUGEGRAPH_TRUST_STORE_PASSWORD | export HUGEGRAPH_TRUST_STORE_PASSWORD=xxxx |
Another way is to set the environment variable in the bin/hugegraph script:
#!/bin/bash
# Set environment here if needed
#export HUGEGRAPH_URL=
#export HUGEGRAPH_GRAPH=
#export HUGEGRAPH_USERNAME=
#export HUGEGRAPH_PASSWORD=
#export HUGEGRAPH_TIMEOUT=
#export HUGEGRAPH_TRUST_STORE_FILE=
#export HUGEGRAPH_TRUST_STORE_PASSWORD=
3.3 Graph Management Type,graph-mode-set、graph-mode-get、graph-list、graph-get and graph-clear
- graph-mode-set,set graph restore mode
- –graph-mode or -m, required, specifies the mode to be set, legal values include [NONE, RESTORING, MERGING, LOADING]
- graph-mode-get,get graph restore mode
- graph-list,list all graphs in a HugeGraph-Server
- graph-get,get a graph and its storage backend type
- graph-clear,clear all schema and data of a graph
- –confirm-message Or -c, required, delete confirmation information, manual input is required, double confirmation to prevent accidental deletion, “I’m sure to delete all data”, including double quotes
When you need to restore the backup graph to a new graph, you need to set the graph mode to RESTORING mode; when you need to merge the backup graph into an existing graph, you need to first set the graph mode to MERGING model.
3.4 Asynchronous task management Type,task-list、task-get and task-delete
- task-list,List the asynchronous tasks in a graph, which can be filtered according to the status of the tasks
- –status,Optional, specify the status of the task to view, i.e. filter tasks by status
- –limit,Optional, specify the number of tasks to be obtained, the default is -1, which means to obtain all eligible tasks
- task-get,Get detailed information about an asynchronous task
- –task-id,Required, specifies the ID of the asynchronous task
- task-delete,Delete information about an asynchronous task
- –task-id,Required, specifies the ID of the asynchronous task
- task-cancel,Cancel the execution of an asynchronous task
- –task-id,ID of the asynchronous task to cancel
- task-clear,Clean up completed asynchronous tasks
- –force,Optional. When set, it means to clean up all asynchronous tasks. Unfinished ones are canceled first, and then all asynchronous tasks are cleared. By default, only completed asynchronous tasks are cleaned up
3.5 Gremlin Type,gremlin-execute and gremlin-schedule
- gremlin-execute, send Gremlin statements to HugeGraph-Server to execute query or modification operations, execute synchronously, and return results after completion
- –file or -f, specify the script file to execute, UTF-8 encoding, mutually exclusive with –script
- –script or -s, specifies the script string to execute, mutually exclusive with –file
- –aliases or -a, Gremlin alias settings, the format is: key1=value1,key2=value2,…
- –bindings or -b, Gremlin binding settings, the format is: key1=value1,key2=value2,…
- –language or -l, the language of the Gremlin script, the default is gremlin-groovy
–file and –script are mutually exclusive, one of them must be set
- gremlin-schedule, send Gremlin statements to HugeGraph-Server to perform query or modification operations, asynchronous execution, and return the asynchronous task id immediately after the task is submitted
- –file or -f, specify the script file to execute, UTF-8 encoding, mutually exclusive with –script
- –script or -s, specifies the script string to execute, mutually exclusive with –file
- –bindings or -b, Gremlin binding settings, the format is: key1=value1,key2=value2,…
- –language or -l, the language of the Gremlin script, the default is gremlin-groovy
–file and –script are mutually exclusive, one of them must be set
3.6 Backup/Restore Type
- backup, back up the schema or data in a certain graph out of the HugeGraph system, and store it on the local disk or HDFS in the form of JSON
- –format, the backup format, optional values include [json, text], the default is json
- –all-properties, whether to back up all properties of vertices/edges, only valid when –format is text, default false
- –label, the type of vertices/edges to be backed up, only valid when –format is text, only valid when backing up vertices or edges
- –properties, properties of vertices/edges to be backed up, separated by commas, only valid when –format is text, valid only when backing up vertices or edges
- –compress, whether to compress data during backup, the default is true
- –directory or -d, the directory to store schema or data, the default is ‘./{graphName}’ for local directory, and ‘{fs.default.name}/{graphName}’ for HDFS
- –huge-types or -t, the data types to be backed up, separated by commas, the optional value is ‘all’ or a combination of one or more [vertex, edge, vertex_label, edge_label, property_key, index_label], ‘all’ Represents all 6 types, namely vertices, edges and all schemas
- –log or -l, specify the log directory, the default is the current directory
- –retry, specify the number of failed retries, the default is 3
- –split-size or -s, specifies the size of splitting vertices or edges when backing up, the default is 1048576
- -D, use the mode of -Dkey=value to specify dynamic parameters, and specify HDFS configuration items when backing up data to HDFS, for example: -Dfs.default.name=hdfs://localhost:9000
- restore, restore schema or data stored in JSON format to a new graph (RESTORING mode) or merge into an existing graph (MERGING mode)
- –directory or -d, the directory to store schema or data, the default is ‘./{graphName}’ for local directory, and ‘{fs.default.name}/{graphName}’ for HDFS
- –clean, whether to delete the directory specified by –directory after the recovery map is completed, the default is false
- –huge-types or -t, data types to restore, separated by commas, optional value is ‘all’ or a combination of one or more [vertex, edge, vertex_label, edge_label, property_key, index_label], ‘all’ Represents all 6 types, namely vertices, edges and all schemas
- –log or -l, specify the log directory, the default is the current directory
- –retry, specify the number of failed retries, the default is 3
- -D, use the mode of -Dkey=value to specify dynamic parameters, which are used to specify HDFS configuration items when restoring graphs from HDFS, for example: -Dfs.default.name=hdfs://localhost:9000
restore command can be used only if –format is executed as backup for json
- migrate, migrate the currently connected graph to another HugeGraphServer
- –target-graph, the name of the target graph, the default is hugegraph
- –target-url, the HugeGraphServer where the target graph is located, the default is http://127.0.0.1:8081
- –target-username, the username to access the target map
- –target-password, the password to access the target map
- –target-timeout, the timeout for accessing the target map
- –target-trust-store-file, access the truststore file used by the target graph
- –target-trust-store-password, the password to access the truststore used by the target map
- –directory or -d, during the migration process, the directory where the schema or data of the source graph is stored. For a local directory, the default is ‘./{graphName}’; for HDFS, the default is ‘{fs.default.name}/ {graphName}’
- –huge-types or -t, the data types to be migrated, separated by commas, the optional value is ‘all’ or a combination of one or more [vertex, edge, vertex_label, edge_label, property_key, index_label], ‘all’ Represents all 6 types, namely vertices, edges and all schemas
- –log or -l, specify the log directory, the default is the current directory
- –retry, specify the number of failed retries, the default is 3
- –split-size or -s, specify the size of the vertex or edge block when backing up the source graph during the migration process, the default is 1048576
- -D, use the mode of -Dkey=value to specify dynamic parameters, which are used to specify HDFS configuration items when the data needs to be backed up to HDFS during the migration process, for example: -Dfs.default.name=hdfs://localhost: 9000
- –graph-mode or -m, the mode to set the target graph when restoring the source graph to the target graph, legal values include [RESTORING, MERGING]
- –keep-local-data, whether to keep the backup of the source map generated in the process of migrating the map, the default is false, that is, the backup of the source map is not kept after the default migration map ends
- schedule-backup, periodically back up the graph and keep a certain number of the latest backups (currently only supports local file systems)
- –directory or -d, required, specifies the directory of the backup data
- –backup-num, optional, specifies the number of latest backups to save, defaults to 3
- –interval, an optional item, specifies the backup cycle, the format is the same as the Linux crontab format
- dump, export all the vertices and edges of the entire graph, and store them in
vertex vertex-edge1 vertex-edge2...
JSON format by default. Users can also customize the storage format, just need to be inhugegraph-tools/src/main/java/com/baidu/hugegraph/formatter
Implement a class inherited fromFormatter
in the directory, such asCustomFormatter
, and specify this class as formatter when using it, for examplebin/hugegraph dump -f CustomFormatter
- –formatter or -f, specify the formatter to use, the default is JsonFormatter
- –directory or -d, the directory where schema or data is stored, the default is the current directory
- –log or -l, specify the log directory, the default is the current directory
- –retry, specify the number of failed retries, the default is 3
- –split-size or -s, specifies the size of splitting vertices or edges when backing up, the default is 1048576
- -D, use the mode of -Dkey=value to specify dynamic parameters, and specify HDFS configuration items when backing up data to HDFS, for example: -Dfs.default.name=hdfs://localhost:9000
3.7 Install the deployment type
- deploy, one-click download, install and start HugeGraph-Server and HugeGraph-Studio
- -v, required, specifies the version number of HugeGraph-Server and HugeGraph-Studio installed, the latest is 0.9
- -p, required, specifies the installed HugeGraph-Server and HugeGraph-Studio directories
- -u, optional, specifies the link to download the HugeGraph-Server and HugeGraph-Studio compressed packages
- clear, clean up HugeGraph-Server and HugeGraph-Studio directories and tarballs
- -p, required, specifies the directory of HugeGraph-Server and HugeGraph-Studio to be cleaned
- start-all, start HugeGraph-Server and HugeGraph-Studio with one click, and start monitoring, automatically pull up the service when the service dies
- -v, required, specifies the version number of HugeGraph-Server and HugeGraph-Studio to be started, the latest is 0.9
- -p, required, specifies the directory where HugeGraph-Server and HugeGraph-Studio are installed
- stop-all, close HugeGraph-Server and HugeGraph-Studio with one click
There is an optional parameter -u in the deploy command. When provided, the specified download address will be used instead of the default download address to download the tar package, and the address will be written into the
~/hugegraph-download-url-prefix
file; if no address is specified later When -u and~/hugegraph-download-url-prefix
are not specified, the tar package will be downloaded from the address specified by~/hugegraph-download-url-prefix
; if there is neither -u nor~/hugegraph-download-url-prefix
, it will be downloaded from the default download address
3.8 Specific command parameters
The specific parameters of each subcommand are as follows:
Usage: hugegraph [options] [command] [command options]
Options:
--graph
Name of graph
Default: hugegraph
--password
Password of user
--timeout
Connection timeout
Default: 30
--trust-store-file
The path of client truststore file used when https protocol is enabled
--trust-store-password
The password of the client truststore file used when the https protocol
is enabled
--url
The URL of HugeGraph-Server
Default: http://127.0.0.1:8080
--user
Name of user
Commands:
graph-list List all graphs
Usage: graph-list
graph-get Get graph info
Usage: graph-get
graph-clear Clear graph schema and data
Usage: graph-clear [options]
Options:
* --confirm-message, -c
Confirm message of graph clear is "I'm sure to delete all data".
(Note: include "")
graph-mode-set Set graph mode
Usage: graph-mode-set [options]
Options:
* --graph-mode, -m
Graph mode, include: [NONE, RESTORING, MERGING]
Possible Values: [NONE, RESTORING, MERGING, LOADING]
graph-mode-get Get graph mode
Usage: graph-mode-get
task-list List tasks
Usage: task-list [options]
Options:
--limit
Limit number, no limit if not provided
Default: -1
--status
Status of task
task-get Get task info
Usage: task-get [options]
Options:
* --task-id
Task id
Default: 0
task-delete Delete task
Usage: task-delete [options]
Options:
* --task-id
Task id
Default: 0
task-cancel Cancel task
Usage: task-cancel [options]
Options:
* --task-id
Task id
Default: 0
task-clear Clear completed tasks
Usage: task-clear [options]
Options:
--force
Force to clear all tasks, cancel all uncompleted tasks firstly,
and delete all completed tasks
Default: false
gremlin-execute Execute Gremlin statements
Usage: gremlin-execute [options]
Options:
--aliases, -a
Gremlin aliases, valid format is: 'key1=value1,key2=value2...'
Default: {}
--bindings, -b
Gremlin bindings, valid format is: 'key1=value1,key2=value2...'
Default: {}
--file, -f
Gremlin Script file to be executed, UTF-8 encoded, exclusive to
--script
--language, -l
Gremlin script language
Default: gremlin-groovy
--script, -s
Gremlin script to be executed, exclusive to --file
gremlin-schedule Execute Gremlin statements as asynchronous job
Usage: gremlin-schedule [options]
Options:
--bindings, -b
Gremlin bindings, valid format is: 'key1=value1,key2=value2...'
Default: {}
--file, -f
Gremlin Script file to be executed, UTF-8 encoded, exclusive to
--script
--language, -l
Gremlin script language
Default: gremlin-groovy
--script, -s
Gremlin script to be executed, exclusive to --file
backup Backup graph schema/data. If directory is on HDFS, use -D to
set HDFS params. For exmaple:
-Dfs.default.name=hdfs://localhost:9000
Usage: backup [options]
Options:
--all-properties
All properties to be backup flag
Default: false
--compress
compress flag
Default: true
--directory, -d
Directory of graph schema/data, default is './{graphname}' in
local file system or '{fs.default.name}/{graphname}' in HDFS
--format
File format, valid is [json, text]
Default: json
--huge-types, -t
Type of schema/data. Concat with ',' if more than one. 'all' means
all vertices, edges and schema, in other words, 'all' equals with
'vertex,edge,vertex_label,edge_label,property_key,index_label'
Default: [PROPERTY_KEY, VERTEX_LABEL, EDGE_LABEL, INDEX_LABEL, VERTEX, EDGE]
--label
Vertex or edge label, only valid when type is vertex or edge
--log, -l
Directory of log
Default: ./logs
--properties
Vertex or edge properties to backup, only valid when type is
vertex or edge
Default: []
--retry
Retry times, default is 3
Default: 3
--split-size, -s
Split size of shard
Default: 1048576
-D
HDFS config parameters
Syntax: -Dkey=value
Default: {}
schedule-backup Schedule backup task
Usage: schedule-backup [options]
Options:
--backup-num
The number of latest backups to keep
Default: 3
* --directory, -d
The directory of backups stored
--interval
The interval of backup, format is: "a b c d e". 'a' means minute
(0 - 59), 'b' means hour (0 - 23), 'c' means day of month (1 -
31), 'd' means month (1 - 12), 'e' means day of week (0 - 6)
(Sunday=0), "*" means all
Default: "0 0 * * *"
dump Dump graph to files
Usage: dump [options]
Options:
--directory, -d
Directory of graph schema/data, default is './{graphname}' in
local file system or '{fs.default.name}/{graphname}' in HDFS
--formatter, -f
Formatter to customize format of vertex/edge
Default: JsonFormatter
--log, -l
Directory of log
Default: ./logs
--retry
Retry times, default is 3
Default: 3
--split-size, -s
Split size of shard
Default: 1048576
-D
HDFS config parameters
Syntax: -Dkey=value
Default: {}
restore Restore graph schema/data. If directory is on HDFS, use -D to
set HDFS params if needed. For
exmaple:-Dfs.default.name=hdfs://localhost:9000
Usage: restore [options]
Options:
--clean
Whether to remove the directory of graph data after restored
Default: false
--directory, -d
Directory of graph schema/data, default is './{graphname}' in
local file system or '{fs.default.name}/{graphname}' in HDFS
--huge-types, -t
Type of schema/data. Concat with ',' if more than one. 'all' means
all vertices, edges and schema, in other words, 'all' equals with
'vertex,edge,vertex_label,edge_label,property_key,index_label'
Default: [PROPERTY_KEY, VERTEX_LABEL, EDGE_LABEL, INDEX_LABEL, VERTEX, EDGE]
--log, -l
Directory of log
Default: ./logs
--retry
Retry times, default is 3
Default: 3
-D
HDFS config parameters
Syntax: -Dkey=value
Default: {}
migrate Migrate graph
Usage: migrate [options]
Options:
--directory, -d
Directory of graph schema/data, default is './{graphname}' in
local file system or '{fs.default.name}/{graphname}' in HDFS
--graph-mode, -m
Mode used when migrating to target graph, include: [RESTORING,
MERGING]
Default: RESTORING
Possible Values: [NONE, RESTORING, MERGING, LOADING]
--huge-types, -t
Type of schema/data. Concat with ',' if more than one. 'all' means
all vertices, edges and schema, in other words, 'all' equals with
'vertex,edge,vertex_label,edge_label,property_key,index_label'
Default: [PROPERTY_KEY, VERTEX_LABEL, EDGE_LABEL, INDEX_LABEL, VERTEX, EDGE]
--keep-local-data
Whether to keep the local directory of graph data after restored
Default: false
--log, -l
Directory of log
Default: ./logs
--retry
Retry times, default is 3
Default: 3
--split-size, -s
Split size of shard
Default: 1048576
--target-graph
The name of target graph to migrate
Default: hugegraph
--target-password
The password of target graph to migrate
--target-timeout
The timeout to connect target graph to migrate
Default: 0
--target-trust-store-file
The trust store file of target graph to migrate
--target-trust-store-password
The trust store password of target graph to migrate
--target-url
The url of target graph to migrate
Default: http://127.0.0.1:8081
--target-user
The username of target graph to migrate
-D
HDFS config parameters
Syntax: -Dkey=value
Default: {}
deploy Install HugeGraph-Server and HugeGraph-Studio
Usage: deploy [options]
Options:
* -p
Install path of HugeGraph-Server and HugeGraph-Studio
-u
Download url prefix path of HugeGraph-Server and HugeGraph-Studio
* -v
Version of HugeGraph-Server and HugeGraph-Studio
start-all Start HugeGraph-Server and HugeGraph-Studio
Usage: start-all [options]
Options:
* -p
Install path of HugeGraph-Server and HugeGraph-Studio
* -v
Version of HugeGraph-Server and HugeGraph-Studio
clear Clear HugeGraph-Server and HugeGraph-Studio
Usage: clear [options]
Options:
* -p
Install path of HugeGraph-Server and HugeGraph-Studio
stop-all Stop HugeGraph-Server and HugeGraph-Studio
Usage: stop-all
help Print usage
Usage: help
3.9 Specific command example
1. gremlin statement
# Execute gremlin synchronously
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph gremlin-execute --script 'g.V().count()'
# Execute gremlin asynchronously
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph gremlin-schedule --script 'g.V().count()'
2. Show task status
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph task-list
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph task-list --limit 5
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph task-list --status success
3. Set and show graph mode
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-set -m RESTORING MERGING NONE
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-set -m RESTORING
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-get
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-list
4. Cleanup Graph
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-clear -c "I'm sure to delete all data"
5. Backup Graph
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph backup -t all --directory ./backup-test
6. Periodic Backup Graph
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph --interval */2 * * * * schedule-backup -d ./backup-0.10.2
7. Recovery Graph
# set graph mode
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-set -m RESTORING
# recovery graph
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph restore -t all --directory ./backup-test
# restore graph mode
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph graph-mode-set -m NONE
8. Graph Migration
./bin/hugegraph --url http://127.0.0.1:8080 --graph hugegraph migrate --target-url http://127.0.0.1:8090 --target-graph hugegraph
3.7 - HugeGraph-Computer Quick Start
1 HugeGraph-Computer Overview
The HugeGraph-Computer
is a distributed graph processing system for HugeGraph (OLAP). It is an implementation of Pregel. It runs on a Kubernetes framework.
Features
- Support distributed MPP graph computing, and integrates with HugeGraph as graph input/output storage.
- Based on BSP (Bulk Synchronous Parallel) model, an algorithm performs computing through multiple parallel iterations, every iteration is a superstep.
- Auto memory management. The framework will never be OOM(Out of Memory) since it will split some data to disk if it doesn’t have enough memory to hold all the data.
- The part of edges or the messages of super node can be in memory, so you will never lose it.
- You can load the data from HDFS or HugeGraph, or any other system.
- You can output the results to HDFS or HugeGraph, or any other system.
- Easy to develop a new algorithm. You just need to focus on a vertex only processing just like as in a single server, without worrying about message transfer and memory/storage management.
2 Dependency for Building/Running
2.1 Install Java 11 (JDK 11)
Must use ≥ Java 11
to run Computer
, and configure by yourself.
Be sure to execute the java -version
command to check the jdk version before reading
3 Get Started
3.1 Run PageRank algorithm locally
To run algorithm with HugeGraph-Computer, you need to install Java 11 or later versions.
You also need to deploy HugeGraph-Server and Etcd.
There are two ways to get HugeGraph-Computer:
- Download the compiled tarball
- Clone source code then compile and package
3.1.1 Download the compiled archive
Download the latest version of the HugeGraph-Computer release package:
wget https://downloads.apache.org/incubator/hugegraph/${version}/apache-hugegraph-computer-incubating-${version}.tar.gz
tar zxvf apache-hugegraph-computer-incubating-${version}.tar.gz -C hugegraph-computer
3.1.2 Clone source code to compile and package
Clone the latest version of HugeGraph-Computer source package:
$ git clone https://github.com/apache/hugegraph-computer.git
Compile and generate tar package:
cd hugegraph-computer
mvn clean package -DskipTests
3.1.3 Start master node
You can use
-c
parameter specify the configuration file, more computer config please see:Computer Config Options
cd hugegraph-computer
bin/start-computer.sh -d local -r master
3.1.4 Start worker node
bin/start-computer.sh -d local -r worker
3.1.5 Query algorithm results
3.1.5.1 Enable OLAP
index query for server
If OLAP index is not enabled, it needs to enable. More reference: modify-graphs-read-mode
PUT http://localhost:8080/graphs/hugegraph/graph_read_mode
"ALL"
3.1.5.2 Query page_rank
property value:
curl "http://localhost:8080/graphs/hugegraph/graph/vertices?page&limit=3" | gunzip
3.2 Run PageRank algorithm in Kubernetes
To run algorithm with HugeGraph-Computer, you need to deploy HugeGraph-Server first
3.2.1 Install HugeGraph-Computer CRD
# Kubernetes version >= v1.16
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml
# Kubernetes version < v1.16
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1beta1.yaml
3.2.2 Show CRD
kubectl get crd
NAME CREATED AT
hugegraphcomputerjobs.hugegraph.apache.org 2021-09-16T08:01:08Z
3.2.3 Install hugegraph-computer-operator&etcd-server
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-operator.yaml
3.2.4 Wait for hugegraph-computer-operator&etcd-server deployment to complete
kubectl get pod -n hugegraph-computer-operator-system
NAME READY STATUS RESTARTS AGE
hugegraph-computer-operator-controller-manager-58c5545949-jqvzl 1/1 Running 0 15h
hugegraph-computer-operator-etcd-28lm67jxk5 1/1 Running 0 15h
3.2.5 Submit job
More computer crd please see: Computer CRD
More computer config please see: Computer Config Options
cat <<EOF | kubectl apply --filename -
apiVersion: hugegraph.apache.org/v1
kind: HugeGraphComputerJob
metadata:
namespace: hugegraph-computer-operator-system
name: &jobName pagerank-sample
spec:
jobId: *jobName
algorithmName: page_rank
image: hugegraph/hugegraph-computer:latest # algorithm image url
jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar # algorithm jar path
pullPolicy: Always
workerCpu: "4"
workerMemory: "4Gi"
workerInstances: 5
computerConf:
job.partitions_count: "20"
algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
hugegraph.url: http://${hugegraph-server-host}:${hugegraph-server-port} # hugegraph server url
hugegraph.name: hugegraph # hugegraph graph name
EOF
3.2.6 Show job
kubectl get hcjob/pagerank-sample -n hugegraph-computer-operator-system
NAME JOBID JOBSTATUS
pagerank-sample pagerank-sample RUNNING
3.2.7 Show log of nodes
# Show the master log
kubectl logs -l component=pagerank-sample-master -n hugegraph-computer-operator-system
# Show the worker log
kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system
# Show diagnostic log of a job
# NOTE: diagnostic log exist only when the job fails, and it will only be saved for one hour.
kubectl get event --field-selector reason=ComputerJobFailed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
3.2.8 Show success event of a job
NOTE: it will only be saved for one hour
kubectl get event --field-selector reason=ComputerJobSucceed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
3.2.9 Query algorithm results
If the output to Hugegraph-Server
is consistent with Locally, if output to HDFS
, please check the result file in the directory of /hugegraph-computer/results/{jobId}
directory.
4 Built-In algorithms document
4.1 Supported algorithms list:
Centrality Algorithm:
- PageRank
- BetweennessCentrality
- ClosenessCentrality
- DegreeCentrality
Community Algorithm:
- ClusteringCoefficient
- Kcore
- Lpa
- TriangleCount
- Wcc
Path Algorithm:
- RingsDetection
- RingsDetectionWithFilter
More algorithms please see: Built-In algorithms
4.2 Algorithm describe
TODO
5 Algorithm development guide
TODO
6 Note
- If some classes under computer-k8s cannot be found, you need to execute
mvn compile
in advance to generate corresponding classes.
4 - Config
4.1 - HugeGraph configuration
1 Overview
The directory for the configuration files is hugegraph-release/conf
, and all the configurations related to the service and the graph itself are located in this directory.
The main configuration files include gremlin-server.yaml
, rest-server.properties
, and hugegraph.properties
.
The HugeGraphServer
integrates the GremlinServer
and RestServer
internally, and gremlin-server.yaml
and rest-server.properties
are used to configure these two servers.
- GremlinServer: GremlinServer accepts Gremlin statements from users, parses them, and then invokes the Core code.
- RestServer: It provides a RESTful API that, based on different HTTP requests, calls the corresponding Core API. If the user’s request body is a Gremlin statement, it will be forwarded to GremlinServer to perform operations on the graph data.
Now let’s introduce these three configuration files one by one.
2. gremlin-server.yaml
The default content of the gremlin-server.yaml
file is as follows:
# host and port of gremlin server, need to be consistent with host and port in rest-server.properties
#host: 127.0.0.1
#port: 8182
# timeout in ms of gremlin query
evaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
# don't set graph at here, this happens after support for dynamically adding graph
graphs: {
}
scriptEngines: {
gremlin-groovy: {
staticImports: [
org.opencypher.gremlin.process.traversal.CustomPredicates.*',
org.opencypher.gremlin.traversal.CustomFunctions.*
],
plugins: {
org.apache.hugegraph.plugin.HugeGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {
classImports: [
java.lang.Math,
org.apache.hugegraph.backend.id.IdGenerator,
org.apache.hugegraph.type.define.Directions,
org.apache.hugegraph.type.define.NodeRole,
org.apache.hugegraph.traversal.algorithm.CollectionPathsTraverser,
org.apache.hugegraph.traversal.algorithm.CountTraverser,
org.apache.hugegraph.traversal.algorithm.CustomizedCrosspointsTraverser,
org.apache.hugegraph.traversal.algorithm.CustomizePathsTraverser,
org.apache.hugegraph.traversal.algorithm.FusiformSimilarityTraverser,
org.apache.hugegraph.traversal.algorithm.HugeTraverser,
org.apache.hugegraph.traversal.algorithm.JaccardSimilarTraverser,
org.apache.hugegraph.traversal.algorithm.KneighborTraverser,
org.apache.hugegraph.traversal.algorithm.KoutTraverser,
org.apache.hugegraph.traversal.algorithm.MultiNodeShortestPathTraverser,
org.apache.hugegraph.traversal.algorithm.NeighborRankTraverser,
org.apache.hugegraph.traversal.algorithm.PathsTraverser,
org.apache.hugegraph.traversal.algorithm.PersonalRankTraverser,
org.apache.hugegraph.traversal.algorithm.SameNeighborTraverser,
org.apache.hugegraph.traversal.algorithm.ShortestPathTraverser,
org.apache.hugegraph.traversal.algorithm.SingleSourceShortestPathTraverser,
org.apache.hugegraph.traversal.algorithm.SubGraphTraverser,
org.apache.hugegraph.traversal.algorithm.TemplatePathsTraverser,
org.apache.hugegraph.traversal.algorithm.steps.EdgeStep,
org.apache.hugegraph.traversal.algorithm.steps.RepeatEdgeStep,
org.apache.hugegraph.traversal.algorithm.steps.WeightedEdgeStep,
org.apache.hugegraph.traversal.optimize.ConditionP,
org.apache.hugegraph.traversal.optimize.Text,
org.apache.hugegraph.traversal.optimize.TraversalUtil,
org.apache.hugegraph.util.DateUtil,
org.opencypher.gremlin.traversal.CustomFunctions,
org.opencypher.gremlin.traversal.CustomPredicate
],
methodImports: [
java.lang.Math#*,
org.opencypher.gremlin.traversal.CustomPredicate#*,
org.opencypher.gremlin.traversal.CustomFunctions#*
]
},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {
files: [scripts/empty-sample.groovy]
}
}
}
}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1,
config: {
serializeResultToString: false,
ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
}
}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0,
config: {
serializeResultToString: false,
ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
}
}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0,
config: {
serializeResultToString: false,
ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
}
}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0,
config: {
serializeResultToString: false,
ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
}
}
metrics: {
consoleReporter: {enabled: false, interval: 180000},
csvReporter: {enabled: false, interval: 180000, fileName: ./metrics/gremlin-server-metrics.csv},
jmxReporter: {enabled: false},
slf4jReporter: {enabled: false, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}
}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false
}
There are many configuration options mentioned above, but for now, let’s focus on the following options: channelizer
and graphs
.
graphs
: This option specifies the graphs that need to be opened when the GremlinServer starts. It is a map structure where the key is the name of the graph and the value is the configuration file path for that graph.channelizer
: The GremlinServer supports two communication modes with clients: WebSocket and HTTP (default). If WebSocket is chosen, users can quickly experience the features of HugeGraph using Gremlin-Console, but it does not support importing large-scale data. It is recommended to use HTTP for communication, as all peripheral components of HugeGraph are implemented based on HTTP.
By default, the GremlinServer serves at localhost:8182
. If you need to modify it, configure the host
and port
settings.
host
: The hostname or IP address of the machine where the GremlinServer is deployed. Currently, HugeGraphServer does not support distributed deployment, and GremlinServer is not directly exposed to users.port
: The port number of the machine where the GremlinServer is deployed.
Additionally, you need to add the corresponding configuration gremlinserver.url=http://host:port
in rest-server.properties
.
3. rest-server.properties
The default content of the rest-server.properties
file is as follows:
# bind url
# could use '0.0.0.0' or specified (real)IP to expose external network access
restserver.url=http://127.0.0.1:8080
#restserver.enable_graphspaces_filter=false
# gremlin server url, need to be consistent with host and port in gremlin-server.yaml
#gremlinserver.url=http://127.0.0.1:8182
graphs=./conf/graphs
# The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0
batch.max_write_ratio=80
batch.max_write_threads=0
# configuration of arthas
arthas.telnet_port=8562
arthas.http_port=8561
arthas.ip=127.0.0.1
arthas.disabled_commands=jad
# authentication configs
# choose 'org.apache.hugegraph.auth.StandardAuthenticator' or
# 'org.apache.hugegraph.auth.ConfigAuthenticator'
#auth.authenticator=
# for StandardAuthenticator mode
#auth.graph_store=hugegraph
# auth client config
#auth.remote_url=127.0.0.1:8899,127.0.0.1:8898,127.0.0.1:8897
# for ConfigAuthenticator mode
#auth.admin_token=
#auth.user_tokens=[]
# TODO: Deprecated & removed later (useless from version 1.5.0)
# rpc server configs for multi graph-servers or raft-servers
#rpc.server_host=127.0.0.1
#rpc.server_port=8091
#rpc.server_timeout=30
# rpc client configs (like enable to keep cache consistency)
#rpc.remote_url=127.0.0.1:8091,127.0.0.1:8092,127.0.0.1:8093
#rpc.client_connect_timeout=20
#rpc.client_reconnect_period=10
#rpc.client_read_timeout=40
#rpc.client_retries=3
#rpc.client_load_balancer=consistentHash
# raft group initial peers
#raft.group_peers=127.0.0.1:8091,127.0.0.1:8092,127.0.0.1:8093
# lightweight load balancing (beta)
server.id=server-1
server.role=master
# slow query log
log.slow_query_threshold=1000
# jvm(in-heap) memory usage monitor, set 1 to disable it
memory_monitor.threshold=0.85
memory_monitor.period=2000
restserver.url
: The URL at which the RestServer provides its services. Modify it according to the actual environment. If you can’t connet to server from other IP address, try to modify it as specific IP; or modify it ashttp://0.0.0.0
to listen all network interfaces as a convenient solution, but need to take care of the network area that might access.graphs
: The RestServer also needs to open graphs when it starts. This option is a map structure where the key is the name of the graph and the value is the configuration file path for that graph.
Note: Both
gremlin-server.yaml
andrest-server.properties
contain thegraphs
configuration option, and theinit-store
command initializes based on the graphs specified in thegraphs
section ofgremlin-server.yaml
.
The
gremlinserver.url
configuration option is the URL at which the GremlinServer provides services to the RestServer. By default, it is set tohttp://localhost:8182
. If you need to modify it, it should match thehost
andport
settings ingremlin-server.yaml
.
4. hugegraph.properties
hugegraph.properties
is a type of file. If the system has multiple graphs, there will be multiple similar files. This file is used to configure parameters related to graph storage and querying. The default content of the file is as follows:
# gremlin entrence to create graph
gremlin.graph=org.apache.hugegraph.HugeFactory
# cache config
#schema.cache_capacity=100000
# vertex-cache default is 1000w, 10min expired
#vertex.cache_capacity=10000000
#vertex.cache_expire=600
# edge-cache default is 100w, 10min expired
#edge.cache_capacity=1000000
#edge.cache_expire=600
# schema illegal name template
#schema.illegal_name_regex=\s+|~.*
#vertex.default_label=vertex
backend=rocksdb
serializer=binary
store=hugegraph
raft.mode=false
raft.safe_read=false
raft.use_snapshot=false
raft.endpoint=127.0.0.1:8281
raft.group_peers=127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283
raft.path=./raft-log
raft.use_replicator_pipeline=true
raft.election_timeout=10000
raft.snapshot_interval=3600
raft.backend_threads=48
raft.read_index_threads=8
raft.queue_size=16384
raft.queue_publish_timeout=60
raft.apply_batch=1
raft.rpc_threads=80
raft.rpc_connect_timeout=5000
raft.rpc_timeout=60000
# if use 'ikanalyzer', need download jar from 'https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory
search.text_analyzer=jieba
search.text_analyzer_mode=INDEX
# rocksdb backend config
#rocksdb.data_path=/path/to/disk
#rocksdb.wal_path=/path/to/disk
# cassandra backend config
cassandra.host=localhost
cassandra.port=9042
cassandra.username=
cassandra.password=
#cassandra.connect_timeout=5
#cassandra.read_timeout=20
#cassandra.keyspace.strategy=SimpleStrategy
#cassandra.keyspace.replication=3
# hbase backend config
#hbase.hosts=localhost
#hbase.port=2181
#hbase.znode_parent=/hbase
#hbase.threads_max=64
# mysql backend config
#jdbc.driver=com.mysql.jdbc.Driver
#jdbc.url=jdbc:mysql://127.0.0.1:3306
#jdbc.username=root
#jdbc.password=
#jdbc.reconnect_max_times=3
#jdbc.reconnect_interval=3
#jdbc.ssl_mode=false
# postgresql & cockroachdb backend config
#jdbc.driver=org.postgresql.Driver
#jdbc.url=jdbc:postgresql://localhost:5432/
#jdbc.username=postgres
#jdbc.password=
# palo backend config
#palo.host=127.0.0.1
#palo.poll_interval=10
#palo.temp_dir=./palo-data
#palo.file_limit_size=32
Pay attention to the following uncommented items:
gremlin.graph
: The entry point for GremlinServer startup. Users should not modify this item.backend
: The backend storage used, with options includingmemory
,cassandra
,scylladb
,mysql
,hbase
,postgresql
, androcksdb
.serializer
: Mainly for internal use, used to serialize schema, vertices, and edges to the backend. The corresponding options aretext
,cassandra
,scylladb
, andbinary
(Note: Therocksdb
backend should have a value ofbinary
, while for other backends, the values ofbackend
andserializer
should remain consistent. For example, for thehbase
backend, the value should behbase
).store
: The name of the database used for storing the graph in the backend. In Cassandra and ScyllaDB, it corresponds to the keyspace name. The value of this item is unrelated to the graph name in GremlinServer and RestServer, but for clarity, it is recommended to use the same name.cassandra.host
: This item is only meaningful when the backend is set tocassandra
orscylladb
. It specifies the seeds of the Cassandra/ScyllaDB cluster.cassandra.port
: This item is only meaningful when the backend is set tocassandra
orscylladb
. It specifies the native port of the Cassandra/ScyllaDB cluster.rocksdb.data_path
: This item is only meaningful when the backend is set torocksdb
. It specifies the data directory for RocksDB.rocksdb.wal_path
: This item is only meaningful when the backend is set torocksdb
. It specifies the log directory for RocksDB.admin.token
: A token used to retrieve server configuration information. For example: http://localhost:8080/graphs/hugegraph/conf?token=162f7848-0b6d-4faf-b557-3a0797869c55
5. Multi-Graph Configuration
Our system can have multiple graphs, and the backend of each graph can be different, such as hugegraph_rocksdb
and hugegraph_mysql
, where hugegraph_rocksdb
uses RocksDB
as the backend, and hugegraph_mysql
uses MySQL
as a backend.
The configuration method is simple:
[Optional]: Modify rest-server.properties
You can modify the graph profile directory in the graphs
option of rest-server.properties
. The default configuration is graphs=./conf/graphs
, if you want to change it to another directory then adjust the graphs
option, e.g. adjust it to graphs=/etc/hugegraph/graphs
, example is as follows:
graphs=./conf/graphs
Modify hugegraph_mysql_backend.properties
and hugegraph_rocksdb_backend.properties
based on hugegraph.properties
under conf/graphs
path
The modified part of hugegraph_mysql_backend.properties
is as follows:
backend=mysql
serializer=mysql
store=hugegraph_mysql
# mysql backend config
jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.url=jdbc:mysql://127.0.0.1:3306
jdbc.username=root
jdbc.password=123456
jdbc.reconnect_max_times=3
jdbc.reconnect_interval=3
jdbc.ssl_mode=false
The modified part of hugegraph_rocksdb_backend.properties
is as follows:
backend=rocksdb
serializer=binary
store=hugegraph_rocksdb
Stop the server, execute init-store.sh
(to create a new database for the new graph), and restart the server.
$ ./bin/stop-hugegraph.sh
$ ./bin/init-store.sh
Initializing HugeGraph Store...
2023-06-11 14:16:14 [main] [INFO] o.a.h.u.ConfigUtil - Scanning option 'graphs' directory './conf/graphs'
2023-06-11 14:16:14 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_rocksdb_backend.properties
...
2023-06-11 14:16:15 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_rocksdb' has been initialized
2023-06-11 14:16:15 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_mysql_backend.properties
...
2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_mysql' has been initialized
2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Close graph standardhugegraph[hugegraph_rocksdb]
...
2023-06-11 14:16:16 [main] [INFO] o.a.h.HugeFactory - HugeFactory shutdown
2023-06-11 14:16:16 [hugegraph-shutdown] [INFO] o.a.h.HugeFactory - HugeGraph is shutting down
Initialization finished.
$ ./bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)...OK
Started [pid 21614]
Check out created graphs:
curl http://127.0.0.1:8080/graphs/
{"graphs":["hugegraph_rocksdb","hugegraph_mysql"]}
Get details of the graph
curl http://127.0.0.1:8080/graphs/hugegraph_mysql_backend
{"name":"hugegraph_mysql","backend":"mysql"}
curl http://127.0.0.1:8080/graphs/hugegraph_rocksdb_backend
{"name":"hugegraph_rocksdb","backend":"rocksdb"}
4.2 - HugeGraph Config Options
Gremlin Server Config Options
Corresponding configuration file gremlin-server.yaml
config option | default value | description |
---|---|---|
host | 127.0.0.1 | The host or ip of Gremlin Server. |
port | 8182 | The listening port of Gremlin Server. |
graphs | hugegraph: conf/hugegraph.properties | The map of graphs with name and config file path. |
scriptEvaluationTimeout | 30000 | The timeout for gremlin script execution(millisecond). |
channelizer | org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer | Indicates the protocol which the Gremlin Server provides service. |
authentication | authenticator: org.apache.hugegraph.auth.StandardAuthenticator, config: {tokens: conf/rest-server.properties} | The authenticator and config(contains tokens path) of authentication mechanism. |
Rest Server & API Config Options
Corresponding configuration file rest-server.properties
config option | default value | description |
---|---|---|
graphs | [hugegraph:conf/hugegraph.properties] | The map of graphs’ name and config file. |
server.id | server-1 | The id of rest server, used for license verification. |
server.role | master | The role of nodes in the cluster, available types are [master, worker, computer] |
restserver.url | http://127.0.0.1:8080 | The url for listening of rest server. |
ssl.keystore_file | server.keystore | The path of server keystore file used when https protocol is enabled. |
ssl.keystore_password | The password of the path of the server keystore file used when the https protocol is enabled. | |
restserver.max_worker_threads | 2 * CPUs | The maximum worker threads of rest server. |
restserver.min_free_memory | 64 | The minimum free memory(MB) of rest server, requests will be rejected when the available memory of system is lower than this value. |
restserver.request_timeout | 30 | The time in seconds within which a request must complete, -1 means no timeout. |
restserver.connection_idle_timeout | 30 | The time in seconds to keep an inactive connection alive, -1 means no timeout. |
restserver.connection_max_requests | 256 | The max number of HTTP requests allowed to be processed on one keep-alive connection, -1 means unlimited. |
gremlinserver.url | http://127.0.0.1:8182 | The url of gremlin server. |
gremlinserver.max_route | 8 | The max route number for gremlin server. |
gremlinserver.timeout | 30 | The timeout in seconds of waiting for gremlin server. |
batch.max_edges_per_batch | 500 | The maximum number of edges submitted per batch. |
batch.max_vertices_per_batch | 500 | The maximum number of vertices submitted per batch. |
batch.max_write_ratio | 50 | The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0. |
batch.max_write_threads | 0 | The maximum threads for batch writing, if the value is 0, the actual value will be set to batch.max_write_ratio * restserver.max_worker_threads. |
auth.authenticator | The class path of authenticator implementation. e.g., org.apache.hugegraph.auth.StandardAuthenticator, or org.apache.hugegraph.auth.ConfigAuthenticator. | |
auth.admin_token | 162f7848-0b6d-4faf-b557-3a0797869c55 | Token for administrator operations, only for org.apache.hugegraph.auth.ConfigAuthenticator. |
auth.graph_store | hugegraph | The name of graph used to store authentication information, like users, only for org.apache.hugegraph.auth.StandardAuthenticator. |
auth.user_tokens | [hugegraph:9fd95c9c-711b-415b-b85f-d4df46ba5c31] | The map of user tokens with name and password, only for org.apache.hugegraph.auth.ConfigAuthenticator. |
auth.audit_log_rate | 1000.0 | The max rate of audit log output per user, default value is 1000 records per second. |
auth.cache_capacity | 10240 | The max cache capacity of each auth cache item. |
auth.cache_expire | 600 | The expiration time in seconds of vertex cache. |
auth.remote_url | If the address is empty, it provide auth service, otherwise it is auth client and also provide auth service through rpc forwarding. The remote url can be set to multiple addresses, which are concat by ‘,’. | |
auth.token_expire | 86400 | The expiration time in seconds after token created |
auth.token_secret | FXQXbJtbCLxODc6tGci732pkH1cyf8Qg | Secret key of HS256 algorithm. |
exception.allow_trace | false | Whether to allow exception trace stack. |
memory_monitor.threshold | 0.85 | The threshold of JVM(in-heap) memory usage monitoring , 1 means disabling this function. |
memory_monitor.period | 2000 | The period in ms of JVM(in-heap) memory usage monitoring. |
Basic Config Options
Basic Config Options and Backend Config Options correspond to configuration files:{graph-name}.properties, such as hugegraph.properties
config option | default value | description |
---|---|---|
gremlin.graph | org.apache.hugegraph.HugeFactory | Gremlin entrance to create graph. |
backend | rocksdb | The data store type, available values are [memory, rocksdb, cassandra, scylladb, hbase, mysql]. |
serializer | binary | The serializer for backend store, available values are [text, binary, cassandra, hbase, mysql]. |
store | hugegraph | The database name like Cassandra Keyspace. |
store.connection_detect_interval | 600 | The interval in seconds for detecting connections, if the idle time of a connection exceeds this value, detect it and reconnect if needed before using, value 0 means detecting every time. |
store.graph | g | The graph table name, which store vertex, edge and property. |
store.schema | m | The schema table name, which store meta data. |
store.system | s | The system table name, which store system data. |
schema.illegal_name_regex | .\s+$|~. | The regex specified the illegal format for schema name. |
schema.cache_capacity | 10000 | The max cache size(items) of schema cache. |
vertex.cache_type | l2 | The type of vertex cache, allowed values are [l1, l2]. |
vertex.cache_capacity | 10000000 | The max cache size(items) of vertex cache. |
vertex.cache_expire | 600 | The expire time in seconds of vertex cache. |
vertex.check_customized_id_exist | false | Whether to check the vertices exist for those using customized id strategy. |
vertex.default_label | vertex | The default vertex label. |
vertex.tx_capacity | 10000 | The max size(items) of vertices(uncommitted) in transaction. |
vertex.check_adjacent_vertex_exist | false | Whether to check the adjacent vertices of edges exist. |
vertex.lazy_load_adjacent_vertex | true | Whether to lazy load adjacent vertices of edges. |
vertex.part_edge_commit_size | 5000 | Whether to enable the mode to commit part of edges of vertex, enabled if commit size > 0, 0 means disabled. |
vertex.encode_primary_key_number | true | Whether to encode number value of primary key in vertex id. |
vertex.remove_left_index_at_overwrite | false | Whether remove left index at overwrite. |
edge.cache_type | l2 | The type of edge cache, allowed values are [l1, l2]. |
edge.cache_capacity | 1000000 | The max cache size(items) of edge cache. |
edge.cache_expire | 600 | The expiration time in seconds of edge cache. |
edge.tx_capacity | 10000 | The max size(items) of edges(uncommitted) in transaction. |
query.page_size | 500 | The size of each page when querying by paging. |
query.batch_size | 1000 | The size of each batch when querying by batch. |
query.ignore_invalid_data | true | Whether to ignore invalid data of vertex or edge. |
query.index_intersect_threshold | 1000 | The maximum number of intermediate results to intersect indexes when querying by multiple single index properties. |
query.ramtable_edges_capacity | 20000000 | The maximum number of edges in ramtable, include OUT and IN edges. |
query.ramtable_enable | false | Whether to enable ramtable for query of adjacent edges. |
query.ramtable_vertices_capacity | 10000000 | The maximum number of vertices in ramtable, generally the largest vertex id is used as capacity. |
query.optimize_aggregate_by_index | false | Whether to optimize aggregate query(like count) by index. |
oltp.concurrent_depth | 10 | The min depth to enable concurrent oltp algorithm. |
oltp.concurrent_threads | 10 | Thread number to concurrently execute oltp algorithm. |
oltp.collection_type | EC | The implementation type of collections used in oltp algorithm. |
rate_limit.read | 0 | The max rate(times/s) to execute query of vertices/edges. |
rate_limit.write | 0 | The max rate(items/s) to add/update/delete vertices/edges. |
task.wait_timeout | 10 | Timeout in seconds for waiting for the task to complete,such as when truncating or clearing the backend. |
task.input_size_limit | 16777216 | The job input size limit in bytes. |
task.result_size_limit | 16777216 | The job result size limit in bytes. |
task.sync_deletion | false | Whether to delete schema or expired data synchronously. |
task.ttl_delete_batch | 1 | The batch size used to delete expired data. |
computer.config | /conf/computer.yaml | The config file path of computer job. |
search.text_analyzer | ikanalyzer | Choose a text analyzer for searching the vertex/edge properties, available type are [word, ansj, hanlp, smartcn, jieba, jcseg, mmseg4j, ikanalyzer]. if use ‘ikanalyzer’, need download jar from ‘https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory |
search.text_analyzer_mode | smart | Specify the mode for the text analyzer, the available mode of analyzer are {word: [MaximumMatching, ReverseMaximumMatching, MinimumMatching, ReverseMinimumMatching, BidirectionalMaximumMatching, BidirectionalMinimumMatching, BidirectionalMaximumMinimumMatching, FullSegmentation, MinimalWordCount, MaxNgramScore, PureEnglish], ansj: [BaseAnalysis, IndexAnalysis, ToAnalysis, NlpAnalysis], hanlp: [standard, nlp, index, nShort, shortest, speed], smartcn: [], jieba: [SEARCH, INDEX], jcseg: [Simple, Complex], mmseg4j: [Simple, Complex, MaxWord], ikanalyzer: [smart, max_word]}. |
snowflake.datacenter_id | 0 | The datacenter id of snowflake id generator. |
snowflake.force_string | false | Whether to force the snowflake long id to be a string. |
snowflake.worker_id | 0 | The worker id of snowflake id generator. |
raft.mode | false | Whether the backend storage works in raft mode. |
raft.safe_read | false | Whether to use linearly consistent read. |
raft.use_snapshot | false | Whether to use snapshot. |
raft.endpoint | 127.0.0.1:8281 | The peerid of current raft node. |
raft.group_peers | 127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283 | The peers of current raft group. |
raft.path | ./raft-log | The log path of current raft node. |
raft.use_replicator_pipeline | true | Whether to use replicator line, when turned on it multiple logs can be sent in parallel, and the next log doesn’t have to wait for the ack message of the current log to be sent. |
raft.election_timeout | 10000 | Timeout in milliseconds to launch a round of election. |
raft.snapshot_interval | 3600 | The interval in seconds to trigger snapshot save. |
raft.backend_threads | current CPU v-cores | The thread number used to apply task to backend. |
raft.read_index_threads | 8 | The thread number used to execute reading index. |
raft.apply_batch | 1 | The apply batch size to trigger disruptor event handler. |
raft.queue_size | 16384 | The disruptor buffers size for jraft RaftNode, StateMachine and LogManager. |
raft.queue_publish_timeout | 60 | The timeout in second when publish event into disruptor. |
raft.rpc_threads | 80 | The rpc threads for jraft RPC layer. |
raft.rpc_connect_timeout | 5000 | The rpc connect timeout for jraft rpc. |
raft.rpc_timeout | 60000 | The rpc timeout for jraft rpc. |
raft.rpc_buf_low_water_mark | 10485760 | The ChannelOutboundBuffer’s low water mark of netty, when buffer size less than this size, the method ChannelOutboundBuffer.isWritable() will return true, it means that low downstream pressure or good network. |
raft.rpc_buf_high_water_mark | 20971520 | The ChannelOutboundBuffer’s high water mark of netty, only when buffer size exceed this size, the method ChannelOutboundBuffer.isWritable() will return false, it means that the downstream pressure is too great to process the request or network is very congestion, upstream needs to limit rate at this time. |
raft.read_strategy | ReadOnlyLeaseBased | The linearizability of read strategy. |
RPC server Config Options
config option | default value | description |
---|---|---|
rpc.client_connect_timeout | 20 | The timeout(in seconds) of rpc client connect to rpc server. |
rpc.client_load_balancer | consistentHash | The rpc client uses a load-balancing algorithm to access multiple rpc servers in one cluster. Default value is ‘consistentHash’, means forwarding by request parameters. |
rpc.client_read_timeout | 40 | The timeout(in seconds) of rpc client read from rpc server. |
rpc.client_reconnect_period | 10 | The period(in seconds) of rpc client reconnect to rpc server. |
rpc.client_retries | 3 | Failed retry number of rpc client calls to rpc server. |
rpc.config_order | 999 | Sofa rpc configuration file loading order, the larger the more later loading. |
rpc.logger_impl | com.alipay.sofa.rpc.log.SLF4JLoggerImpl | Sofa rpc log implementation class. |
rpc.protocol | bolt | Rpc communication protocol, client and server need to be specified the same value. |
rpc.remote_url | The remote urls of rpc peers, it can be set to multiple addresses, which are concat by ‘,’, empty value means not enabled. | |
rpc.server_adaptive_port | false | Whether the bound port is adaptive, if it’s enabled, when the port is in use, automatically +1 to detect the next available port. Note that this process is not atomic, so there may still be port conflicts. |
rpc.server_host | The hosts/ips bound by rpc server to provide services, empty value means not enabled. | |
rpc.server_port | 8090 | The port bound by rpc server to provide services. |
rpc.server_timeout | 30 | The timeout(in seconds) of rpc server execution. |
Cassandra Backend Config Options
config option | default value | description |
---|---|---|
backend | Must be set to cassandra . | |
serializer | Must be set to cassandra . | |
cassandra.host | localhost | The seeds hostname or ip address of cassandra cluster. |
cassandra.port | 9042 | The seeds port address of cassandra cluster. |
cassandra.connect_timeout | 5 | The cassandra driver connect server timeout(seconds). |
cassandra.read_timeout | 20 | The cassandra driver read from server timeout(seconds). |
cassandra.keyspace.strategy | SimpleStrategy | The replication strategy of keyspace, valid value is SimpleStrategy or NetworkTopologyStrategy. |
cassandra.keyspace.replication | [3] | The keyspace replication factor of SimpleStrategy, like ‘[3]’.Or replicas in each datacenter of NetworkTopologyStrategy, like ‘[dc1:2,dc2:1]’. |
cassandra.username | The username to use to login to cassandra cluster. | |
cassandra.password | The password corresponding to cassandra.username. | |
cassandra.compression_type | none | The compression algorithm of cassandra transport: none/snappy/lz4. |
cassandra.jmx_port=7199 | 7199 | The port of JMX API service for cassandra. |
cassandra.aggregation_timeout | 43200 | The timeout in seconds of waiting for aggregation. |
ScyllaDB Backend Config Options
config option | default value | description |
---|---|---|
backend | Must be set to scylladb . | |
serializer | Must be set to scylladb . |
Other options are consistent with the Cassandra backend.
RocksDB Backend Config Options
config option | default value | description |
---|---|---|
backend | Must be set to rocksdb . | |
serializer | Must be set to binary . | |
rocksdb.data_disks | [] | The optimized disks for storing data of RocksDB. The format of each element: STORE/TABLE: /path/disk .Allowed keys are [g/vertex, g/edge_out, g/edge_in, g/vertex_label_index, g/edge_label_index, g/range_int_index, g/range_float_index, g/range_long_index, g/range_double_index, g/secondary_index, g/search_index, g/shard_index, g/unique_index, g/olap] |
rocksdb.data_path | rocksdb-data/data | The path for storing data of RocksDB. |
rocksdb.wal_path | rocksdb-data/wal | The path for storing WAL of RocksDB. |
rocksdb.allow_mmap_reads | false | Allow the OS to mmap file for reading sst tables. |
rocksdb.allow_mmap_writes | false | Allow the OS to mmap file for writing. |
rocksdb.block_cache_capacity | 8388608 | The amount of block cache in bytes that will be used by RocksDB, 0 means no block cache. |
rocksdb.bloom_filter_bits_per_key | -1 | The bits per key in bloom filter, a good value is 10, which yields a filter with ~ 1% false positive rate, -1 means no bloom filter. |
rocksdb.bloom_filter_block_based_mode | false | Use block based filter rather than full filter. |
rocksdb.bloom_filter_whole_key_filtering | true | True if place whole keys in the bloom filter, else place the prefix of keys. |
rocksdb.bottommost_compression | NO_COMPRESSION | The compression algorithm for the bottommost level of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd. |
rocksdb.bulkload_mode | false | Switch to the mode to bulk load data into RocksDB. |
rocksdb.cache_index_and_filter_blocks | false | Indicating if we’d put index/filter blocks to the block cache. |
rocksdb.compaction_style | LEVEL | Set compaction style for RocksDB: LEVEL/UNIVERSAL/FIFO. |
rocksdb.compression | SNAPPY_COMPRESSION | The compression algorithm for compressing blocks of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd. |
rocksdb.compression_per_level | [NO_COMPRESSION, NO_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION] | The compression algorithms for different levels of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd. |
rocksdb.delayed_write_rate | 16777216 | The rate limit in bytes/s of user write requests when need to slow down if the compaction gets behind. |
rocksdb.log_level | INFO | The info log level of RocksDB. |
rocksdb.max_background_jobs | 8 | Maximum number of concurrent background jobs, including flushes and compactions. |
rocksdb.level_compaction_dynamic_level_bytes | false | Whether to enable level_compaction_dynamic_level_bytes, if it’s enabled we give max_bytes_for_level_multiplier a priority against max_bytes_for_level_base, the bytes of base level is dynamic for a more predictable LSM tree, it is useful to limit worse case space amplification. Turning this feature on/off for an existing DB can cause unexpected LSM tree structure so it’s not recommended. |
rocksdb.max_bytes_for_level_base | 536870912 | The upper-bound of the total size of level-1 files in bytes. |
rocksdb.max_bytes_for_level_multiplier | 10.0 | The ratio between the total size of level (L+1) files and the total size of level L files for all L. |
rocksdb.max_open_files | -1 | The maximum number of open files that can be cached by RocksDB, -1 means no limit. |
rocksdb.max_subcompactions | 4 | The value represents the maximum number of threads per compaction job. |
rocksdb.max_write_buffer_number | 6 | The maximum number of write buffers that are built up in memory. |
rocksdb.max_write_buffer_number_to_maintain | 0 | The total maximum number of write buffers to maintain in memory. |
rocksdb.min_write_buffer_number_to_merge | 2 | The minimum number of write buffers that will be merged together. |
rocksdb.num_levels | 7 | Set the number of levels for this database. |
rocksdb.optimize_filters_for_hits | false | This flag allows us to not store filters for the last level. |
rocksdb.optimize_mode | true | Optimize for heavy workloads and big datasets. |
rocksdb.pin_l0_filter_and_index_blocks_in_cache | false | Indicating if we’d put index/filter blocks to the block cache. |
rocksdb.sst_path | The path for ingesting SST file into RocksDB. | |
rocksdb.target_file_size_base | 67108864 | The target file size for compaction in bytes. |
rocksdb.target_file_size_multiplier | 1 | The size ratio between a level L file and a level (L+1) file. |
rocksdb.use_direct_io_for_flush_and_compaction | false | Enable the OS to use direct read/writes in flush and compaction. |
rocksdb.use_direct_reads | false | Enable the OS to use direct I/O for reading sst tables. |
rocksdb.write_buffer_size | 134217728 | Amount of data in bytes to build up in memory. |
rocksdb.max_manifest_file_size | 104857600 | The max size of manifest file in bytes. |
rocksdb.skip_stats_update_on_db_open | false | Whether to skip statistics update when opening the database, setting this flag true allows us to not update statistics. |
rocksdb.max_file_opening_threads | 16 | The max number of threads used to open files. |
rocksdb.max_total_wal_size | 0 | Total size of WAL files in bytes. Once WALs exceed this size, we will start forcing the flush of column families related, 0 means no limit. |
rocksdb.db_write_buffer_size | 0 | Total size of write buffers in bytes across all column families, 0 means no limit. |
rocksdb.delete_obsolete_files_period | 21600 | The periodicity in seconds when obsolete files get deleted, 0 means always do full purge. |
rocksdb.hard_pending_compaction_bytes_limit | 274877906944 | The hard limit to impose on pending compaction in bytes. |
rocksdb.level0_file_num_compaction_trigger | 2 | Number of files to trigger level-0 compaction. |
rocksdb.level0_slowdown_writes_trigger | 20 | Soft limit on number of level-0 files for slowing down writes. |
rocksdb.level0_stop_writes_trigger | 36 | Hard limit on number of level-0 files for stopping writes. |
rocksdb.soft_pending_compaction_bytes_limit | 68719476736 | The soft limit to impose on pending compaction in bytes. |
HBase Backend Config Options
config option | default value | description |
---|---|---|
backend | Must be set to hbase . | |
serializer | Must be set to hbase . | |
hbase.hosts | localhost | The hostnames or ip addresses of HBase zookeeper, separated with commas. |
hbase.port | 2181 | The port address of HBase zookeeper. |
hbase.threads_max | 64 | The max threads num of hbase connections. |
hbase.znode_parent | /hbase | The znode parent path of HBase zookeeper. |
hbase.zk_retry | 3 | The recovery retry times of HBase zookeeper. |
hbase.aggregation_timeout | 43200 | The timeout in seconds of waiting for aggregation. |
hbase.kerberos_enable | false | Is Kerberos authentication enabled for HBase. |
hbase.kerberos_keytab | The HBase’s key tab file for kerberos authentication. | |
hbase.kerberos_principal | The HBase’s principal for kerberos authentication. | |
hbase.krb5_conf | etc/krb5.conf | Kerberos configuration file, including KDC IP, default realm, etc. |
hbase.hbase_site | /etc/hbase/conf/hbase-site.xml | The HBase’s configuration file |
hbase.enable_partition | true | Is pre-split partitions enabled for HBase. |
hbase.vertex_partitions | 10 | The number of partitions of the HBase vertex table. |
hbase.edge_partitions | 30 | The number of partitions of the HBase edge table. |
MySQL & PostgreSQL Backend Config Options
config option | default value | description |
---|---|---|
backend | Must be set to mysql . | |
serializer | Must be set to mysql . | |
jdbc.driver | com.mysql.jdbc.Driver | The JDBC driver class to connect database. |
jdbc.url | jdbc:mysql://127.0.0.1:3306 | The url of database in JDBC format. |
jdbc.username | root | The username to login database. |
jdbc.password | ****** | The password corresponding to jdbc.username. |
jdbc.ssl_mode | false | The SSL mode of connections with database. |
jdbc.reconnect_interval | 3 | The interval(seconds) between reconnections when the database connection fails. |
jdbc.reconnect_max_times | 3 | The reconnect times when the database connection fails. |
jdbc.storage_engine | InnoDB | The storage engine of backend store database, like InnoDB/MyISAM/RocksDB for MySQL. |
jdbc.postgresql.connect_database | template1 | The database used to connect when init store, drop store or check store exist. |
PostgreSQL Backend Config Options
config option | default value | description |
---|---|---|
backend | Must be set to postgresql . | |
serializer | Must be set to postgresql . |
Other options are consistent with the MySQL backend.
The driver and url of the PostgreSQL backend should be set to:
jdbc.driver=org.postgresql.Driver
jdbc.url=jdbc:postgresql://localhost:5432/
4.3 - Built-in User Authentication and Authorization Configuration and Usage in HugeGraph
Overview
To facilitate authentication usage in different user scenarios, HugeGraph currently provides built-in authorization StandardAuthenticator
mode,
which supports multi-user authentication and fine-grained access control. It adopts a 4-layer design based on “User-UserGroup-Operation-Resource” to
flexibly control user roles and permissions (supports multiple GraphServers).
Some key designs of the StandardAuthenticator
mode include:
- During initialization, a super administrator (
admin
) user is created. Subsequently, other users can be created by the super administrator. Once newly created users are assigned sufficient permissions, they can create or manage more users. - It supports dynamic creation of users, user groups, and resources, as well as dynamic allocation or revocation of permissions.
- Users can belong to one or multiple user groups. Each user group can have permissions to operate on any number of resources. The types of operations include read, write, delete, execute, and others.
- “Resource” describes the data in the graph database, such as vertices that meet certain criteria. Each resource consists of three elements:
type
,label
, andproperties
. There are 18 types in total, with the ability to combine any label and properties. The internal condition of a resource is an AND relationship, while the condition between multiple resources is an OR relationship.
Here is an example to illustrate:
// Scenario: A user only has data read permission for the Beijing area
user(name=xx) -belong-> group(name=xx) -access(read)-> target(graph=graph1, resource={label: person, city: Beijing})
Configure User Authentication
By default, HugeGraph does not enable user authentication, and it needs to be enabled by modifying the configuration file (Note: If used in a production environment or over the internet, please use a Java11 version and enable auth-system to avoid security risks.)
You need to modify the configuration file to enable this feature. HugeGraph provides built-in authentication mode: StandardAuthenticator
. This mode supports multi-user authentication and fine-grained permission control. Additionally, developers can implement their own HugeAuthenticator
interface to integrate with their existing authentication systems.
HugeGraph authentication modes adopt HTTP Basic Authentication. In simple terms, when sending an HTTP request, you need to set the Authentication
header to Basic
and provide the corresponding username and password. The corresponding HTTP plaintext format is as follows:
GET http://localhost:8080/graphs/hugegraph/schema/vertexlabels
Authorization: Basic admin xxxx
Warning: Versions of HugeGraph-Server prior to 1.5.0 have a JWT-related security vulnerability in the Auth mode.
Users are advised to update to a newer version or manually set the JWT token’s secretKey. It can be set in the rest-server.properties
file by setting the auth.token_secret
information:
auth.token_secret=XXXX # should be a 32-chars string, consist of A-Z, a-z and 0-9
You can also generate it with the following command:
RANDOM_STRING=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 32)
echo "auth.token_secret=${RANDOM_STRING}" >> rest-server.properties
StandardAuthenticator Mode
The StandardAuthenticator
mode supports user authentication and permission control by storing user information in the database backend. This
implementation authenticates users based on their names and passwords (encrypted) stored in the database and controls user permissions based on their
roles. Below is the specific configuration process (requires service restart):
Configure the authenticator
and its rest-server
file path in the gremlin-server.yaml
configuration file:
authentication: {
authenticator: org.apache.hugegraph.auth.StandardAuthenticator,
authenticationHandler: org.apache.hugegraph.auth.WsAndHttpBasicAuthHandler,
config: {tokens: conf/rest-server.properties}
}
Configure the authenticator
and graph_store
information in the rest-server.properties
configuration file:
auth.authenticator=org.apache.hugegraph.auth.StandardAuthenticator
auth.graph_store=hugegraph
# Auth Client Config
# If GraphServer and AuthServer are deployed separately, you also need to specify the following configuration. Fill in the IP:RPC port of AuthServer.
# auth.remote_url=127.0.0.1:8899,127.0.0.1:8898,127.0.0.1:8897
In the above configuration, the graph_store
option specifies which graph to use for storing user information. If there are multiple graphs, you can choose any of them.
In the hugegraph{n}.properties
configuration file, configure the gremlin.graph
information:
gremlin.graph=org.apache.hugegraph.auth.HugeFactoryAuthProxy
For detailed API calls and explanations regarding permissions, please refer to the Authentication-API documentation.
Custom User Authentication System
If you need to support a more flexible user system, you can customize the authenticator for extension.
Simply implement the org.apache.hugegraph.auth.HugeAuthenticator
interface with your custom authenticator,
and then modify the authenticator
configuration item in the configuration file to point to your implementation.
Switching authentication mode
After the authentication configuration completed, enter the admin password on the command line when executing init store.sh
for the first time. (For non-Docker mode)
If deployed based on Docker image or if HugeGraph has already been initialized and needs to be converted to authentication mode, relevant graph data needs to be deleted and HugeGraph needs to be restarted. If there is already business data in the diagram, it is temporarily not possible to directly convert the authentication mode (version<=1.2.0)
Improvements for this feature have been included in the latest release (available in the latest docker image), please refer to PR 2411. Seamless switching is now available.
# stop the hugeGraph firstly
bin/stop-hugegraph.sh
# delete the store data (here we use the default path for rocksdb)
# there is no need to delete in the latest version (fixed in https://github.com/apache/incubator-hugegraph/pull/2411)
rm -rf rocksdb-data/
# init store again
bin/init-store.sh
# start hugeGraph again
bin/start-hugegraph.sh
Use docker to enable authentication mode
For versions of the hugegraph/hugegraph image equal to or greater than 1.2.0, you can enable authentication mode while starting the Docker image.
The steps are as follows:
1. Use docker run
To enable authentication mode, add the environment variable PASSWORD=123456
(you can freely set the password) in the docker run
command:
docker run -itd -e PASSWORD=123456 --name=server -p 8080:8080 hugegraph/hugegraph:1.5.0
2. Use docker-compose
Use docker-compose
and set the environment variable PASSWORD=123456
:
version: '3'
services:
server:
image: hugegraph/hugegraph:1.5.0
container_name: server
ports:
- 8080:8080
environment:
- PASSWORD=123456
3. Enter the container to enable authentication mode
Enter the container first:
docker exec -it server bash
# Modify the config quickly, the modified file are save in the conf-bak folder
bin/enable-auth.sh
Then follow Switching authentication mode
4.4 - Configuring HugeGraphServer to Use HTTPS Protocol
Overview
By default, HugeGraphServer uses the HTTP protocol. However, if you have security requirements for your requests, you can configure it to use HTTPS.
Server Configuration
Modify the conf/rest-server.properties
configuration file and change the schema part of restserver.url
to https
.
# Set the protocol to HTTPS
restserver.url=https://127.0.0.1:8080
# Server keystore file path. This default value is automatically effective when using HTTPS, and you can modify it as needed.
ssl.keystore_file=conf/hugegraph-server.keystore
# Server keystore file password. This default value is automatically effective when using HTTPS, and you can modify it as needed.
ssl.keystore_password=******
The server’s conf
directory already includes a keystore file named hugegraph-server.keystore
, and the password for this file is hugegraph
. These are the default values when enabling the HTTPS protocol. Users can generate their own keystore file and password, and then modify the values of ssl.keystore_file
and ssl.keystore_password
.
Client Configuration
Using HTTPS in HugeGraph-Client
When constructing a HugeClient, pass the HTTPS-related configurations. Here’s an example in Java:
String url = "https://localhost:8080";
String graphName = "hugegraph";
HugeClientBuilder builder = HugeClient.builder(url, graphName);
// Client keystore file path
String trustStoreFilePath = "hugegraph.truststore";
// Client keystore password
String trustStorePassword = "******";
builder.configSSL(trustStoreFilePath, trustStorePassword);
HugeClient hugeClient = builder.build();
Note: Before version 1.9.0, HugeGraph-Client was created directly using the
new
keyword and did not support the HTTPS protocol. Starting from version 1.9.0, it changed to use the builder pattern and supports configuring the HTTPS protocol.
Using HTTPS in HugeGraph-Loader
When starting an import task, add the following options in the command line:
# HTTPS
--protocol https
# Client certificate file path. When specifying --protocol as https, the default value conf/hugegraph.truststore is automatically used, and you can modify it as needed.
--trust-store-file {file}
# Client certificate file password. When specifying --protocol as https, the default value hugegraph is automatically used, and you can modify it as needed.
--trust-store-password {password}
Under the conf
directory of hugegraph-loader, there is already a default client certificate file named hugegraph.truststore
, and its password is hugegraph
.
Using HTTPS in HugeGraph-Tools
When executing commands, add the following options in the command line:
# Client certificate file path. When using the HTTPS protocol in the URL, the default value conf/hugegraph.truststore is automatically used, and you can modify it as needed.
--trust-store-file {file}
# Client certificate file password. When using the HTTPS protocol in the URL, the default value hugegraph is automatically used, and you can modify it as needed.
--trust-store-password {password}
# When executing migration commands and using the --target-url with the HTTPS protocol, the default value conf/hugegraph.truststore is automatically used, and you can modify it as needed.
--target-trust-store-file {target-file}
# When executing migration commands and using the --target-url with the HTTPS protocol, the default value hugegraph is automatically used, and you can modify it as needed.
--target-trust-store-password {target-password}
Under the conf
directory of hugegraph-tools, there is already a default client certificate file named hugegraph.truststore
, and its password is hugegraph
.
How to Generate Certificate Files
This section provides an example of generating certificates. If the default certificate is sufficient or if you already know how to generate certificates, you can skip this section.
Server
- Generate the server’s private key and import it into the server’s keystore file. The
server.keystore
is for the server’s use and contains its private key.
keytool -genkey -alias serverkey -keyalg RSA -keystore server.keystore
During the process, fill in the description information according to your requirements. The description information for the default certificate is as follows:
First and Last Name: hugegraph
Organizational Unit Name: hugegraph
Organization Name: hugegraph
City or Locality Name: BJ
State or Province Name: BJ
Country Code: CN
- Export the server certificate based on the server’s private key.
keytool -export -alias serverkey -keystore server.keystore -file server.crt
server.crt
is the server’s certificate.
Client
keytool -import -alias serverkey -file server.crt -keystore client.truststore
client.truststore
is for the client’s use and contains the trusted certificate.
4.5 - HugeGraph-Computer Config
Computer Config Options
config option | default value | description |
---|---|---|
algorithm.message_class | org.apache.hugegraph.computer.core.config.Null | The class of message passed when compute vertex. |
algorithm.params_class | org.apache.hugegraph.computer.core.config.Null | The class used to transfer algorithms’ parameters before algorithm been run. |
algorithm.result_class | org.apache.hugegraph.computer.core.config.Null | The class of vertex’s value, the instance is used to store computation result for the vertex. |
allocator.max_vertices_per_thread | 10000 | Maximum number of vertices per thread processed in each memory allocator |
bsp.etcd_endpoints | http://localhost:2379 | The end points to access etcd. |
bsp.log_interval | 30000 | The log interval(in ms) to print the log while waiting bsp event. |
bsp.max_super_step | 10 | The max super step of the algorithm. |
bsp.register_timeout | 300000 | The max timeout to wait for master and works to register. |
bsp.wait_master_timeout | 86400000 | The max timeout(in ms) to wait for master bsp event. |
bsp.wait_workers_timeout | 86400000 | The max timeout to wait for workers bsp event. |
hgkv.max_data_block_size | 65536 | The max byte size of hgkv-file data block. |
hgkv.max_file_size | 2147483648 | The max number of bytes in each hgkv-file. |
hgkv.max_merge_files | 10 | The max number of files to merge at one time. |
hgkv.temp_file_dir | /tmp/hgkv | This folder is used to store temporary files, temporary files will be generated during the file merging process. |
hugegraph.name | hugegraph | The graph name to load data and write results back. |
hugegraph.url | http://127.0.0.1:8080 | The hugegraph url to load data and write results back. |
input.edge_direction | OUT | The data of the edge in which direction is loaded, when the value is BOTH, the edges in both OUT and IN direction will be loaded. |
input.edge_freq | MULTIPLE | The frequency of edges can exist between a pair of vertices, allowed values: [SINGLE, SINGLE_PER_LABEL, MULTIPLE]. SINGLE means that only one edge can exist between a pair of vertices, use sourceId + targetId to identify it; SINGLE_PER_LABEL means that each edge label can exist one edge between a pair of vertices, use sourceId + edgelabel + targetId to identify it; MULTIPLE means that many edge can exist between a pair of vertices, use sourceId + edgelabel + sortValues + targetId to identify it. |
input.filter_class | org.apache.hugegraph.computer.core.input.filter.DefaultInputFilter | The class to create input-filter object, input-filter is used to Filter vertex edges according to user needs. |
input.loader_schema_path | The schema path of loader input, only takes effect when the input.source_type=loader is enabled | |
input.loader_struct_path | The struct path of loader input, only takes effect when the input.source_type=loader is enabled | |
input.max_edges_in_one_vertex | 200 | The maximum number of adjacent edges allowed to be attached to a vertex, the adjacent edges will be stored and transferred together as a batch unit. |
input.source_type | hugegraph-server | The source type to load input data, allowed values: [‘hugegraph-server’, ‘hugegraph-loader’], the ‘hugegraph-loader’ means use hugegraph-loader load data from HDFS or file, if use ‘hugegraph-loader’ load data then please config ‘input.loader_struct_path’ and ‘input.loader_schema_path’. |
input.split_fetch_timeout | 300 | The timeout in seconds to fetch input splits |
input.split_max_splits | 10000000 | The maximum number of input splits |
input.split_page_size | 500 | The page size for streamed load input split data |
input.split_size | 1048576 | The input split size in bytes |
job.id | local_0001 | The job id on Yarn cluster or K8s cluster. |
job.partitions_count | 1 | The partitions count for computing one graph algorithm job. |
job.partitions_thread_nums | 4 | The number of threads for partition parallel compute. |
job.workers_count | 1 | The workers count for computing one graph algorithm job. |
master.computation_class | org.apache.hugegraph.computer.core.master.DefaultMasterComputation | Master-computation is computation that can determine whether to continue next superstep. It runs at the end of each superstep on master. |
output.batch_size | 500 | The batch size of output |
output.batch_threads | 1 | The threads number used to batch output |
output.hdfs_core_site_path | The hdfs core site path. | |
output.hdfs_delimiter | , | The delimiter of hdfs output. |
output.hdfs_kerberos_enable | false | Is Kerberos authentication enabled for Hdfs. |
output.hdfs_kerberos_keytab | The Hdfs’s key tab file for kerberos authentication. | |
output.hdfs_kerberos_principal | The Hdfs’s principal for kerberos authentication. | |
output.hdfs_krb5_conf | /etc/krb5.conf | Kerberos configuration file. |
output.hdfs_merge_partitions | true | Whether merge output files of multiple partitions. |
output.hdfs_path_prefix | /hugegraph-computer/results | The directory of hdfs output result. |
output.hdfs_replication | 3 | The replication number of hdfs. |
output.hdfs_site_path | The hdfs site path. | |
output.hdfs_url | hdfs://127.0.0.1:9000 | The hdfs url of output. |
output.hdfs_user | hadoop | The hdfs user of output. |
output.output_class | org.apache.hugegraph.computer.core.output.LogOutput | The class to output the computation result of each vertex. Be called after iteration computation. |
output.result_name | value | The value is assigned dynamically by #name() of instance created by WORKER_COMPUTATION_CLASS. |
output.result_write_type | OLAP_COMMON | The result write-type to output to hugegraph, allowed values are: [OLAP_COMMON, OLAP_SECONDARY, OLAP_RANGE]. |
output.retry_interval | 10 | The retry interval when output failed |
output.retry_times | 3 | The retry times when output failed |
output.single_threads | 1 | The threads number used to single output |
output.thread_pool_shutdown_timeout | 60 | The timeout seconds of output threads pool shutdown |
output.with_adjacent_edges | false | Output the adjacent edges of the vertex or not |
output.with_edge_properties | false | Output the properties of the edge or not |
output.with_vertex_properties | false | Output the properties of the vertex or not |
sort.thread_nums | 4 | The number of threads performing internal sorting. |
transport.client_connect_timeout | 3000 | The timeout(in ms) of client connect to server. |
transport.client_threads | 4 | The number of transport threads for client. |
transport.close_timeout | 10000 | The timeout(in ms) of close server or close client. |
transport.finish_session_timeout | 0 | The timeout(in ms) to finish session, 0 means using (transport.sync_request_timeout * transport.max_pending_requests). |
transport.heartbeat_interval | 20000 | The minimum interval(in ms) between heartbeats on client side. |
transport.io_mode | AUTO | The network IO Mode, either ‘NIO’, ‘EPOLL’, ‘AUTO’, the ‘AUTO’ means selecting the property mode automatically. |
transport.max_pending_requests | 8 | The max number of client unreceived ack, it will trigger the sending unavailable if the number of unreceived ack >= max_pending_requests. |
transport.max_syn_backlog | 511 | The capacity of SYN queue on server side, 0 means using system default value. |
transport.max_timeout_heartbeat_count | 120 | The maximum times of timeout heartbeat on client side, if the number of timeouts waiting for heartbeat response continuously > max_heartbeat_timeouts the channel will be closed from client side. |
transport.min_ack_interval | 200 | The minimum interval(in ms) of server reply ack. |
transport.min_pending_requests | 6 | The minimum number of client unreceived ack, it will trigger the sending available if the number of unreceived ack < min_pending_requests. |
transport.network_retries | 3 | The number of retry attempts for network communication,if network unstable. |
transport.provider_class | org.apache.hugegraph.computer.core.network.netty.NettyTransportProvider | The transport provider, currently only supports Netty. |
transport.receive_buffer_size | 0 | The size of socket receive-buffer in bytes, 0 means using system default value. |
transport.recv_file_mode | true | Whether enable receive buffer-file mode, it will receive buffer write file from socket by zero-copy if enable. |
transport.send_buffer_size | 0 | The size of socket send-buffer in bytes, 0 means using system default value. |
transport.server_host | 127.0.0.1 | The server hostname or ip to listen on to transfer data. |
transport.server_idle_timeout | 360000 | The max timeout(in ms) of server idle. |
transport.server_port | 0 | The server port to listen on to transfer data. The system will assign a random port if it’s set to 0. |
transport.server_threads | 4 | The number of transport threads for server. |
transport.sync_request_timeout | 10000 | The timeout(in ms) to wait response after sending sync-request. |
transport.tcp_keep_alive | true | Whether enable TCP keep-alive. |
transport.transport_epoll_lt | false | Whether enable EPOLL level-trigger. |
transport.write_buffer_high_mark | 67108864 | The high water mark for write buffer in bytes, it will trigger the sending unavailable if the number of queued bytes > write_buffer_high_mark. |
transport.write_buffer_low_mark | 33554432 | The low water mark for write buffer in bytes, it will trigger the sending available if the number of queued bytes < write_buffer_low_mark.org.apache.hugegraph.config.OptionChecker$$Lambda$97/0x00000008001c8440@776a6d9b |
transport.write_socket_timeout | 3000 | The timeout(in ms) to write data to socket buffer. |
valuefile.max_segment_size | 1073741824 | The max number of bytes in each segment of value-file. |
worker.combiner_class | org.apache.hugegraph.computer.core.config.Null | Combiner can combine messages into one value for a vertex, for example page-rank algorithm can combine messages of a vertex to a sum value. |
worker.computation_class | org.apache.hugegraph.computer.core.config.Null | The class to create worker-computation object, worker-computation is used to compute each vertex in each superstep. |
worker.data_dirs | [jobs] | The directories separated by ‘,’ that received vertices and messages can persist into. |
worker.edge_properties_combiner_class | org.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombiner | The combiner can combine several properties of the same edge into one properties at inputstep. |
worker.partitioner | org.apache.hugegraph.computer.core.graph.partition.HashPartitioner | The partitioner that decides which partition a vertex should be in, and which worker a partition should be in. |
worker.received_buffers_bytes_limit | 104857600 | The limit bytes of buffers of received data, the total size of all buffers can’t excess this limit. If received buffers reach this limit, they will be merged into a file. |
worker.vertex_properties_combiner_class | org.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombiner | The combiner can combine several properties of the same vertex into one properties at inputstep. |
worker.wait_finish_messages_timeout | 86400000 | The max timeout(in ms) message-handler wait for finish-message of all workers. |
worker.wait_sort_timeout | 600000 | The max timeout(in ms) message-handler wait for sort-thread to sort one batch of buffers. |
worker.write_buffer_capacity | 52428800 | The initial size of write buffer that used to store vertex or message. |
worker.write_buffer_threshold | 52428800 | The threshold of write buffer, exceeding it will trigger sorting, the write buffer is used to store vertex or message. |
K8s Operator Config Options
NOTE: Option needs to be converted through environment variable settings, e.g. k8s.internal_etcd_url => INTERNAL_ETCD_URL
config option | default value | description |
---|---|---|
k8s.auto_destroy_pod | true | Whether to automatically destroy all pods when the job is completed or failed. |
k8s.close_reconciler_timeout | 120 | The max timeout(in ms) to close reconciler. |
k8s.internal_etcd_url | http://127.0.0.1:2379 | The internal etcd url for operator system. |
k8s.max_reconcile_retry | 3 | The max retry times of reconcile. |
k8s.probe_backlog | 50 | The maximum backlog for serving health probes. |
k8s.probe_port | 9892 | The value is the port that the controller bind to for serving health probes. |
k8s.ready_check_internal | 1000 | The time interval(ms) of check ready. |
k8s.ready_timeout | 30000 | The max timeout(in ms) of check ready. |
k8s.reconciler_count | 10 | The max number of reconciler thread. |
k8s.resync_period | 600000 | The minimum frequency at which watched resources are reconciled. |
k8s.timezone | Asia/Shanghai | The timezone of computer job and operator. |
k8s.watch_namespace | hugegraph-computer-system | The value is watch custom resources in the namespace, ignore other namespaces, the ‘*’ means is all namespaces will be watched. |
HugeGraph-Computer CRD
spec | default value | description | required |
---|---|---|---|
algorithmName | The name of algorithm. | true | |
jobId | The job id. | true | |
image | The image of algorithm. | true | |
computerConf | The map of computer config options. | true | |
workerInstances | The number of worker instances, it will instead the ‘job.workers_count’ option. | true | |
pullPolicy | Always | The pull-policy of image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy | false |
pullSecrets | The pull-secrets of Image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod | false | |
masterCpu | The cpu limit of master, the unit can be ’m’ or without unit detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu | false | |
workerCpu | The cpu limit of worker, the unit can be ’m’ or without unit detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu | false | |
masterMemory | The memory limit of master, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory | false | |
workerMemory | The memory limit of worker, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory | false | |
log4jXml | The content of log4j.xml for computer job. | false | |
jarFile | The jar path of computer algorithm. | false | |
remoteJarUri | The remote jar uri of computer algorithm, it will overlay algorithm image. | false | |
jvmOptions | The java startup parameters of computer job. | false | |
envVars | please refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/ | false | |
envFrom | please refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/ | false | |
masterCommand | bin/start-computer.sh | The run command of master, equivalent to ‘Entrypoint’ field of Docker. | false |
masterArgs | ["-r master", “-d k8s”] | The run args of master, equivalent to ‘Cmd’ field of Docker. | false |
workerCommand | bin/start-computer.sh | The run command of worker, equivalent to ‘Entrypoint’ field of Docker. | false |
workerArgs | ["-r worker", “-d k8s”] | The run args of worker, equivalent to ‘Cmd’ field of Docker. | false |
volumes | Please refer to: https://kubernetes.io/docs/concepts/storage/volumes/ | false | |
volumeMounts | Please refer to: https://kubernetes.io/docs/concepts/storage/volumes/ | false | |
secretPaths | The map of k8s-secret name and mount path. | false | |
configMapPaths | The map of k8s-configmap name and mount path. | false | |
podTemplateSpec | Please refer to: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-template-v1/#PodTemplateSpec | false | |
securityContext | Please refer to: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ | false |
KubeDriver Config Options
config option | default value | description |
---|---|---|
k8s.build_image_bash_path | The path of command used to build image. | |
k8s.enable_internal_algorithm | true | Whether enable internal algorithm. |
k8s.framework_image_url | hugegraph/hugegraph-computer:latest | The image url of computer framework. |
k8s.image_repository_password | The password for login image repository. | |
k8s.image_repository_registry | The address for login image repository. | |
k8s.image_repository_url | hugegraph/hugegraph-computer | The url of image repository. |
k8s.image_repository_username | The username for login image repository. | |
k8s.internal_algorithm | [pageRank] | The name list of all internal algorithm. |
k8s.internal_algorithm_image_url | hugegraph/hugegraph-computer:latest | The image url of internal algorithm. |
k8s.jar_file_dir | /cache/jars/ | The directory where the algorithm jar to upload location. |
k8s.kube_config | ~/.kube/config | The path of k8s config file. |
k8s.log4j_xml_path | The log4j.xml path for computer job. | |
k8s.namespace | hugegraph-computer-system | The namespace of hugegraph-computer system. |
k8s.pull_secret_names | [] | The names of pull-secret for pulling image. |
5 - API
5.1 - HugeGraph RESTful API
HugeGraph-Server provides interfaces for clients to operate on graphs based on the HTTP protocol through the HugeGraph-API. These interfaces primarily include the ability to add, delete, modify, and query metadata and graph data, perform traversal algorithms, handle variables, and perform other graph-related operations.
Expect the doc below, you can also use swagger-ui
to visit the RESTful API
by localhost:8080/swagger-ui/index.html
. Here is an example
5.1.1 - Schema API
1.1 Schema
HugeGraph provides a single interface to get all Schema information of a graph, including: PropertyKey, VertexLabel, EdgeLabel and IndexLabel.
Method & Url
GET http://localhost:8080/graphs/{graph_name}/schema
e.g: GET http://localhost:8080/graphs/hugegraph/schema
Response Status
200
Response Body
{
"propertykeys": [
{
"id": 7,
"name": "price",
"data_type": "DOUBLE",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:05.316"
}
},
{
"id": 6,
"name": "date",
"data_type": "TEXT",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:05.309"
}
},
{
"id": 3,
"name": "city",
"data_type": "TEXT",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:05.287"
}
},
{
"id": 2,
"name": "age",
"data_type": "INT",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:05.280"
}
},
{
"id": 5,
"name": "lang",
"data_type": "TEXT",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:05.301"
}
},
{
"id": 4,
"name": "weight",
"data_type": "DOUBLE",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:05.294"
}
},
{
"id": 1,
"name": "name",
"data_type": "TEXT",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:05.250"
}
}
],
"vertexlabels": [
{
"id": 1,
"name": "person",
"id_strategy": "PRIMARY_KEY",
"primary_keys": [
"name"
],
"nullable_keys": [
"age",
"city"
],
"index_labels": [
"personByAge",
"personByCity",
"personByAgeAndCity"
],
"properties": [
"name",
"age",
"city"
],
"status": "CREATED",
"ttl": 0,
"enable_label_index": true,
"user_data": {
"~create_time": "2023-05-08 17:49:05.336"
}
},
{
"id": 2,
"name": "software",
"id_strategy": "CUSTOMIZE_NUMBER",
"primary_keys": [],
"nullable_keys": [],
"index_labels": [
"softwareByPrice"
],
"properties": [
"name",
"lang",
"price"
],
"status": "CREATED",
"ttl": 0,
"enable_label_index": true,
"user_data": {
"~create_time": "2023-05-08 17:49:05.347"
}
}
],
"edgelabels": [
{
"id": 1,
"name": "knows",
"source_label": "person",
"target_label": "person",
"frequency": "SINGLE",
"sort_keys": [],
"nullable_keys": [],
"index_labels": [
"knowsByWeight"
],
"properties": [
"weight",
"date"
],
"status": "CREATED",
"ttl": 0,
"enable_label_index": true,
"user_data": {
"~create_time": "2023-05-08 17:49:08.437"
}
},
{
"id": 2,
"name": "created",
"source_label": "person",
"target_label": "software",
"frequency": "SINGLE",
"sort_keys": [],
"nullable_keys": [],
"index_labels": [
"createdByDate",
"createdByWeight"
],
"properties": [
"weight",
"date"
],
"status": "CREATED",
"ttl": 0,
"enable_label_index": true,
"user_data": {
"~create_time": "2023-05-08 17:49:08.446"
}
}
],
"indexlabels": [
{
"id": 1,
"name": "personByAge",
"base_type": "VERTEX_LABEL",
"base_value": "person",
"index_type": "RANGE_INT",
"fields": [
"age"
],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:05.375"
}
},
{
"id": 2,
"name": "personByCity",
"base_type": "VERTEX_LABEL",
"base_value": "person",
"index_type": "SECONDARY",
"fields": [
"city"
],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:06.898"
}
},
{
"id": 3,
"name": "personByAgeAndCity",
"base_type": "VERTEX_LABEL",
"base_value": "person",
"index_type": "SECONDARY",
"fields": [
"age",
"city"
],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:07.407"
}
},
{
"id": 4,
"name": "softwareByPrice",
"base_type": "VERTEX_LABEL",
"base_value": "software",
"index_type": "RANGE_DOUBLE",
"fields": [
"price"
],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:07.916"
}
},
{
"id": 5,
"name": "createdByDate",
"base_type": "EDGE_LABEL",
"base_value": "created",
"index_type": "SECONDARY",
"fields": [
"date"
],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:08.454"
}
},
{
"id": 6,
"name": "createdByWeight",
"base_type": "EDGE_LABEL",
"base_value": "created",
"index_type": "RANGE_DOUBLE",
"fields": [
"weight"
],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:08.963"
}
},
{
"id": 7,
"name": "knowsByWeight",
"base_type": "EDGE_LABEL",
"base_value": "knows",
"index_type": "RANGE_DOUBLE",
"fields": [
"weight"
],
"status": "CREATED",
"user_data": {
"~create_time": "2023-05-08 17:49:09.473"
}
}
]
}
5.1.2 - PropertyKey API
1.2 PropertyKey
Params Description:
- name: The name of the property type, required.
- data_type: The data type of the property type, including: bool, byte, int, long, float, double, text, blob, date, uuid. The default data type is
text
(Represent astring
type) - cardinality: The cardinality of the property type, including: single, list, set. The default cardinality is
single
.
Request Body Field Description:
- id: The ID value of the property type.
- properties: The properties of the property type. For properties, this field is empty.
- user_data: Setting the common information of the property type, such as setting the value range of the age property from 0 to 100. Currently, no validation is performed on this field, and it is only a reserved entry for future expansion.
1.2.1 Create a PropertyKey
Method & Url
POST http://localhost:8080/graphs/hugegraph/schema/propertykeys
Request Body
{
"name": "age",
"data_type": "INT",
"cardinality": "SINGLE"
}
Response Status
202
Response Body
{
"property_key": {
"id": 1,
"name": "age",
"data_type": "INT",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"~create_time": "2022-05-13 13:47:23.745"
}
},
"task_id": 0
}
1.2.2 Add or Remove userdata for an existing PropertyKey
Params
- action: Indicates whether the current action is to add or remove userdata. Possible values are
append
(add) andeliminate
(remove).
Method & Url
PUT http://localhost:8080/graphs/hugegraph/schema/propertykeys/age?action=append
Request Body
{
"name": "age",
"user_data": {
"min": 0,
"max": 100
}
}
Response Status
202
Response Body
{
"property_key": {
"id": 1,
"name": "age",
"data_type": "INT",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"min": 0,
"max": 100,
"~create_time": "2022-05-13 13:47:23.745"
}
},
"task_id": 0
}
1.2.3 Get all PropertyKeys
Method & Url
GET http://localhost:8080/graphs/hugegraph/schema/propertykeys
Response Status
200
Response Body
{
"propertykeys": [
{
"id": 3,
"name": "city",
"data_type": "TEXT",
"cardinality": "SINGLE",
"properties": [],
"user_data": {}
},
{
"id": 2,
"name": "age",
"data_type": "INT",
"cardinality": "SINGLE",
"properties": [],
"user_data": {}
},
{
"id": 5,
"name": "lang",
"data_type": "TEXT",
"cardinality": "SINGLE",
"properties": [],
"user_data": {}
},
{
"id": 4,
"name": "weight",
"data_type": "DOUBLE",
"cardinality": "SINGLE",
"properties": [],
"user_data": {}
},
{
"id": 6,
"name": "date",
"data_type": "TEXT",
"cardinality": "SINGLE",
"properties": [],
"user_data": {}
},
{
"id": 1,
"name": "name",
"data_type": "TEXT",
"cardinality": "SINGLE",
"properties": [],
"user_data": {}
},
{
"id": 7,
"name": "price",
"data_type": "INT",
"cardinality": "SINGLE",
"properties": [],
"user_data": {}
}
]
}
1.2.4 Get PropertyKey according to name
Method & Url
GET http://localhost:8080/graphs/hugegraph/schema/propertykeys/age
Where age
is the name of the PropertyKey to be retrieved.
Response Status
200
Response Body
{
"id": 1,
"name": "age",
"data_type": "INT",
"cardinality": "SINGLE",
"aggregate_type": "NONE",
"write_type": "OLTP",
"properties": [],
"status": "CREATED",
"user_data": {
"min": 0,
"max": 100,
"~create_time": "2022-05-13 13:47:23.745"
}
}
1.2.5 Delete PropertyKey according to name
Method & Url
DELETE http://localhost:8080/graphs/hugegraph/schema/propertykeys/age
Where age
is the name of the PropertyKey to be deleted.
Response Status
202
Response Body
{
"task_id" : 0
}
5.1.3 - VertexLabel API
1.3 VertexLabel
Assuming that the PropertyKeys listed in 1.1.3 have already been created.
Params Description:
- id: The ID value of the vertex type.
- name: The name of the vertex type, required.
- id_strategy: The ID strategy for the vertex type, including primary key ID, auto-generated, custom string, custom number, custom UUID. The default strategy is primary key ID.
- properties: The property types associated with the vertex type.
- primary_keys: The primary key properties. This field must have a value when the ID strategy is PRIMARY_KEY, and must be empty for other ID strategies.
- enable_label_index: Whether to enable label indexing. It is disabled by default.
- index_names: The indexes created for the vertex type. See details in section 3.4.
- nullable_keys: Nullable properties.
- user_data: Setting the common information of the vertex type, similar to the property type.
1.3.1 Create a VertexLabel
Method & Url
POST http://localhost:8080/graphs/hugegraph/schema/vertexlabels
Request Body
{
"name": "person",
"id_strategy": "DEFAULT",
"properties": [
"name",
"age"
],
"primary_keys": [
"name"
],
"nullable_keys": [],
"enable_label_index": true
}
Response Status
201
Response Body
{
"id": 1,
"primary_keys": [
"name"
],
"id_strategy": "PRIMARY_KEY",
"name": "person2",
"index_names": [
],
"properties": [
"name",
"age"
],
"nullable_keys": [
],
"enable_label_index": true,
"user_data": {}
}
Starting from version v0.11.2, hugegraph-server supports Time-to-Live (TTL) functionality for vertices. The TTL for vertices is set through VertexLabel. For example, if you want the vertices of type “person” to have a lifespan of one day, you need to set the TTL field to 86400000 (in milliseconds) when creating the “person” VertexLabel.
{
"name": "person",
"id_strategy": "DEFAULT",
"properties": [
"name",
"age"
],
"primary_keys": [
"name"
],
"nullable_keys": [],
"ttl": 86400000,
"enable_label_index": true
}
Additionally, if the vertex has a property called “createdTime” and you want to use it as the starting point for calculating the vertex’s lifespan, you can set the ttl_start_time field in the VertexLabel. For example, if the “person” VertexLabel has a property called “createdTime” of type Date, and you want the vertices of type “person” to live for one day starting from the creation time, the Request Body for creating the “person” VertexLabel would be as follows:
{
"name": "person",
"id_strategy": "DEFAULT",
"properties": [
"name",
"age",
"createdTime"
],
"primary_keys": [
"name"
],
"nullable_keys": [],
"ttl": 86400000,
"ttl_start_time": "createdTime",
"enable_label_index": true
}
1.3.2 Add properties or userdata to an existing VertexLabel, or remove userdata (removing properties is currently not supported)
Params
- action: Indicates whether the current action is to add or remove. Possible values are
append
(add) andeliminate
(remove).
Method & Url
PUT http://localhost:8080/graphs/hugegraph/schema/vertexlabels/person?action=append
Request Body
{
"name": "person",
"properties": [
"city"
],
"nullable_keys": ["city"],
"user_data": {
"super": "animal"
}
}
Response Status
200
Response Body
{
"id": 1,
"primary_keys": [
"name"
],
"id_strategy": "PRIMARY_KEY",
"name": "person",
"index_names": [
],
"properties": [
"city",
"name",
"age"
],
"nullable_keys": [
"city"
],
"enable_label_index": true,
"user_data": {
"super": "animal"
}
}
1.3.3 Get all VertexLabels
Method & Url
GET http://localhost:8080/graphs/hugegraph/schema/vertexlabels
Response Status
200
Response Body
{
"vertexlabels": [
{
"id": 1,
"primary_keys": [
"name"
],
"id_strategy": "PRIMARY_KEY",
"name": "person",
"index_names": [
],
"properties": [
"city",
"name",
"age"
],
"nullable_keys": [
"city"
],
"enable_label_index": true,
"user_data": {
"super": "animal"
}
},
{
"id": 2,
"primary_keys": [
"name"
],
"id_strategy": "PRIMARY_KEY",
"name": "software",
"index_names": [
],
"properties": [
"price",
"name",
"lang"
],
"nullable_keys": [
"price"
],
"enable_label_index": false,
"user_data": {}
}
]
}
1.3.4 Get VertexLabel by name
Method & Url
GET http://localhost:8080/graphs/hugegraph/schema/vertexlabels/person
Response Status
200
Response Body
{
"id": 1,
"primary_keys": [
"name"
],
"id_strategy": "PRIMARY_KEY",
"name": "person",
"index_names": [
],
"properties": [
"city",
"name",
"age"
],
"nullable_keys": [
"city"
],
"enable_label_index": true,
"user_data": {
"super": "animal"
}
}
1.3.5 Delete VertexLabel by name
Deleting a VertexLabel will result in the removal of corresponding vertices and related index data. This operation will generate an asynchronous task.
Method & Url
DELETE http://localhost:8080/graphs/hugegraph/schema/vertexlabels/person
Response Status
202
Response Body
{
"task_id": 1
}
Note:
You can use
GET http://localhost:8080/graphs/hugegraph/tasks/1
(where “1” is the task_id) to query the execution status of the asynchronous task. For more information, refer to the Asynchronous Task RESTful API.
5.1.4 - EdgeLabel API
1.4 EdgeLabel
Assuming PropertyKeys from version 1.2.3 and VertexLabels from version 1.3.3 have already been created.
Params Explanation
- name: Name of the vertex type, required.
- source_label: Name of the source vertex type, required.
- target_label: Name of the target vertex type, required.
- frequency: Whether there can be multiple edges between two points, can have values SINGLE or MULTIPLE, optional (default value: SINGLE).
- properties: Property types associated with the edge type, optional.
- sort_keys: Specifies a list of differentiating key properties when multiple associations are allowed.
- nullable_keys: Nullable properties, optional (default: nullable).
- enable_label_index: Whether to enable type indexing, disabled by default.
1.4.1 Create an EdgeLabel
Method & Url
POST http://localhost:8080/graphs/hugegraph/schema/edgelabels
Request Body
{
"name": "created",
"source_label": "person",
"target_label": "software",
"frequency": "SINGLE",
"properties": [
"date"
],
"sort_keys": [],
"nullable_keys": [],
"enable_label_index": true
}
Response Status
201
Response Body
{
"id": 1,
"sort_keys": [
],
"source_label": "person",
"name": "created",
"index_names": [
],
"properties": [
"date"
],
"target_label": "software",
"frequency": "SINGLE",
"nullable_keys": [
],
"enable_label_index": true,
"user_data": {}
}
Starting from version 0.11.2 of hugegraph-server, the TTL (Time to Live) feature for edges is supported. The TTL for edges is set through EdgeLabel. For example, if you want the “knows” type of edge to have a lifespan of one day, you need to set the TTL field to 86400000 when creating the “knows” EdgeLabel, where the unit is milliseconds.
{
"id": 1,
"sort_keys": [
],
"source_label": "person",
"name": "knows",
"index_names": [
],
"properties": [
"date",
"createdTime"
],
"target_label": "person",
"frequency": "SINGLE",
"nullable_keys": [
],
"enable_label_index": true,
"ttl": 86400000,
"user_data": {}
}
Additionally, when the edge has a property called “createdTime” and you want to use the “createdTime” property as the starting point for calculating the edge’s lifespan, you can set the ttl_start_time field in the EdgeLabel. For example, if the knows EdgeLabel has a property called “createdTime” which is of type Date, and you want the “knows” type of edge to live for one day from the time of creation, the Request Body for creating the knows EdgeLabel would be as follows:
{
"id": 1,
"sort_keys": [
],
"source_label": "person",
"name": "knows",
"index_names": [
],
"properties": [
"date",
"createdTime"
],
"target_label": "person",
"frequency": "SINGLE",
"nullable_keys": [
],
"enable_label_index": true,
"ttl": 86400000,
"ttl_start_time": "createdTime",
"user_data": {}
}
1.4.2 Add properties or userdata to an existing EdgeLabel, or remove userdata (removing properties is currently not supported)
Params
- action: Indicates whether the current action is to add or remove, with values
append
(add) andeliminate
(remove).
Method & Url
PUT http://localhost:8080/graphs/hugegraph/schema/edgelabels/created?action=append
Request Body
{
"name": "created",
"properties": [
"weight"
],
"nullable_keys": [
"weight"
]
}
Response Status
200
Response Body
{
"id": 2,
"sort_keys": [
],
"source_label": "person",
"name": "created",
"index_names": [
],
"properties": [
"date",
"weight"
],
"target_label": "software",
"frequency": "SINGLE",
"nullable_keys": [
"weight"
],
"enable_label_index": true,
"user_data": {}
}
1.4.3 Get all EdgeLabels
Method & Url
GET http://localhost:8080/graphs/hugegraph/schema/edgelabels
Response Status
200
Response Body
{
"edgelabels": [
{
"id": 1,
"sort_keys": [
],
"source_label": "person",
"name": "created",
"index_names": [
],
"properties": [
"date",
"weight"
],
"target_label": "software",
"frequency": "SINGLE",
"nullable_keys": [
"weight"
],
"enable_label_index": true,
"user_data": {}
},
{
"id": 2,
"sort_keys": [
],
"source_label": "person",
"name": "knows",
"index_names": [
],
"properties": [
"date",
"weight"
],
"target_label": "person",
"frequency": "SINGLE",
"nullable_keys": [
],
"enable_label_index": false,
"user_data": {}
}
]
}
1.4.4 Get EdgeLabel by name
Method & Url
GET http://localhost:8080/graphs/hugegraph/schema/edgelabels/created
Response Status
200
Response Body
{
"id": 1,
"sort_keys": [
],
"source_label": "person",
"name": "created",
"index_names": [
],
"properties": [
"date",
"city",
"weight"
],
"target_label": "software",
"frequency": "SINGLE",
"nullable_keys": [
"city",
"weight"
],
"enable_label_index": true,
"user_data": {}
}
1.4.5 Delete EdgeLabel by name
Deleting an EdgeLabel will result in the deletion of corresponding edges and related index data. This operation will generate an asynchronous task.
Method & Url
DELETE http://localhost:8080/graphs/hugegraph/schema/edgelabels/created
Response Status
202
Response Body
{
"task_id": 1
}
Note:
You can query the execution status of an asynchronous task by using
GET http://localhost:8080/graphs/hugegraph/tasks/1
(where “1” is the task_id). For more information, refer to the Asynchronous Task RESTful API.
5.1.5 - IndexLabel API
1.5 IndexLabel
Assuming PropertyKeys from version 1.1.3, VertexLabels from version 1.2.3, and EdgeLabels from version 1.3.3 have already been created.
1.5.1 Create an IndexLabel
Method & Url
POST http://localhost:8080/graphs/hugegraph/schema/indexlabels
Request Body
{
"name": "personByCity",
"base_type": "VERTEX_LABEL",
"base_value": "person",
"index_type": "SECONDARY",
"fields": [
"city"
]
}
Response Status
202
Response Body
{
"index_label": {
"id": 1,
"base_type": "VERTEX_LABEL",
"base_value": "person",
"name": "personByCity",
"fields": [
"city"
],
"index_type": "SECONDARY"
},
"task_id": 2
}
1.5.2 Get all IndexLabels
Method & Url
GET http://localhost:8080/graphs/hugegraph/schema/indexlabels
Response Status
200
Response Body
{
"indexlabels": [
{
"id": 3,
"base_type": "VERTEX_LABEL",
"base_value": "software",
"name": "softwareByPrice",
"fields": [
"price"
],
"index_type": "RANGE"
},
{
"id": 4,
"base_type": "EDGE_LABEL",
"base_value": "created",
"name": "createdByDate",
"fields": [
"date"
],
"index_type": "SECONDARY"
},
{
"id": 1,
"base_type": "VERTEX_LABEL",
"base_value": "person",
"name": "personByCity",
"fields": [
"city"
],
"index_type": "SECONDARY"
},
{
"id": 3,
"base_type": "VERTEX_LABEL",
"base_value": "person",
"name": "personByAgeAndCity",
"fields": [
"age",
"city"
],
"index_type": "SECONDARY"
}
]
}
1.5.3 Get IndexLabel by name
Method & Url
GET http://localhost:8080/graphs/hugegraph/schema/indexlabels/personByCity
Response Status
200
Response Body
{
"id": 1,
"base_type": "VERTEX_LABEL",
"base_value": "person",
"name": "personByCity",
"fields": [
"city"
],
"index_type": "SECONDARY"
}
1.5.4 Delete IndexLabel by name
Deleting an IndexLabel will result in the deletion of related index data. This operation will generate an asynchronous task.
Method & Url
DELETE http://localhost:8080/graphs/hugegraph/schema/indexlabels/personByCity
Response Status
202
Response Body
{
"task_id": 1
}
Note:
You can query the execution status of an asynchronous task by using
GET http://localhost:8080/graphs/hugegraph/tasks/1
(where “1” is the task_id). For more information, refer to the Asynchronous Task RESTful API.
5.1.6 - Rebuild API
1.6 Rebuild
1.6.1 Rebuild IndexLabel
Method & Url
PUT http://localhost:8080/graphs/hugegraph/jobs/rebuild/indexlabels/personByCity
Response Status
202
Response Body
{
"task_id": 1
}
Note:
You can get the asynchronous job status by
GET http://localhost:8080/graphs/hugegraph/tasks/${task_id}
(the task_id here should be 1). See More AsyncJob RESTfull API
1.6.2 Rebulid all Indexs of VertexLabel
Method & Url
PUT http://localhost:8080/graphs/hugegraph/jobs/rebuild/vertexlabels/person
Response Status
202
Response Body
{
"task_id": 2
}
Note:
You can get the asynchronous job status by
GET http://localhost:8080/graphs/hugegraph/tasks/${task_id}
(the task_id here should be 2). See More AsyncJob RESTfull API
1.6.3 Rebulid all Indexs of EdgeLabel
Method & Url
PUT http://localhost:8080/graphs/hugegraph/jobs/rebuild/edgelabels/created
Response Status
202
Response Body
{
"task_id": 3
}
Note:
You can get the asynchronous job status by
GET http://localhost:8080/graphs/hugegraph/tasks/${task_id}
(the task_id here should be 3). See More AsyncJob RESTfull API
5.1.7 - Vertex API
2.1 Vertex
In vertex types, the Id
strategy determines the type of the vertex Id
, with the corresponding relationships as follows:
Id_Strategy | id type |
---|---|
AUTOMATIC | number |
PRIMARY_KEY | string |
CUSTOMIZE_STRING | string |
CUSTOMIZE_NUMBER | number |
CUSTOMIZE_UUID | uuid |
For the GET/PUT/DELETE
API of a vertex, the id part in the URL should be passed as the id value with type information. This type information is indicated by whether the JSON string is enclosed in quotes, meaning:
- When the id type is
number
, the id in the URL is without quotes, for example:xxx/vertices/123456
. - When the id type is
string
, the id in the URL is enclosed in quotes, for example:xxx/vertices/"123456"
.
The next example requires first creating the graph schema
from the following groovy
script
schema.propertyKey("name").asText().ifNotExist().create();
schema.propertyKey("age").asInt().ifNotExist().create();
schema.propertyKey("city").asText().ifNotExist().create();
schema.propertyKey("weight").asDouble().ifNotExist().create();
schema.propertyKey("lang").asText().ifNotExist().create();
schema.propertyKey("price").asDouble().ifNotExist().create();
schema.propertyKey("hobby").asText().valueList().ifNotExist().create();
schema.vertexLabel("person").properties("name", "age", "city", "weight", "hobby").primaryKeys("name").nullableKeys("age", "city", "weight", "hobby").ifNotExist().create();
schema.vertexLabel("software").properties("name", "lang", "price").primaryKeys("name").nullableKeys("lang", "price").ifNotExist().create();
schema.indexLabel("personByAge").onV("person").by("age").range().ifNotExist().create();
2.1.1 Create a vertex
Method & Url
POST http://localhost:8080/graphs/hugegraph/graph/vertices
Request Body
{
"label": "person",
"properties": {
"name": "marko",
"age": 29
}
}
Response Status
201
Response Body
{
"id": "1:marko",
"label": "person",
"type": "vertex",
"properties": {
"name": "marko",
"age": 29
}
}
2.1.2 Create multiple vertices
Method & Url
POST http://localhost:8080/graphs/hugegraph/graph/vertices/batch
Request Body
[
{
"label": "person",
"properties": {
"name": "marko",
"age": 29
}
},
{
"label": "software",
"properties": {
"name": "ripple",
"lang": "java",
"price": 199
}
}
]
Response Status
201
Response Body
[
"1:marko",
"2:ripple"
]
2.1.3 Update vertex properties
Method & Url
PUT http://127.0.0.1:8080/graphs/hugegraph/graph/vertices/"1:marko"?action=append
Request Body
{
"label": "person",
"properties": {
"age": 30,
"city": "Beijing"
}
}
Note: There are three categories for property values: single, set, and list. If it is single, it means adding or updating the property value. If it is set or list, it means appending the property value.
Response Status
200
Response Body
{
"id": "1:marko",
"label": "person",
"type": "vertex",
"properties": {
"name": "marko",
"age": 30,
"city": "Beijing"
}
}
2.1.4 Batch Update Vertex Properties
Function Description
Batch update properties of vertices and support various update strategies, including:
- SUM: Numeric accumulation
- BIGGER: Take the larger value between two numbers/dates
- SMALLER: Take the smaller value between two numbers/dates
- UNION: Take the union of set properties
- INTERSECTION: Take the intersection of set properties
- APPEND: Append elements to list properties
- ELIMINATE: Remove elements from list/set properties
- OVERRIDE: Override existing properties, if the new property is null, the old property is still used
Assuming the original vertex and properties are:
{
"vertices": [
{
"id": "2:lop",
"label": "software",
"type": "vertex",
"properties": {
"name": "lop",
"lang": "java",
"price": 328
}
},
{
"id": "1:josh",
"label": "person",
"type": "vertex",
"properties": {
"name": "josh",
"age": 32,
"city": "Beijing",
"weight": 0.1,
"hobby": [
"reading",
"football"
]
}
}
]
}
Add vertices with the following command:
curl -H "Content-Type: application/json" -d '[{"label":"person","properties":{"name":"josh","age":32,"city":"Beijing","weight":0.1,"hobby":["reading","football"]}},{"label":"software","properties":{"name":"lop","lang":"java","price":328}}]' http:///127.0.0.1:8080/graphs/hugegraph/graph/vertices/batch
Method & Url
PUT http://127.0.0.1:8080/graphs/hugegraph/graph/vertices/batch
Request Body
{
"vertices": [
{
"label": "software",
"type": "vertex",
"properties": {
"name": "lop",
"lang": "c++",
"price": 299
}
},
{
"label": "person",
"type": "vertex",
"properties": {
"name": "josh",
"city": "Shanghai",
"weight": 0.2,
"hobby": [
"swimming"
]
}
}
],
"update_strategies": {
"price": "BIGGER",
"age": "OVERRIDE",
"city": "OVERRIDE",
"weight": "SUM",
"hobby": "UNION"
},
"create_if_not_exist": true
}
Response Status
200
Response Body
{
"vertices": [
{
"id": "2:lop",
"label": "software",
"type": "vertex",
"properties": {
"name": "lop",
"lang": "c++",
"price": 328
}
},
{
"id": "1:josh",
"label": "person",
"type": "vertex",
"properties": {
"name": "josh",
"age": 32,
"city": "Shanghai",
"weight": 0.3,
"hobby": [
"reading",
"football",
"swimming"
]
}
}
]
}
Result Analysis:
- The lang property does not specify an update strategy and is directly overwritten by the new value, regardless of whether the new value is null.
- The price property specifies the BIGGER update strategy. The old property value is 328, and the new property value is 299, so the old property value of 328 is retained.
- The age property specifies the OVERRIDE update strategy, but the new property value does not include age, which is equivalent to age being null. Therefore, the original property value of 32 is still retained.
- The city property also specifies the OVERRIDE update strategy, and the new property value is not null, so it overrides the old value.
- The weight property specifies the SUM update strategy. The old property value is 0.1, and the new property value is 0.2. The final value is 0.3.
- The hobby property (cardinality is Set) specifies the UNION update strategy, so the new value is taken as the union with the old value.
The usage of other update strategies can be inferred in a similar manner and will not be further elaborated.
2.1.5 Delete Vertex Properties
Method & Url
PUT http://127.0.0.1:8080/graphs/hugegraph/graph/vertices/"1:marko"?action=eliminate
Request Body
{
"label": "person",
"properties": {
"city": "Beijing"
}
}
Note: Here, the properties (keys and all values) will be directly deleted, regardless of whether the property values are single, set, or list.
Response Status
200
Response Body
{
"id": "1:marko",
"label": "person",
"type": "vertex",
"properties": {
"name": "marko",
"age": 30
}
}
2.1.6 Get Vertices that Meet the Criteria
Params
- label: Vertex type
- properties: Property key-value pairs (precondition: indexes are created for property queries)
- limit: Maximum number of results
- page: Page number
All of the above parameters are optional. If the page
parameter is provided, the limit
parameter must also be provided, and no other parameters are allowed. label, properties
, and limit
can be combined in any way.
Property key-value pairs consist of the property name and value in JSON format. Multiple property key-value pairs are allowed as query conditions. The property value supports exact matching, range matching, and fuzzy matching. For exact matching, use the format properties={"age":29}
, for range matching, use the format properties={"age":"P.gt(29)"}
, and for fuzzy matching, use the format properties={"city": "P.textcontains("ChengDu China")}
. The following expressions are supported for range matching:
Expression | Explanation |
---|---|
P.eq(number) | Vertices with property value equal to number |
P.neq(number) | Vertices with property value not equal to number |
P.lt(number) | Vertices with property value less than number |
P.lte(number) | Vertices with property value less than or equal to number |
P.gt(number) | Vertices with property value greater than number |
P.gte(number) | Vertices with property value greater than or equal to number |
P.between(number1,number2) | Vertices with property value greater than or equal to number1 and less than number2 |
P.inside(number1,number2) | Vertices with property value greater than number1 and less than number2 |
P.outside(number1,number2) | Vertices with property value less than number1 and greater than number2 |
P.within(value1,value2,value3,…) | Vertices with property value equal to any of the given values |
Query all vertices with age 29 and label person
Method & Url
GET http://localhost:8080/graphs/hugegraph/graph/vertices?label=person&properties={"age":29}&limit=1
Response Status
200
Response Body
{
"vertices": [
{
"id": "1:marko",
"label": "person",
"type": "vertex",
"properties": {
"name": "marko",
"age": 30
}
}
]
}
Paginate through all vertices, retrieve the first page (page without parameter value), limited to 3 records
Add vertices with the following command:
curl -H "Content-Type: application/json" -d '[{"label":"person","properties":{"name":"peter","age":29,"city":"Shanghai"}},{"label":"person","properties":{"name":"vadas","age":27,"city":"Hongkong"}}]' http://localhost:8080/graphs/hugegraph/graph/vertices/batch
Method & Url
GET http://localhost:8080/graphs/hugegraph/graph/vertices?page&limit=3
Response Status
200
Response Body
{
"vertices": [
{
"id": "2:lop",
"label": "software",
"type": "vertex",
"properties": {
"name": "lop",
"lang": "c++",
"price": 328
}
},
{
"id": "1:josh",
"label": "person",
"type": "vertex",
"properties": {
"name": "josh",
"age": 32,
"city": "Shanghai",
"weight": 0.3,
"hobby": [
"reading",
"football",
"swimming"
]
}
},
{
"id": "1:marko",
"label": "person",
"type": "vertex",
"properties": {
"name": "marko",
"age": 30
}
}
],
"page": "CIYxOnBldGVyAAAAAAAAAAM="
}
The returned body
contains information about the page number of the next page
, "page": "CIYxOnBldGVyAAAAAAAAAAM"
. When querying the next page, assign this value to the page
parameter.
Paginate and retrieve all vertices, including the next page (passing the page
value returned from the previous page), limited to 3 items.
Method & Url
GET http://localhost:8080/graphs/hugegraph/graph/vertices?page=CIYxOnBldGVyAAAAAAAAAAM=&limit=3
Response Status
200
Response Body
{
"vertices": [
{
"id": "1:peter",
"label": "person",
"type": "vertex",
"properties": {
"name": "peter",
"age": 29,
"city": "Shanghai"
}
},
{
"id": "1:vadas",
"label": "person",
"type": "vertex",
"properties": {
"name": "vadas",
"age": 27,
"city": "Hongkong"
}
},
{
"id": "2:ripple",
"label": "software",
"type": "vertex",
"properties": {
"name": "ripple",
"lang": "java",
"price": 199
}
}
],
"page": null
}
At this point, "page": null
indicates that there are no more pages available. (Note: When using Cassandra as the backend for performance reasons, if the returned page happens to be the last page, the page
value may not be empty. When requesting the next page using that page
value, it will return empty data
and page = null
. The same applies to other similar situations.)
2.1.7 Retrieve Vertex by ID
Method & Url
GET http://localhost:8080/graphs/hugegraph/graph/vertices/"1:marko"
Response Status
200
Response Body
{
"id": "1:marko",
"label": "person",
"type": "vertex",
"properties": {
"name": "marko",
"age": 30
}
}
2.1.8 Delete Vertex by ID
Params
- label: Vertex type, optional parameter
Delete the vertex based on ID only.
Method & Url
DELETE http://localhost:8080/graphs/hugegraph/graph/vertices/"1:marko"
Response Status
204
Delete Vertex by Label+ID
When deleting a vertex by specifying both the Label parameter and the ID, it generally offers better performance compared to deleting by ID alone.
Method & Url
DELETE http://localhost:8080/graphs/hugegraph/graph/vertices/"1:marko"?label=person
Response Status
204
5.1.8 - Edge API
2.2 Edge
The modification of the vertex ID format also affects the ID of the edge, as well as the formats of the source vertex and target vertex IDs.
The EdgeId is formed by concatenating src-vertex-id + direction + label + sort-values + tgt-vertex-id
, but the vertex ID types are not distinguished by quotation marks here. Instead, they are distinguished by prefixes:
- When the ID type is number, the vertex ID in the EdgeId has a prefix
L
, like “L123456>1»L987654”. - When the ID type is string, the vertex ID in the EdgeId has a prefix
S
, like “S1:peter>1»S2:lop”.
The following example requires creating a graph schema
based on the following groovy
script:
import org.apache.hugegraph.HugeFactory
import org.apache.tinkerpop.gremlin.structure.T
conf = "conf/graphs/hugegraph.properties"
graph = HugeFactory.open(conf)
schema = graph.schema()
schema.propertyKey("name").asText().ifNotExist().create()
schema.propertyKey("age").asInt().ifNotExist().create()
schema.propertyKey("city").asText().ifNotExist().create()
schema.propertyKey("weight").asDouble().ifNotExist().create()
schema.propertyKey("lang").asText().ifNotExist().create()
schema.propertyKey("date").asText().ifNotExist().create()
schema.propertyKey("price").asInt().ifNotExist().create()
schema.vertexLabel("person").properties("name", "age", "city").primaryKeys("name").ifNotExist().create()
schema.vertexLabel("software").properties("name", "lang", "price").primaryKeys("name").ifNotExist().create()
schema.indexLabel("personByCity").onV("person").by("city").secondary().ifNotExist().create()
schema.indexLabel("personByAgeAndCity").onV("person").by("age", "city").secondary().ifNotExist().create()
schema.indexLabel("softwareByPrice").onV("software").by("price").range().ifNotExist().create()
schema.edgeLabel("knows").sourceLabel("person").targetLabel("person").properties("date", "weight").ifNotExist().create()
schema.edgeLabel("created").sourceLabel("person").targetLabel("software").properties("date", "weight").ifNotExist().create()
schema.indexLabel("createdByDate").onE("created").by("date").secondary().ifNotExist().create()
schema.indexLabel("createdByWeight").onE("created").by("weight").range().ifNotExist().create()
schema.indexLabel("knowsByWeight").onE("knows").by("weight").range().ifNotExist().create()
marko = graph.addVertex(T.label, "person", "name", "marko", "age", 29, "city", "Beijing")
vadas = graph.addVertex(T.label, "person", "name", "vadas", "age", 27, "city", "Hongkong")
lop = graph.addVertex(T.label, "software", "name", "lop", "lang", "java", "price", 328)
josh = graph.addVertex(T.label, "person", "name", "josh", "age", 32, "city", "Beijing")
ripple = graph.addVertex(T.label, "software", "name", "ripple", "lang", "java", "price", 199)
peter = graph.addVertex(T.label, "person", "name", "peter", "age", 35, "city", "Shanghai")
graph.tx().commit()
g = graph.traversal()
2.2.1 Creating an Edge
Params
Path Parameter Description:
- graph: The graph to operate on
Request Body Description:
- label: The edge type name (required)
- outV: The source vertex id (required)
- inV: The target vertex id (required)
- outVLabel: The source vertex type (required)
- inVLabel: The target vertex type (required)
- properties: The properties associated with the edge. The internal structure of the object is as follows:
- name: The property name
- value: The property value
Method & Url
POST http://localhost:8080/graphs/hugegraph/graph/edges
Request Body
{
"label": "created",
"outV": "1:marko",
"inV": "2:lop",
"outVLabel": "person",
"inVLabel": "software",
"properties": {
"date": "20171210",
"weight": 0.4
}
}
Response Status
201
Response Body
{
"id": "S1:marko>2>>S2:lop",
"label": "created",
"type": "edge",
"outV": "1:marko",
"outVLabel": "person",
"inV": "2:lop",
"inVLabel": "software",
"properties": {
"weight": 0.4,
"date": "20171210"
}
}
2.2.2 Creating Multiple Edges
Params
Path Parameter Description:
- graph: The graph to operate on
Request Parameter Description:
- check_vertex: Whether to check the existence of vertices (true | false). When set to true, an error will be thrown if the source or target vertices of the edge to be inserted do not exist. Default is true.
Request Body Description:
- List of edge information
Method & Url
POST http://localhost:8080/graphs/hugegraph/graph/edges/batch
Request Body
[
{
"label": "knows",
"outV": "1:marko",
"inV": "1:vadas",
"outVLabel": "person",
"inVLabel": "person",
"properties": {
"date": "20160110",
"weight": 0.5
}
},
{
"label": "knows",
"outV": "1:marko",
"inV": "1:josh",
"outVLabel": "person",
"inVLabel": "person",
"properties": {
"date": "20130220",
"weight": 1.0
}
}
]
Response Status
201
Response Body
[
"S1:marko>1>>S1:vadas",
"S1:marko>1>>S1:josh"
]
2.2.3 Updating Edge Properties
Params
Path Parameter Description:
- graph: The graph to operate on
- id: The ID of the edge to be operated on
Request Parameter Description:
- action: The append action
Request Body Description:
- Edge information
Method & Url
PUT http://localhost:8080/graphs/hugegraph/graph/edges/S1:marko>2>>S2:lop?action=append
Request Body
{
"properties": {
"weight": 1.0
}
}
NOTE: There are three categories of property values: single, set, and list. If it is single, it means adding or updating the property value. If it is set or list, it means appending the property value.
Response Status
200
Response Body
{
"id": "S1:marko>2>>S2:lop",
"label": "created",
"type": "edge",
"outV": "1:marko",
"outVLabel": "person",
"inV": "2:lop",
"inVLabel": "software",
"properties": {
"weight": 1.0,
"date": "20171210"
}
}
2.2.4 Batch Updating Edge Properties
Params
Path Parameter Description:
- graph: The graph to operate on
Request Body Description:
- edges: List of edge information
- update_strategies: For each property, you can set its update strategy individually, including:
- SUM: Only supports number type
- BIGGER/SMALLER: Only supports date/number type
- UNION/INTERSECTION: Only supports set type
- APPEND/ELIMINATE: Only supports collection type
- OVERRIDE
- check_vertex: Whether to check the existence of vertices (true | false). When set to true, an error will be thrown if the source or target vertices of the edge to be inserted do not exist. Default is true.
- create_if_not_exist: Currently only supports setting to true
Method & Url
PUT http://127.0.0.1:8080/graphs/hugegraph/graph/edges/batch
Request Body
{
"edges": [
{
"label": "knows",
"outV": "1:marko",
"inV": "1:vadas",
"outVLabel": "person",
"inVLabel": "person",
"properties": {
"date": "20160111",
"weight": 1.0
}
},
{
"label": "knows",
"outV": "1:marko",
"inV": "1:josh",
"outVLabel": "person",
"inVLabel": "person",
"properties": {
"date": "20130221",
"weight": 0.5
}
}
],
"update_strategies": {
"weight": "SUM",
"date": "OVERRIDE"
},
"check_vertex": false,
"create_if_not_exist": true
}
Response Status
200
Response Body
{
"edges": [
{
"id": "S1:marko>1>>S1:vadas",
"label": "knows",
"type": "edge",
"outV": "1:marko",
"outVLabel": "person",
"inV": "1:vadas",
"inVLabel": "person",
"properties": {
"weight": 1.5,
"date": "20160111"
}
},
{
"id": "S1:marko>1>>S1:josh",
"label": "knows",
"type": "edge",
"outV": "1:marko",
"outVLabel": "person",
"inV": "1:josh",
"inVLabel": "person",
"properties": {
"weight": 1.5,
"date": "20130221"
}
}
]
}
2.2.5 Deleting Edge Properties
Params
Path Parameter Description:
- graph: The graph to operate on
- id: The ID of the edge to be operated on
Request Parameter Description:
- action: The eliminate action
Request Body Description:
- Edge information
Method & Url
PUT http://localhost:8080/graphs/hugegraph/graph/edges/S1:marko>2>>S2:lop?action=eliminate
Request Body
{
"properties": {
"weight": 1.0
}
}
NOTE: This will directly delete the properties (removing the key and all values), regardless of whether the property values are single, set, or list.
Response Status
400
Response Body
It is not possible to delete an attribute that is not set as nullable.
{
"exception": "class java.lang.IllegalArgumentException",
"message": "Can't remove non-null edge property 'p[weight->1.0]'",
"cause": ""
}
2.2.6 Fetching Edges that Match the Criteria
Params
Path Parameter:
- graph: The graph to operate on
Request Parameters:
- vertex_id: Vertex ID
- direction: Edge direction (OUT | IN | BOTH), default is BOTH
- label: Edge label
- properties: Key-value pairs of properties (requires pre-built indexes for property queries)
- keep_start_p: Default is false. When set to true, the range matching input expression will not be automatically escaped. For example,
properties={"age":"P.gt(0.8)"}
will be interpreted as an exact match, i.e., the age property is equal to “P.gt(0.8)” - offset: Offset, default is 0
- limit: Number of queries, default is 100
- page: Page number
Key-value pairs of properties consist of the property name and value in JSON format. Multiple key-value pairs are allowed as query conditions. Property values support exact matching and range matching. For exact matching, it is in the form properties={"weight":0.8}
. For range matching, it is in the form properties={"age":"P.gt(0.8)"}
. The expressions supported by range matching are as follows:
Expression | Description |
---|---|
P.eq(number) | Edges with property value equal to number |
P.neq(number) | Edges with property value not equal to number |
P.lt(number) | Edges with property value less than number |
P.lte(number) | Edges with property value less than or equal to number |
P.gt(number) | Edges with property value greater than number |
P.gte(number) | Edges with property value greater than or equal to number |
P.between(number1,number2) | Edges with property value greater than or equal to number1 and less than number2 |
P.inside(number1,number2) | Edges with property value greater than number1 and less than number2 |
P.outside(number1,number2) | Edges with property value less than number1 and greater than number2 |
P.within(value1,value2,value3,…) | Edges with property value equal to any of the given values |
P.textcontains(value) | Edges with property value containing the given value (string type) |
P.contains(value) | Edges with property value containing the given value (collection type) |
Edges connected to the vertex person:marko(vertex_id=“1:marko”) with label knows and date property equal to “20160111”
Method & Url
GET http://127.0.0.1:8080/graphs/hugegraph/graph/edges?vertex_id="1:marko"&label=knows&properties={"date":"P.within(\"20160111\")"}
Response Status
200
Response Body
{
"edges": [
{
"id": "S1:marko>1>>S1:vadas",
"label": "knows",
"type": "edge",
"outV": "1:marko",
"outVLabel": "person",
"inV": "1:vadas",
"inVLabel": "person",
"properties": {
"weight": 1.5,
"date": "20160111"
}
}
]
}
Paginate and retrieve all edges, get the first page (page without parameter value), limit to 2 entries
Method & Url
GET http://127.0.0.1:8080/graphs/hugegraph/graph/edges?page&limit=2
Response Status
200
Response Body
{
"edges": [
{
"id": "S1:marko>1>>S1:josh",
"label": "knows",
"type": "edge",
"outV": "1:marko",
"outVLabel": "person",
"inV": "1:josh",
"inVLabel": "person",
"properties": {
"weight": 1.5,
"date": "20130221"
}
},
{
"id": "S1:marko>1>>S1:vadas",
"label": "knows",
"type": "edge",
"outV": "1:marko",
"outVLabel": "person",
"inV": "1:vadas",
"inVLabel": "person",
"properties": {
"weight": 1.5,
"date": "20160111"
}
}
],
"page": "EoYxOm1hcmtvgggCAIQyOmxvcAAAAAAAAAAC"
}
The returned body contains the page number information for the next page, "page": "EoYxOm1hcmtvgggCAIQyOmxvcAAAAAAAAAAC"
. When querying the next page, assign this value to the page parameter.
Paginate and retrieve all edges, get the next page (include the page value returned from the previous page), limit to 2 entries
Method & Url
GET http://127.0.0.1:8080/graphs/hugegraph/graph/edges?page=EoYxOm1hcmtvgggCAIQyOmxvcAAAAAAAAAAC&limit=2
Response Status
200
Response Body
{
"edges": [
{
"id": "S1:marko>2>>S2:lop",
"label": "created",
"type": "edge",
"outV": "1:marko",
"outVLabel": "person",
"inV": "2:lop",
"inVLabel": "software",
"properties": {
"weight": 1.0,
"date": "20171210"
}
}
],
"page": null
}
When "page": null
is returned, it indicates that there are no more pages available.
NOTE: When the backend is Cassandra, for performance considerations, if the returned page happens to be the last page, the
page
value may not be empty. When requesting the next page data using thatpage
value, it will returnempty data
andpage = null
. Similar situations apply for other cases.
2.2.7 Fetching Edge by ID
Params
Path parameter description:
- graph: The graph to be operated on.
- id: The ID of the edge to be operated on.
Method & Url
GET http://localhost:8080/graphs/hugegraph/graph/edges/S1:marko>2>>S2:lop
Response Status
200
Response Body
{
"id": "S1:marko>2>>S2:lop",
"label": "created",
"type": "edge",
"outV": "1:marko",
"outVLabel": "person",
"inV": "2:lop",
"inVLabel": "software",
"properties": {
"weight": 1.0,
"date": "20171210"
}
}
2.2.8 Deleting Edge by ID
Params
Path parameter description:
- graph: The graph to be operated on.
- id: The ID of the edge to be operated on.
Request parameter description:
- label: The label of the edge.
Deleting Edge by ID only
Method & Url
DELETE http://localhost:8080/graphs/hugegraph/graph/edges/S1:marko>2>>S2:lop
Response Status
204
Deleting Edge by Label + ID
In general, specifying the Label parameter along with the ID to delete an edge will provide better performance compared to deleting by ID only.
Method & Url
DELETE http://localhost:8080/graphs/hugegraph/graph/edges/S1:marko>1>>S1:vadas?label=knows
Response Status
204
5.1.9 - Traverser API
3.1 Overview of Traverser API
HugeGraphServer provides a RESTful API interface for the HugeGraph graph database. In addition to the basic CRUD operations for vertices and edges, it also offers several traversal methods, which we refer to as the traverser API
. These traversal methods implement various complex graph algorithms, making it convenient for users to analyze and explore the graph.
The Traverser API supported by HugeGraph includes:
- K-out API: It finds neighbors that are exactly N steps away from a given starting vertex. There are two versions:
- The basic version uses the GET method to find neighbors that are exactly N steps away from a given starting vertex.
- The advanced version uses the POST method to find neighbors that are exactly N steps away from a given starting vertex. The advanced version differs from the basic version in the following ways:
- Supports counting the number of neighbors only
- Supports filtering by edge and vertex properties
- Supports returning the shortest path to reach the neighbor
- K-neighbor API: It finds all neighbors that are within N steps of a given starting vertex. There are two versions:
- The basic version uses the GET method to find all neighbors that are within N steps of a given starting vertex.
- The advanced version uses the POST method to find all neighbors that are within N steps of a given starting vertex. The advanced version differs from the basic version in the following ways:
- Supports counting the number of neighbors only
- Supports filtering by edge and vertex properties
- Supports returning the shortest path to reach the neighbor
- Same Neighbors: It queries the common neighbors of two vertices.
- Jaccard Similarity API: It calculates the Jaccard similarity, which includes two types:
- One type uses the GET method to calculate the similarity (intersection over union) of neighbors between two vertices.
- The other type uses the POST method to find the top N vertices with the highest Jaccard similarity to a given starting vertex in the entire graph.
- Shortest Path API: It finds the shortest path between two vertices.
- All Shortest Paths: It finds all shortest paths between two vertices.
- Weighted Shortest Path: It finds the shortest weighted path from a starting vertex to a target vertex.
- Single Source Shortest Path: It finds the weighted shortest path from a single source vertex to all other vertices.
- Multi Node Shortest Path: It finds the shortest path between every pair of specified vertices.
- Paths API: It finds all paths between two vertices. There are two versions:
- The basic version uses the GET method to find all paths between a given starting vertex and an ending vertex.
- The advanced version uses the POST method to find all paths that meet certain conditions between a set of starting vertices and a set of ending vertices.
3.2 Detailed Explanation of Traverser API
In the following, we provide a detailed explanation of the Traverser API:
- Customized Paths API: It traverses all paths that pass through a batch of vertices according to a specific pattern.
- Template Path API: It specifies a starting point, an ending point, and the path information between them to find matching paths.
- Crosspoints API: It finds the intersection (common ancestors or common descendants) between two vertices.
- Customized Crosspoints API: It traverses multiple patterns starting from a batch of vertices and finds the intersections with the v