HugeGraph BenchMark Performance

1 Test environment

1.1 Hardware information

CPU	Memory	网卡	磁盘
48 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz	128G	10000Mbps	750GB SSD

1.2 Software information

1.2.1 Test cases

Testing is done using the graphdb-benchmark, a benchmark suite for graph databases. This benchmark suite mainly consists of four types of tests:

Massive Insertion, which involves batch insertion of vertices and edges, with a certain number of vertices or edges being submitted at once.
Single Insertion, which involves the immediate insertion of each vertex or edge, one at a time.
Query, which mainly includes the basic query operations of the graph database:
- Find Neighbors, which queries the neighbors of all vertices.
- Find Adjacent Nodes, which queries the adjacent vertices of all edges.
- Find Shortest Path, which queries the shortest path from the first vertex to 100 random vertices.
Clustering, which is a community detection algorithm based on the Louvain Method.

1.2.2 Test dataset

Tests are conducted using both synthetic and real data.

MIW, SIW, and QW use SNAP datasets:
CW uses synthetic data generated by the LFR-Benchmark generator.

The size of the datasets used in this test are not mentioned.

Name	Number of Vertices	Number of Edges	File Size
email-enron.txt	36,691	367,661	4MB
com-youtube.ungraph.txt	1,157,806	2,987,624	38.7MB
amazon0601.txt	403,393	3,387,388	47.9MB
com-lj.ungraph.txt	3997961	34681189	479MB

1.3 Service configuration

HugeGraph version: 0.5.6, RestServer and Gremlin Server and backends are on the same server
- RocksDB version: rocksdbjni-5.8.6
Titan version: 0.5.4, using thrift+Cassandra mode
- Cassandra version: cassandra-3.10, commit-log and data use SSD together
Neo4j version: 2.0.1

The Titan version adapted by graphdb-benchmark is 0.5.4.

2 Test results

2.1 Batch insertion performance

Backend	email-enron(30w)	amazon0601(300w)	com-youtube.ungraph(300w)	com-lj.ungraph(3000w)
HugeGraph	0.629	5.711	5.243	67.033
Titan	10.15	108.569	150.266	1217.944
Neo4j	3.884	18.938	24.890	281.537

Instructions

The data scale is in the table header in terms of edges
The data in the table is the time for batch insertion, in seconds
For example, HugeGraph(RocksDB) spent 5.711 seconds to insert 3 million edges of the amazon0601 dataset.

Conclusion

The performance of batch insertion: HugeGraph(RocksDB) > Neo4j > Titan(thrift+Cassandra)

2.2 Traversal performance

2.2.1 Explanation of terms

FN(Find Neighbor): Traverse all vertices, find the adjacent edges based on each vertex, and use the edges and vertices to find the other vertices adjacent to the original vertex.
FA(Find Adjacent): Traverse all edges, get the source vertex and target vertex based on each edge.

2.2.2 FN performance

Backend	email-enron(3.6w)	amazon0601(40w)	com-youtube.ungraph(120w)	com-lj.ungraph(400w)
HugeGraph	4.072	45.118	66.006	609.083
Titan	8.084	92.507	184.543	1099.371
Neo4j	2.424	10.537	11.609	106.919

Instructions

The data in the table header “( )” represents the data scale, in terms of vertices.
The data in the table represents the time spent traversing vertices, in seconds.
For example, HugeGraph uses the RocksDB backend to traverse all vertices in amazon0601, and search for adjacent edges and another vertex, which takes a total of 45.118 seconds.

2.2.3 FA性能

Backend	email-enron(30w)	amazon0601(300w)	com-youtube.ungraph(300w)	com-lj.ungraph(3000w)
HugeGraph	1.540	10.764	11.243	151.271
Titan	7.361	93.344	169.218	1085.235
Neo4j	1.673	4.775	4.284	40.507

Explanation

The data size in the header “( )” is based on the number of vertices.
The data in the table is the time it takes to traverse the vertices, in seconds.
For example, HugeGraph with RocksDB backend traverses all vertices in the amazon0601 dataset, and looks up adjacent edges and other vertices, taking a total of 45.118 seconds.

Conclusion

Traversal performance: Neo4j > HugeGraph(RocksDB) > Titan(thrift+Cassandra)

2.3 Performance of Common Graph Analysis Methods in HugeGraph

Terminology Explanation

FS (Find Shortest Path): finding the shortest path between two vertices
K-neighbor: all vertices that can be reached by traversing K hops (including 1, 2, 3…(K-1) hops) from the starting vertex
K-out: all vertices that can be reached by traversing exactly K out-edges from the starting vertex.

FS performance

Backend	email-enron(30w)	amazon0601(300w)	com-youtube.ungraph(300w)	com-lj.ungraph(3000w)
HugeGraph	0.494	0.103	3.364	8.155
Titan	11.818	0.239	377.709	575.678
Neo4j	1.719	1.800	1.956	8.530

Explanation

The data in the header “()” represents the data scale in terms of edges
The data in the table is the time it takes to find the shortest path from the first vertex to 100 randomly selected vertices in seconds
For example, HugeGraph using the RocksDB backend to find the shortest path from the first vertex to 100 randomly selected vertices in the amazon0601 graph took a total of 0.103s.

Conclusion

In scenarios with small data size or few vertex relationships, HugeGraph outperforms Neo4j and Titan.
As the data size increases and the degree of vertex association increases, the performance of HugeGraph and Neo4j tends to be similar, both far exceeding Titan.

K-neighbor Performance

Vertex	Depth	Degree 1	Degree 2	Degree 3	Degree 4	Degree 5	Degree 6
v1	Time	0.031s	0.033s	0.048s	0.500s	11.27s	OOM
v111	Time	0.027s	0.034s	0.115s	1.36s	OOM	–
v1111	Time	0.039s	0.027s	0.052s	0.511s	10.96s	OOM

Explanation

HugeGraph-Server’s JVM memory is set to 32GB and may experience OOM when the data is too large.

K-out performance

Vertex	Depth	1st Degree	2nd Degree	3rd Degree	4th Degree	5th Degree	6th Degree
v1	Time	0.054s	0.057s	0.109s	0.526s	3.77s	OOM
Degree	10	133	2453	50,830	1,128,688
v111	Time	0.032s	0.042s	0.136s	1.25s	20.62s	OOM
Degree	10	211	4944	113150	2,629,970
v1111	Time	0.039s	0.045s	0.053s	1.10s	2.92s	OOM
Degree	10	140	2555	50825	1,070,230

Explanation

The JVM memory of HugeGraph-Server is set to 32GB, and OOM may occur when the data is too large.

Conclusion

In the FS scenario, HugeGraph outperforms Neo4j and Titan in terms of performance.
In the K-neighbor and K-out scenarios, HugeGraph can achieve results returned within seconds within 5 degrees.

2.4 Comprehensive Performance Test - CW

Database	Size 1000	Size 5000	Size 10000	Size 20000
HugeGraph(core)	20.804	242.099	744.780	1700.547
Titan	45.790	820.633	2652.235	9568.623
Neo4j	5.913	50.267	142.354	460.880

Explanation

The “scale” is based on the number of vertices.
The data in the table is the time required to complete community discovery, in seconds. For example, if HugeGraph uses the RocksDB backend and operates on a dataset of 10,000 vertices, and the community aggregation is no longer changing, it takes 744.780 seconds.
The CW test is a comprehensive evaluation of CRUD operations.
In this test, HugeGraph, like Titan, did not use the client and directly operated on the core.

Conclusion

Performance of community detection algorithm: Neo4j > HugeGraph > Titan

Last modified May 14, 2023: Update hugegraph-benchmark-0.5.6.md (#226) (2e5bf8c6)