HugeGraph BenchMark Performance
1 Test environment
1.1 Hardware information
|48 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz||128G||10000Mbps||750GB SSD|
1.2 Software information
1.2.1 Test cases
Testing is done using the graphdb-benchmark, a benchmark suite for graph databases. This benchmark suite mainly consists of four types of tests:
- Massive Insertion, which involves batch insertion of vertices and edges, with a certain number of vertices or edges being submitted at once.
- Single Insertion, which involves the immediate insertion of each vertex or edge, one at a time.
- Query, which mainly includes the basic query operations of the graph database:
- Find Neighbors, which queries the neighbors of all vertices.
- Find Adjacent Nodes, which queries the adjacent vertices of all edges.
- Find Shortest Path, which queries the shortest path from the first vertex to 100 random vertices.
- Clustering, which is a community detection algorithm based on the Louvain Method.
1.2.2 Test dataset
Tests are conducted using both synthetic and real data.
MIW, SIW, and QW use SNAP datasets:
CW uses synthetic data generated by the LFR-Benchmark generator.
The size of the datasets used in this test are not mentioned.
|Name||Number of Vertices||Number of Edges||File Size|
1.3 Service configuration
HugeGraph version: 0.5.6, RestServer and Gremlin Server and backends are on the same server
- RocksDB version: rocksdbjni-5.8.6
Titan version: 0.5.4, using thrift+Cassandra mode
- Cassandra version: cassandra-3.10, commit-log and data use SSD together
Neo4j version: 2.0.1
The Titan version adapted by graphdb-benchmark is 0.5.4.
2 Test results
2.1 Batch insertion performance
- The data scale is in the table header in terms of edges
- The data in the table is the time for batch insertion, in seconds
- For example, HugeGraph(RocksDB) spent 5.711 seconds to insert 3 million edges of the amazon0601 dataset.
- The performance of batch insertion: HugeGraph(RocksDB) > Neo4j > Titan(thrift+Cassandra)
2.2 Traversal performance
2.2.1 Explanation of terms
- FN(Find Neighbor): Traverse all vertices, find the adjacent edges based on each vertex, and use the edges and vertices to find the other vertices adjacent to the original vertex.
- FA(Find Adjacent): Traverse all edges, get the source vertex and target vertex based on each edge.
2.2.2 FN performance
- The data in the table header “( )” represents the data scale, in terms of vertices.
- The data in the table represents the time spent traversing vertices, in seconds.
- For example, HugeGraph uses the RocksDB backend to traverse all vertices in amazon0601, and search for adjacent edges and another vertex, which takes a total of 45.118 seconds.
- The data size in the header “( )” is based on the number of vertices.
- The data in the table is the time it takes to traverse the vertices, in seconds.
- For example, HugeGraph with RocksDB backend traverses all vertices in the amazon0601 dataset, and looks up adjacent edges and other vertices, taking a total of 45.118 seconds.
- Traversal performance: Neo4j > HugeGraph(RocksDB) > Titan(thrift+Cassandra)
2.3 Performance of Common Graph Analysis Methods in HugeGraph
- FS (Find Shortest Path): finding the shortest path between two vertices
- K-neighbor: all vertices that can be reached by traversing K hops (including 1, 2, 3…(K-1) hops) from the starting vertex
- K-out: all vertices that can be reached by traversing exactly K out-edges from the starting vertex.
- The data in the header “()” represents the data scale in terms of edges
- The data in the table is the time it takes to find the shortest path from the first vertex to 100 randomly selected vertices in seconds
- For example, HugeGraph using the RocksDB backend to find the shortest path from the first vertex to 100 randomly selected vertices in the amazon0601 graph took a total of 0.103s.
- In scenarios with small data size or few vertex relationships, HugeGraph outperforms Neo4j and Titan.
- As the data size increases and the degree of vertex association increases, the performance of HugeGraph and Neo4j tends to be similar, both far exceeding Titan.
|Vertex||Depth||Degree 1||Degree 2||Degree 3||Degree 4||Degree 5||Degree 6|
- HugeGraph-Server’s JVM memory is set to 32GB and may experience OOM when the data is too large.
|Vertex||Depth||1st Degree||2nd Degree||3rd Degree||4th Degree||5th Degree||6th Degree|
- The JVM memory of HugeGraph-Server is set to 32GB, and OOM may occur when the data is too large.
- In the FS scenario, HugeGraph outperforms Neo4j and Titan in terms of performance.
- In the K-neighbor and K-out scenarios, HugeGraph can achieve results returned within seconds within 5 degrees.
2.4 Comprehensive Performance Test - CW
|Database||Size 1000||Size 5000||Size 10000||Size 20000|
- The “scale” is based on the number of vertices.
- The data in the table is the time required to complete community discovery, in seconds. For example, if HugeGraph uses the RocksDB backend and operates on a dataset of 10,000 vertices, and the community aggregation is no longer changing, it takes 744.780 seconds.
- The CW test is a comprehensive evaluation of CRUD operations.
- In this test, HugeGraph, like Titan, did not use the client and directly operated on the core.
- Performance of community detection algorithm: Neo4j > HugeGraph > Titan