Skip to content

This is a ready-to-go application to convert the LDBC FinBench dataset to a Gradoop TPGM graph.

Notifications You must be signed in to change notification settings

dbs-leipzig/FinBench_gradoop_importer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Apache License, Version 2.0, January 2004

Gradoop: FinBench Dataset Importer

This is an supporting application for Gradoop, allows importing FinBench datasets as TPGM in Gradoop for further graph analyzing and processing. This project is built using Apache Flink 1.9.3 and Gradoop 0.7.0-SNAPSHOT

This project was developed during a Bachelor's thesis at Leipzig University.

Requirements:

Building

Bulding should be done using Maven. Simply run this in the project directory:

mvn package

This will generate the JAR package FinBenchGradoopImporter-1.0.jar inside target folder.

Execution

This application can be executed using Apache Flink. An example Command is:

/bin/flink run -p 128 -c org.gradoop.importer.finbench.FinBenchImporter FinBenchGradoopImporter-1.0.jar -i 
hdfs:///finbench/sf10 -o hdfs:///finbench/gradoop-parquet-protobuf -f protobuf

Configuration

Parameter Argument Description Required
-i /path/to/finbench The input path to a directory containing all Finbench's csv files. Yes
-o /path/out The output path for the Gradoop graph to be written. Yes
-f csv or indexed or parquet or protobuf The output format. CSV is default, parquet or protobuf are fastest. Yes

Disclaimer

Apache®, Apache Accumulo®, Apache Flink, Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.