Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a sample configuartion for import .osm files to OrientDB via ETL . #7107

Closed
saeedtabrizi opened this issue Jan 22, 2017 · 4 comments
Closed
Assignees
Labels

Comments

@saeedtabrizi
Copy link
Contributor

Hi
I need a sample for ETL configuration to importing OSM data to OrientDB .
In my case , i'm using spatial module in orientdb and i need to store the node , ways , tags as Vertex and relationship as the Edge or something like this . i want to define the data version field and using data version in the runtime query too . after all i want to use routing functions like the astar function to find cheapest way (i don't like to use postgis routing) so i want to calculate the distance between node when ETL transfortms data .
i read ETL XML but i confused exactly .

i need help or more documentation to solve my problem .
Thanks

@lvca
Copy link
Member

lvca commented Jan 27, 2017

Hey @saeedtabrizi do you have a example of a XML snippet for OSM and how you want to import it in OrientDB?

@saeedtabrizi
Copy link
Contributor Author

Hi @lvca . I tried multiple ways to handle my xml OSM data to OrientDB by using ETL but it so complicated and hard in my case .
I decided to write a java api to handle data conversion , because OrientDB ETL does not suitable for me in my case.
Now i can convert data from .osm files to orientdb . but when i want to create some edges for connecting way and nodes , i can't transform easily and is so complicated .

I think the ETL feature has lack of documentation and current documentation does not have true guides or best practices . (TD;LR ETL must be add some facilities and documentation and best practice for ETL at the future) .

Anyway i put my ETL json files here that i have success transform data from .osm to OrientDB but it is not my favor ETL process .

ETL Process for OSM Data .

  1. osm data can take from OpenStreetMap as test.osm .
  2. create the osm_node_etl.json file as below content .
{
    "config": {
        "log": "debug"
    },
    "source": {
        "file": {
            "path": "/mnt/hdd2/osmdata/test.osm"
        }
    },
    "extractor": {
        "xml": {
            "rootNode": "osm.node"
        }
    },
    "transformers": [{
            "field": {
                "fieldName": "@class",
                "value": "Osm_Node"
            }
        }, {
            "vertex": {
                "class": "Osm_Node",
                "skipDuplicates": true
            }
        },
        {
            "field": {
                "fieldName": "changeset",
                "expression": "$input.changeset.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "id",
                "expression": "$input.id.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "lat",
                "expression": "$input.lat.asFloat()"
            }
        },
        {
            "field": {
                "fieldName": "lon",
                "expression": "$input.lon.asFloat()"
            }
        },
        {
            "field": {
                "fieldName": "timestamp",
                "expression": "$input.timestamp"
            }
        },
        {
            "field": {
                "fieldName": "version",
                "expression": "$input.version.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "visible",
                "expression": "$input.visible.asBoolean()"
            }
        },
        {
            "field": {
                "fieldName": "location",
                "expression": "'POINT('+ $input.lon + ' ' + $input.lat + ')' "
            }
        },
        {
            "field": {
                "fieldName": "location",
                "expression": "St_GeomFromText($input.location)"
            }
        },
        {

            "field": {
                "fieldName": "user",
                "operation": "remove"
            }


        },
        {
            "code": {
                "language": "Javascript",
                "code": " var sdf = new java.text.SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss'Z'\"); record.field('timestamp',sdf.parse(record.field('timestamp')))   ;var tags =  new java.util.HashMap();var tag = record.field('tag'); if(tag){ print('has tags ==> ' + tag) ; tag.forEach(function(ix){ var k = ix.field('k').replaceAll(':','_').replaceAll('-','__'); tags.put(k, ix.field('v')) ;print('has key ==>' + ix.field('k'));  } ); record.field('tags',tags,com.orientechnologies.orient.core.metadata.schema.OType.EMBEDDEDMAP) ; };  "
            }
        },
        {
            "field": {
                "fieldName": "tag",
                "operation": "remove"
            }
        }
    ],
    "loader": {
        "orientdb": {
            
            "dbURL": "plocal:./../databases/MapDB",
            "dbUser": "admin",
            "dbPassword": "admin",
            "dbType": "graph",
            "dbAutoCreate": true,
            "tx": false,
            "batchCommit": 1000,
            "wal": false,
            "classes": [{
                    "name": "Osm_Node",
                    "extends": "V"
                },
                {
                    "name": "Osm_Way",
                    "extends": "V"
                },
                {
                    "name": "HasWayTo",
                    "extends": "E"
                }
            ],
            "indexes": [{
                "class": "Osm_Node",
                "fields": ["id:long"],
                "type": "UNIQUE"
            }]
        }
    }
}
  1. add the osm_way_etl.json file as below content .

{
    "config": {
        "log": "debug"
    },
    "source": {
        "file": {
            "path": "/mnt/hdd2/osmdata/test.osm"
        }
    },
    "extractor": {
        "xml": {
            "rootNode": "osm.way"
        }
    },
    "transformers": [{
            "field": {
                "fieldName": "@class",
                "value": "Osm_Way"
            }
        }, {
            "vertex": {
                "class": "Osm_Way",
                "skipDuplicates": true
            }
        },
        {
            "field": {
                "fieldName": "changeset",
                "expression": "$input.changeset.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "id",
                "expression": "$input.id.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "timestamp",
                "expression": "$input.timestamp"
            }
        },
        {
            "field": {
                "fieldName": "version",
                "expression": "$input.version.asLong()"
            }
        },
        {
            "field": {
                "fieldName": "visible",
                "expression": "$input.visible.asBoolean()"
            }
        },
        {

            "field": {
                "fieldName": "user",
                "operation": "remove"
            }


        },
        {
            "code": {
                "language": "Javascript",
                "code": " var sdf = new java.text.SimpleDateFormat(\"yyyy-MM-dd'T'HH:mm:ss'Z'\"); record.field('timestamp',sdf.parse(record.field('timestamp')))   ;var tags =  new java.util.HashMap();var tag = record.field('tag'); if(tag){ print('has tags ==> ' + tag) ; tag.forEach(function(ix){ var k = ix.field('k').replaceAll(':','_').replaceAll(' ','_').replaceAll(',','_').replaceAll('-','__') ; tags.put(k, ix.field('v') ) ; print('has key ==>' + k);  } ); record.field('tags' , tags , com.orientechnologies.orient.core.metadata.schema.OType.EMBEDDEDMAP ) ; };  "
            }
        }
    ],
    "loader": {
        "orientdb": {
            
            "dbURL": "plocal:./../databases/MapDB",
            "dbUser": "admin",
            "dbPassword": "admin",
            "dbType": "graph",
            "dbAutoCreate": true,
            "tx": false,
            "batchCommit": 1000,
            "wal": false,
            "classes": [{
                    "name": "Osm_Node",
                    "extends": "V"
                },
                {
                    "name": "Osm_Way",
                    "extends": "V"
                },
                {
                    "name": "HasWayTo",
                    "extends": "E"
                }
            ],
            "indexes": [{
                "class": "Osm_Way",
                "fields": ["id:long"],
                "type": "UNIQUE"
            }]
        }
    }
}

For each nd element is the way element we must create and edge . so i need to access the vertex that i converted in step 2 .
This configuration works well and convert data from xml to orientdb now . but this is not my favor solution and its so complicated . i prefer to use a java api program to handle my problem now . may be i write a plugin for orientdb to handle OSM data conversion like the postgis (postgres) .

This issue can be closed .
Thanks .

@lvca
Copy link
Member

lvca commented Jan 27, 2017

@saeedtabrizi thanks for the JSON files, it could be useful to some users. If you can create something generic, we would more than happy to include it in the GeoSpatial module and maybe also in Studio!

@lvca lvca closed this as completed Jan 27, 2017
@saeedtabrizi
Copy link
Contributor Author

@lvca with the pleasure . i handle it asap . (may be in this week)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants