Skip to content

SocioPatterns/Sociopatterns-neo4j

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

#Sociopatterns-neo4j

##contactadder-neo4j.py

###class Neo4jContactAdder - Description

Neo4jContactAdder adds contacts (defined in sociopatterns.loader) in a neo4j graph database modeled like specified in neo4j-dynagraph extended with labels on frames (first_frame, last_frame, year, month, day, hour) and index on actors' name. It uses db drivers from py2neo 1.6.1, and the version of neo4j and cypher is 2.0.0.

The main concept is influenced by the need to pay attention to performance, so py2neo's batches to send a bulk of insertion to DB, reducing REST overhead, are necessary. Py2neo's batches provide nice methods that reflect the power of Cypher 2.0.0 new functionalities, like get_or_create_path (=> CREATE UNIQUE), but those do not permit local references in the same batch. To juggle the problem three couples of data structures are used to store useful references and KEYs to identify the elements that are going to be added:

  • actors_dict, interactions_dict (dict) : store actors and interactions references, are updated every new frame in the same batch and contain only real references

    • actors_dict: KEY, actor_id; VALUE, actor reference to DB
    • interactions_dict: KEY, tuple(actor_id1, actor_id2); VALUE, interaction reference to DB
  • actor_new, interaction_new (set): store the KEYs of the actors and the interactions, are used to know wich new actor/interaction have been encountered during the frame, so it is possible to create them in the database and store them into actors_dict/interactions_dict ready to be used for other purposes.

  • frame_actors, frame_interactions (dict): store all the actors/interactions present in the current frame, it's evident that it's independent from other two data structures

    • frame_actors: KEY, actor_id; VALUE, number of times in which an actor appears in the current frame
    • frame_interactions: KEY, tuple(actor_id1, actor_id2); VALUE, number of times in which an interaction appears in the current frame

So the idea is to store in *_new and frame_* data structures everything concerning to the current frame, at creation of a new frame DB is updated according to those referencing with *_dict structures: creating new frames *_new will be used to create new nodes stored in *_dict, then will be used frame_* to use *_dict and creating new non-temporal relationships.

Neo4jContactAdder also provide warm restart with some query to DB during initialization, so recreates *_dict structures and few other datas, the fact that a frame is created after a hot restart is notified by a property of frame nodes called session, incremented every hot restart.

Notes : batch commands are processed like transactions.

  • Parameters:
    • run_name (str): name to identify the run in the DB
    • start_time (int): the time when the capture begins expressed in seconds since the epoch
    • deltat (positive int) : time granularity of the capture
    • neo4j_rest (str): address of the neo4j db

  • Public methods to add contacts:

    • store_contact(self, contact) :

      It's intended for real use, when a script receives packets from network, parses it in a Contact object and passes it to store_contact function. It unpacks the Contact object and for each contact contained calls add_single_contact to add it to DB.

      • Parameter :
        • contact (Contact object as defined in sociopatterns.loader)
      • Returns :
        • None

    • add_single_contact(self, timestamp, actor_id1, actor_id2) :

      This method allows to add single real contacts, intended as two actors that met in a certain instant and no more like Contact objects as store_contact does. So it's useful also in a simulation when usually no metadatas and are needed unlike Contact objects.

      • Parameters:
        • timestamp (int): increasing timestamp, expressed in seconds since the epoch, identifying the instant of contact.
        • actor_id1, actor_id2 (__repr__able objects): usually integer 16 bit IDs for the nature of Contact objects.
      • Returns :
        • None.

Improvements

- Handle insertions with timestamp not subsequent, this problem is real usually around frames switch and depends on real network latency. Now all inserions between the first timestamp received of a frame and the first timestamp of the subsequent are stored in the current frame, no matter if the timestamp indicates that contact deals with previous frame. - Adding index support to speed up queries - Increase the speed of the hot restart, ideas: - Use indexes to retrieve actors IDs - Use two different threads to query the database and to mantain data structures

About

Neo4j backend for SocioPatterns

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages