Wednesday, February 8, 2017

Neo4j Insertion/Query Performance Optimization on large (1G+) datasets

A 12 dimensional hypercube in Neo4j - why?


Purpose:

Design for performance by first determining the profile for use of batched insertion and paginated querying design patterns.

Raw performance data

Formula so far (in log form) is
data - threads + 1 = batch (example 2^18 - 2 ^ 3 + 1^1 = 2 ^ 16 = 65536 



5820 (RHEL7.3 ulimit) (sec, threads, time, 2 ^ power dataset, batch size)

508534,16,508,16,1
235979,16,235,16,2
291876,16,291,16,4
329377,16,329,16,8
354427,16,354,16,16
374336,16,374,16,32
384040,16,384,16,64
393977,16,393,16,128
387177,16,387,16,256
380892,16,380,16,512
383322,16,383,16,1024
363335,16,363,16,2048

3538939,8,3538,18,1
1775530,8,1775,18,2
3867348,8,3867,18,4
4978299,8,4978,18,8
5653734,8,5653,18,16
5999641,8,5999,18,32
6337678,8,6337,18,64
6468651,8,6468,18,128
6572826,8,6572,18,256
6762269,8,6762,18,512
6800382,8,6800,18,1024
7102821,8,7102,18,2048
7618263,8,7618,18,4096
8124558,8,8124,18,8192
6603800,8,6603,18,16384
9529125,8,9529,18,32768
1521072,8,1521,18,65536
4138558,8,4138,18,131072
13791585,8,13791,18,262144
58497594,8,58497,18,524288
1919401,16,1919,18,1
949259,16,949,18,2
3466038,16,3466,18,4
4789950,16,4789,18,8

i7-alu

45:*.xspf=38;5;45:, XDG_VTNR=1, SHLVL=2, HOME=/root}
3538939,8,3538,18,1
1775530,8,1775,18,2
3867348,8,3867,18,4
4978299,8,4978,18,8
5653734,8,5653,18,16
5999641,8,5999,18,32
6337678,8,6337,18,64
6468651,8,6468,18,128
6572826,8,6572,18,256
6762269,8,6762,18,512
6800382,8,6800,18,1024
7102821,8,7102,18,2048
7618263,8,7618,18,4096
8124558,8,8124,18,8192
6603800,8,6603,18,16384
9529125,8,9529,18,32768
1521072,8,1521,18,65536
4138558,8,4138,18,131072
13791585,8,13791,18,262144
58497594,8,58497,18,524288
1919401,16,1919,18,1
949259,16,949,18,2
3466038,16,3466,18,4
4789950,16,4789,18,8

5502976,16,5502,18,16
5887113,16,5887,18,32
6154832,16,6154,18,64
6330652,16,6330,18,128
6542268,16,6542,18,256
6651309,16,6651,18,512
6886699,16,6886,18,1024
6850014,16,6850,18,2048
6682343,16,6682,18,4096
6544843,16,6544,18,8192
5951906,16,5951,18,16384
752136,16,752,18,32768
1549619,16,1549,18,65536
4207831,16,4207,18,131072
15949986,16,15949,18,262144

59302815,16,59302,18,524288
1168163,32,1168,18,1
571298,32,571,18,2
3332109,32,3332,18,4
4764952,32,4764,18,8
5549885,32,5549,18,16
5970120,32,5970,18,32
6295001,32,6295,18,64
6496541,32,6496,18,128
6679320,32,6679,18,256
6824917,32,6824,18,512
6990629,32,6990,18,1024
6860142,32,6860,18,2048
6394366,32,6394,18,4096
5846679,32,5846,18,8192
3427860,32,3427,18,16384
756820,32,756,18,32768
1551643,32,1551,18,65536
4167513,32,4167,18,131072
15790413,32,15790,18,262144
59131487,32,59131,18,524288
^C
[1]+  Done                    nohup java -cp graph-0.0.4-SNAPSHOT.jar:neo4j-java-driver-1.1.1.jar org.obrienscience.graph.ForkJoinGraphServer 3 6 18 1 password 7687 127.0.0.1 1 19 1


Pagination - Server mode


    private void queryPage() {
    Driver driver = GraphDatabase.driver("bolt://" + ip + ":" + port,  AuthTokens.basic(username, pass)); 
    try (Session session = driver.session()) {
    try (Transaction aTransaction = session.beginTransaction()) {
    StatementResult result  = aTransaction.run(
    //new StringBuffer("MATCH (a:Node0 {name: {p0}})  return a").toString(),
    //new StringBuffer("MATCH (a) return a AS name ORDER BY name DESC LIMIT 10").toString(), // no
    new StringBuffer("MATCH (a) return a SKIP 10 LIMIT 10").toString(),
    parameters( "p0", 0));
    aTransaction.success();
    if(result.hasNext()) {
    System.out.println("Size: " + result.keys().size());
    while(result.hasNext()) {
    System.out.println(new StringBuffer(String.valueOf(System.currentTimeMillis())).append(",").append(result.next())
    .toString());
    }
    }
    }
    driver.close();
    System.out.println(System.currentTimeMillis() + ", end");
    }
    }

Pagination - Embedded mode



Notes

https://neo4j.com/docs/developer-manual/current/get-started/cypher/getting-the-results-you-want/

Friday, January 13, 2017

Neo4j 3.1 Embedded mode GraphDatabase browser available by wrapping with NeoServer

Issue: A Neo4j 3.1 embedded graph database does not expose the jetty based http browser (7474) by default

Solution: Wrap the embedded graph database with a NeoServer
It turns out with a bit of help from the following post we can still bring up a wrapping NeoServer around our embedded mode Neo4j 3.1.0 server
http://stackoverflow.com/questions/30074232/replacement-for-deprecated-wrappingneoserverbootstrapper

We override the following function

    @Override
     protected GraphDatabaseBuilder.DatabaseCreator createDatabaseCreator(
                final File storeDir, final GraphDatabaseFactoryState state)  {
            return new GraphDatabaseBuilder.DatabaseCreator() {
                @Override
                public GraphDatabaseService newDatabase( final Map<String, String> config ) {
                    EnterpriseBootstrapper neoServer = new EnterpriseBootstrapper();
                    GraphDatabaseService graph = null;
                    // convert all config (spring, conf, code) to vararg Pairs
                List<Pair<String, String>> pairs = new ArrayList<>();
                for(Entry<String, String> entry : config.entrySet()) {
                pairs.add(Pair.of(entry.getKey(), entry.getValue()));
                } 
                    Pair<String, String> pairArray[] = new Pair[pairs.size()];
                    
                    // will resolve to /dir/data/databases/graph.db 
                    int state = neoServer.start(storeDir, Optional.empty(), pairs.toArray(pairArray)); 
                    // state is 0 for success, 1 will mean a null server
                    if(state > 0) {
                    log.error("return state of NeoServer.start(); is 1 - no GraphDatabase available - check config settings");
                    }
                    // will be null if state == 1
                    NeoServer server = neoServer.getServer();
                    if(null != server) {
                    Database database = server.getDatabase();
                    if(null != database) {
                    graph = database.getGraph(); 
                    // set the paxos HA listener only when dbms.mode=HA
                    // Note: initial TO_MASTER callback during server.start above is missed
                    if(graph instanceof HighlyAvailableGraphDatabase) {
                    HighlyAvailableGraphDatabase haGraph = (HighlyAvailableGraphDatabase) graph;
                    haMonitor.setDb(haGraph);
                    HighAvailabilityMemberStateMachine memberStateMachine = 
                    (haGraph).getDependencyResolver()
                    .resolveDependency(HighAvailabilityMemberStateMachine.class);
                    if ( memberStateMachine != null ) {
                    memberStateMachine.addHighAvailabilityMemberListener(haMonitor);
                    System.out.println("register: " +  haMonitor);
                    // rethrow isMaster callback from start
                    //haMonitor.getMasterListenerManager().masterChanged(haGraph.isMaster());
                    System.out.println("isMaster: " + haGraph.isMaster());
                    }
                    }
                    } else {
                    log.error("database null : check your http configuration settings");
                    }
                    } else {
                    log.error("server null : check your http configuration settings");
                    }
                    return graph;
                } };

    }

Using the following spring configuration

   
    <context:annotation-config />
    <context:spring-configured />
    <!-- rest annotations -->
    <mvc:annotation-driven />
    
    
    <!--  Rest controllers -->
    <context:component-scan base-package="org.obrienlabs.nbi.graph.service" />
    <!-- in cases where the DAO's are in a separate jar - list them -->
    <bean id="daoFacade" class="org.obrienlabs.nbi.graph.service.ApplicationService"/>
    <bean id="IGraphDatabaseService" class="org.obrienlabs.nbi.graph.service.GraphDatabaseServiceImpl"/>
    <util:map id="config">
        <entry key="ha.allow_init_cluster" value="true"/>
        <entry key="unsupported.cluster_name" value="NSP"/>
        <entry key="ha.host.coordination" value="127.0.0.1:5001"/>
        <entry key="ha.pull_interval" value="10s"/>
        <entry key="ha.host.data" value=":6016"/>
        <entry key="ha.slave_only" value="false"/>
        <entry key="ha.tx_push_factor" value="1"/>
        <entry key="ha.tx_push_strategy" value="fixed_ascending"/>
        <entry key="ha.server_id" value="1"/>
        <entry key="ha.initial_hosts" value="127.0.0.1:5001"/>
        <entry key="dbms.mode" value="HA"/><!-- required to get the http browser -->
        <entry key="browser.allow_outgoing_connection" value="true" />
        <entry key="unsupported.dbms.ephemeral" value="false" />   
        <entry key="dbms.connector.bolt.enabled" value="true" />
        <entry key="dbms.connector.bolt.address" value="0.0.0.0:7688" />
        <entry key="dbms.connector.bolt.type" value="BOLT" />
        <entry key="dbms.connector.bolt.tls_level" value="DISABLED" />
        <entry key="dbms.connector.http.type" value="HTTP" />
        <!-- enabled=false is not supported in this configuration -->
        <entry key="dbms.connector.http.enabled" value="true" />
        <entry key="dbms.connector.http.address" value="0.0.0.0:7575" />
        <entry key="dbms.connector.http.encryption" value="NONE" />
        <entry key="dbms.security.auth_enabled" value="false"/>
        <entry key="dbms.logs.debug.level" value="DEBUG"/>
        <entry key="dbms.logs.http.enabled" value="true" />
        <entry key="dbms.logs.query.enabled" value="true"/> <!--  causes netty exceptions when ssl enabled -->
        <entry key="dbms.shell.enabled" value="true"/>          
    </util:map>
    
    <bean id="haMonitor" class="org.obrienlabs.nbi.graph.service.HaMonitor"/>  
    <bean id="graphDbFactory"  class="org.obrienlabs.nbi.graph.service.ExtendedHighlyAvailableGraphDatabaseFactory">
        <constructor-arg ref="haMonitor" />
    </bean>
    <bean id="graphDbBuilder" 
        factory-bean="graphDbFactory" 
        factory-method="newEmbeddedDatabaseBuilder">
        <constructor-arg value="/ec2-user"/>
    </bean>
    <bean id="graphDbBuilderFinal" 
        factory-bean="graphDbBuilder" 
        factory-method="setConfig">
        <constructor-arg ref="config"/>
    </bean>
    <!-- HighlyAvailableGraphDatabase wrapped by an EnterpriseBootstrapper NeoServer created in this constructor -->
    <bean id="graphDatabaseService" 
        factory-bean="graphDbBuilderFinal" 
        factory-method="newGraphDatabase" 
        destroy-method="shutdown" />



Details:

Run the following code in
https://github.com/obrienlabs/nbi-neo4j-embedded-aws-war



Neo4j 3.1.0 HA (paxos) Embedded mode HighlyAvailableGraphDatabase wrapped with an EnterpriseBootstrapper NeoServer

In this code we get everything (except causal cluster using the raft protocol).

With a few modifications I was able to wrap a HighlyAvailableGraphDatabase with a EnterpriseBootstrapper.

There are some non-fatal exceptions around JMX reporting in debug.log likely related to my 8.0.28 tomcat version but the graph db running embedded in Tomcat is OK.

        2016-12-21 16:20:00.574+0000 INFO  Bolt enabled on 0.0.0.0:7688.
        2016-12-21 16:20:09.554+0000 INFO  Attempting to join cluster of [127.0.0.1:5001]
        2016-12-21 16:20:11.566+0000 INFO  Creating new cluster with name [neo4j.ha]...
        2016-12-21 16:20:11.607+0000 INFO  Instance 1 (this server)  entered the cluster
        2016-12-21 16:20:12.164+0000 INFO  I am 1, moving to master
        2016-12-21 16:20:12.293+0000 INFO  Instance 1 (this server)  was elected as coordinator
        2016-12-21 16:20:12.462+0000 INFO  I am 1, successfully moved to master
        2016-12-21 16:20:12.513+0000 INFO  Instance 1 (this server)  is available as master at ha://127.0.0.1:6001?serverId=1 with StoreId{creationTime=1482199697648, randomId=7800059877674392627, storeVersion=15531981201765894, upgradeTime=1482199697648, upgradeId=1}
        2016-12-21 16:20:14.495+0000 INFO  Database available for write transactions
        2016-12-21 16:20:31.917+0000 INFO  Mounted REST API at: /db/manage
        2016-12-21 16:20:53.264+0000 INFO  Remote interface available at http://localhost:7575/
        register: org.obrienlabs.nbi.graph.service.HaMonitor@1c0f80c9


Source:
https://github.com/obrienlabs/nbi-neo4j-embedded-aws-war

Diagnostics:
7575 http and 7688 bolt ports
```
obrienlabs-mbp15:_deployment michaelobrien$ netstat -vatn | grep 7575
tcp46      0      0  *.7575                 *.*                    LISTEN      131072 131072  49013      0
tcp4       0      0  127.0.0.1.7575         127.0.0.1.60685        TIME_WAIT   407296 146988  49013      0
tcp6       0      0  ::1.7575               ::1.60699              TIME_WAIT   407284 146808  49013      0
tcp6       0      0  ::1.7575               ::1.60700              TIME_WAIT   407284 146808  49013      0
obrienlabs-mbp15:_deployment michaelobrien$ netstat -vatn | grep 7688
tcp6       0      0  ::1.7688               ::1.60704              ESTABLISHED 406582 146808  49013      0
tcp6       0      0  ::1.60704              ::1.7688               ESTABLISHED 398196 146808  48165      0
tcp6       0      0  ::1.7688               ::1.60702              ESTABLISHED 406570 146808  49013      0
tcp6       0      0  ::1.60702              ::1.7688               ESTABLISHED 398185 146808  48165      0
tcp6       0      0  ::1.7688               ::1.60701              ESTABLISHED 407255 146808  49013      0
tcp6       0      0  ::1.60701              ::1.7688               ESTABLISHED 407628 146808  48165      0
tcp46      0      0  *.7688                 *.*                    LISTEN      131072 131072  49013      0
obrienlabs-mbp15:_deployment michaelobrien$ netstat -vatn | grep 8080
tcp4       0      0  127.0.0.1.8080         127.0.0.1.60584        FIN_WAIT_2  408104 146988  49013      0
tcp4     994      0  127.0.0.1.60584        127.0.0.1.8080         CLOSE_WAIT  408128 146988  42992      0
tcp46      0      0  *.8080                 *.*                    LISTEN      131072 131072  49013      0
Links:
Deployed demo is on our Canadian data center at
(bolt enabled) http browser - user:neo4j pass:password

Add a node to the embedded graph db

Neo4j 3.1 Technology

https://neo4j.com/blog/neo4j-3-1-beta-release/?ref=home
- causal clustering (with read replicas and synced read/write clusters)
- bolt can drive load balancing (with read/write splitting)
- 4 security roles with list and terminate control
- node and id reallocation from deleted records (uuid reuse?)
- max query duration

A server-mode Neo4j 3.1.1 instance is running for development purposes on 
http://ec2-52-20-143-25.compute-1.amazonaws.com:7474/browser/

Troubleshooting

./neo4j start does not work
Fix: make sure your java 8 home is set like

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home

Links