I've been using relational databases of one sort or another for most of my career. Back when I was a derivatives trader at HSBC we used to use MS Access to hack VBA applications together on the fly for the trading desk (in fact that was my initial exposure to the world of software development!). Since then I've primarily used Postgres for client development work at Coolgarif Tech. I like its stable setup and psql cmd line editor is excellent!
However clients are starting to ask for more and more data science type projects, such as internal price comparison applications, I've become more interested in the NoSQL movement and particularly in the subset that covers Graph Databases.
So when decided we wanted to run a quick internal learning project at Coolgarif Tech to create a bespoke dataVis for our website, my co-founder James and I reckoned it would be fun to build it on top of NEO4J.
This first blog is just a warm up, both for you as a reader and myself as a blogger. I have never blogged before and it is generally agreed amongst my friends and family that I am horrendous at stringing words together into coherent sentences! So please go easy on me!
Before even getting my hands dirty in the code I decided to get my head around the theory. So while surfing for a couple of weeks in March, while I was escaping the London winter (half day surfing, half day coding!! Its the way forward!) I got stuck into the early release version of Graph Databases by Robinson, Webber and Eifrem. I also got started on Networks, Crowds, and Markets by David Easley and Jon Kleinberg, which is an excellent read. Although I have to admit I haven't completely finished it yet!
Other books that I've read in the past which are loosely related to this area and might be worth taking a peek at are The wisdom of Crowds by James Surowiecki and Linked: The New Science of Networks by Albert-Laszlo Barbasi.
My development environment is Ubuntu 14.04 LTS 64-bit which sits very nicely on a VirtualBox instance on my Macbook Pro.
Now there is a an apt-get install cmd for NEO4J however it is important to recognise that using apt-get the package manager decides where the new software is going to end up, usually buried in the /var/lib/neo4j folder following completion. I took a quick look on google and didn't find anything that allowed me to specify the destination folder for apt-get.
However, unlike postgres where you can have multiple databases within the same instance of the programme, with NEO4J it worked out better to have a separate instance of the Graph Database for each individual graph instance. By doing it this way you avoid any crossed wires within the applications particularly if using the python embedded bindings, which appear to be single treaded (more about this in later posts!).
As an alternative I simply downloaded the latest stable release as a tar to my local directory from here. Once the tar had downloaded I created a neo4j working directory under /home/userName/neo4j, where I keep all graph database instances together in one place.
Excellent. So now you should have a sandbox where you can setup multiple graph databases, one for each of the projects that you are working on. Just simply unzip the tar and rename the newly created neo4j folder for each new project, which will give you a clean instance of the database to work with.
Following this are a few necessary configuration steps that we need to get out of the way (don't skip this bit!).
First is to ensure your reference to JAVA_HOME is set correctly. By the way, as an aside make sure you have the Java Development Kit (JDK) and not the runtime environment (JRE) on your machine (more about this when we get to the blog on server deployment).
Anyhow there are two ways to set Java_home: the first is to specify the Java environment each time you want to you Neo4j by typing the following cmd:
However I always forgot to do that each time I fired up the graph. So the better alternative in my opinion is to update your environment file which is found in the /etc/ folder path.
Open this file using the Nano cmd-line text editor and add the following lines to the file.
The final step is to amend the NEO4J config files which are located within the NEO4J directory titled conf, so in my case the path is
From the above list of files open the file titled 'neo4j-server.properties' and change the https port from the default setting of 7474 to another number. This is required as if you have two different graphs running at the same time they will need to broadcast from different ports.
So this is the line in the file that needs to be amended:
Then open the file titled 'neo4j-wrapper.conf' from the same folder and change the name of the service. I just appended the name of the application to the end of the default setting. It should end up as follows:
Right. Now you are good to go.
Go up a level from the config folder to your main graph database folder using $ cd ..
Then start the server with the command
$ bin/neo4j start Starting Neo4j Server... WARNING: not changing user process ... waiting for server to be ready......... OK. Go to http://localhost:7475/webadmin/ for administration interface.
And you can use your browser to navigate to the console screen, which in this case will be at 7475 as we changed the port in the config setup above
Finally when you are finished using the console or just want to shut down the Neo4j server you can use the following command:
$ bin/neo4j stop
So that's it. Next up is loading some data into the graph.