Graph database – Neo4j on AWS

For our new product, we need to build a social graph. I started looking for appropriate data structure, databases to do that and came across the graph database, something that is being used by Facebook’s new graph search and Google’s KnowledgeBase. The most exciting part was there are open source graph database available, i.e. you can build something like graph search or KnowledgeBase for your own app on top of those graph databases. Here is one example of using graph database for product recommendation system, and here is another recommendation engine.

The first challenge is to pick the graph database to use. After doing some research, I decided on Neo4j: http://neo4j.com/developer/graph-database/. The reasons being:

Given the background, the rest of the post are some instructions to get Neo4j up and running in the cloud (AWS).

Setup Environment in AWS
https://github.com/neo4j-contrib/ec2neo

  • Log onto the AWS CloudFormation console with your AWS account.
  • Click Create New Stack
  • Fill in the Stack Name field (whatever name you’d like)
  • Click the Provide a template URL radio button
  • Paste the Amazon Linux Template or the Ubuntu Template into the field next to the button
  • Click the Create button
  • Fill in the 3 parameters.
    • The SSHKeyName parameter is the name of your EC2 Key pair (we suggested NEO4J)
    • The Network Whitelist allows you to control access to your database. You can restrict it to your own IP address using the /32 suffix. The default (0.0.0.0/0) will allow connections from anywhere on the public Internet.
    • The Instance Type lets you choose the size of machine to use.
  • Click Continue button
  • You can optionally add tags to help identify your stacks. Click Continue button.
  • You may review your options here. Click Continue button.
  • Your stack is now being created. Click the Close button.
  • Click the Refresh button on the top right hand side of the CloudFormation view until your stack is complete. You should see the status CREATE_COMPLETE.
  • The running Neo4j server may not immediately be available.
  • The Output tab will show you the endpoint of the Neo4j server. Click on it, and when prompted for password, enter the password that you chose in pre-requisites.
  • Configure your application to talk to the endpoint.
Confirm Neo4j is running locally
Wait for the instance to launch by monitoring it on the “Instances” dashboard. Once it is running ssh onto the instance like so: ssh -i /path/to/keypair.pem ubuntu@ec2-???-??-???-???.compute-1.amazonaws.com
If you need to start/stop the server (neo4j start/stop) command was failing for me. One solution eventually worked for me. In case you have the same problem, you can try the following:
$ sudo -u neo4j service neo4j-service start
Make Neo4j accessible from the outside
http://www.neo4j.org/develop/ec2_manual
Only do this if it is necessary, for instance when your services accessing Neo4j run on a different host. Make sure to secure the instance by enabling SSL and adding authentication (like the authentication-extension.
1. Find and open: /path/to/neo4j/conf/neo4j-server.properties. On ubuntu (at AWS), its /var/lib/neo4j/conf/neo4j-server.properties 
2. uncomment the line: #org.neo4j.server.webserver.address=0.0.0.0 
3. Check that SSL access is enabled: org.neo4j.server.webserver.https.enabled=true 
4. restart the Neo4j Server: sudo /etc/init.d/neo4j-server restart  sudo -u neo4j service neo4j-service restart
Check if it is accessible from the outside: curl http://ec2-???-??-???-???.compute-1.amazonaws.com:7474
If it fails, which failed for me at first with these steps, it is because of Security Group that the process generated skipped to open the 7474 port from your IP. You need to make your security group to access port 7474. Go to AWS Dashboard -> EC2 -> Security Group. Select your appropriate security group that is used for your instances. Inbound -> Edit -> Add Rule -> Custom TCP Rule – TCP (Protocol) – 7474 (Port Range).
Hope these steps will get you up and running with Neo4j at AWS.
Happy graph-db’ing!
Naushad
Advertisements

One thought on “Graph database – Neo4j on AWS”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s