For our new product, we need to build a social graph. I started looking for appropriate data structure, databases to do that and came across the graph database, something that is being used by Facebook’s new graph search and Google’s KnowledgeBase. The most exciting part was there are open source graph database available, i.e. you can build something like graph search or KnowledgeBase for your own app on top of those graph databases. Here is one example of using graph database for product recommendation system, and here is another recommendation engine.
The first challenge is to pick the graph database to use. After doing some research, I decided on Neo4j: http://neo4j.com/developer/graph-database/. The reasons being:
- very popular and many support, e.g. instructions on integrating on AWS (details below), python wrapper (http://neo4j.com/developer/python/ – I am using py2neo), etc.
- very good documentation and tutorials. http://neo4j.com/graphacademy/online-course/
- very intuitive, which probably is also true for other graph database.
Given the background, the rest of the post are some instructions to get Neo4j up and running in the cloud (AWS).
Setup Environment in AWS
- Log onto the AWS CloudFormation console with your AWS account.
- Click Create New Stack
- Fill in the Stack Name field (whatever name you’d like)
- Click the Provide a template URL radio button
- Paste the Amazon Linux Template or the Ubuntu Template into the field next to the button
- Click the Create button
- Fill in the 3 parameters.
- The SSHKeyName parameter is the name of your EC2 Key pair (we suggested NEO4J)
- The Network Whitelist allows you to control access to your database. You can restrict it to your own IP address using the /32 suffix. The default (0.0.0.0/0) will allow connections from anywhere on the public Internet.
- The Instance Type lets you choose the size of machine to use.
- Click Continue button
- You can optionally add tags to help identify your stacks. Click Continue button.
- You may review your options here. Click Continue button.
- Your stack is now being created. Click the Close button.
- Click the Refresh button on the top right hand side of the CloudFormation view until your stack is complete. You should see the status CREATE_COMPLETE.
- The running Neo4j server may not immediately be available.
- The Output tab will show you the endpoint of the Neo4j server. Click on it, and when prompted for password, enter the password that you chose in pre-requisites.
- Configure your application to talk to the endpoint.
$ sudo -u neo4j service neo4j-service start
Only do this if it is necessary, for instance when your services accessing Neo4j run on a different host. Make sure to secure the instance by enabling SSL and adding authentication (like the authentication-extension.
1. Find and open: /path/to/neo4j/conf/neo4j-server.properties. On ubuntu (at AWS), its /var/lib/neo4j/conf/neo4j-server.properties 2. uncomment the line: #org.neo4j.server.webserver.address=0.0.0.0 3. Check that SSL access is enabled: org.neo4j.server.webserver.https.enabled=true 4. restart the Neo4j Server:
sudo /etc/init.d/neo4j-server restartsudo -u neo4j service neo4j-service restart