Connecting to the Clinical Knowledge Graph database

In order to make use of the CKG database you just built, we need to connect to it and be able to query for data. This connection is established via Neo4j’s Python driver neo4j <https://neo4j.com/docs/api/python-driver/current/api.html/>, a library and comprehensive toolkit developed to enable working with Neo4j from within Python applications, and should already be installed in your virtual environment.

Another essential tool when working with Neo4j databases, is the Cypher query language. We recommend becoming familiar with it, to understand the queries used in the different analyses.

Neo4j connector

Note that this section is for illustration purposes only.

Within the CKG package, the graph_connector module was created to connect the different parts of the Python code, to the Neo4j database and allow their interaction.

In this module, the Graph class from Neo4j is used to represent the graph data storage space within the Neo4j database, and a YAML configuration file is parsed to retrieve the connection details. The configuration file connector_config.yml contains the database server host name, server port, user to authenticate as, and password to use for authentication.

from ckg.graphdb_connector import connector
driver = connector.getGraphDatabaseConnectionConfiguration()

Once the connection is established, the database can be queried. For example:

example_query = 'MATCH (p:Project)-[:HAS_ENROLLED]-(s:Subject) RETURN p.id as project_id, COUNT(s) as n_subjects'
results = connector.getCursorData(driver=driver, query=example_query, parameters={})

This query searches the database for all the available projects and counts how many subjects have been enrolled in each one, returning a pandas DataFrame with “project_id” and “n_subjects” as columns.

Changing/Updating database connection

The connection to the graph database requires credentials, which are stored in graphdb_connector/connector_config.yml. This files includes the following lines:

db_url: "0.0.0.0"
#dbPort = 7688 #Production environment
db_port: 7687 #Test environment
db_user: "neo4j"
db_password: "NeO4J"

The initial password to create a new Neo4j database is set to NeO4J. If you would like to use another password when creating the database, you can edit the mentioned file and replace NeO4J with any other password of your choosing. Another option is to change the password directly in the database by accessing Manage in the Neo4j desktop window, select the tab Administration and then set the new password. Ultimately, make sure that the password in graphdb_connector/connector_config.yml and in the Neo4j database are the same.