Migrate RI dataset to Neo4j 5.x
Overview
The Neo4j database used to store the RI dataset is currently running with version 4.4, which is the LTS release of Neo4j 4.x. Since Neo4j 5 has been available for a longer period of time, it is still desirable to migrate the dataset to Neo4j 5.x at some point in the future.
The main challenge that comes with this migration is the adaption of the import script resources/import-regf3.cypher
which is currently not compatible with Neo4j 5. Most likely, the changes are related to the apoc
functions used in the script.
Currently, Neoj4 5 has no LTS release, but the final release of Neo4j 5 will be a 42-month LTS release. Since minor releases dont introduce breaking changes, it should be managable to bump to the currently supported version when a new monthly release hits.
Working steps
Instead of performing an in-place migration of the database, we should start working with an empty database of the latest Neo4j 5.x release. The goal is to migrate the import script to all changes in Neo4j and apoc in order to produce the exact same database as in Neo4j 4.4. A good attempt to check the results is by comparing the number of nodes and relationships for each label.
Once the import script was successfully migrated, the production configuration in the infrastructure
repository should be migrated to possible setting changes in Neo4j 5. Finally, the 4.4 instance and its data volume should be deleted to run a new migration on the production server. This includes temporarily making the database write-accessible with a password (as it is normally in read-only mode).
References:
- Migration reference for Neo4j 4.4 -> 5.x: https://neo4j.com/docs/upgrade-migration-guide/current/version-5/migration/reference/
- Migration guide for apoc 5: https://neo4j.com/docs/apoc/current/migration-guide/
- Neo4j 5 support model explanation: https://neo4j.com/blog/continuous-release-support-model-neo4j-5/
- Neo4j version support overview: https://neo4j.com/developer/kb/neo4j-supported-versions/