Welcome!

OpenMovieData is a project that aims to provide a comprehensive dataset of movies and their associated data. The dataset is intended to be used for data analysis and machine learning projects. Furthermore, these "raw" datasets will be integrated into a Neo4j Property Graph Database to provide a more structured and queryable dataset.

Installation/Usage

Requirements

  • A Neo4j database
  • Python 3.9+

Importing to Neo4j

If you have a Neo4j database and wish to import the dataset, you can use the provided script, `import_script.py`. Below are the steps to follow:

  1. Ensure that you have Neo4j running locally or have access to your remote Neo4j database.
  2. Open a terminal and navigate to the `graph` directory in the cloned repository.
  3. Create a new Python virtual environment using the command `python3 -m venv venv`. This will create a new directory called `venv` which will contain the Python interpreter and any packages you install.
  4. Activate the virtual environment. On Unix or MacOS, use the command `source venv/bin/activate`. On Windows, use `venv\Scripts\activate`.
  5. Install the required packages with the command `pip install -r requirements.txt`.
  6. Set your Neo4j connection details (URI, username, and password) as environment variables. You can set these variables in your terminal or add them to your .env file if you have one.
  7. Run the script using Python with the command `python import_script.py`.
  8. The script will now connect to your Neo4j database and begin the import process. Please note that this may take some time depending on the size of the dataset.

Please note that this guide assumes you have Python installed and are familiar with setting environment variables. If you encounter any issues, please open an issue on the GitHub repository.

Contributing

Any contributions to the project are welcome. If you have any suggestions or ideas, please open an issue. If you want to contribute to the code or data, please open a pull request.

Maintenance

This project will be updated annually, after the Oscars have been awarded. The data will be updated to include the new winners and nominees. Furthermore, the data will be updated to include the new movies that have been released in the past year.

License

Licensing information can be found on the Licence page.

Citations

If you use this dataset in your research, please cite this Github repository with the following citation:

@misc{OpenMovieData,
  author = {Luka van den Boogaard, Marlou Gielen, Sander Moonemans, Luc Siecker},
  title = {OpenMovieData},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://sandermoon.github.io/OpenMovieData/}}

Furthermore, please notify the maintainers of this project of your publication, so that we can add it to the list of publications that use this dataset in the `citations` folder. This will allow us to track any changes in the dataset and to keep track of the usage of the dataset.

Contact

Any questions or comments can be directed to Sander Moonemans