Summary and Schedule
ATTENTION This is an experimental test of The Carpentries Workbench lesson infrastructure. It was automatically converted from the source lesson via the lesson transition script.
If anything seems off, please contact Zhian Kamvar zkamvar@carpentries.org
The Managing Open and Reproducible Computational Projects training material covers best practices for managing and supervising computational projects in biology and related fields through data science methods, analysis, interpretation, and reporting processes. Through lessons learned in this training, researchers will enhance their understanding and guide the integration of rigorous and reproducible scientific methods for designing reproducible, transparent and collaborative computational projects. Furthermore, the guidance provided for managing and supervising early career researchers in conducting computational (data-driven/informed) research will help ensure transparency and research integrity throughout the project design, methodology, analysis, interpretation and reporting process.
This training material is developed under the Data Science for Biomedical Scientists project. It massively reuses The Turing Way chapters and builds on The Carpentries and Open Life Science practices. Hosted by the Tools, practices and systems (TPS) research team, all materials are shared under CC-BY 4.0 License. Although the training course is tailored to the biomedical sciences community, materials will be generally transferable and directly relevant for data science projects across different domains. Anyone interested in collaboration and improvements of this material is welcome to connect with the development team on GitHub (see the repository).
Funding Acknowledgement: The first iteration of this project was funded by The Alan Turing Institute - AI for Science and Government (ASG) Research Programme from October 2021 to March 2022.
{% comment %} This is a comment in Liquid {% endcomment %}
Prerequisites
This resource is designed for experimental biologists, biomedical researchers and adjacent communities, with a focus on two key professional/career groups:
- Group leaders or lab managers without prior experience with Data Science or management of computational projects
- Postdoc and lab scientists (next-generation senior leaders) interested in enabling the integration of computational science into biosciences
In defining the scope of this project for our target audience, we make some assumptions about the learner groups:
- Our learners have a good understanding of designing or contributing to a scientific project throughout its lifecycle.
- They have a computational project in mind for which funding and research ethics approval have been received.
- We also assume that the research team of any size is (either partially or fully) established.
This lesson is developed alongside the Introduction to Data Science and AI for senior researchers lesson. Our learners are encouraged to go through Introduction to Data Science and AI for senior researchers lesson to learn about data science and AI/ML practices that could be relevant to life science domains, where the best practices for Managing Open and Reproducible Computational Projects can be practically applied.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to this course |
What is the purpose of this training? Who are the target audience? What will they learn at the end of this training? |
Duration: 00h 10m | 2. Better and faster research ! |
How does this training relate to your work? What are the benefits of using data science skills? What are the challenges for teams and management? |
Duration: 00h 45m | 3. What is special in data science project ? |
Get an overview of the training material understand how the different aspects of this material relates to one another |
Duration: 01h 00m | 4. Reproducibility |
How to build reproducible analysis? how to deal with dependencies? |
Duration: 01h 10m | 5. An introduction to version control |
What is version control? Why using git ? How is version control system relevant for biomedical research? |
Duration: 01h 50m | 6. Setting up a computational project |
How to set up a computational project? What main concerns and challenges exist and how to address them? How to create a project repository for sharing, collaboration and an intention to release? |
Duration: 02h 20m | 7. Implementing tools and methods during the project |
How to manage and oversee tasks and track progress of your
projects? How collaborative practices help ensure code quality, testing and reuse? What is literate programming and how does it help with early communication, testing and collaboration? |
Duration: 02h 50m | 8. Research Data Management |
What is considered research data? How to start building a research data management plan? What is FAIR principles for data management? Why care about documentation and metadata standards? |
Duration: 03h 40m | 9. Fostering documentation | |
Duration: 03h 50m | 10. Scientific rigour with code |
Is analysis with code more rigourous ? What is p-hacking? |
Duration: 04h 00m | 11. Coding basics |
What is the role of data wrangling? What is literate programming? How to use data visualisation for insight and communication? |
Duration: 04h 20m | 12. Code testing and Review |
What are the main objectives and best practices for testing and
reviewing code? What can continous integration help? How can group leaders facilitate a collaborative environment for code review? |
Duration: 04h 40m | 13. Code Modularity | |
Duration: 04h 50m | 14. Publication and release |
Why should I make my research objects available? What open source tools to use for applying data science practices in bioscience? How to get your research work cited and invite more contributions to your project? |
Duration: 05h 20m | 15. Open Science Practices |
How to maintain history of contributions and contributors? How to apply open science practices to work transparently and collaborate openly? |
Duration: 05h 40m | 16. Data and code citation |
Why should I make my research objects available? What open source tools to use for applying data science practices in bioscience? How to get your research work cited and invite more contributions to your project? |
Duration: 05h 50m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
FIXME