Skip to content

climate-service-center/git-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Git tutorial

Collaborate, communicate and publish code using git and github/gitlab.

This should be a very basic guide on how to publish your codes, software and data to make it available to a broader public or just your colleagues next door (Read here why YOU should do it!). Please be aware: this guid is more or less a compilation of existing resources on this topic! Note that this guide does not claim to be finished, the whole truth or perfect at all. If you have experience with writing scientific codes, you might find this guide incomplete. If you usually write code only for your own without sharing it, you might be overwhelmed. However, to cover some common ground, we want to raise awareness and focus your attention on your code being an integral part of your scientific work and publication.

Motivation

Most of our publications and products are based on data processing and analysis. Good scientific practice also includes reproducibility and reusability of codes and data as key and they are central concerns of the Helmholtz Open Science Initiative. In fact, making your research codes and data tidy, reproducible and reusable is nowadays much easier and fun than most scientists think, and it can considerably improve your ability to cooperate with your peers and colleagues in a transparent and trustworthy environment.

grafik

Preparation

To preare for the tutorial, you should check if you have git installed and can use it in a terminal. If you have no git installation, you can follow the instructions of the official documentation. You will also need to use a platform for sharing your code, e.g., make sure you have a github account you can access or use your gitlab account at the Helmholtz codebase. For the Helmholtz codebase, you can use any Helmholtz account to login via the Helmholtz AAI. You will also need to be able to push to github/gitlab for which you have to create and register an ssh key with github/gitlab. For this, use an existing or create a new ssh key on your local computer and store the public key, e.g., in your github profile settings and/or your gitlab account. For more detailed instructions, please also follow, e.g., the official github documentation.

If you are a Windows user with no experience in git or using the terminal, we recommend the following based:

How to setup git on windows

  • Install git on your windows machine (if is not installed already). You may invoke in your Windows Power Shell the following command (assuming your default location is U:
    U:\> winget install --id Git.Git -e --source winget
    This should install all software of the git for Windows tool set.
  • Open "git GUI" and go to the menu "Help/show SSH key"
  • If no key is found, then click on "Generate key" to generate a new ssh key. If you get an error concerning missing .ssh folders, you might have to create that yourself, e.g., mkdir .ssh.
  • Copy the content of the public ssh key.
  • Go to github/gitlab in your web browser and in the settings menu go to SSH and GPG keys section. Click on "New ssh key" and paste your public ssh key.
  • Open a "git bash" terminal and test your git connection by invoking, e.g., the following command:

Planning

The publication of code and data should receive the same focus of attention and planning as a classical scientific paper publication. It should be an important part when you plan a project. The best thing to do is to always start writing your code having this in mind and ask yourself honestly:

  • Will somebody else be able to understand what i did?
  • Will he/she be able to run my code and reproduce my results/plots (without complicated explanations)?
  • How can i make life easier for them?

If you have these things in mind during writing your code, you are already on a good way.

Luckily, there are some very helpful tools and methods to help you getting your code organized. The Helmholtz Open Science Seminar has presented some very helpful guidance and a factsheet to help researchers getting their code on track for easier collaboration, reproducibility and fun! Here is an excerpt from the factsheet

grafik

Start with a project repository

Even when you have not written any code yet, you should start your project by creating a project repository, e.g., on github or the Helmholtz codebase. This can be a great landing page for your project. If you start by writing a comprehensive README.md, you can simply refer colleagues and collaborators to your project page where they can find all neccessary information without you having to explain it all over again. You can also structure your project more efficiently by using the repositories issue management. You can also check out the github/gitlab pages feature which will enable you to create nice webpages easily from your project repository.

Checklist for publication

If you publish your code, you should be aware of some basic technical requirements that should be checked. If you have followed some of the advice above, you should easily be ready to publish. However, the minimum requirements are:

  • A publically accessible project repository.
  • A README, preferrably in markdown format that should include some information on your project, further links and basic technical documentation.
  • License (check if you have used GPL licensed libraries!) See also here and here!
  • environment.yml or requirements.txt file that defines software dependecies.
  • DOI, e.g., using zenodo, works well with github.

You can use howfairis to automatically check some basic requirements of FAIR principles for your project repository.

Examples

Here is a short list of publications by GERICS employees which might be good examples for code and data publication:

Further fun reading

Releases

No releases published

Packages

No packages published