Github Forks, Collaborators, Watchers

What it is

Github is a software development platform that supports code sharing and reuse. It is most often used for collaboration in software and code development, but any file type can be stored. Since Github is based on Git, a powerful version control tool, it stores a detailed history of the changes made to all files in the repository. This change data is used to generate the metrics described here.

How it works

Users can create “Forks” of a repository (i.e., a project or a group of files) to modify, extend, or create a copy. Activity is the term used to describe changes, such as modifying existing files, creating new files, and deleting files. A “collaborator” is another Github user who is able to perform many actions on the files within the repository, including edits. Users can “watch” a repository to be notified of activity in a repository. Watching a repository is similar to following an RSS feed.

What to keep in mind

  • Forks and activity within those forked repositories, or copies, can indicate use or reuse of your work. For example, someone might fork your repository to suggest changes or as the basis for a new project.
  • The identities of your Github collaborators may offer evidence of your engagement with particular communities, or possibly wide interest.
  • Github watchers can indicate interest in your repository or project.
  • Understanding who is interacting with your repositories may be difficult. Not all Github users provide detailed information about their role, affiliation, and identity. It may not be possible to connect Github users with authentic identities or academic affiliations.
  • Go beyond the numbers to understand the value of these metrics. Simple counts of forks, collaborators, and watchers do not provide sufficient context about how or by whom the repository files are being used. There is not yet enough research evidence to indicate the meaning and value of Github statistics. For example, users might watch a repository for many different reasons. Exploring the specific activities and individuals may be helpful for understanding who is engaging with and reusing the repository content, as well as what kinds of activities are taking place. Factors such as programming language, repository owner, type of repository content (i.e., systems software, web libraries and frameworks, applications, etc.), among others may affect attention and subsequent interaction (Borges et al, 2016).
  • Sources for these metrics are Github and PlumX (available through subscription only).

Learn more

  • Dozmorov, M. G. (2018). GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software. Frontiers in Bioengineering and Biotechnology, 6. https://doi.org/10.3389/fbioe.2018.00198
  • Li, K., Chen, P.-Y., & Yan, E. (2019). Challenges of measuring software impact through citations: An examination of the lme4 R package. Journal of Informetrics, 13(1), 449–461. https://doi.org/10.1016/j.joi.2019.02.007

Citations, software

Last updated April 2022