Blog Post

Inside the CHAOSS Data Science Working Group

By January 22, 2025No Comments
A bunch of white ones and zeros on a black background with a heart shape of ones and zeros in the middle

The Featured Image is a Photo by Alexander Sinn on Unsplash

We had our first CHAOSS Data Science Working Group meeting on August 16, 2023, and we’ve been very busy since then, so we thought it was about time we provided an update on what we’ve done and how you can participate!

The first thing we did within this new Data Science Working Group (WG) was to run a survey to better understand the challenges that CHAOSS users had with using our tools and metrics with the results published in October 2023. We found that installing our software continues to be the biggest challenge, and this feedback (along with detailed comments from the survey) was shared with the software teams. We also learned that finding data and drawing insights from the data are also top challenges, which became a key focus area for our CHAOSS data science efforts.

To address this challenge, we created the Practitioner Guide Series, which is designed to help you understand how to interpret the data about an open source project and develop insights that can help you improve project health. These guides are all under an MIT license so that you can contribute to the guides or fork and modify them to meet the needs of your organization. So far, we’ve released the following guides:

But we aren’t stopping there! We have other guides currently in progress, proposed guides that someone could start writing, and we’re always looking for ideas for new guides! We have an issue template if you want to propose a new Practitioner Guide. You can propose a guide that you want to write or a guide that you want to use, but don’t have the expertise to write yourself, and we’ll try to find someone else to write it. You can learn more about the Practitioner Guide Series by reading the guides, watching our short video series (~15 min) about the guides, or listen to the audio podcasts we’ve recorded about the guides.

In addition to the Practitioner Guides, we’ve also kicked off a few data science projects. The goal of these projects is to use CHAOSS metrics and data to dive into the details of a topic that is of interest to the broader CHAOSS community and share what we’ve learned via research reports and papers. The one that we’ve made the most progress on so far is the Relicensing and Forks Case Study. We already produced one paper that was presented at the Open Forum Academy Symposium in November, but that was just the first phase of the research. We still have much more to do, and if you’d like to help, we have details about the project in our repository where you can learn more about participating. We’re also actively working to define some other data science projects to get more people working on projects with us.

All you need to join the Data Science WG is an interest in using data to understand the open source world around us. Most of our work is analysis of data, writing guides, and discussions about using metrics. You don’t need any special skills, and you don’t need to know any advanced statistics, machine learning, or AI. To learn more, visit our repository, join our meetings, or reach out to us in the #wg-data-science channel in CHAOSS Slack. We hope you’ll join us!

Author