
It’s been about four months since the last Data Science Working Group update, and we’ve made a lot of progress in those four months!
Practitioner Guides
Last week, we launched two new Practitioner Guides: Getting Started with Sunsetting an Open Source Project and Getting Started with Building Diverse Leadership. A huge thank you to Peculiar C. Umeh for driving the work on the diverse leadership guide! These two new guides join the 5 existing guides: Introduction, Contributor Sustainability, Responsiveness, Organizational Participation, and Security.
The Practitioner Guides were also featured in the June edition of IEEE’s Computer magazine in an article titled, From Data to Action: Building Healthy and Sustainable Open Source Projects (PDF).
We also have other guides currently being written, and we have an issue template if you want to propose a new guide. You can propose a guide that you want to write or a guide that you want to use, but don’t have the expertise to write yourself, and we’ll try to find someone else to write it.
Research
We run several research projects from within the CHAOSS Data Science WG.
We’ve continued to gain traction on the project to better understand the dynamics associated with projects that are relicensed from open source and put under proprietary licenses along with the forks that result from those events. The power dynamics that result from relicensing and forks were explored in more detail in a recent post on The New Stack: Clouds, Code, and Control: The New Open Source Power Struggle. We’ve also presented about the data at several conferences:
- FOSDEM Panel
- State of Open Con: Meet the Forkers keynote and a panel
- Linux Foundation Member Summit: New Trajectories: Rug Pulls, Relicensing, and Hard Forks in OSS
- Monkigras: Power Dynamics, Rug Pulls, and Other Corporate Impacts on OSS Sustainability
We’re continuing to build on this research by looking into how relicensing and forks impact usage of these projects (using stars and social forks as a proxy / indicator of usage). Chan Voong and Peculiar C. Umeh are also starting to explore other project health metrics to explore the impact of relicensing and forks beyond the organizational affiliation and usage data that we’ve focused on so far.
We’re also doing some research on open source projects that move from private ownership and into a foundation. So far, we’re in the data collection phase and are working to get it into a format that we can use for more analysis. A huge thank you to Sal Kimmich and the CHAOSS Africa Researchers team who have been driving this work.
We also have a few other projects that are just getting started about archived projects and taxonomies.
Join Us!
Chan is running the CHAOSS Data Science Hackathon, which will be co-located with Open Source Summit North America and CHAOSScon in Denver, CO on June 26, 2025. You have until June 20th to register for the hackathon, and space is limited. The goal of this hackathon is to drive interest in CHAOSS data science projects while making progress on one or more of our projects. We plan to have a variety of activities so that everyone will have something to contribute whether they are new to data science or an experienced professional.
All you need to join the Data Science WG is an interest in using data to understand the open source world around us. Most of our work is analysis of data, writing guides, and discussions about using metrics. You don’t need any special skills, and you don’t need to know any advanced statistics, machine learning, or AI. To learn more, visit our repository, join our meetings, or reach out to us in the #wg-data-science channel in CHAOSS Slack. We hope you’ll join us!