Measuring the health of open source communities


Jono Bacon at DevXcon

Jono Bacon at DevXcon

Developer community consultant, Jono Bacon, spoke at DevXcon 2017 on how to measure the health of open source communities.

Transcript of Jono’s talk

So a bit of background. I led community at XPRIZE, GitHub, and Canonical. I’m really passionate about how we build powerful, engaged, productive communities. I wrote a book called “The Art of Community,” ran a couple of conferences, one call the the Community Leadership Summit and it just happened. And most recently I’ve been consulting. I kicked off a consultancy practice about a year ago where I work with companies to help them to build communities either internally or externally. And one of the reasons why I did this, it wasn’t really for all that sweet, sweet consulting money. It’s mainly because I wanna essentially expose myself to as much content and challenges and as many problems as possible so I can try and understand every nuance of how this stuff works.

And one of the things that I’ve enjoyed, these are some of my clients, is that this is a really broad swath of people. It’s financial services, it’s big hardware companies, it’s small startups. And each of these organizations has got a distinctive set of challenges that are in some ways unique and in some ways fairly consistent. And one of the things that all of these clients that I’ve worked with and every community that I’ve worked with across my career, everything that everybody is interested in is how do you build health, right? Now, there’s an irony here because I am probably the most unhealthy person in this room right now. I hate exercise, I like drinking, I love Taco Bell. I know some of you are looking at me as if to say, “How does he like Taco Bell?” You people are liars, you like Taco Bell as well.

But how do we build healthy communities? And I’m gonna share with you some of my perspectives. But I’m not an expert on this, right? This is just some ideas, so take this for what it’s worth and hopefully, some of this will be useful. I believe that the primary goal we’ve got in building any kind of community is that we want to influence some desired behavioral patterns, right? We want people to be productive, to do great work. People wanna do work that has impact, that has meaning. We wanna build diverse and inclusive communities that respect each other. So to me, what makes this whole area and body of work really interesting is that it provides an opportunity for us to understand that connective tissue between people and technology, and to shape it in such a way that spurs the right kind of behavioral patterns. That’s how I describe it. Another way of describing it is social engineering, which is also accurate. But we want to really create these kinds of positive behavioral outcomes.

Now, the negative thing when we’re thinking about metrics and thinking about health is that a lot of people try to do one of this. They build a huge dashboard full of numbers and people judge it on how complex it is. “Wow, it’s really complicated. There’s probably quite…you know a lot of good stuff in there.” So to me, we wanna avoid this, right? Dashboards and dashboards full of data are meaningless, right? I’m much more interested in focusing on a set of outcomes and how do we derive data that can influence and understand those outcomes?

The other thing that in my mind is a goal for this kind of work is for us to differentiate between what I call the maze runners and the detectives. I have a peanut-sized brain and as such, I need to try and understand the world in simple terms. And I like to take a bird’s eye view and break it down into smaller pieces. And I think there are broadly two types of people in the world. There are certain people who really need a lot of structure. They need workflow, they need process, and they like to operate based upon that structure and process. And then there are other people who are just kind of shoot from the hip. Will, you know, try and do what they can do to understand how things are working. And these are two very different psychologies, right? And our communities will be filled with both. What’s more interesting to me is that we want to build our structure, and process, and workflow that helps influence those positive behavioral patterns but we don’t wanna do it in a way that’s so rigid that you actually perturb the people who like to be more creative or approach it in a slightly different way.

So what we wanna do here is use data as a guide but not data as a, you know, like a monorail that sticks us to the road and that’s the only way in which we can operate. So the way I tend to think of this is that there’s two types of information that we need to assess when building health in our communities and in understanding health. The first is tangible data. This is stuff you can measure with a computer, right? Examples of this include pull requests, issues, governance, like whether people participate in governance bodies. Testing. Do people write test? Do people run tests? Do people modify tests? Support. Things like Q&A sites, like Stack Overflow or Askbot. These are all tangible things that we can pull data out of and understand, you know, what that is doing to determine the particular health of our community.

And we see tools such as Bitergia, you know, with Bitergia Analytics. This is a really awesome product that this company, I think they’re based out of Spain, are building, which basically allows you to essentially suck up a load of open source metrics and understand them. The thing about this approach is that this pulls in lots of different individual metrics. It pulls in, you know, pull request. How many people are submitting pull requests? How many people are merging pull requests? How many issues are being created and contributed to it? This provides a good way of determining the kind of activity of what is going on, right? And I think this is a good starting point because with this we can start seeing some patterns that are emerging. So I think that is one approach.

The second approach in which we process tangible data is to look at the amalgamation of different data points to see what that story looks like. How many of you here are familiar with something called Discourse? It’s a forum. Not Disqus, which is a crappy commenting system. So Discourse has got built into it something called the trust system, where basically as people participate in the forum, this amalgamation of data points will essentially help them to go up a trust level. So trust level zero, this is actually from the forum that I’m…that I run, there’s 458 people who are trust level zero. These are basically people who are new, who haven’t reached that first base level. And then as participate more and more, they go up the rankings. So, for example, here, trust level three, these are two people who are super active, right? And this provides a nice way in which we can reach out to those people and determine…and support them, and guide them, and help them to influence our community.

So Discourse was co-created by a guy called Jeff Atwood, who in my mind is one of the most talented people in the business at building these kinds of platforms. He also co-created Stack Overflow. And, as an example, going up to trust level one from zero means that you have to have entered at least five topics, you have to have read at least 30 posts and spent a total of 10 minutes reading posts. So it’s not just one data point. It’s not just how many posts, or how many responses, or how many likes, it’s a mixture of these things. I think this creates a really interesting way when we tune those metrics effectively to very easily process data in such a way that we can determine quality of contribution. But in a real way not in a…you know, how many coffee beans have you got on your forum, you know, kind of way.

So that’s the tangible. So I break it into those two areas. When you think about the health of your community, how do you look at the individual data points, which are useful, but how do you look at the overall user journey and experience? Because my view is the way in which we build brilliant, productive, fun and inclusive communities is that we expect them to be a journey. People start out when they’re brand-new. They have no context, they have no relationships. It’s nerve-wracking. Everything happens out in the open, and then they grow and they grow and grow, and they go through different phases of how we engage with them.

The other type of data that we can use to assess the health of communities, in my mind, is intangible data, right? This is the hard bit. This is happiness, personal development, relationships. Is it a rewarding place to be? This is the human bit. And, you know, the tangible, with most of the companies who I work with, they’re very interested in the tangible stuff. Like, “How many pull requests am I getting?” But everybody in this room knows that this is really where the importance lies, right? People don’t stick around in communities unless they feel satisfied and engaged, and participatory. And there’s such a fine line and a fine set of nuances in somebody going from feeling empowered and engaged in their community to feeling like you’re on a treadmill. So, to me, measuring this is really important. The challenge is that it’s really hard to measure this. And I know I’m basically singing to the choir here because all of you have tried to measure this at some point in your careers and it’s really tough.

I think the easy way to do this that a lot of people think of, and I’ve certainly thought of in the past is surveys, right? Just every so often you put together a survey, you knock it out. And survey suck. I mean, I’ve got nothing…I don’t know if anyone here works on Google Forms or SurveyMonkey or anything like that. Your companies are brilliant but surveys, this is not the best tool, in my mind, for this. It is a good tool in some ways because it’s a way of kind of reaching out to people and people providing some kind of feedback, but the problem with this is that it’s an artificial environment. What you’re doing is you’re asking people to share their perception of their own opinions and their own emotions. And they will tune it to the audience. I love it when companies send out their anonymous surveys, right? Like that’s gonna get real feedback. And everybody is thinking, “Yeah, how anonymous is this,” right? And so people naturally self-select the information they’re gonna present. So to me, this, again, is a good way of gathering feedback but it’s imperfect, and we need to use it as a data point, not as an explicit articulation of intent.

A classic example of this are events, right? How many people in this room have run an event before? I figured quite a few. You know, a lot of people send an event out…an event survey out the day after and say, “How did it go?” And people provide their feedback and they use that to…you know, to improve the next event. The problem with that is that I think the psychological state that you’re in at the end of the event and then the following morning when you’ve gone to bed, you’ve woken up, you’re thinking about what you go to do, you know, during the day, your cat’s already thrown up on the floor and you’ve cleaned that up, right? All these different things is very, very different. To me, a better way of determining the humanistic elements is what I call observed data points, is that what we do is we look at, again, in a similar way to the Discourse example, we look at all of these different data points and we pull them together to get a determination of, you know, of quality.

So, as an example, imagine you’ve got…This is just a random picture for meetup of Google Images search. Imagine you’ve got an event, a meetup, you could send that survey out. To me, a better way of gathering or determination of the quality of the event, how people…how happy people were is to look at things like, how many people sat close to the front? How many people were on their laptops while they were watching the talk? How many people were tweeting out to the # while they were watching it? How many hands went up for questions? Look at the eye contact between the speaker and the audience. These are all things that you can do just sat at the side of the stage making just observations. Like, this doesn’t have to be super data-driven, but it gives you a good determination of what’s going on, right? And I think when you pull these things together, you can then identify the topics that are interesting, how engaging the speaker was, how engaged the audience was and things like that. So I’m a really big believer that tracking health in communities really doesn’t have to be that scientific. What it has to be is realistic in the way in which we count things and realistic in the way in which we assess those things to make improvements.

The other element I’d say here is that when we’re tracking this kind of stuff, it’s important to look at metrics from an individual perspective and a community perspective, right? Track the experience of the individual and how happy they are and how empowered they are, but also look at the broader community. Look at, you know, the overall experience of people across different segments in your community. There’s some really interesting behavioral economics in the relatedness of people. Like the cultures in companies between…we all know the difference between sales, and marketing, and engineering in a company, right? These are very different cultures. But we have the same thing in our communities, the translators, and developers, and whatever else. And I think looking at those cultural affirmations of health is also an important way in which we can get some good data. So that’s how I tend to think of things in terms of counting and what we count. Stay away from the big dashboard full of crap and focus on individual things that can actually help us to answer questions.

Now I wanna talk a little bit about the blueprint of this. Like, what does an individual…when we’re assessing individual metrics, when we’re assessing individual pieces here, what should we be looking at? The way I tend to think of things, and, again this is by no means gospel, is that when we’re counting things, there’s essentially two components in each thing that we count. There is the activity and then there is the validation of that activity. A lot of people really just focus on the activity. Like, what did someone do? You know, what was the tangible outcome that they did? But the validation is a really important piece, which I’ll get to in a second. But let’s first of all look at this activity component.

This is the kind of stuff that you see in here. It’s just stuff that people did. Kicking off new issues, commenting on issues, closing issues, submitting new pull requests, reviewing pull requests, merging pull requests, visiting websites. It’s all these individual bits of data that we can assess. Now, I’ve always been of the view, well, I say always, I mean, since I…for probably the last 10 years, that the goal that we’ve got here is to build significant and sustained contributions. We don’t want just one or the other. We don’t want just people like building lots of stuff for a week and a half because those people are very expensive in times and in some…in time and sometimes money to spin up. We want strong, significant periods of growth, okay? Across a diverse range of areas from a diverse range of people. So I think that to get that retention, what you need is you need to be constantly counting the things that people are doing. So these are the things, you know, that we count. Things like pull requests, you know, as I mentioned.

But one of the things that we need to do is to be able to have some kind of number that determines that longevity of contribution. I used to work with a company called HackerOne. Anyone heard of HackerOne in here? Oh, wow. Cool. So HackerOne have a product where you can submit security reports for issues. They basically pay people to go and hack stuff, which is quite cool. They break down contributions in three primary areas. One is reputation, signal and…well, it’s reputation, signal, and impact. Within the context of tracking activity, I think reputation is the most interesting piece. So, essentially, each time you submit a report to HackerOne, if it’s approved and it’s validated, you get seven points. If it’s not as good, you maybe get three points. If it’s bad you get minus two points, things like that. So this shows you the longevity that someone has been participating in. Now, this is good. This gives us a sense of how long someone has been on the platform. What it doesn’t tell us is quality. There may be people who shot-gun lots and lots of reports that are actually fairly poor quality. But this is a good indicator. Unfortunately, some communities only track reputation. They only track things like this, and that doesn’t give us a good indication of quality.

Now, what is important, and HackerOne doesn’t currently do this, in my mind, is that that should decay. There should be the notion of decay in reputation tracking in that if we only keep adding numbers on and we don’t decay things when someone hasn’t contributed for a period of time, you’re creating a lead environment where the people who are newer can never catch up with people who started earlier on. So people who started before have a fundamental advantage that nobody else will ever be able to surpass. So decay, I think, is…and the right level of decay. I personally like saying, you know, you basically deduct X percentage or X number of points each week or something like that to help that decay happen. So if you don’t do anything for a year then your reputation goes down.

Now, let’s look at validation. To me, the way in which we determine health or quality is how many actions that somebody performed have been validated as high quality, right? So going back to the HackerOne stats, this is what signal is. This is the average number of points per report submitted to the platform, right? So given the fact that you can get seven points max for a report, you know, Mr Hack 5.20 is a pretty damn good average for a report. This shows that someone has been around for a long time and does amazing work, okay? There’s loads of different ways in which we can think about validation. Things like merged pull requests, right? A pull request, I mean, a pull request is really cool, right? It’s a great contribution, but when it’s been merged, it means it’s been reviewed by the community. It’s been deemed high quality. It’s been accepted. So we know that is kind of the higher end of things.

And it doesn’t need to be necessarily super rigorous as well. You know, again, this is Discourse. You’ve got the notion of likes. Someone posts something to a form and somebody else likes it. That is a form of validation. It’s a form of quality. It might not necessarily stand up in court, but it is a good sign of quality. The same thing can apply to individual posts as well. How many likes did a particular thread get? That can give you insights into the kind of topics that are interesting to people.

Now, there is something to bear in mind when thinking about this is…and I don’t wanna go too off-topic here, but I think this is relevant to the conversation. There’s something called the Yerkes–Dodson scale, right? This is basically…so you’ve got this access up here, which is the performance of somebody who is contributing, and then here are the number of incentives, rewards and other kind of carrots and sticks that you can dangle in front of them. There is a perception, and I’ll talk a little bit about incentives in a bit, that if you keep giving people stuff, that their performance will keep growing, right? So here people will…I’m sure lots of you here have given out T-shirts and challenge coins, all kinds of stuff, but there is a point up here where people are getting so much stuff that they then start thinking more about, “Do I keep getting this stuff than actually objectively participating?” And you reach a peak. There isn’t a huge amount of rigorous data on this and how we apply it into our communities, but to me, a big chunk of this work in just develop relations and community management is taking notions such as this and seeing how we can use this as a lens to look through our communities. So there is a natural limit on how much you can reward people for participation. And as we generate data such as this, such as, you know, this intangible and tangible data, we need to think carefully therefore about how we reward people with extrinsic and intrinsic rewards so we don’t hit this issue.

So, you know, I’m coming up to the…towards the end of my time and I just wanted to talk a little bit about how we consume this information, you know. So so far we’ve identified that we don’t wanna build a dashboard full of stuff. We wanna track both the tangible and the intangible, and we can do that through tracking individual pieces of data or we can also identify like the overall journey from aggregate pieces of data that are pulled together, whether that’s observational data events or whether that’s pulling it together in systems such as Discourse, whatever.

So how do we consume this? How do we take this and action on top of it? There’s a couple of a few things that I would recommend. The first thing is focus on specific outcomes. When you get this information, try and pull out some specific things that you want to action based upon that information. I’m a big believer in two things in the work that I do. And I’m sure you all share the same viewpoint. One is, you know, fast iteration, see how it went, and then change it for the future based upon results. And then the second thing is to embrace failure, right? We all screw up, we all make poor choices, and let’s use that as another data point for us to improve as well. So I’m big believer in like fast iteration on different things, try things out for a month, make a change, see if it moves the needle, if it doesn’t then so be it.

The other thing we can do here is to assess the onboarding complexity. You know, when somebody goes from, “I’m starting out on something to making my first valid contribution,” whether it’s writing an app or whether it’s consuming an API, whatever it might be, is there’s a whole set of things that need to happen for that person to successfully do that. Each of those things we can plumb metrics in to track that work, to determine the success of that so people don’t get “stuck” in the funnel.

The other thing is, yeah, failure stakes. I like to design for failure. When you’re building out systems, when you’re building out processes, look at ways in which people are almost certainly gonna screw it up, or in which it’s gonna break, and use that and track that data as well. People don’t often like tracking things about, you know, things that might go wrong because of an insecurity that we might look bad. But I think it sends quite the opposite message.

The other as well that we should be tracking here is…or using this data for is community retention. This is a tricky one because communities are generational, right? There are communities that will have people who will hang around for two or three years and then they’ll take off and then, you know, other people will join. And it can be easy to assume that if someone leaves it’s because they left because they’re frustrated or there’s something wrong. That’s not always necessarily the case. Often generations kind off come in and people get interested in other things or whatever else. And the real thing I think we can assess here when we gather the right kind of data, is determining stagnation. At what point does someone feel like they’re stagnated and they’re losing the sense of challenge, they’re losing the sense of interest? This is one of the reasons why I personally like to segment communities into different groups and engage them in different ways. And I don’t mean groups based upon any demographic, I just primarily mean like how long they’ve been around. Are they new? Are they regulars? Are they core people in the community? Because then you can kind of keep that…keep the romance alive in your community essentially.

Another element here is reporting. I’m not gonna delve into this too deeply, but when we get this kind of information, we typically need to…with any kind of community that we’re building, we often have to report to different demographics, right? Executives, middle management, and our community members. One thing I have learned over my career is people in these different levels of an organization consume information very, very differently, right? If you send an email to a CEO that has more than five bullet points then they’re not reading your email. Quite the opposite with community members. A lot of community members like depth, they like the content, they like material. So I think it’s important that when we think about these data points that we’ve got and these recommendations that we have, that we really tune them to our audience carefully.

I already touched on incentives a little bit earlier on but I think there’s tremendous opportunity when we have this so we can render that information as a means to encourage and design a set of incentives. I actually just gave a talk at OSCON about this a couple of weeks ago. I think of two types of incentives. One is what I call a submarine incentive, which is where someone gets a reward and…that seemingly is out of the blue but it was actually designed all along. They just didn’t know about it. And then the second type of incentive is the stated incentive, where you say, “If you do this thing you’ll get this thing.” The risk of that, of course, is that it can be gamed. Speaking of which gamification, like gamification. But people think of gamification as primarily just badges. And it’s way more than that. Again, I think that Jeff Atwood, who, you know, was the co-creator of Discourse and Stack Overflow is one of the greatest minds in gamification. You know, he describes himself as building multiplayer system, like multiplayer games for people, just where it happens to get people being productive. So I’d recommend you check out one of his talks to get a sense of. The thing about gamification is that badges, from my experience and from the work that I’ve seen, generally work best with people who are new. Because once you get to 20 badges, then you tend to get bored and you tend to move on. There’s an exception of that with video games because in video games it’s a long experience. It’s not necessarily the same thing over and over again

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.