Juggling the dual life of an enterprise open source project

Rhys Arkins
Rhys Arkins
DevRelCon Earth 2020
30th to 10th June 2020
Online

What happens when a key component of an enterprise solution is also a community open source project with a life of its own?

Rhys Arkins discusses how his team manages the sometimes competing directions of enterprise software, which wants to be predictable and well-documented, versus community open source contributions which are more chaotic.

Watch the talk

Key takeaways

Takeaways coming soon!

Transcript

Rhys Arkins: Alright. So, today, I'm talking about juggling the dual life of an enterprise open source project. And actually, Steve's final point there was a good one. He talked about don't burn yourself out. And actually, I'm mostly gonna be talking on that aspect.

It's about how to have an open source project that is sort of company sponsored or company authored, but how to manage it and how to keep it manageable. Alright. So in general, open source is great. Well, Steve actually introduced it really well, but it's it's great for companies and community alike. Mean, it can have pitfalls.

Popular projects can get pulled in many directions. Once So you open source something, assuming it's not the single source like Steve talked about, people in community can want a project to go in multiple directions. And another challenge with enterprise open source is that, you know, enterprise customers do have different expectations about release frequency and stability. And then the final point is that I'll talk about it. It's important to automate to maintain quality.

So just to put one, I I top called this, like, enterprise open source. I'm using that term. I I kind of made that up. So just to or just picked it arbitrarily. So just to give you an idea what I mean by that, I mean, sort of an open source project that originates from a company rather than an individual, and which has continued, like, steering and usage by the original company.

So the type of open source project I'm talking about, because it's hard to really capture everything, as Steve mentioned, there's lots of different approaches. So one that's enterprise originated and managed, one that is widely adopted. I mean, there's no point in open sourcing things if people aren't going to use it. Importantly, I'm gonna cover the the the use case where it has an active community of contributors, maybe even maintainers. So the, you know, the non single source approach.

It contains features the managing enterprise doesn't use. Right? So, you know, there may be things that are not of interest to the company who originally open sourced it, but it is good for the project itself, good for the community so they get added. And then the final important point of this kind of case study is that it is the project is used as an integral internal component to some other enterprise software. Alright.

So so yeah. So in terms of what it's not really about, just to clarify, it's not really about the type of projects where it's, like, so big it has a dedicated team or, you know, where the enterprise retains complete control or development. Alright. So moving on to this case study. So the open source project is renovate bot.

It's hosted on GitHub as most open source these days are. It's not actually 19 years old as GitHub thinks somehow there's a year 2000 commit there. But you'll see it has, you know, 7,000 commits, 4,000 releases, 27 pull requests. It's pretty active. Now on the flip side of that, at WhiteSource, we have a an enterprise software called WhiteSource Remediate.

Now Remediate is not like the the paid version of renovate, really. It's not this is not the type where it's withhold things to get, you know, premium features, that type of stuff. It's more that we found that we could open source 90% of what Remediate does as this project called renovate and have a really great community project that still allows us to continue with our kinda enterprise projects or enterprise products, which in this case is called white source remediate. So renovate is a project for automatically updating dependencies, whereas remediate has these capabilities to be able to do, you know, real time monitoring for vulnerabilities and patching vulnerabilities and things like that. Very similar concepts.

A closer look at renovate. So the graph that you see at the top left, I realized they're both orange. This is weekly commits. So you can see that there is somewhere between 20 to 40 weekly commits, which is which is pretty high. I mean, for I mean, there are obviously many projects that have higher, but this is a pretty regularly updated project.

We have 250 contributors, some of which are bots and there might be some duplicates. But essentially, we've got 200 plus people that have contributed code, nearly all of which are not white source employees. And if we look at the middle and at the bottom in the last month, you can see some more stats there. So, you know, for example, 47 closed issues, 53 new issues. So, you know, around a 100 a month active issues, 162 active pull requests.

So there's quite a lot going on here. And this is the open source side. On the closed source side, we'll say, white source remediate. So this is commercial software, and it's part of a wider piece we call white source for developers. It's adapted from the open source renovate, adding the vulnerability awareness and remediation.

And we typically release this in two week sprints, and we use agile methodology. So this is quite different. This is, you know, more traditional enterprise approach to software as compared to, you know, to open source. And one of the other key things is that when it comes to enterprise software, customers also expect a lot of stability. So, anyway, let's talk about that juggling that dual life between a piece of open source, which is fairly well, it's free as in beer and speech, but also, it is developed quite freely versus our enterprise software where we use it as part of where we want that to be kinda stable, predictable, etcetera.

So, as I said, Steve gave a great overview of different types of open source. So here's our open source approach at the high level. We welcome contributions which would benefit the community even if they're not important to white source directly. So we do not say no to an idea or a feature just because we wouldn't use it ourselves. We only say no if we think that that, like, doesn't benefit the project or the wider community.

We have three project maintainers of the open source project, one of which is me, but two of which are unrelated to wide source. We attempt to review submitted pull requests as quickly as we can, you know, same day a lot of the time, and we release continuously. We have continuous delivery. So per feature, per fix, we push out new releases. Something that's important to to note is that so as I said, it's not an it's not like an upsell.

Remediate is a different product to renovate. The renovate is a very important dependency for remediate. It's not like a just a regular dependency that you can, you know, leave non updated for a while. Because we at WhiteSource are in the business of open source dependencies, we can't ship our software with our own dependencies out of date. That would just not look good and will not benefit our customers.

Also, package managers and platforms like GitHub and GitLab, they're rapidly progressing, so the software does need to keep up. Basic in short, we need to update renovate regularly as part of our commercial offering. So in terms of juggling the heavy lifting, well, juggling that dual life, the heavy lifting really is done in open source. In terms of how we can manage that, not burn ourselves out, find the balance. The key things are automated tests with full coverage.

This certainly helps us a lot. We use bots for checking things like pull request titles, contribute a license agreement. It's not a CLA agreement. And and also work in progress bot. We have a bot for automating dependency updates, which is renovate itself.

We automatically release using semantic release. We automatically publish to NPM and Docker Hub, and we also automate our documentation, both, like, testing during pull requests as well as generation. And this is the way that we find that we're able to kind of support the community very well, not burn ourselves out, while also allowing the project to kinda go as quickly as it would like to go. Alright. So first point is continuous integration and testing.

We're on GitHub. We're using GitHub actions for our tests. So you can see here maybe half of that is the GitHub actions. Below that, you'll find the other ones I mentioned. So for example, because we because we release every commit, we need pull requests to be correctly formatted in their title.

So is it a feature? Is it a fix? And is it a breaking change? So the semantic pull request bot automates that. The pull request doesn't pass unless the pull request has been named with one of those semantic prefixes.

We have a work in progress bot which lets people mark a pull request as a work in progress so it doesn't get accidentally merged. We check code coverage and enforce that coverage is always a 100%. And importantly for enterprise open source, we also have a contributor contributor license agreement bot that automatically checks that all pull requests have been have had the contributor agreement agreed to just so that we don't get ourselves into any problems with rights to use. In terms of releasing, once a merge request has been successfully, you know, passing all the tests and and passing out manual reviews, one of my opinions is that if you an open source project, if you need to decide when and what to release, in many cases, that's an unnecessary burden. Right?

So if we're talking about how not to burn out, how to make it manageable, you know, just releasing every time actually works extremely well for us. Because, you know, often you get, like, contributors, for example, contribute a feature, contribute a fix. And, of course, the second you merge it, they ask, well, when will this be released? I'd love to use it. So the answer is, basically, you know, within minutes, there is a release.

And certainly within an hour or so, we get a build that's pushed to NPM and also pushed to to Docker Hub. So we found that this works really well for us. In the early days of the project, one of the first issues maybe not one of the first issues, one of the first outside contributor issues, maybe about two years ago said was called too many releases. And someone just said, oh, you released too much. And again, that was probably someone with the we call the enterprise approach to releasing where they would like something every two weeks or a month instead of, you know, five times a day.

Now the other thing where releasing makes things good for us is it also makes pin pinpointing faults much easier. So if when people are using the product and and they may use it in many ways that we don't need to use it, you know, this being community use, if they say, well, something broke, you know, that's the worst type of bug. It's very nice that we can say to people, if you can if you can work out which was the last working release, or at least tell us when was it working for you and when you upgraded, what's the version. This makes a lot easier to pinpoint faults. Because otherwise, we would have to do, you know, a lot more checking ourselves, whereas now we can kinda lean on the community more and say, okay.

Well, we're really sorry we broke something or someone else broke something. But, like, if you can work out what commit it was, you know, or what release it was, I should say, then normally that narrows it down to, like, one or two commits at most. So basic automated releasing works very well for us in keeping the burden low. Alright. So documentation.

Steve, again, talked a lot about this, about sometimes documentation is used as as the upsell. So in this case, it's it's kind of the opposite. I mean, we obviously, good documentation benefits both open open source and enterprise. It is very easy to under document in open source projects, you know, not as a business model, but just in general. Right?

Because, you know, people write a feature, they submit a fix, and they don't document it. Right? And so you got you know, you may have the choice between, like, well, do we accept it as is? And just say, yeah, we'll document it next, or could you document it soon? The reality is that in if you do it that way, you get behind in your documentation.

And the less documented something is, the harder it is to support open source. So assuming that, you know, documentation is not part of your business model, having great documentation open source really lowers the the effort required. So what what we do is that we ensure that first of all, every configuration option in the open source tool has a description field. So you can't create, like, a new feature without having a description. And then we also have a free form markdown document, which is like extended description, and we have a test that makes sure that every feature also appears in that document as a heading even if they even if the content is empty.

And what this means is that instead of us having to kind of push people and lean on people, there's an automated test that basically says missing documentation. And this has worked very well. We quite rarely have to actually remind people, you know, it's it's quite clear as part of the project that if you write a feature, then you add then you add documentation for it as part of the project. So this is sort of on on the on the open source side. Excuse me.

Alright. I'm gonna I'm gonna assume that you can see no. I'm bringing it back. Thank you for your patience. Alright.

Stacy, can you reshare that? Thank you. Sorry, everybody. Okay. So documentation.

Good good doc yeah. I talk about it. Said, adoption of commercial releases. So we usually upgrade it every sprint. We don't always take the very latest.

I mean, sometimes that's impossible. Like, we've made major changes, we don't want to push that into our commercial product. But you know what? Open source users are often waiting for a feature or fix. I mean, you generally don't add a feature or fix unless someone needs it.

And so what that means is that the person who needs it, who's benefiting from it, does get to test it first, and that helps us a lot. As much as we attempt to, you know, to to test it, people test it in real applications when they need it. We also have a hosted version. So we actually run it sort of as a free service, and we have, like, over a 150,000 repositories there. So we generally deploy there before we deploy into our enterprise release, and this helps a lot.

Ultimately, also with our commercial release, we lean on the open source documentation, and we don't duplicate it. So having great documentation also benefits us there. Support. Alright. So open source renovate is supported on GitHub.

That's one of the key things. This is how we keep our support manageable for the open source. So the open source project is supported on GitHub using issues like most open source projects. We actually we're in GitHub all the time. So questions are usually answered in minutes or hours.

And white source though, we naturally have a commercial support process via emails and portals. So it creates a little bit of friction at times. So one thing is we don't obviously advertise white source support for renovate. It's not an upsell for us. So we don't have any information about it.

But some people do look it up. So some people go to the effort of looking up how to contact white source support. Volume though of this is not very high. Maybe only less than 5% of questions try to go to our commercial support. And we generally if we can't answer them in a sentence, we will politely refer them back to the GitHub repo and ask them to post in.

So this is how we keep the separation between, like, commercial support and open source support. Alright. In terms of, like, local developer relations, like, more I mean, all open source can be developer relations, but how do we deal with developers? Initially, we had a Gitter channel, but I actually closed that down. It it turned out to be problematic because it it did become overwhelmed by what I'll call lazy support questions.

So people would, you know, look at the different support, different ways to talk to people in the project, and they'd see that you're like you're online and just jump in. And instead of troubleshooting things first, they would want to have, like, real time support. So that actually didn't work for us as much as I would love to have have that. What we do now is we have a private Slack channel. And so, basically, if anybody is developing, we have a note that says, don't develop on your own.

Email us to get a link. And we add people there so that when people need help developing, then they can talk to us. I would say that the friction where people have to ask for an invite is unfortunate. Like, I would love if there wasn't friction, but it has cut the support load, the real time support load to almost zero because people respect that that's a development channel, not support. So, this is something I'd love there to be a better solution for, but it works pretty well for now.

Alright. So just wrapping up, you know, the benefits of open sourcing components, and I wanted to be clear that this is, you know, this is sort of about when a company says, hey. We've got this great tool we built, and a lot of people could benefit from us, and it doesn't, you know, threaten our core business anyway to open source this. That could be a great benefit to the community and to you, but you do need to plan for how to handle the success of it. I mean, there's no point open sourcing parts if no one uses it.

But at the same time, if people do, it produces pressure on things like, you know, reviewing pull request, releasing, and support. The biggest thing I can say is that the automation is a key. So when you automate tests, when you automate releases, and then in our case, you automate documentation. I forgot to mention that documentation is actually automatically built as well. So it's built using also GitHub actions, and it's published using GitHub pages.

And this means that, like, again, we don't have to worry about people chasing us for when is the documentation up to date. So doing all these things also means not only that it's good for the community, not only that it keeps our workload low, but also means that then the project is in a very good shape for us to adopt into our own commercial software without major pains. Because, like, if we were releasing sloppily, non tested, non documented, and things like that, and, like, lots of features in one release, that would actually make it hard on ourselves. So this automated testing release and documentation is is really, I'd say, key. I mean, to any open source project, but especially if you're open sourcing a component of your full solution that you need to keep using yourself so that it doesn't, like, get out of control.