Making a codebase more welcoming

May 28, 2021 at 11:09pm
Hi,

I play a very small part with occasional and small contributions to a software that exists for more than 20 years.

I only started coding stuff in it relatively recently when I went to make a game in it and noticed some features I wanted were missing. I notice though that project attracts very few "passerby" contributors - at least when I compare with other projects that have relatively the same size. I also don't know how desirable this is since there's also work in explaining and guiding people to make their codes comply with the project, but I don't know much about this.

If I posted the GitHub repository here, would you people give me ideas on how to make it more welcoming to code contributions? Like how to ease the understanding of it's relatively large codebase that has pieces that are legacy.
Last edited on May 28, 2021 at 11:31pm
May 29, 2021 at 12:40am
You "posted the repository here" where?
May 29, 2021 at 1:21am
I didn't, I was asking hypothetically because I didn't knew if it was allowed. Lots of forums frown from people posting links and I am not familiar with the community here yet. :)
May 29, 2021 at 2:02am
if its spamish, we will remove it for you.
post away.
that said, the way to get people to work on a project is awareness.
they can be aware of it because they use the software, and wish it could be better.

understanding screwy code requires comments and documentation. Without it, you need constant access to the authors who can tell you what it does and why and how. At some point, once the authors are gone, if it isnt well coded, documented, and you lack an expert or two who knows how it works, it dies.
May 29, 2021 at 2:34am
Sorry, I guess I misread your post.

The question you should first be asking is whether the project gets few contributions because it's unwelcoming, or for some other reason. E.g. it's unpopular, people are happy with it the way it is, the problem the program tries to solve is inherently difficult or understood by few, etc.

It would help a lot to understand the situation better if you gave more details. What sort of project is it? Is it a program or a library?
Looking back, the only times I've contributed to other people's projects it was because I was working on something that used their code in some way and I discovered a bug or something that could be improved with a small and obvious (to me) change. For example:
https://github.com/grimfang4/sdl-gpu/pull/16
https://github.com/pret/pokered/pull/162
https://github.com/icsharpcode/SharpZipLib/pull/465
https://github.com/lexbor/lexbor/pull/92
If it turns out that I need to expend much more energy than that to accomplish my task, it's more likely that either I'll look somewhere else or that I won't bother trying to integrate my solution into the external project and I'll keep in my own code. Large patches are more likely to be rejected anyway, for various reasons.
May 29, 2021 at 3:38am
understanding screwy code requires comments and documentation


I really think about what would be good docs on a codebase. Like diagrams explaining things with blocks and what the concepts mean? I think I could work with that if it's the case. I am not very experienced regarding documenting a codebase - I have documented applications for users, but it's a different case.

It would help a lot to understand the situation better if you gave more details. What sort of project is it? Is it a program or a library?
Looking back, the only times I've contributed to other people's projects it was because I was working on something that used their code in some way and I discovered a bug or something that could be improved with a small and obvious change.


It's a program, people "consume" it in binary form, meaning they run the software and use it, but don't have to deal with it's insides. Looking at your PRs and the above I think it makes sense that libraries get more contributions since the users are people that have a closer "relationship" with the code.

Uhm,. a lot of food for thought. I will sleep on these.
May 29, 2021 at 4:25am
Looking at your PRs and the above I think it makes sense that libraries get more contributions since the users are people that have a closer "relationship" with the code.
Yes. As a user, I would have to be really invested in a program or I would have to already have a clear mental picture of its internals to decide to make some change to it.

There's no single way to document an application, but in broad terms you should draw an overall diagram of the major modules and how data flows between them. Something that would allow an outsider to get their bearings on how to find the code responsible for implementing some functionality. It doesn't have to be super-detailed, and if necessary you can add a little more detail with text beneath. Primarily it should be clear, comprehensible, and accurate. If to get there you need to omit information then that's preferable.
Then you need the reverse trip. If I'm looking at a piece of code, I need to know what its responsibility is. Something like Doxygen can help here, but one way or another it's likely going to involve a lot of writing. Some projects eschew explicit documentation in favor of unit tests for individual components. To some extent that can help to understand the implicit contracts for said components, although IMO it's insufficient.
May 29, 2021 at 9:40am
but in broad terms you should draw an overall diagram of the major modules and how data flows between them.


Thanks! Following the data around is something concrete I can do and document. We have a little small wiki we use for storing this stuff, I will try to write about it.

Something that would allow an outsider to get their bearings on how to find the code responsible for implementing some functionality


Yeah, I think this helps for someone that wants to modify the program who is primarily an user. I think I can at least group the concepts, and then point generally to where they belong in blocks and then point to where these blocks are implemented as code.

Some projects eschew explicit documentation in favor of unit tests for individual components.


We have tests for some things and not for others, overall a really small part of the codebase has currently active tests. This is also something that needs being worked on.

Thanks for the information, I will think about it. Plan to start on writing things soon! As I mentioned we have a small wiki that needs information like this.

Because it was asked before, the project is here: https://github.com/adventuregamestudio/ags
May 29, 2021 at 11:30am
Oh, AGS! I learned about it from Yahtzee's Let's Drown Out series, where he talked about and showcased some games he made on it, like Rob Blanc I-III and Adventures in the Land of Fantabulous Wonderment.

One more thing I forgot to mention was documentation of file formats. This is primarily important for interoperability. If someone wanted to write a program* that consumes those files or manipulates them in some way, they shouldn't need to read the source to understand the format. Communications protocols (although I assume AGS doesn't have those), as well as any other channel through which the application exchanges data with the outside world, are for the purposes of documentation requirements analogous to file formats.
IMO the ideal situation for file formats is if the project is nicely modularized and it has an internal file parsing library that reads files and returns a computer-friendly data structure, which other programs can link to. In general modularization is great for comprehensibility, because it puts bounds on how much code someone needs to load into their brain before they can modify functionality.


* Note that just because it's a program separate from the main executable doesn't mean it has to be a separate a project. It could be a tool that's included in the same repository, such as some kind of IDE or debugger, or an art editor.
May 29, 2021 at 11:50am
I really think about what would be good docs on a codebase. Like diagrams explaining things with blocks and what the concepts mean?


think high level to start. this is what the program does. this is how it is designed, there are these top level objects, foo and bar. Foo handles the graphics and bar handles the logic. Foo uses the 2slow engine from studio... work it down from the top to cover the key concepts and areas.
then you refer to the detailed docs that get into the ugly stuff, but they need a map just to find what area to start working in. That is really a good way to think of it... a map... I want to fix the mouse interface so a controller works too.. where is that? I want to improve the graphics FPS, where is that? I want to add a new gun to the items, where would that be? etc.

you can generate a lot of crap automatically with some tools as to who calls what where and how (inherited, direct call, aggregate, whatever). Its still nonsense to the guy that just wandered by though, if you do not have a STARTING PLACE. So the very first thing to build is a starting place for the major blocks of code.

not much you can do in hindsight but I am a huge fan of stand alone code base. That is, if you pull out a .cpp file and a .h file, they should be able to compile and run in a new program with minimal 'oh, now I need x.cpp and .h, and x.cpp needs y.cpp and y.h, which needs...' there is always a little of that, it cannot be avoided, but the key is to minimize it so that a small # of files can be grouped together and tested/modified/developed without having to compile the whole big thing every time you change a couple of lines.
Last edited on May 29, 2021 at 11:54am
May 29, 2021 at 12:40pm
We gave up on separating into .h and .cpp compilation units a while ago. With the speed advances in compilers and increased use of templated code, we've gone down the standard library way with all our code in .hpp files that are then included in the main compilation unit. Any .hpp file is self compiling as it has the required #includes. Every standalone function is declared inline so if used in multiple compilation units, there's no link issues. Classes are all defined as part of the class definition - no separation of class definition and function bodies.

Works well for us.
May 29, 2021 at 3:06pm
How do you deal with classes or functions in separate files that depend on each other, though? Also, what's the line count of the preprocessed file, excluding external headers?
I don't know, I don't think that's such a great trade-off in the general case. If the code is extremely templatey, fine, but otherwise you're giving up all possible build parallelization. I'm reminded of an anecdote of an acquaintance of mine who said their project took over ten minutes to link a release (I'd guess LTCG took the bulk of that). With a single translation unit compiling would take just as long no matter how small the change.
May 30, 2021 at 2:06pm
Hey, I started writing the text, it's been a constant game of write things, cut things, write things, cut things... But it's going. My working sketches are a short text and graph like an architectural description, a quick write up of what's where in the codebase, and a glossary of concepts/words that have specific meaning in the code/app in particular.

I was browsing other repositories looking for something similar to a bird eye view of the codebase and concepts but I couldn't find - other than Godot, it was the only one I found at least a short explanation of things. Are there other repos that do this or other codebases that do this well? I was interested in looking around for inspiration. Thanks for the good feedback in information!
Last edited on May 30, 2021 at 2:07pm
May 30, 2021 at 5:39pm
May 30, 2021 at 6:33pm
Topic archived. No new replies allowed.