Trouble in the kernel

TLDR: the future of bcachefs in the kernel is uncertain, and lots of things aren't looking good.

Linus has said he isn't accepting my 6.13 pull request, per "an open issue with the CoC board", and at this point I have no idea what's going on with the CoC board. I, for my part, have felt for quite some time that there are issues about our culture and the way we do work that need to be raised, and that hasn't been going anywhere - hence this post.

What follows will be an account of some (not atypical) LKML drama, along with some analysis of where things went wrong - cultural issues, poor processes.

Note: I don't want anyone to be getting hate mail because of this; particularly the devs involved. There are some things that should change, particularly, in my opinion, with the CoC process, but let's try to keep it productive.

Most of the people in the community are wonderful: I've been a Linux kernel engineer for over 15 years, and I've met many wonderful people: most of the people I come into contact with are great to work with. But there are subsystems where I have done fundamental work and am no longer able to get work done in because of just these sorts of issues, and with it now happening again it's time to speak out.

And I believe these sorts of thing do have an impact on the quality of the kernel as a whole, as well.

I've seen it get harder and harder to get bugs addressed in other subsystems over the years, and while bcachefs itself has been rapidly stabilizing I and my users are running into more and more heisenbugs in core subsystems that should not have them (block, mm), and are not being addressed.

And I think the issues I've been running into are indicative of process issues that are tied into all of that.

Memory allocation profiling:

There's a feature the kernel now has called memory allocation profiling: if you enable it a new file shows up in /proc/ - /proc/allocinfo - which lists by callsite the total amount of memory allocated. And it's low overhead: cheaper than memcg, cheap enough for distribution kernels to enable by default. Which is great: I really love debugging features that are cheap enough to leave on all the time, so that users can poke around and discover things (some of which will be of interest to the developers!), and so they're available for debugging things that only show up in the wild, in production.

So: cool stuff. Memory allocation profiling is a feature we haven't had before in the kernel, and even in userspace there aren't good options (the best option I know of in userspace is to run your program against a special (and much slower) version of tcmalloc, so not terribly practical). People have been using it to discover and fix real issues.

This was conceived of my myself many years ago, based on a trick that the "dynamic printk" kernel facility uses (which lets you turn on and off individual pr_debug() statements at runtime, via some nifty linker tricks), and co developed with Suren Baghdasaryan at Google (who at this point has certainly done more work on it than I have, and taken over maintaining it. Props to Suren!).

The main trick to how memory allocation profiling and dynamic debug works is interesting - how they add per-callsite introspection:

- First, replace your function calls with a macro wrapper. The macro wrapper will declare a 'static struct' - i.e. a singleton object - and pass it whereever it's needed. This gets us the per-callsite state.

- Next, we need to be able to find all these state or control objects at runtime. This is where the linker magic comes in, because native C does not have constructors (gcc has them as an extension, but it's not something you should rely on). If we had constructors, the constructor could add them to a global list, but we need another option.

The trick here is to put all these state objects in a special elf section, and then have the linker specify start and stop symbols for that section - then it's just an arary of objects we can walk!

This is even better in terms of per-object overhead, than the constructor approach - but it is some deep voodoo. Fun stuff.

Getting it merged was not smooth sailing, however. Besides the usual "I haven't had my coffee yet, can't we just do this with tracing?" (oof, no), there was one mm maintainer in particular who very nearly derailed the entire project.

This mm maintainer was very, very concerned about "maintanence overhead", and wanted the entire project redone a certain way. You see, memory allocating profiling requires adding macro wrappers to memory allocation calls, and renaming memory allocation functions to `_noprof()`, for the non-hooked entry points, and he had some concerns about adding all these macro wrappers. Understandable enough, so far.

But things went off the rails when he kept axe grinding about "maintenance overhead", without really adding anything else to the discussion or digging in to understand why this design was necessary, and he was very, very concerned and very insistent that we redo the entire project a different way.

It came to a bit of a head with him and another high profile maintainer crashing our LSF presentation - a Q&A format is typical, but taking up the majority of a slot's time when you're not copresenting is atypical - with ideas they'd come up with that morning (!).

https://www.youtube.com/watch?v=vwtjvJ8iuYo&list=PLbzoR-pLrL6rlmdpJ3-oMgU_zxc1wAhjS&index=69

(I wasn't in the room that year because at the moment of that talk I was sick in bed with COVID).

So, let me outline the alternative ideas. There were two, and the goal of both was to get rid of the function renaming and the macro hooks. One was:

- Trampolines, similar to how function tracing injects its hooks

- Nonexistent compiler magic, i.e. asking the compiler people to add a feature to transparently inject our hooks at the source code level.

But neither would have worked, for one fundamental reason: allocation functions that wrap other allocation functions are very common, and we have to be able to annotate at the source code level at which level the profiling happens. It can't simply be the outermost one, because mm code internally does its own allocations that should not be charged to the outer callsite.

IOW, the source code changes they were complaining about were fundamentally necessary. Besides that, tracing would have added function call overhead that we really wanted to avoid (cheap enough for distro kernels to flip on!), and I very much doubt the compiler people would have been able to make any sense of the feature they wanted.

But these two maintainers (and the mm maintainer in particular) were so insistent that nevertheless, they derailed the project for nearly a year while the rest of the team explored other options (mainly the compiler feature). These two people never contributed code, and if they had really tried to contribute to the project and think things through I think they would have come to the same conclusion.

Alas.

It took nearly a year before I was finally able to put my foot down and say "these other options aren't going to work, what we have does work, we need to go with what we've got and I'll send the pull request directly to Linus if the mm people are going to be roadblocks".

By the way, here's the "maintenance overhead" he was concerned about:

void __must_check krealloc_noprof(const void objp, size_t new_size, gfp_t flags) __realloc_size(2);

#define krealloc(...) alloc_hooks(krealloc_noprof(__VA_ARGS__))

And then switching calls from wrappers to the `_noprof` version where appropriate.

PF_MEMALLOC_NORECLAIM

Now we come to the next chapter in the story.

First, a tour of memory allocation APIs in the kernel. This isn't userspace, where you can just call malloc(); in the kernel, things are never that simple :)

Historically the major divide has been "do I use the slab allocator, or the page allocator - or vmalloc?". The slab allocator handles sub page allocations, the page allocator handles page or greater allocations, and vmalloc() is for allocations that are so big that we likely won't be able to find physically contiguous memory - and thus we need to allocate fragmented memory and construct a virtual mapping.

That's gradually been getting consolidated, for quite some time the slab allocator has been able to transparently use the page allocator for large allocations, and now we also have kvmalloc() that automatically falls back to vmalloc() when required.

Additionally, there's the gfp flags argument. When you allocate memory you don't just specify how much you want, you also have to specify information about what context you're in.

GFP_KERNEL: I'm in a normal safe, sleepable context, the allocator can do anything it needs
GFP_NOFS, GFP_NOIO: I'm in either filesystem code or the block layer, so memory reclaim is a bit restricted on what it can do in order to avoid recursion.
GFP_NOWAIT (or its more common cousin, GFP_ATOMIC): I'm not allowed to sleep at all here (perhaps in interrupt context), fail immediately if there's no memory available.

And, the one that will become an issue later:

GFP_NOFAIL: I'm in a context where I can't handle errors (perhaps deep in the guts of filesystem journalling code where I can't unwind), try forever if you have to.

For many years there's been an understanding that having to know your context and specify it at each callsite is not strictly ideal: what defines a context is often something like "taking a lock that filesystem reclaim also uses", so we'd like to be able to specify the memory allocation context in those places.

Thus there are some parallel APIs, scoped versions of the GFP flags:

memalloc_nofs_save(), memalloc_nofs_restore()
memalloc_noio_save(), memalloc_noio_restore()

But memalloc_noreclaim_save(), for "I'm not allowed to sleep", was missing.

Converting to the scoped APIs is important for a couple reasons, besides the unwieldiness the gfp flags approach.

Not all interfaces in the kernel that allocate memory expose gfp flags. A big one is page table allocation, and per Linus (many years ago) that one will never be fixed because pte allocation is arch specific - it's not worth the code churn when we have a better way to do it.
The big one is Rust: we really need to be exposing a malloc() that can be safely used anywhere (i.e. `kmalloc(..., GFP_KERNEL)`, with proper scoped annotations) because the alterative means forcing the Rust folks to do a lot of cut and paste changes of, for example, all the standard library container code just to support the kernel.

And considering the Rust folks already changed their entire core library to support fallible allocations, I wouldn't be happy if I was in there shoes having to make this change; this is what's going on now.

This also completely kills any hopes for easy interoperability of generic crates between userspace and the kernel.

Interoperability of library code is something that we'd really like in C land, but so far it hasn't happened. The kernel has a lot of nice library code (e.g. rhashtable, completely lockless dynamically resizable hash table) that could be used in userspace unchanged with only a bit of API standarization.

Moving in this direction would get a lot of this library code more users and more developers finding and fixing bugs, and enable more cross polination. But right now, kernel land is largely its own world, separate from userspace code: it is possible to use a lot of this code in userspace (I do, in bcachefs), but only in a somewhat hacky, non supported way.

With Rust, we'd had hopes of that starting to change. Rust's better module system and better safety means that there's a lot of nice crates that we could use in the kernel unchanged, and perhaps our kernel code could start to see wider usage as too.

Better portability is wonderful, and besides the "let's just make code better" aspirations, there's a real need for this since increasingly people do need to port kernel code to userspace - sometimes just for test harnesses, sometimes for more interesting things.

But back to the `PF_MEMALLOC_*` apis: There's one little roadblock, which is that `memalloc_noreclaim_save()` - the equivalent to `GFP_NORECLAIM` - is missing. Queue the next fracas.

The best of plans gone awry:

So, I ran into a (relatively uninteresting) bug in bcachefs - deadlock due to memory reclaim recursion, where we need to be passing the correct gfp flags to `alloc_inode()`, a VFS level interface, which didn't support GFP flags. So, there were two options, both of which required changing core code:

Add a gfp flags argument to alloc_inode()
Add the missing PF_MEMALLOC_NORECLAIM

Since there'd been a long standing understanding that scoped annotations were the way of the future, I went with option #2. Patch went out to the mm mailing list, got some feedback (applied) and reviewed-bys, everything seemed to be going along fine.

Until the last minute, when the very same mm maintainer replied with a nack and a vague comment about unsafety.

And, since the nack was from a someone who'd long been consistently axe grinding before, and the reasoning seemed to be specious and at odds with our previously understood goals, and it was a bug fix and I couldn't wait around all day - I sent the fix anyways, including `PF_MEMALLOC_NORECLAIM`.

Some time later, I see a revert and additional patches in my mailbox (that didn't compile...), along with a still difficult to understand reasoning involving GFP_NOFAIL.

More technical backstory:

GFP_NOFAIL was introduced for filesystems that couldn't deal with memory allocation failures in certain contexts, i.e. deep in the journal transaction layer. It's always been a "please really avoid using this" thing, the standard expectation within the kernel is that you deal with errors, including memory allocation failures, gracefully, and if necessary you pre-allocate memory - that's what the block layer does. Even within a filesystem it's not strictly necessary, bcachefs doesn't use GFP_NOFAIL except in two small places that I'm going to get rid of (one day, I swear, when I get around to it).

The places where GFP_NOFAIL is used tend to be places deep in the guts of IO paths where unwinding is impractical, but preallocating is a much better solution, because GFP_NOFAIL may wait on the allocator for an unbounded amount of time. If the system is thrashing, that's a great way to get multiplying latencies and really bad performance cliffs. So it's not something we really want to have; it was added because originally certain filesystems had open coded `"while (!(p = kmalloc(...)));"` infinite loops, which was even worse.

But some people in filesystem land Understandably don't want to see it go away, because the rearchitecting that would be necesssary is just not practical.

PF_MEMALLOC_NORECLAIM meets GFP_NOFAIL

But if GFP_NOFAIL means "we're not allowed to return errors", and PF_MEMALLOC_NORECLAIM means we're not allowed to sleep, and thus must sometimes return an error, what do we do?

It's surprisingly deep question, and this is the unsafety the mm maintainer was concerned about: potentially returning errors to codepaths that don't handle them.

If preserving GFP_NOFAIL semantics in paramount, then not allowing PF_MEMALLOC_NORECLAIM seems, at first glance, to be the correct approach: perhaps we just shouldn't create an API that allows us to be put in an impossible situation.

But that argument turns out not to work, for a couple reasons:
PF_MEMALLOC_NORECLAIM is an inherent property of a given codepath, not a property created by an API: there are many places in the kernel where we literally aren't allowed to sleep. The question isn't whether we're allowing unsafety, it's about whether we have a way of telling the allocator about said potential unsafety.

If all non-sleepable codepaths were annotated with PF_MEMALLOC_NORECLAIM, then the allocator could at least attempt to do something sensible (e.g. emit a warning and a backtrace) when `GFP_NOFAIL` is used from an invalid context. Today, that would be a "scheduling while atomic" bug, something that will only show up with certain debugging options turned on.

There are other reasons GFP_NOFAIL allocations actually can fail - primarily if the requested size is too large.

This means that unless you are 100% sure that your GFP_NOFAIL use is safe: constant or at least bounded allocation size, and safe context, it must have an error path anyways.

And it turns out, around half of the GFP_NOFAIL allocations in the kernel actually do have error paths.

The LKML discussion, shall we say, did not go well.

Besides dramatic accusations of breakage - theoretical in this case, from a developer with a history of introducing silent data corruption bugs into code I've written, the rest of the discussion can be summarized as follows:

The filesystem people started out very much against PF_MEMALLOC_NORECLAIM` but after further discussion about what the actual impact would be, and some looking into actual GFP_NOFAIL usage and error paths, they seemed to be coming around.
But the mm maintainer pushing the revert wasn't having any of it, saying essentially that the matter had already been decided (in backroom discussions, apparently) and wasn't engaging in technical discussion otherwise; when asked for the rationale from those discussions, he provided a spreadsheet with what had been decided but not why.

Things really went off the rails (and I lost my cool, and earned the ire of the CoC committee) when the mm maintener opined that he wanted to just kill processes that used GFP_NOFAIL incorrectly, rather than have them return to a missing error path.

This would be really bad, because as previously noted, many GFP_NOFAIL uses (the ones that matter, presumably) do have error paths, and a bad GFP_NOFAIL allocation where the size of the allocation is something userspace can control is a real possibility.

Given that GFP_NOFAIL is used in places where we can't unwind, and the reason we can't unwind is almost always because we're holding locks, if you just kill the process those locks will never be released and the system grinds to a halt.

And that's how you introduce a CVE.

And given how "you must handle errors" is beaten into every new kernel developer, this is pretty basic; it's really quite concerning to have a senior maintainer pushing this point of view.

One final note.

Please, please please, if you're someone outside the community don't go looking up the mm maintainer in question on my account; I don't want anyone getting hate mail, and keep in mind you've only heard my side of the story, not his.

And I don't hold that any of this was malicious or meant personally. Many people, myself included, have gotten locked into technical arguments and so focused on winning and advancing our point of view that we forget to take a step back and look at the wider picture.

It happens! We're all only human, after all.

It's not like there's any guide book on "how to be a good senior engineer" out there - perhaps someone should write one? There's a real learning curve to learning how to give guidance and lead teams without being overly demanding or pushy; recognizing that you have experience to share and direction to give, while remembering that you don't know everything and you need to give other engineers room to explore their own ideas.

The aftermath:

I got an emails from multiple people, including from Linus, to the effect of "trust me, you don't want to be known as an asshole - you should probably send him an apology". You'll find what I sent, and the response, at the bottom, now that the rest of this is public.

Linus is a genuinely good guy: I know a lot of people reading this will have also seen our pull request arguments, so I specifically wanted to say that here: I think he and I do get under each other's skin, but those arguments are the kind of arguments you get between people who care deeply about their work and simply have different perspectives on the situation. Those arguments are good arguments to have, because there's always common ground and a way to move forward, as well as things to learn. Even if they have gotten a touch dramatic.

The emails I got (there were several) also all made ominous mentions of the CoC committee - you'd almost think they were talking about the boogyman. The CoC's approach is that if something comes to their attention - determined by anonymous complaints and private proceedings - they'll demand that someone make a public apology, "or else".

But, my response was to say "no" to a public apology, for a variety of reasons: because this was the result of an ongoing situation that had now impacted two different teams and projects, and I think that issue needs attention - and I think there's broader issues at stake here, regarding the CoC board.

But mostly, because that kind of thing feels like it ought to be kept personal.

Interactions with the CoC:

To start with, I was approached at Plumbers by one notable CoC member and stable kernel maintainer, and in that conversation, while pressuring me to follow the CoC's "process", he:

Spoke quite a bit about how this was important for our community's "image"
Made repeated mentions about how it would be a "shame if I wasn't around anymore"

It's hard to read the talk of "image" as anything other than perhaps "corporate friendly", and I bring that up because that is another sentiment I've heard - at Plumbers, from another high level kernel maintainer and elsewhere, that Linux is "for the big tech companies now" - in those exact words.

I can't agree. Linux got started without the help of the big tech companies - I was getting my education as an engineer from the mailing lists 25 years ago - and tech companies come and go, it will outlive them. They're just visiting.

Our duties are to the community, to our users, to fostering and preserving a working engineering culture that the world looks to and relies on.

And, needless to say, threatening someone's career to get them to comply is not a great approach.

And it didn't help matters any that before our "talk", in casual conversation with others right outside the conference, the very same CoC member managed to call every single filesystem community member who came up by name an asshole. Needless to say, such conduct is not the norm at conferences, and is no more acceptable there than on the lists. Rules for me and not for thee?

And more than that: I was there for every situation he was refererring to, it was uncalled for, and those people are heroes for the work they've done.

Matthew Wilcox is an absolute legend for the work he's done on folios. He's gone above and beyond, and done a far better job of it than anyone else (includingg myself) could have done, and I saw what he had to push through to get it in and it was a shitshow. Pre folios, 4k page overhead was absolutely killing us: the buffered IO paths were not remotely able to keep up with modern NVME devices, and that work was absolutely necessary for Linux's continued relevance in filesystem land.

Ted T'so was instrumental in making sure Linux had a filesystem that users could depend on. Whatever I may say elsewhere about the ext3/4 codebase, ext3 was a triumph of pragmatism at the time and Ted has always kept the focus on reliability and robustness: e2fsck is as rock solid as it is largely thanks to him. He's done a ton of other behind the scenes maintaining of random subsystems that isn't as well known as it should be.

Wedson Almeida Filho is one of the sharpest and nicest engineers I've met, and he bent over backwards trying to get the rust filesystem interfaces in, I do not blame him for saying "enough" after getting yanked around for a year on that. That was a loss we should all feel, and it really should be cause to reflect and ask ourselves "how can we better shephard important work in, and not drive away the people doing brilliant work?".

This may come as a surprise to people who just see the spicy drama on the list, but the filesystem community is fairly close knit; those are all people I work with and respect. Dave and I yell at each other all the time when we're having technical discussions, but you wouldn't know that to see us in person. Filesystem work is a stressful job.

We need leaders that can lead by example, and what I saw at Plumbers was not that.

Later interactions over email became even more absurd, with Shuah at first talking about having a conversation, then later making it clear that conversation would only be about getting me to write my public apology, with zero room for discussing anything else - then proceeding to turn into more and more of a broken record (did I break his brain?)

Shuah: "Ok, but you'll have to take this up with the community"

Me: "Yes, I think I will have to"

Shuah: "Ok, but I don't think you'll get support"

Me: "Ok, I'm writing it up"

Linus: "Uhh, you have an open CoC issue..."

Me, to CoC: "Ok, here's what I'm getting ready to post"

Shuah: "...Could you please just say publically that you worked this out privately?"

Me: "...I suppose that would be acceptable"

Shuah: "Ok, but I still want the public apology"

Me: "Err, are you going to stick to what you said last night?"

To compound this, Shuah is the person who both authored and is implementing our possibilities, and defends them on the list, but when implementing them claims to be "only following the process" with no room for deviation, and conversation on any other subject seems to be out of bounds.

To author and implement policy and then claim to be ham-strung by it is quite the act of double speak.

How things could be different:

Being too heavy handed is bad because it discourages people from engaging, and it encourages a culture of dismissiveness - because that's safer than engaging in a discussion that may turn into a heated argument.

This has had real effects; I've had it repeatedly brought to me by corporate entities now funding bcachefs that it's gotten significantly harder to get work done on the lists, and that the only way seems to be to show up at conferences - which is expensive and time consuming and not available to everyone. I don't want to see Linux turn into a club for the chosen few, I want us to stay a place where anyone can participate and get work done.

More than that, I've found that when intelligent engineers are stuck at loggerheads, frustrated and butting heads, there's usually something technically interesting and it's worth getting to the bottom of, and it really helps if a neutral third party can take an interest and figure out what's going on. When people get frustrated, they get tunnel vision, and they often forget the wider picture: there's usually plenty of common ground and the details might not be as important as they were thought to be.

Simply telling people to knock it off is really the wrong approach when this happens: it's our literal job, as engineers, to solve the hard problems, not avoid them.

This isn't theoretical to me, I've had real involvement with these issues in the past.

Back during the process of merging folios, the first pull request was nacked by an mm developer, and things got quite heated. More than that, the issue was dividing the filesystem and mm communities: I would go to a meeting of filesystem people, and everyone was looking glum, pissed off at the mm people, saying "I don't know what the mm people are on about, we're fully behind Matthew's work and what he's saying. And then I'd go to a meeting with the mm people, and it was the exact same thing: "We're pissed off at the filesystem people, this is crazy, Johaness represents us".

As I knew both of the engineers who were the public face of the dispute, I jumped in, and it wasn't easy considering neither of them were even speaking to each other at that point. It took two full weeks, on calls with one and then the other, talking and listening and exploring their ideas to get it resolved. A rather stressful two weeks; I was not so used to doing something so highly visible then.

It turned out they'd both independently come up with nearly identical long term visions for memory allocator internals in the kernel - specifically, what to do with struct page - and the immediate issue was entirely inconsequental in the long run. That long term vision was memdescs, which is still the long term plan that we're slowly working towards: drastically reducing struct page overhead by reducing the page descriptor down to 16 bytes and separately allocating the structs that describe entire groups of pages (folios in some cases, but really working towards proper types and a type heirarchy for different use cases). Once we had a design doc where they could see both their ideas represented, the rest was easy.

So, this dispute with the CoC takes on a personal element for me, as someone who is community minded and takes pride in my work and hates to the same work done badly. I'd like a better process that isn't so heavy handed for dealing with situations where tensions rise and communications break down.

As for that process: just talk to people.

It's amazing what you can learn and what you can accomplish by just talking with people, and listening, with an open mind. People's frustrations are worth listening to and worth addressing, and sometimes there's complex situations that need to be addressed.

Maintainers can indeed be too demanding; it's difficult coordinating people with their own interests, who often come to the project just wanting to land their feature, when the long term needs of the codebase need to be considered and resources devoted. Better communication can help: "I can take this, but to balance this out we need also consider x: could you take on some more of this too?". If you tell people the goals and give them a relatively free hand, they'll often be glad to help out.

Sometimes newer people get frustrated with the sheer amount of process we have and it can help to have someone disinterested gently tell them: "You're on the right track, this is nothing atypical, stick with it and show you can take direction and your patches will get in".

Usually it's nothing so dramatic as the situation with folios, and just having someone show up with a helpful attitude, instead of just as an enforcer, can really bring out the best in people.

The CoC response, and others:

It seems I provoked a response, just not the one I was hoping for. Previously, it was commonly understood that the CoC's response would be to eject you from Linux Foundation conferences. That's changed. After my email discussion with the CoC, some new patches showed up on the mailing list and went in, outlining the new response.

https://lore.kernel.org/lkml/20241114205649.44179-1-skhan@linuxfoundation.org/

Now, it's a full ban from participating in any way. It seems some people must have saw what recently happened in the Python community and read it not as a cautionary tale, but instead said "Hey, why don't we do that too!".

One of the most substantive comments in the email thread for the new policy was from someone in the freedesktop community who felt the new approach was quite heavy handed, as well:

https://lore.kernel.org/linux-doc/ZzJkAJEjKidV8Fiz@phenom.ffwll.local/

To summarize what recently happened in the Python community:

Someone uploaded a package named "slut"
Their CoC-equivalent immediately removed it, bypassing all normal process
And then a long standing and well liked developer of many years was banned, fully (even Guido van Rossum wasn't allowed to speak with him!) for three months, for having the temerity to argue with the wrong people.

Regarding the Python situation, my personal opinion is that CoCs for interactions between developers are one thing, but when providing a common platform - i.e., the python package repository - censorship becomes a more questionable thing, and bypassing process to remove it was a clear overreach; and I find the ban of the developer in question appalling.

But the broader points here are about transparency and accountability in our power structures, and how those power structures affect our community.

There's been zero transparency or public announcement from the CoC on this matter - simply a private note from Linus that per the CoC my pull request wouldn't be going in. (He claimed to not know anything about the matter in question, it seems this is all on them). Is a full ban from the mailing list next?

And: we're a community. We're not interchangeable cogs to be kicked out and replaced when someone is "causing a problem", we should be watching out for each other. I don't care for big tech - I got out of that world many years ago, and now I write code because I enjoy it, and because I have a wonderful community. I don't enjoy having that mentality show up here.

Personal note, re: bcachefs:

The bcachefs community really has been great. I get thank yous with my bug reports! The people I work with every day on IRC who have been dedicated to testing my code and giving me good feedack - I couldn't do this without them, and seeing more people show up with actual patches is amazing.

The one thing that (mildly :) concerns me is that, particularly from the people actually working on my code, is that there haven't been any complaints - not one! And I know my code isn't perfect, I like to hear the complaints, when they're well founded. People always have frustrations, and I'd like to know what they are so I can try to do something about them.

So:

The first person who has a good rant about my code (and it has to be a good one - a good rant has real information content and perhaps something to teach) will get a plate of cookies, hand baked by me and mailed to wherever you are. A good rant is priceless! (Just don't make a habit out of it, it burns people out when you do it too much :)

Also, I'm going to keep writing code no matter what. Things may turn into more of a hassle to actually get the code, but people who want to keep running bcachefs will always be able to (that's the beauty of open source, we can always fork), and I will keep supporting my users.

Some broader perspectives:

Anthropologists study, among other things, the genesis and evolution of power structures in human societies, and I think that's relevant here.

Almost universally, any time there's a powe vacuum (whether in the first larger-than-tribes societies, or in societies where state power has become weak), the first authority figures that fill the power vacuum are dicks - overly violent, because that's how they come to power. Even from the start they do generally serve useful functions (one example anthropologists use is the Mafia, in New York in the 70s; besides their more well known occupations, they were also making sure the garbage was hauled away). It takes time to "domesticate" authority figures, to teach them to be accountable and responsible, but it generally happens.

I wonder if that explains some of what we've been seeing in free software communities as a whole, as CoC boards have been springing up, wielding real power, in ways that feel quite uncomfortable. I don't think there's any going back to the way things used to be, but perhaps with some awareness we can get this pointed in a better direction.

Another note that I was raising with the CoC is that a culture of dismissiveness, of finding ways to avoid the technical discussions we're supposed to be having, really is toxic, and moreso than mere flamewars.

Couples therapists say they can tell within a few minutes if a couple is worth working with or not: if it's anger they're displaying, then that's something that can be worked through. If it's dismissiveness, all hope is lost.

It's a good thought for engineers to have as well, we really do need to be engaging properly with each other in order to do our work well.

Vannevar Bush (one of the most accomplished engineers of the 20th century) said that all he did was get the army and the navy to talk to each other.

Food for thought.