Last year, Linux’s source code amounted to 27.8 million lines of code. It has only gotten bigger since then. Like any 30-year-old software project, Linux has taken its fair share of garbage over the years. Now, after months of work, senior Linux kernel developer Ingo Molnar is releasing his first try to clean it up to a fundamental level with his “Fast Kernel Headers” project.
The object? No less than a complete cleanup and overhaul of the Linux kernel header hierarchy and header dependencies. Linux contains many header files, .h ,. To be exact, there are around 10,000 main .h headers in the Linux kernel with the include / and arch / * / include / hierarchies. As Molnar explained, “Over the past 30 years they have become a complicated and painful set of cross addictions that we affectionately call ‘addiction hell.’
To bring rhyme and reason to all of this, Molnar proposes to make 2,200 commit changes to the code. That’s a lot of commits! Why so much? Well, Molnar continued, it turns out there’s a lot more mess in all of this code than he thought when he started his cleanup project at the end of 2020. To be exact:
When I started this project in late 2020, I was expecting there to be maybe 50-100 fixes. I made some rough measurements which suggested that about a 20% improvement in build speed could be achieved by reducing header dependencies, without having a substantial runtime effect on the kernel. Seemed substantial enough to warrant 50-100 commits.
– But as the number of fixes increased, I only saw limited performance increases. By mid-2021 I hit over 500 commits in this tree and had to give up my second attempt (!), The first two approaches just didn’t evolve, weren’t maintainable and offered Barely a 4% build acceleration, not worth the 500 patch churn and not even worth announcing.
– With the third attempt, I introduced the per_task () machinery which provided the flexibility to greatly reduce dependencies, and it was a clean type approach which improved maintainability. But even at 1000 commits, I barely got a 10% build speed improvement. Again, this wasn’t something I felt comfortable pushing upstream or even announcing. : – /
– But the figures were quite clear: performance gains of 20% were quite possible. So I continued to develop this tree, and most of the acceleration started happening after over 1500 commits, in the fall of 2021. I was very surprised when it went over 20% acceleration and is more than the current 78% with my reference configuration. There is a clear superlinear enhancement property of kernel build overhead, once the number of dependencies is kept to a minimum.
So today its cleaned up fast header tree provides a +50-80% improvement in absolute kernel build performance on supported architectures, depending on configuration. This is a big step forward in terms of efficiency and build performance of the Linux kernel. . ”
A 50-80% improvement is well worth the time and effort. These speed savings come from reducing the size of the default headers, which together with the Quick Headers tree will mostly include type definitions, from 1 to 2 orders of magnitude.
But wait, those 2,200 commits are just the tip of the iceberg. These changes will affect almost all Linux kernel programs. Overall, Molnar estimates that “in addition to the aforementioned 25 subtrees and 2,200 commits, the fast header tree modifies over half of all existing kernel source files.” This will change 25,288 files with 178,024 insertions and 74,720 deletions. In other words, “Yeah, so that’s probably the biggest single feature announcement in LKML. [Linux Kernel Mailing List] the story. Not by choice! : – / ”
On top of that, making these changes feasible will require aggressive decoupling of the high-level headers; API type and header decoupling; automated addition of dependencies to .h and .c files; and headers optimization. This will not be easy. So before pulling the trigger and starting to make those changes, Molnar gathers feedback from his fellow maintainers and, in particular, he would have liked to hear “Linus [Torvalds] & André [Morton] and other maintainers of the larger subsystems affected by these changes. ”
Greg Kroah-Hartman, the Linux kernel maintainer for the Linux stable branch, thinks, “This is ‘interesting’, but how are you going to keep the definition kernel / sched / per_task_area_struct_defs.h and struct task_struct_per_task in sync?” In short, who has to ring the cat to maintain all these changes?
Molnar replied that he is ready to tackle this job and that he doesn’t think it will be so much of a problem. Kroah-Hatman then gave his blessing to Molnar’s efforts and remarked, “I’ll leave all of that to the developers of the Planner, but it still feels strange to me. The mess we create by trying to get around problems in C :)”
He’s not wrong. This is one of the reasons why efforts are underway to make Rust Linux the second language.
If passed, users won’t see any real change. But developers of Linux kernels and distributions will be able to compile Linux faster than ever. The result will be to make it easier and faster than ever to improve, fix, and add functionality to Linux.