[Home | Contact | What's New? | Products | Services | Tips | Mike |
Living with Schizoaffective Disorder

HOWTO Write Code That Doesn't Suck

Using a high-level language? Does your code suck?
Learn what's going on under the hood and you will code like a superhero.

Michael David Crawford

Michael David Crawford, Baritone,

Monday, July 9, 2012

Copyright © 2012 Michael David Crawford. All Rights Reserved.

On Sun, Jul 8, 2012 at 11:06 AM, Nick Rogers <roger_s1@mac.com> posted to cocoa-dev@lists.apple.com:
> I have seen that even when this condition is not met and no memory
> is allocated, the VM keeps growing steadily as seen in Activity Monitor.
> But the app is GC enabled (required), so the VM size should reset to
> some level every few seconds, but doesn't.

Quite likely the single most-successful scam that anyone ever pulled on the public - not just us coders - was when the Public Relations people at Sun Microsystems managed to sell the idea that Garbage Collection prevents memory leaks.

Garbage Collection does nothing of the sort.

Neither does Reference Counting, Automatic or otherwise.

It happens that even as we speak I am regressing a massive leak in Firefox that runs my box completely out of swap space. It doesn't just make my Mac unresponsive; eventually my entire filesystem fills up, with the result that none of my processes are able to allocate any more virtual memory, so most of them hang.

Just killing processes, quitting the processes that aren't hung, nor closing windows will bring anything back to life. Once OS X says that *any* of my Apps are "Paused", I have no choice but to kill their processes.

I'm not real sure yet as I have dozens of tabs open, but my guess is that the culprit is some totally half-baked 1990s Javascript that Firefox is "too modern" to survive.

If you have a reference to an object you don't really need, then you are preventing its Garbage Collected, and so are either leaking memory or at least consuming memory that you don't really require.

I do not wish to denigrate the incredible work of Apple's tools people who created Automated Reference Counting, but we all know that computers don't do what we want them to do, computers do what we tell them to do.

The Objective-C compiler implements Automated Reference Counting by figuring out completely on its own when it can safely decrement a reference count, then does so at its very first opportunity. But ARC might not know that releasing your object is safe until quite a long time after you don't really need the retained object anymore; the best that ARC can do is decrement reference counts when it can be certain that doing so won't cause a double deallocation.

If you could rewrite your source in such a way that ARC would decrement your count earlier than it presently does, your code as it stands now suffers from poor memory management. There is absolutely nothing that ARC can do to help you; you must instead learn to take responsibility for managing your memory yoursel.

I don't claim you must stop using ARC. What I am telling you is to help ARC by enabling it to decrement reference counts sooner than it otherwise would with naive implementations.

If Garbage Collection is so cool, why did I have to advise ClickRebates to reboot their very expensive Solaris UltraSparc server every night. Otherwise their entire server would crash, as ClickRebates' Java Enhydra eCommerce application was running their box completely out of swap space.

If Garbage Collection is so cool, why did Intuitive Systems' Optimize It! Java memory leak detector sell like hotcakes, so muc so that Borland paid metric boatloads to acquire it then sell as their own product?

I wrote a bunch of Smalltalk GUI for an Electronic Medical Records database back in the day. A friend congratulated me on being offered the position at KnowMed Systems - later iKnowMed, so as to take advantage of the Dot-Com Boom \:-/ - by saying "Smalltalk is the way Object-Oriented Programming should be". \

You may be far, far more familiar with Smalltalk than you think you are: Objective-C is just C with Smalltalk bolted on, you see. Because there is no feasible way to do Garbage Collection in C, not so much Objective-C itself but Cocoa and Cocoa Touch's Foundation Framework uses Reference Counting instead.

KnowMed's development environment had this really cool feature that it could save its entire state whenever I pleased, so that more or less I'd Dump Core at the end of each workday, then the next morning I'd Reanimate The Undead Corpse Of My Integrated Development Environment, only to find all my text editors, debug sessions and so on right where I left them the night before.

But after my first week of truly productive work, I discovered my entire box had become completely unresponsive. It was just like writing code while swimming in Cold Molasses.

"Check in all your source," advised my boss, "Trash your saved session document, set up a fresh, new development environment, then finally check all your source back out."

Worked like a charm.

It was not so much that my own Smalltalk source was leaking so much memory, as I had not yet written that much source. It was that my whole development environment was not only leaking, but was saving snapshots of all the leaks to be deserialized back into RAM the very next day!

On Sun, Jul 8, 2012 at 11:06 AM, Nick Rogers <roger_s1@mac.com> posted to cocoa-dev@lists.apple.com: > I have a auto release pool drained at the end of every iteration in
> a loop in some other method but that doesn't seem to have much effect.

There may be other kinds of pools - other than autorelease pools - in use by other libraries that your own code depends on.

Is your application for iOS or Mac OS X? If it's for Mac OS X, you can diagnose your leak with Valgrind.

Because Valgrind also validates library and system calls, it does not yet support the iOS, not even in the Simulator. There's no reason it couldn't be made to do so, but someone would have to do all the heavy lifting of validating all those APIs.

(Valgrind is much like the old Classic Mac OS Trap Discipline in that regard - which was in development for aeons without ever actually shipping - as well as Onyx Technologies' QC for validating 680x0 A-Traps.)

The easiest way to install Valgrind on OS X is to install its MacPorts package.

After installing MacPorts, execute:

   $ sudo port selfupdate
   $ sudo port upgrade outdated
   $ sudo port install valgrind

To run TextEdit under valgrind, one would execute the following:

   $ valgrind /Applications/TextEdit.app/Contents/MacOS/TextEdit

If your code runs on iOS, perhaps you could port just the relevant components to OS X. It wouldn't even have to be a Cocoa App; an Objective-C command-line program with no UI, but that exhibited the leak would do just fine.

It is for just this reason that I do my very, very best to keep the core parts of my own code completely portable to PowerPC on Mac OS 9. That's so I can use Onyx Technology's Spotlight to diagnose memory leaks, buffer overruns and buffer underruns.

Spotlight works by patching Classic Mac OS Code Fragment Manager PowerPC binaries so that every memory reference is validated. Despite my copy of Spotlight 1.0 having been purchased in 2000, Spotlight still puts Valgrind completely to shame, but only to the extent that I can get my code to work under Classic PowerPC CFM.

I don't recall just now whether Spotlight works under Classic, but I don't see why it wouldn't. Maybe you could turn up a license on eBay.

On Refactoring C++ Code relates my experience with Spotlight on quite a large combination vector/bitmap graphic editor.

While that piece is specific to C++, it applies to Objective-C in a conceptual way.

I was a "White Badge" Senior Engineer on Apple's Traditional OS Integration Team in 1995 and 1996. "The Team Formerly Known As The Blue Meanies" was tasked with assembling all the disparate parts of the Classic Mac OS into a single build that would result in a single installation CD image. After we'd worked out the kinks in that build, we'd hand it off to Apple's Source Code Management, who would build and issue the CD images that were actually tested by Apple Software Quality Assurance, as well as the builds that were delivered to end-users.

I was one of the Debug Meisters. I eventually requested an internal transfer to PowerBooks because I realized that despite working like a Mad Man to fix the most arcane bugs for months on end, I never got to actually write any actual code. Us Debug Meisters had so much on our plates that the very best we could do was narrow down just the subroutine were a bug lay, or maybe just the source file, then assign the bug to whoever owned that code.

But there was some code in the Classic Mac OS - System 7.5.3 and later for sure; I don't recall just when I did this work, but maybe my one contribution to Apple's codebase was in 7.5.2 as well:

An upcoming hardware product had a serious design flaw, in that Apple's hardware people selected the original PowerPC 603 chip, which at eight kilobytes simply did not have enough L1 Cache to enable System 7.5.2 to run fast enough to be successful in the marketplace:

The 603 is notable due to its very low cost and power consumption. This was a deliberate design goal on Motorola's part, who used the 603 project to build the basic core for all future generations of PPC chips. Apple tried to use the 603 in a new laptop design but was unable to due to the small 8 KiB level 1 cache. The 68000 emulator in the Mac OS could not fit in 8 KiB and thus slowed the computer drastically. The 603e solved this problem by having a 16 KiB L1 cache which allowed the emulator to run efficiently.
-- PowerPC Implementations, from Wikipedia

Recall that at the time, while 7.5.2 and later Systems were mostly PowerPC-Native, there were many emulated 68k binaries in the System, not just GUI applications, but device drivers, interrupt handlers, exception handlers, all manner of stuff that was critical to the overall system performance.

Some cluebot made the mistake of not simulating the performance of a CPU that such limited cache before Apple's management signed off on their choice of the part.

I served on a team that was given the mission of coming up with anything, anything at all that would enable that box to run fast enough that it would sell. While Apple's hardware people could have redesigned its motherboard to use a more-capable processor, the respin would have delayed the product's launch so much that we would have lost lots of market share to The Borg.

I at first did not know how to even proceed.

But eventually I figured out that, given our problem was that we did not have enough cache, I'd just pick out an important part of the Mac OS largely at random, investigate its cache utilization, then find some way to recode it so that it thrashed the cache less.

I figured the Resource Manager would be a good start. One's first GetResource would hit the filesystem; HFS' source code was far beyond my limited mental capacity, so I considered just the case of calling GetResource for the second and subsequent times, *after* the payload data was in main memory.

It did not take long to find all manner of ways to optimize that code path:

It was very painful, tedious and labor-intensive for Apple to power the Mac OS from 68k to PowerPC, because most of the original Mac OS was written in tightly hand-optimized 680x0 Assembly Code! Just to get the Native Mac OS to work at all, the first few revisions consisted of native-compiled C source that was a direct, literal translation of the 68k Assembly Source.

Even If That Literal Translation Was As Dumb As A Box Of Hair.

So I just re-coded a bunch of the Resource Manager so that it used much less of both the Data and Instruction caches.

Mac OS 9 was *so* much faster than the Mac OS 7.5.2 and 7.5.3 that I worked on, not just because it had so much less emulated code, but because so much more of the native code was written with sophistication, rather than implementing The Simplest Thing That Could Possibly Work.

Eventually a build of the Mac OS shipped with a release note that rather obliquely mentioned that the Resource Manager had been modified so as to make the entire Mac OS a little bit faster, but Apple never did say what exactly "Apple" - uh, Yours Truly - had done to make it faster.

The chances are pretty good that my patches are still present in Mountain Lion's Carbon support. OS X still supports Carbon, it just doesn't support 64-Bit Carbon GUI. There are portions of Carbon that are 64-Bit though. I don't know whether the Resource Manager is 64-Bit, but for sure 32-Bit Resources are still in there.

I had such success with this that I distributed a White Paper among Apple's developers, that emphasized the importance of treating memory as an extremely limited, preciously expensive resource. The proper way to think of memory is not that one has, at the time, dozens or even hundreds of megabytes, but only - for the PowerPC 604 in the Power Macintosh 8500 and 9500 - but just 512 32-Bit Cache Lines.

If you read or write so much as one single byte of your memory space, you're going to use up at least 32 Bytes of cache, 64 Bytes on some architectures - as well as taking up cache that might have been put to more productive use by some other code or data. My White Paper included a diagram just like this:

   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 

If you want your product to work well on modern hardware architectures, you must stop thinking about gigabytes of physical memory with hundreds of gigabytes of backing store.


You must think about all the machine code that you care about, fitting within the limit of those one thousand twenty-four dots, and similarly for all the data you care about.

Heed my advice, and your executable will accelerate by orders of magnitude.

It will run really well on much less capable and so cheaper hardware to boot, thereby opening your product up to markets you would not otherwise have been able to penetrate.

That's why my own iOS App is called Warp Life, as it is named after the DiLithium Crystal-powered Starship engines:

It's an implementation of the Cellular Automaton known as Conway's Game of Life. There are not one but two kinds of Life fans: there are the freaks like Paul Tooke who devote their entire lives to coming up with Patterns - what I refer to as "Animals" - such as his 3-engine Cordership

... itself built from three of Charles Corderman's Switch Engines.

I have not the first clue how those guys manage to think up stuff like that.

You see, I myself am the kind of freak who devotes his entire life to coming up with ways to make Life run faster, bigger and better. When I discovered Conway's Game of Life in 1972 at the age of eight in Martin Garder's Mathematical Games column in the October 1970 Scientific American, I was quite a long ways from writing my first computer program, so instead I played it on a checkerboard.

It's not at all obvious from Martin Gardner's First Post that Princeton University Mathatics Professor John Horton Conway's purpose in proposing The Game of Life at all was to contribute to Artificial Intelligence Research. Conway himself later proposed that there was a Turing-Complete Computer somewhere in the Life Universe. I think he did prove such an Animal existed, but was unable to produce a working example. Since then not one but two Turing Complete Animals have been discovered:

Some people have far too much free time on their hands.

My hope with Warp Life is not to make millions with App Store sales, but to \bring about The End of Time by causing The Singularity when my Algorithm is run on parallel-processing supertcomputers. :-D

The very fastest known Algorithm for Conway's Game of Life and related Cellular Automata is known as Hash Life.

My Warp Life Algorithm - as opposed to my iOS App also called Warp Life - is conceptually similar to Hash Life, but designed specifically to run on hardware that caches both code and data:

The vast majority of "Textbook" Algorithms that are studied by the Computer Science community completely ignore the effect that poor memory access patterns have on cache utilization. While the N Log N Average Run Time of QuickSort is the best one can hope for if and only if all data and memory accesses are created equal, the straightforward implementation of QuickSort is quite a lot slower than what it would be if it were implemented in such a way as to maximize cache utilization, or equivalently, to minimize cache thrashing.

Puzzling over that question led to my discovery of what I refer to as The Holy Grail late on the night of July 27, 2006, where at the time I was working out of the home where my ex-Wife Bonita Hatcher and I lived in Truro, Nova Scotia.

My Life-Altering Insight was at first an Algorithm, an associated File Format, a File Parser for reading that format, an Encoder for writing it as well as an Editor.

Such Algorithms, Formats, Parsers, Encoders and Editors have been in widespread use for decades, so I've been keeping a lid on what The Holy Grail really is lest a better-funded competitor get the drop on me.

My original plan was to Crush Bill Gates Like A Bug. But that's just not the kind of guy I am.

I enrolled at the California Institute of Technology in the Fall of 1982 with the aim of someday winning the Nobel Prize in Physics through original research in Optical Astronomy.

(Hi Jens! Greetings from an Old Scurve.)

Upon discovering that what I like about Astronomy is making telescopes, not looking through them, I was advised by my classmate Mike Roberts to major in Physics instead. Astronomers never make their own scopes, they always hire opticians for that, as Astronomical Telescope Mirror Fabrication is very specialized work. But every Physics experiment is quite different from every other one, so it is quite common for Physicists to design and build their own instruments for each individual experiment.

All us Physics and Astronomy Majors referred to the Institute's Computer Science majors as "Prostitutes", as many of them aimed to sell their minds for money. And in fact during my time there, a Silicon Valley game startup called iMagic advertised in The California Tech offering good pay to CS Majors who dropped out of school to develop games instead.

While I'm not quite ready to tell you what The Holy Grail actually is, I will tell you that my key insight concerns Efficiency.

The Physical Definition of "Work" is the product of Force and the Distance moved by an obect while that Force is impressed upon it. If a stone rests on the ground, while the Force of Gravity bears upon the stone, and the Force of the stone's weight bears upon the ground, neither the stone nor the ground move and so Work is performed.

But if that stone moves while Force is applied, it will accelerate thereby gaining increasing Kinetic Energy. Transferring Energy of any form to the Kinetic Energy of an object by applying Force as it moves is what constitutes Work.

Work is equivalent to Energy. Or rather, Energy is the capacity to perform Work.

The rate at which one either produces or consumes Energy is called Power.

There are many forms of Power. Of most concern to not just to Hardware Engineers but to us Software Engineers as well is Electrical Power, not just the Electricity required to power our mobile devices, but also our servers:

Greenpeace and Apple recently got in a spat over Greenpeace's claim that Apple's new Data Center consumes One Hundred Megawatts of Electricity. Apple cried foul, then pointed out that in reality, it only consumes Twenty Megawatts.

My friends, if you're blowing Twenty Megawatts just to serve a website, ou're doing something very, very wrong.

Steve Furber explains in Arm System-on-Chip Architecture that digital logic does not consume significant power while in a fixed state; power is consumed only by logic transitions

Power in consumed when a conductor is switched from a low voltage state to a high voltage state - from Logical 0 to Logical 1 - as one has to force electrons into that conductor. Power is produced by that same conductor when it is switched back to low voltage - from Logical 1 to Logical 0 - but the amortized cost of going from 1 to 0 can be considered as consuming power, because the next time it goes from 0 to 1 it will have to be charged up again.

Both kinds of transitions produce heat due to the non-zero resistance of the conductors. I was told by Luke Crawford, the proprietor of the Prgmr.com hosting service, that his single greatest cost center is neither bandwidth nor rackspace in his data center, but the power consumption of his servers.

Modern computers don't consume that much power in themselves. What is costly is to keep Luke's servers cool by transferring their waste heat out of the facility so they don't all melt into puddles of liquid solder.

Most storage access energize a powerful electromagnetic to seek disk drive heads. If you minimize seeking by making your code smaller, by making your data files smaller, by using less memory so that less paging takes place, by opening fewer files or by searching fewer directories then there will be fewer seeks.

The electrical consumption of data center servers scales much like the Launch to Orbit cost of spacecraft: if you add just a little bit of "weight" in the form of server power consumptions, that server's cooling fans will have to spin faster and more often, while the data center's air conditioners will have to work harder.

If much of your code is inefficiently written, or your servers improperly configured, the data center must pay more for higher capacity thermal solutions. It must also pay the power company not just for more electricity, but also for higher-capacity cabling from the electrical grid, larger transformers, and larger backup generators in the event of an outage.

The power company in turn must then spend more money to build more generating plants, by damming more rivers, producing more radioactive waste that we must spend more to store and to guard so that future generations don't get Teh Cancer, and we must pay more to purchase more oil that those purchase it from must pay more to drill out of the ground, thereby burning through their own children's inheritances by depleting their own national oil reserves much faster.

Not just yours and mine, but all of Humanity's failure to consider what happens when you move a charged particle by applying Force to it ultimately lead to the following specific example of what I refer to metaphorically as:

The Mental Software Problem

The Mental Software Problem can be understood by the fact that British Petroleum didn't even apply for an ecological drilling permit before it blew a smoking crater into the floor of the Gulf of Mexico in the most ignorant kind of way.

If you think British Petroleum just fixed the Lousiana Shrimp Industry but good, just wait until you see what the United States Justice Department is about to do to British Petroleum's Stock Market Capitalization.

The Mental Software Problem is commonly understood to be severely symptomatic mental illness, but abnormal psychology is only a part of the problem.

The medieval Catholic theologians identified the Seven Deadly Sins, of which Pride is the worst. The Mental Software Problem arises from each of the Seven Deadly Sins:

Fix those problems, and they'll be paving the public highways with ingots of pure gold.

You will surely agree that the Technology Industry suffers The Deadly Sin of Greed.

While Greed is indeed a problem, far more serious is that we suffer The Deadly Sin of Sloth:

32-Bit Integer Overflow in Image Capture from
 Mac OS X Snow Leopard

32-Bit Integer Overflow in Image Capture from Mac OS X Snow Leopard

I am very sorry to have to bear the bad news to my dear friends at Apple that I won't be filing any more Radar bugs. Instead all my reports of what are properly referred to as "defects' and not "bugs" in Apple products while be lucidly detailed somewhere within:

There are a whole bunch of Conway's Game of Life products in the App Store; there are some in Google Play as well. Most are free, the most expensive iOS App is just $2.99.

Try a few out. You will readily agree they all blow goats.

After two and a half years of development, while my ultimate goal remains far off, I am confident that the Warp Life Algorithm will blow Hash Life completely out of the water. As it stands, my App is dramatically faster than every last one of its competitors for every mobile platform - not just iOS - and can represent a much larger Cell Grid.

Presently I store a fixed-size, rectangular array of Cells, but Version 1.0's grid will be almost but not quite unlimited, without any "Edge Effects", as Rules of Conway's Life require.

I decided a while back to GPL Warp Life's source so it can serve as an example to other iOS coders that There Is A Better Way™. I have a little work to do yet, but Warp Life's first source drop will coincide with its Beta 2 release in a week or so, at which point I will post the tarball as well as enable Anonymous Subversion Checkout.

The current directory will hold each individual source file, along with some documentation. All the tarballs will go in the releases directory.

As a result of all this, I was offered an internal transfer to a Performance Engineer position on the Copland Team. Our "Traditional OS Integration" was "Traditional" beacuse Copland was the modern operating system.

I thought all of Copland's people were just wonderful, and would have been As Happy As A Pig In Shit with that job, but in the end declined their offer because, for no reason I have ever been able to figure out, I was filled with the sense that Copland would never ship.


Maybe, just maybe, had I accepted that Performance Engineer internal transfer to Copland, none of us would be coding for Mac OS X.

Maybe we'd all be coding for Copland.

Vote for Us at the Programming Pages
Voting for Dulcinea Technologies at The Programming Pages will encourage more people to read these articles.

[Home | Contact | What's New? | Products | Services | Tips | Mike]