Building HyperDisk from scratch:
a 9 month code odyssey
Ahhhh, fresh air. The past 9 months were a programming marathon, now concluded with the birth of a new app called HyperDisk. It’s exhilarating to finally be ramping down the programming and switching modes to a less techie, more human-oriented mental state. To that end, I’ll be blogging and making content about HyperDisk. That starts today with this post chronicling the development of HyperDisk while it’s still fresh in my mind.
The origins of HyperDisk
For years, I’ve been kicking around the idea of a 3D hard disk visualizer. I thought about a grid of rectangles where bigger files occupied more space (like the venerable Grand Perspective), but with a 3-dimensional stacking. The vertical axis would allow you to visualize the structure of enclosing folders on your disk, with root-level folders at the bottom. In March 2023 I had some time to devote to new projects, so I figured it was time to finally start prototyping my disk visualizer. How hard could it be, right?
The “how hard could it be?” question is the genesis of so many of these projects, and it’s something of a double-edged sword. Almost invariably the project contains a thousand tiny pitfalls which you didn’t anticipate, but if you did anticipate every one of them, you might not have had the courage to start the project. I've learned to embrace the naiveté of “how hard could it be,” but also to expect the unexpected.
Building the 3D Disk Viewer
Choosing a 3D graphics library was one of the first decisions to be made. I considered writing the graphics natively in Metal as I’ve done for other apps, but using a library could save a lot of code. I ended up picking Apple’s SceneKit library. The fact that it’s Swift-native and available on every Mac made it a no-brainer. I love the Unreal Engine but it would have been massive overkill for this app, and even Unity is a bit much for a single-view 3D application. Using Metal directly would have been the most efficient, but with a lot more heavy lifting. SceneKit was the Goldilocks graphics API for HyperDisk — just right.
I wasn’t sure how well the 3D disk view would work, simply because I’ve never seen one before. Logically it makes sense, but I had no way of knowing how it would “feel” until I built one. I’m happy to say that it really does work well, and I’m now convinced that a 3D disk viewer is an outstanding way to visualize a lot of files and folders. Since our brains are adapted to seeing and parsing 3D visual information, this view mode gives us an efficient pipeline for communicating information.
One cool bonus feature of drawing the disk in 3D is that you get the 2D disk view for free. By putting the camera directly above the stack of rectangles, you get a 2D tree map view of your disk. HyperDisk lets you drag the view around to see it from a different perspective, and then it snaps back to 2D if you view it from overhead. Pretty cool.
But wouldn’t it be great if…
In building up the disk viewer, I wrote a lot of code for indexing, cataloging and understanding your hard drive. This got me thinking about what else I could do with it. I started to think about making HyperDisk into an all-in-one disk manager. I thought about the key features of other disk cleaner apps, and asked myself the question “What if HyperDisk did all of that, but better?” What if it were, in fact... ≪fanfare≫ The Ultimate Disk Cleaner for Mac.
Duplicate Files
Duplicate Files are the low-hanging-fruit of hard disk cleanup. There are sometimes reasons for having multiple copies of a file, but often it’s just wasteful. Identifying these duplicate files is all about efficiency and optimization. It would be easy to write a deathly-slow duplicate file finder, but writing a fast one takes some work and creativity.
As with most of the app, I wrote the Duplicate File finder in Swift. To squeeze maximum performance out of modern computers, you need to write concurrent code. That means your code runs in multiple different threads on the processor, allowing all the cores of the chip to be running hard. Swift is a language that was built with concurrency in mind from day one, so it helped me write the code to use as much of the computer’s power as possible.
To optimize code, it often helps to think about how a human might solve a problem efficiently in the physical world. Finding duplicate files on a computer can be compared to finding duplicate books on the shelves of a library.
In the library situation, you’d probably start thinking about how to minimize your workload. The worst approach would be to compare two books word for word, move on to the next pair if they don't match, and repeat for all the books in the library. Even with the performance of modern computers, an approach like this would be unusably slow for scanning your entire hard drive.
In the real world, you might first sort the books by their number of pages. (There’s no point in comparing a 300 page book with a 400 page book because they’re not going to be identical.) Once you have your candidates, you might compare a few words from the books. Maybe the first few words, the middle few, and the last few. If those match, you’ve got a likely candidate so it might be time to check the books word-for-word.
By mirroring this same kind of selection refinement in code, we reap huge performance benefits. I experimented with a number of different approaches for streamlining the duplicate files finder. Thorough a combination of selection refinement, data hashing, and concurrency, I’ve got HyperDisk’s dupes finder running nice and fast.
Similar Images
Pictures take up a lot of space, and they’re often one of the main culprits for disk space usage. Often you’ll have multiple images which look the same to the eye, but with very different underlying file data. A JPEG image and an HEIC image might look almost identical, but to the computer the data is vastly different. To find similar-looking images, a visual approach is needed.
From a pure programming standpoint, the Similar Image finder was definitely the most challenging, nitty-gritty, Computer Science-y part of HyperDisk. Churning through thousands or millions of photos and comparing their pixels in a performant way is really, really tricky. Conceptually, there’s a lot of filtering and summary that needs to take place. Instead of comparing all the pixels of an image, it sometimes makes more sense to compare just a few. And instead of comparing images one by one, it’s less work for the computer to tile images onto a gridded “atlas” and compare chunks of those.
It so happens that modern computers have specialized processors geared toward exactly this type of work. These processors are called the GPU: Graphics Processing Unit. On new Macs the GPU is built into the main chip (i.e. the M1, M3, etc.) Writing code for the GPU is not regular programming however. You can’t write GPU code in Swift or even C. You need to use a library like Unreal or SceneKit, or roll your own graphics in a native language like Metal or OpenGL.
HyperDisk’s similar image finder is written in a mix of Swift, C and Metal. Swift handles the overall architecture, concurrency and running of the similar image finder. Certain parts of the image atlas rendering are written in C because it’s the native language of Apple’s CoreGraphics library. And finally Metal is used for on-GPU comparison. Rewriting the similar image finder with Metal boosted speed by almost 10x on some scans.
Auto Clean
Auto cleaners are a bit of a divisive topic. Some people love the convenience of a one-click cleanup. Others are distrustful of these things, and they won’t use them. Put me in the second camp. I’ll use HyperDisk because I wrote it, but I don’t blame anybody who doesn’t want to trust a third party app to make changes to their hard disk.
It was important for me to make HyperDisk primarily a do-it-yourself kind of cleaner. The Auto Clean is an added little feature for those who want it, and I made sure to be very conservative in what the app cleans. There’s nothing more unsettling than an app deleting things off your drive without really letting you know what it’s doing and why. HyperDisk’s auto cleaner tries to be very communicative, and it always gives users a chance to cancel any proposed changes.
The auto cleaner is centered around a slider control which ranges from Light Clean to Deep Clean. You can choose how deeply you want to clean up your drive. A Light Clean will delete things like cache files, which exist to improve load times but can be deleted at will. As you move the slider toward Deep Clean, HyperDisk will start selecting large, old files from your Downloads folder, as this is a common source of hard disk cruft. As you move the slider deeper, it will begin selecting old, rarely used Apps which can be re-downloaded from the app store. Nothing is actually deleted until you confirm.
If you opt for a very deep clean, HyperDisk will find large duplicate files and link them together. It doesn’t actually delete these files, instead it mirrors them so that they represent the same file. The caveat here is that modifications to either linked file will lead to both files being modified. HyperDisk achieves this mirroring through hard links.
The App Icon
I thought about whether the app icon should represent something real, like a hard disk, or be abstract. Looking through the icons for other apps, I found myself gravitating toward the more abstract ones. There was also the fact that hard disks just aren’t the prettiest things in the world. So we decided to go more toward the abstract.
HyperDisk’s icon is a sphere, representing your stuff, with two little orbits around it, representing the comprehensive scans that HyperDisk does. One thing I like about Mac App icons is the way things can poke out beyond the frame. You see this in apps like TextEdit and Xcode. We decided to have the orbits going around the outside of the frame to give the icon more depth.
We started by making a 3D model of the icon in Blender. It was dead-simple, with a sphere, a bezel and two rings. After getting the camera position just right, we rendered it. There’s a funny thing about 3D renderings… they don’t usually look right as app icons. All the best app icons are hand-drawn using tools like Illustrator and Photoshop. Shadows and the like are brushed in by mouse or pen tablet. So we used the 3D rendering as a template for positioning, and drew on top of it in Illustrator. After getting the shapes right in Illustrator we brought it into Photoshop and started adding layers for things like noise, gradients, airbrushed shadows. It seems like the more layers you add, the better the icon looks. Even when you shrink it way down, the small details are somehow captured and conveyed.
There could still be more refinement. I like the idea of refreshing the icon when releasing a major app update. When HyperDisk 2 comes out, maybe it’ll have an updated icon.
A note about apps, trust, privacy and sandboxing
There are good disk cleaner apps out there. There are also questionable ones. And yes, there are downright malware ones too. (You might have experience with one that advertised everywhere for a number of years.) Leave that kind of stuff to the PC world, I say. HyperDisk is a Mac app through and through, and it has no interest in collecting your data, auto-launching itself, or installing its trappings all over your computer.
Because disk cleaners scan your whole hard drive, it’s essential to make sure they’re not sharing your data. In HyperDisk’s case, you can verify that it doesn’t. Because HyperDisk is an App Store only app, it’s walled off by Apple’s sandbox environment. HyperDisk lacks the necessary entitlement for using the Internet, so it couldn’t contact the Internet even if it tried. It’s prevented from making network connections by macOS itself. You can verify this.
For HyperDisk, I went back to an old development methodology which I used for my most successful software release: NetShade 5. In a nutshell, it’s “don’t release the app until every single thing, no matter how tiny, is finished and polished to perfection.” This is the methodology Blizzard used on their legendary games like World of Warcraft and StarCraft. I think it produces the best software, and I think that’s reflected in HyperDisk. I hope you agree. Now let’s scan that disk!
Everything on HyperDisk’s website was written by a human.