What is Rubberduck?

Rubberduck is an open-source COM add-in that enhances the VBA Editor, the glorious VBE… which was last updated well over 20 years ago.

We are working around the rather limited VBIDE Extensibility API, and one way or another, we’ll bring the VBE into this century.

From refactoring tools to code inspections, from the member-level Code Explorer and its ability to organize modules into folders, to the ability to quickly locate and navigate to literally anything that has a name, or the 64-bit compatible Smart Indenter port, or, or, or… Rubberduck’s scope is huge – and that’s only a part of what makes this project so awesome: this duck is ambitious, and there’s work for everyone, at every skill level.

This blog contains articles about feature announcements, information about upcoming releases, but also detailed tutorials on various VBA topics – often geared towards promoting clean, testable, object-oriented code. Rubberduck was started on Code Review Stack Exchange after all.

700 issues?

The project’s GitHub repository currently has over 700 open “issues”… but 500 of those are all the wonderful things you/me/we would like Rubberduck to be able to do: we carefully label each one of them as they’re opened, which makes these 700-some “issues” more like “tickets” or “work items”, sometimes even “brainstorming records”. We love feedback and ideas, and we’re usually quite fast to respond to new issues, too!

That said yes, there is a metric ton of things that need tweaking. Here’s the more important stuff that’s going to be relevant throughout the v2.x cycle:

Problem: Access violation (crash) on shutdown

Tearing down a managed (.net) VBE add-in is hard. There are so many important things we easily take for granted when writing .net code, that fly out the window when the .net runtime itself is spawned from your DLL being loaded by an unmanaged COM process. We found the hard way that the Runtime Callable Wrappers .net is helpfully creating, appear to assume the COM objects are owned by .net: if a destructor didn’t get to run, or if garbage collection (which is undeterministic) didn’t complete before the VBE decides it’s over, it’s over: either there’s an access violation, or the host application closes, but the host process remains running, stuck, still consuming all the leaked memory.

Solution: Rubberduck solved a good 95% of the teardown issues since the release of v2.0, thanks to the impressive efforts of the project’s awesome contributors. COM objects and RCWs are now almost under complete control, window subclassing is centralized; we’re all careful to dispose the COM wrappers we create, and we review each others’ pull requests so as to increase the chances of picking up any oversight before the changes are merged.

What’s left is on the managed (.net) side of things: we still need to clear up the Code Explorer‘s ViewModel, it’s holding on to the parser state, which ultimately prevents its proper garbage collection. I haven’t had any problems exiting Excel in months at this point, but Access doesn’t seem to be as smooth; we need to sort out these object references and dependencies to get a 100% clean exit in all hosts. It’s in sight.

Problem: Member attributes are lost on rewrite

VBA module members can have a number of VB_Attribute values, that the VBE “conveniently” doesn’t display in the code panes. Because Rubberduck needs to know the exact position of every token in the code panes, our parse trees don’t include the member attributes. As a consequence, whenever Rubberduck modifies a module’s code, any member attributes in that module are annoyingly lost.

Solution: this isn’t an easy one. The current idea is that we’re going to have to essentially hijack the VBE’s code panes, and inject our own – one that does include these member attributes. And since we can’t just inject a TextBox and lose IntelliSense, this boils down to re-implementing pretty much the entire design-time editor (hooking the debugger is a whole other problem).

Problem: Memory (RAM) consumption

Rubberduck parses an average-size module in under a second, and then traverses its parse tree and proceeds to resolve all identifier references in sub-second time as well. This amazing performance has a price: we cache every single publicly accessible declaration in every single referenced type library, and then expose them to our resolver through various dictionaries, each using different types of keys – an empty Excel VBA project might have well over 30K declarations in memory, cached a dozen times each so that the resolver and static code analysis can access any one of them pretty much instantly. As a consequence, an average Excel VBA project might consume ~400MB RAM, as opposed to ~30MB without Rubberduck.

Solution: since the beginning of the project, the primary concern had been to get something that works, then to get something that works fast… memory footprint has constantly been left behind – any program is a trade-off between performance and memory. We “just” need to find a way to give memory footprint more priority. The most realistically implementable solution we thought of, that wouldn’t impact performance much, is to “simply” store the cache in an embedded SQLite database, and use actual SQL queries to lookup declarations. This demands a rather major overhaul of every feature that needs this data though.

Problem: Inspection false positives

The parser now handles literally every legal bit of VBA (and VB6) code we can think of throwing at it. The resolver, which is responsible for resolving the identifiers we find in the parse trees to a declaration that’s in scope in the context of the code being processed, is more complicated. See parsing is a mere grammatical analysis; resolving pretty much literally involves implementing the language specifications, and this means knowing everything there is to know about the internals of the type libraries being referenced – and then implementing the clever hacks VBA does with some of them, too.

If the resolver (and its dependencies, e.g. the part responsible for scanning the referenced type libraries) isn’t 100% perfect, then the code that requires this insight (refactorings, inspections, etc.) can’t work as intended either. Barring actual bugs in the inspections themselves, resolver issues are responsible for most false positives.

Sometimes it’s a matter of visibility, too. Rubberduck is mostly host-agnostic, meaning it doesn’t care whether it’s running in Excel, Access, PowerPoint, Outlook, AutoCAD, CorelDRAW, Sage 300, or any other one of the 200+ possible VBA host applications; we’re not scanning the “document modules” (e.g. worksheets) through the host object model. This means if you have a button on a worksheet that invokes DoSomething, and no other VBA code is calling that procedure, then Rubberduck can’t know that the procedure is actually in use.

Solution: barring bugs we do need to fix with a number of inspections, that one’s on you, Rubberduck user. No static code analysis tool ever written is completely free of false positives, or covers everything there is to cover in a given code base. If Rubberduck is saying DoSomething isn’t used, don’t just blindly remove it! When you know an inspection result should be ignored, use the IgnoreOnce quick-fix to have Rubberduck insert an '@Ignore annotation. Recommendations aren’t Gospel either: even recommendations made by AAA-grade static code analysis tools sold at premium price can be iffy, if not flat-out wrong. If Rubberduck is flagging something and it doesn’t look right, use your judgement first.