How we solved the most annoying COM Interop issue

I first noticed this strange behavior when I refactored Rubberduck’s menu system last summer: although I was 100% certain that there only ever was a single instance of a given menu item, in the Click event handler the sender object’s GetHashCode method returned a different value every time the handler ran.

GetHashCode, in the “normal” .net world, is tightly related to the Equals implementation – whenever you override Equals, you must also override GetHashCode. The value returned by this method is used by data structures like Dictionary and HashSet to determine some kind of “object ID” – the rules are simple:

  • if two things are equal (Equals(...) == true) then they must return the same value for GetHashCode()
  • if the GetHashCode() is equal, it is not necessary for them to be the same; this is a collision, and Equals will be called to see if it is a real equality or not.

(source)

In the “normal” .net world, it’s usually safe to assume that an object’s hash code doesn’t change throughout the lifetime of the object – because a correct implementation relies on immutable data.

Apparently COM Interop has different rules.

When Rubberduck parses the projects in the VBE, it generates a plethora of Declaration objects – thousands of them. There’s a Declaration object not only for every declared variable, procedure, property, parameter, constant or enum member, but also one for every VBComponent, and one for each VBProject – anything that has an identifier that can appear in code has a Declaration object for it. Declarations are related to each others, so any given Declaration instance knows which Declaration is its “parent”. For example, a module-level variable has the declaration for the module as its parent, and the declaration for that module has the declaration for the project as its parent.

On the first pass, there’s no problem: we’re just collecting new data.

Problems start when a module was modified, and is now being re-parsed. The parser state already has hundreds of declarations for that module, and they need to be replaced, because they’re immutable. And to be replaced, they need to be identified as declarations that belong under the module we’re re-parsing.

A module’s name must be unique within a project – we can’t just say “remove all existing declarations under Module1”, because “Module1” in and by itself isn’t enough to unambiguously qualify the identifier. We can’t even say “remove all existing declarations under Project1.Module1”, because the VBE has no problem at all with having two distinct projects with the same name.

In Rubberduck 1.x we used object equality at project level: if a declaration’s Project was the same reference as the VBProject instance we were processing, then it had to be the same object. Right? Wrong.

And this is how we got stumped.

We couldn’t use a project’s FileName, because merely accessing the property when the project is unsaved, throws a beautiful COMException – and we could be facing 5 different unsaved projects with the same name, and we needed a way to tell which project that modified “Sheet1” module belonged under. We couldn’t use a project’s hash code, because it was now known to be unreliable. We couldn’t use… we couldn’t use anything.

This COM Interop issue was threatening the entire Rubberduck project, and shattered our hopes of one day coming up with an efficient way of mapping a parse tree and a bunch of Declaration objects to a VBComponent instance: we were condemmned to constantly scrap everything we knew hadn’t changed since the last parse, and re-process everything, just to be 100% sure that the internal representation of the code matched with what the IDE actually had.

Enter @Comintern and his pure awesomeness:

What about hi-jacking one of the R/W properties that are never going to end up user facing? Like .HelpFile? Just copy the original hashcode to the property and then search for it.

Genius!

Rubberduck 2.0 will hijack the VBProject.HelpFile property, and use it to store a ProjectId value that uniquely identifies every project in the IDE.

Problem solved! Nobody ever writes anything to that property, right?

Stay tuned, we’re just about to announce something very, very cool =)

Advertisements

1 thought on “How we solved the most annoying COM Interop issue”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s