VBA is often said to be an event-driven language: a lot of worksheet automation involves executing code in response to such or such workbook or worksheet event. ActiveX controls such as MSForms.CommandButton are trivially double-clicked, and code is written in some CommandButton1_Click procedure.
But how does it all work, and can we leverage this event-driven paradigm in our own code too? But first, what is this paradigm all about?
In a procedural paradigm, code executes one statement at a time, top to bottom, in sequence: procedure A gets invoked, procedure A calls procedure B, procedure B completes, execution returns to procedure A, procedure A completes, execution ends. That’s a paradigm you’re likely already familiar with.
In an event-driven paradigm, code still executes one statement at a time, in the very same way – except now procedures are being invoked by an external object, and there isn’t always a way to tell at compile-time what the run-time sequence will be. If you’re writing the code-behind for a UserForm module with Button1 and Button2 controls, there is no way to know whether Button1_Click will run before Button2_Click, or if either are even going to run at all: what code gets to run, is driven by what events get raised – hence, event-driven.
Event-driven code is asynchronous, meaning you could be in the middle of a loop, and then a DoEvents statement is encountered, and suddenly you’re not in the loop body anymore, but (as an example) in some worksheet event handler that gets invoked when the selected cell changes. And when that handler completes, execution resumes in the loop body, right where it left off.
This mechanism is extremely useful, especially in an object-oriented project, since only objects (class modules) are allowed to raise and handle events. It is being put to extensive use in the OOP BattleShip project (see GridViewAdapter and WorksheetView classes for examples of event forwarding and how to define an interface that exposes events), which I’m hoping makes a good advanced-level study on the matter.
But let’s start at the beginning.
Host Document & Other Built-In Events
Whether you’re barely a week into your journey to learn VBA, or several years into it, unless all you ever did was record a macro, you’ve been exposed to VBA events.
VBA code lives in the host document, waiting to be executed: any standard module public procedure that can be invoked without parameters can be an entry point to begin code execution – these are listed in the “macros” list, and you can attach them to e.g. some Shape and have the host application invoke that VBA code when the user clicks it.
But macros aren’t the only possible entry points: the host document often provides “hooks” that you can use to execute VBA code when the host application is performing some specific operation – like opening a new document, saving it, modifying it, etc.: different hosts allow for various degrees of granularity on “hooking” a varying amount of operations. These “hooks” are the events exposed by the host application’s object model, and the procedures that are executed when these events are raised are event handler procedures.
In Excel the host document is represented by a Workbook module, named ThisWorkbook; every worksheet in this workbook is represented by a Worksheet module. These document modules are a special type of class module in that they inherit a base class: the ThisWorkbook class is aWorkbook class; the Sheet1 class is aWorksheet class – and when classes relate to each other with an “is-a” relationship, we’re looking at class inheritance (“has-a” being composition). So document modules are a very special kind of module, first because their instantiation is in the hands of the host application (you can’t do Set foo = New Worksheet to create a new sheet), and second because like UserForm modules, they are inheriting the members of a base class – that’s how you can type Me. in a procedure inside an otherwise empty document module, and get plenty of members to pick from: if you’re in the Sheet1 module, you have access to Me.Range, and that Range property is inherited from the Worksheet “base class”. Or Me.Controls in the UserForm1 module, inherited from the UserForm class.
Wait I thought VBA didn’t support inheritance?
Indeed, VBA user code doesn’t have any mechanism in the language to support this: there’s no Inherits keyword in VBA. But VBA creates and consumes COM types, and these types can very well be pictured as having an inheritance hierarchy.
Or something like it. Picturing the ThisWorkbook : Workbook relationship as though there was a hidden Private WithEvents Workbook As Workbook field in the ThisWorkbook module, i.e. more like composition than inheritance, wouldn’t be inaccurate either.
Fair enough. So what does this have to do with events?
Take any old Sheet1 module: because it “inherits” the Worksheet class, it has access to the events defined in that class. You can easily see what events are available in any class, using the Object Browser (F2) – all events are members represented with a lightning bolt icon:
So when you’re in a Worksheet module, you can implement event handlers for any events fired by the base Worksheet class. Because of how events work under the hood, in order for an event handler to “hook” the event it means to handle, it must have the correct signature or at least, a compatible one: the name of the procedure, the number, order, and type of its parameters must match exactly with the signature/definition of the event to be handled… and ensuring that isn’t as complicated as it sounds:
Notice the VBE generates Private procedures: there is no reason whatsoever to ever make an event handler public. Event handler procedures are meant to handle events, i.e. they’re callbacks whose intent is to be invoked by the VBA runtime, not by user code! That’s why I recommend limiting the amount of logic that you put into an event handler procedure, and having the bulk of the work into a separate, dedicated procedure that can be made public if it needs to be invoked from elsewhere. This is especially important for UserForm modules, which tend to be accessed from outside the form’s code-behind module.
Event handler procedures are always named in a very specific way, just like interface implementations:
Private Sub EventProvider_EventName()
Note the underscore: it matters, on a syntactical level. This is why you should avoid underscores in procedure names, and name all procedures in PascalCase. Adhering to this naming standard will spare you many headaches later, when you start defining and impementing your own interfaces (spoiler: your project will refuse to compile if you try implementing an interface that has members with an underscore in their name).
Any VBA class module can define its own events, and events may only be defined in a class module (remember: document and userform modules are classes). Defining events is done using the (drumroll) Event keyword:
Public Event BeforeSomething(ByRef Cancel As Boolean)
Public Event AfterSomething()
You’ll want the events Public, so they can be handled in other classes. Now all that’s left to do is to raise these events. That’s done using the RaiseEvent keyword:
Public Sub DoSomething()
Dim cancelling As Boolean
If Not cancelling Then
Here are a few guidelines (that word is chosen) for sane event design:
DO define Before/After event pairs that are raised before and after a given operation. This leaves the handlers the flexibility to execute preparatory/cleanup code around that operation.
DO provide a ByRef Cancel As Boolean parameter in Before events. This lets the handlers determine whether an operation should be cancelled or not.
CONSIDER using ByVal Cancel As MSForms.ReturnBoolean if the MSForms type library is referenced. Being a simple object encapsulating the cancel state, it can be passed by value, and the handler code can treat it as a Boolean if it wants to, because the object’s Value is the class’ default member.
CONSIDER exposing a public On[EventName] procedure with the same signature as the event, whose purpose is simply to raise said event; events can only be raised by the class they are defined in, so such methods are very useful for making an object raise an event, notably for testing purposes.
DO use past tense to indicate that an event occurs after a certain operation has completed, when there is no need for an event to occur before. For example: Changed instead of Change.
DO use future tense to indicate that an event occurs before a certain operation has started, when there is no need for an event to occur after. For example: WillConnect.
DO NOT use present tense (be it indicative or progressive/continuous), it’s ambiguous and unclear exactly when the event is raised in the operation. For example, a lot of standard library events use this naming scheme, and it’s easy to second-guess whether the event is fired before or after said Change or Changing has actually happened.
DO NOT use Before or After without also exposing a corresponding After/Before event.
DO NOT mistake guidelines for gospel, what really matters is consistency.
The class that defines an event is the provider; a class that handles that event is a client. The client code needs to declare a WithEvents field – these fields must be early-bound, and the only types available for the As clause are event providers, i.e. classes that expose at least one event.
Private WithEvents foo As Something
Private Sub foo_BeforeDoSomething(ByRef Cancel As Boolean)
Every WithEvents field adds a new item in the left-hand code pane dropdown, and that class’ events are displayed in the right-hand code pane dropdown – exactly like any plain old workbook or worksheet event!
Last time I went over the startup and initialization of Rubberduck, and I said I was going to follow it up with how the parser and resolver work.
Just so happens that Max Dörner, who has pretty much owned the parser and resolver parts of Rubberduck since he joined the project (that’s right – jumped head-first into some of the toughest, most complicated code in the project!), has nicely documented the highlights of how parsing and resolving works.
So yeah, all I did here was type up an intro. Buckle up, you’re in for a ride!
Part 2: Parsing & Resolving
Rubberduck processes the code in all unprotected modules in a five-step process. First, in the parser state Pending, the projects and modules to parse are determined. Then, in the parser state LoadingReferences, the references currently used by the projects, e.g. the Excel object model, and some built-in declarations are loaded into Rubberduck. Following this, the actual processing of the code begins. Between the parser states Parsing and Parsed the code gets parsed into parse trees with the help of Antlr4. Following this, between the states ResolvingDeclarations and ResolvedDeclarations the module, method and variable declarations are generated based on the parse tree. Finally, between the states ResolvingReferences and Ready the parse trees are walked a second time to determine the references to the declarations within the code.
At each state change, an event is fired which can be handled by any feature subscribing to it, e.g. the CodeExplorer, which listens for the state change to ResolvedDeclarations.
A More Detailed Story
The entry point for the parsing process is the ParseCoordinator inside the Rubberduck.Parsingassembly. It coordinates the parsing process and is responsible for triggering the appropriate state changes at the right time, for which it uses a IParserStateManager passed to it. To trigger the different stages of the parsing process, the ParseCoordinator uses a IParsingStageService. This is a facade passed to it providing a unified interface for calling the individual stages, which are all implemented in an individual set of classes. Each has a concurrent version for production and a synchronous one for testing. The latter was needed because of concurrency problems of the mocking framework.
Every parsing run gets executed in fresh background task. Moreover, to always be in a consistent state, we allow only one parsing run to execute at a time. This is achieved by acquiring a lock in a top level method. This top level method is also the point at which any cancellation or unexpected exception will be caught and logged.
The first step of the actual parsing process is to set the overall parser state to Pending. This signals to all components of Rubberduck that we left a fully usable state. Afterwards, we refresh the projects cache on the RubberduckParserState asking the VBE for the loaded projects and then acquire a collection of the modules currenlty present.
After setting the overall parser state to LoadingReferences, the declarations for the project references, i.e. the references selected in Tools –> References... , get loaded into Rubberduck. This is done using the ReferencedDeclarationsCollector in the Rubberduck.Parsing.ComReflectionnamespace, which reads the appropriate type libraries and generates the corresponding declarations.
Note that the order in the References dialog determines what procedure or field an identifier resolves to in VBA if two or more references define a procedure or field of the same name. This prioritization is taken into account when loading the references.
Unfortunately, we are currently not able to load all built-in declarations from the type libraries: there are some hidden members of the MSForms library, some special syntax declarations like LBound and everything related to Debug, and aliases for built-in functions like Left, where Leftis the alias for the actual hidden function defined in the VBA type library. These get loaded as a set of hand-crafted declarations defined in the Rubberduck.Parsing.Symbols.DeclarationLoadersnamespace.
Parsing the Code
At the start of the processing of the actual code, the parser state is set to Parsing. However, this time this is achieved by setting the individual modules states of the modules to be parsed and then evaluating the overall state.
Each module gets parsed separately using an individual ComponentParseTask from the Rubberduck.Parsing.VBA namespace, which is powered by the Antlr4 parser generator. The end result is a pair of two parse trees providing a structured representation of the code one time as seen in the VBE and one time as exported to file.
The general process using Antlr is to provide the code to a lexer that turns the code into a stream of tokens based on lexer rules. (The lexer rules used in Rubberduck can be found in the file VBALexer.g4 in the Rubberduck.Parsing.Grammar namespace.) Then this token stream gets processed by a parser that generates a parse tree based on the stream and a set of parser rules describing the syntactic rules of the language. (The VBA parser rules used in Rubberduck can be found in the file VBAParser.g4 in the Rubberduck.Parsing.Grammar namespace. However, there are more specialized rules in the project). The parse tree then consists of nodes of various types corresponding to the rules in the parser rules.
Even when counting the Antlr workflow described above as one step, the actual parsing process in the ComponentParseTask is a multi stage process in itself. This has two reasons: there are precompiler directives in VBA and some information regarding modules is hidden from the user inside the VBE, namely attributes.
The precompiler directives in VBA allow to conditionally select which code is alive. This allows to write code that would only be legal VBA after evaluating the conditional compilation directives. Accordingly, this has to be done before the code reaches the parser. To achieve this, we parse each module first with a specialized grammar for the precompiler directives and then hide all tokens that are dead after the evaluation from the VBA parser, including the precompiler directives themselves, by sending the tokens to a hidden channel in the tokenstream. Afterwards, the dead code is still part of the text representation of the tokenstream by disregarded by the parser.
To cover both the attributes, which are only present in the exported modules, and provide meaningful line numbers in inspection results, errors and the command bar, we parse both the attributes and the code as seen in the VBE code pane into a separate parse tree and save both on the ModuleState belonging to the module on the RubberduckParserState.
One thing of note is that Antlr provides two different kinds of parsers: the LL parser that basically parses all valid input for every not indirectly left-recursive grammar (our VBA grammar satisfies this) and the SLL parser, which is considerably faster but cannot necessarily parse all valid input for all such grammars. Both parsers are guaranteed to yield the same result whenever the parse succeeds at all. Since the SLL parser works for next to all commonly encountered code, we first parse using it and fall back to the LL parser if there is a parser error.
Following the parse, the state of the module is set to Parsed on a successful parse and to ParserError, otherwise. After all modules have finished parsing, the overall parser state is evaluated. If there has been any parser error, the parsing process ends here.
After parsing the code into parse trees, it is time to generate the declarations for the procedures, functions, properties, variables and arguments in the code.
First, the state of all modules gets set to ResolvingDeclarations, analogous to the start of parsing the code. Then the tree walker and listener infrastructure of Antlr is used to traverse the parse trees and generate declarations whenever the appropriate grammar constructs are encountered. This is done inside the implementations of IDeclarationResolveRunner in the Rubberduck.Parsing.VBAnamespace.
Note that there is still some information missing on the declarations at this point that cannot be determined in this first pass over the parse trees. E.g. the supertypes of classes implementing the interface of another class are not known yet and, although the name of the type of each declaration is already known, the actual type might not be known yet. For both cases we first have to know all declarations.
After the parse trees of all modules have been walked, the overall parser state gets set to ResolvedDeclarations, unless there has been an error, which would result in the state ResolverError and an immediate stop of the parsing run.
After all declarations are known, it is possible to resolve all references to these declarations within the code, beit as types, supertypes or in expressions. This is done using the implementations of IReferenceResolveRunner in the Rubberduck.Parsing.VBA namespace.
First, the state of the modules for which to resolve the references gets set to ResolvingReferencesand the overall state gets evaluated. Then the CompilationPasses run. In these the type names found when resolving the declarations get resolved to the actual types. Moreover, the type hierarchy gets determined, i.e. super- and and subtypes get added to the declarations based on the implements statements in the code.
After that, the parse trees get walked again to find all references to the declarations. This is a slightly complicated process because of the various language constructs in VBA. As a side effect, the variables not resolving to any declaration get collected. Based on these, new declarations get created, which get marked as undeclared. These form the basis for the inspection for undeclared variables.
After all references in a module got resolved, the module state gets set to Ready. If there is some error, the module state gets set to ResolverError. Finally, the overall state gets evaluated and the parsing run ends.
Handling State Changes
On each change of the overall state, an event is raised to which other features can subscribe. Examples are the CodeExplorer, which refreshes on the change to ResolvedDeclarations, and the inspections, which run on the change to Ready.
Handling any state change but the two above is discouraged, except maybe for the change to Pending or the error states if done to disable things. The problem with the other states is that they may never be encountered during a parsing run due to optimizations. Moreover, Rubberduck is generally not in a stable state between Pending and ResolvedDeclarations. Features requiring access to references should generally only handle the Ready state.
Events also get raised for changes of individual module states. However, it should be preferred to handle overall state changes because module states change a lot, especially in large projects.
IMPORTANT: Never request a parse from a state change handler! That will cancel the current parse right after the handlers for this state in favor of the newly requested one.
Doing Only What Is Necessary
When parsing again after a successful parsing run, the easiest way to proceed is to throw away all information you got from the last parsing run and start from scratch. However, this is quite wasteful since typically only a few modules change between parsing runs. So, we try to reuse as much information as possible from prior parsing runs. Since our VBA grammar is build for parsing entire modules the smallest unit of reuse of information we can work with is a module.
We only reparse modules that satisfy one of three conditions: they are new, modified, or not in the state Ready. For the first two conditions it should be obvious why we have to reparse such modules. The question is rather how we evaluate these conditions.
To be able to determine whether a module has changed, we save a hash of the code contained in the module whenever the module gets parsed successfully. At the start of the parsing run, we compare the saved hash with the hash of the corresponding freshly loaded component to find those modules with modified content. In addition we save a flag on the module telling us whether the content hash has ever been saved. If this is not the case, the module is regarded as new.
For the third condition the question is rather why we also reparse such modules. The reason is that such modules might be in an invalid state although the content hash had been written in the last parsing run. E.g. they might have encountered a resolver error or they got parsed successfully in the last parsing run, but the parsing run got cancelled before the declarations got resolved. In these cases the content hash has already been saved so that the module is neither considered to be new nor modified. Consequently, it would not be considered for parsing and resolving if only modules satisfying one of the first two conditions were considered. Because of the possibility of such problems, we rather err on the save side and reparse every module that has not reached the success state Ready.
Since reparsing makes all information we previously acquired about the module invalid, we have to resolve the declarations anew for the modules we reparse. Fortunately, the base characteristics of a declaration only depend on the module it is defined in. So, we only have to resolve declarations for those modules that get reparsed. For references the situation is more complicated.
Since all declarations from the modules we reparse get replaced with new ones, all references to them, all super- and subtypes involving the reparsed modules and all variable and method types involving the reparsed modules are invalid. So, we have to re-resolve the references for all modules that reference the reparsed modules. To allow us to know which modules these are we save the information which module references which other modules in an implementation of IModuleToModuleReferenceManager accessed in the ParseCoordinator via the IParsingCacheService facade. This information gets saved whenever the references for all modules have been resolved successfully, even before evaluating the overall parser state.
In addition to the modules that reference modules that got reparsed, we also re-resolve those modules that referenced modules or project references having just been removed. This is necessary because the references might now point to different declarations. In particular, a renamed module is treated as unrelated to the old one. This means that renaming a module looks to Rubberduck like the removal of the old module and the addition of a new module with a new name.
The final optimization in place on a reparse is that we do not reload the referenced type libraries or the special built-in declarations every time. We just reload those we have not loaded before.
Caching and Cache Invalidation
If you have read the previous paragraph, you might have already realized that the additional speed due to only doing what is necessary comes at a cost: various types of cached data get invalid after parsing and resolving only some modules. So we have to remove the data at a suitable place in the parsing process. To achieve this the ParseCoordinator primarily calls different methods from the IParsingCacheService facade handed to it.
In the next sections we will work our way up from cache data for which you would probably seldom realize that we forgot to remove it to data for which forgetting to remove it sends the parser down in flames. After that, we will finish with a few words about refreshing the DeclarationFinder on the RubberduckParserState.
Invalid Type Declarations
The kind of cache invalidation problem you would probably not realize is that the type as which a variable is defined has to be replaced in case it is a user defined class and the class module gets reparsed; it now has a different declaration. This would probably just cause some issues with some inspections because the actual IdentifierReference tying the identifier to the class declaration is not related to the type declaration we save. Fortunately, the TypeAnnotationPass works by replacing the type declaration anyway. So, we just have to do that for all modules for which we resolve references.
Invalid Super- and Subtypes
As mentioned in the section about resolving references, we run a TypeHierarchyPass to determine the super- and subtypes of each class module (and built-in library). After reparsing a module, we have to re-resolve its supertypes. However, we also have to remove the old declaration of the module itself from the supertypes of its subtypes and from the subtypes of its supertypes, which has some further data invalidation consequences. Otherwise, the “Find all Implementations” dialog or the rename refactoring might produce …interesting results for the affected modules.
The removal of the super- and subtypes is performed via an implementation of ISupertypeCleareron all modules we re-resolve, including the modules we reparse, before clearing the state of the modules to be reparsed. Here, a removal of the supertypes is sufficient because everything is wired up such that manipulating the supertypes automatically triggers the corresponding change on the subtypes.
Invalid Module-To-Module References
As with all other reference caches, part of our cache saving which module references which other modules can become invalid when we re-resolve a module; it might just be that the the part referencing another module is gone. Fortunately, the way these references are saved does not depend on the actual declarations. So reparsing alone does not cause problems. This allows us to defer the removal of the module-to-module references to the reference resolver.
Being able to postpone the removal until we resolve references is fortunate because of potential problems with cancellations. We use the module-to-module references to determine which modules need to be re-resolved. If they got removed and the parsing run got cancelled before they got filled again in the reference resolver, we would potentially miss modules we have to re-resolve. Then the user would need to modify the affected modules in order to force Rubberduck to re-resolve them.
To handle this problem, the reference resolver itself has a cache of the modules to resolve, which is only cleared at the very end of its work. This is safe because the reference resolver only ever processes modules for which it can find a parse tree on the RupperduckParserState.
Invalid IndentifierReferences to declarations from previous parsing runs can cause any number of strange behaviors. This can range from selections referring to references that have once been at that line and column but having been removed in the meantime to refactorings changing things they really should not change.
It is rather clear that the references from all modules to be re-resolved should be removed. However, this is not as straightforward as it seems. The problem is that the references live in a collection on the referenced declaration and not in a collection attached to the module whose code is referencing the declaration. In particular, this makes it easy to forget to remove references from built-in declarations. To avoid such issues, we extracted the logic for removing references by a module into implementations of IReferenceRemover, which is hidden behind the IParsingCacheService facade.
Modules And Projects That No Longer Exist
Now we come to the piece where everything falls to pieces if we are not doing our job, modules and projects that get removed from the VBE. The problem is that some functionality like the CodeExplorer has to query information from the components in the VBE via COM Interop. If a component does no longer exist when the information gets queried, the parsing run will die with a COMException and there is little we can do about that. So we have to be careful to remove all declarations for no longer existing components right at the start of the parsing run.
To find out which modules no longer exist, we simply collect all the modules on the declarations we have cached and compare these to the modules we get from the VBE. More precisely, we compare the identifiers we use for modules, the QualifiedModuleNames. This will also find modules that got renamed. Projects are bit more tricky since they are usually treated as equal if their ProjectIds are the same; we save these in the project help file. Thus, we have to take special care for renamed projects. Knowing the removed projects, their modules get added to the removed modules as well.
Removing the data for removed modules and projects is a bit more complicated than for modules that still exist. After their declarations got removed, there is no sign anymore that they ever existed. So, we have to take special care to remove everything in the right order to guarantee that all information is gone already when we erase the declaration; after each step, the parsing run might be cancelled.
The final effect of removing modules is that the modules referencing the removed modules need to be re-resolved. Intuitively one might think that this will always result in a resolver error. However, keep in mind that renaming is handled as removing a module and adding another. Then the references will simply point to the new renamed module. Because of possible cancellations on the way to resolving the references, we immediately set the state of the modules to be re-resolved to ResolvingReferences. This has the effect that they will be reparsed in case of a cancellation.
Note that basically the same procedure is also necessary whenever we reload project references. Accordingly, we do this right after unloading the references, without allowing cancellations in between.
Refreshing the DeclarationFinder
Since declarations and references change in nearly all steps of the parsing process, we have to refresh our primary cached source of declarations, the DeclarationFinder, quite regularly when parsing. Unfortunately, this is a rather computation intensive thing to do; a lot of dictionaries get populated. So, we refresh only if we need to. E.g. we do not refresh after loading and unloading project references in case nothing changed. However, there are two points in each parsing run where we always have to refresh it: before setting the state to ResolvedDeclarations and before evaluating the overall state at the end of the parsing run, which results in the Ready state in the success path.
Refreshing before the change to ResolvedDeclarations is necessary to ensure that removed modules vanish from the DeclarationFinder before the handlers of this state change event run, including the CodeExplorer. We have to refresh again at the end because, from inside the ParseCoordinator, we can never be sure that the reference resolver did not do anything; it has its own cache of modules that need to be resolved.
One optimization done in the DeclarationFinder itself is that some collections are populated lazily, in particular those dealing only with built-in declarations. This saves the time to rebuild the collections multiple times on each parsing run. However, there is a price to pay. The primary users of the DeclarationFinder are the reference resolver and the inspections, both of which are parallelized. Accordingly, it can happen that multiple threads race to populate the collections. This is bad for the performance of the corresponding features. So, we make compromises by immediately populating the most commonly used collections.
When numbering versions, incrementing the “major” digit is reserved for breaking changes – and that’s exactly what Rubberduck 2.0 will introduce.
I have these changes in my own personal fork at the moment, not yet PR’d into the main repository.. but as more and more people fork the main repo I feel a need to go over some of the changes that are about to happen to the code base.
If you’re wondering, it’s becoming clearer now, that Rubberduck 2.0 will not be released until another couple of months – at this rate we’re looking at something like the end of winter 2016… but it’s going to be worth the wait.
Parsing in Rubberduck 1.x was relatively simple:
User clicks on a command that requires a fresh parse tree;
Parser knows which modules have been modified since the last parse, so only the modified modules are processed by the ANTLR parser;
Once we have a parse tree and a set of Declaration objects for everything (modules, procedures, variables, etc.), we resolve the identifier usages we encounter as we walk the parse tree again, to one of these declarations;
Once identifier resolution is completed, the command can run.
The parse results were cached, so that if the Code Explorer processed the entire code base to show up, and then the user wanted to run code inspections or one of the refactor commands, they could be reused as long as none of the modules were modified.
Parsing in Rubberduck 2.0 flips this around and completely centralizes the parser state, which means the commands that require a fresh parse tree can be disabled until a fresh parse tree is available.
We’ve implemented a key hook that tells the parser whenever the user has pressed a key that’s changed the content of the active code pane. When the 2.0 parser receives this message, it cancels the parse task (wherever it’s at) for that module, and starts it over; anytime there’s a “ready” parse tree for all modules, the expensive identifier resolution step begins in the background – and once that step completes, the parser sends a message to whoever is listening, essentially saying “If you ever need to analyze some code, I have everything you need right here”.
Sounds great! So… What does it mean?
It means the Code Explorer and Find Symbol features no longer need to trigger a parse, and no longer need to even wait for identifier resolution to complete before they can do their thing.
It means no feature ever needs to trigger a parse anymore, and Rubberduck will be able to disable the relevant menu commands until parser state is ready to handle what you want to do, like refactor/rename,find all references or go to implementation.
It means despite the VBE not having a status bar, we can (read: will) use a command bar to display the current parser state in real-time (as you type!), and let you click that parser state command button to expand the parser/resolver progress and see exactly what little ducky’s working on in the background.