Frequently Asked Questions
- Q: What does EMF-IncQuery call the 'qualified name' of a graph pattern?
- Q: What is the complexity of pattern matching?
- Q: How and when are the match set caches initialized?
- Q: Does IncQuery need to load all the models into memory, or only the necessary ones like EMF Model Query 2?
- Q: What is included in the query results if the matcher is attached to an EMF model root that is not the entire ResourceSet, just some Resource or containment subtree?
- Q: What is included in the query results if the scope of the matcher is the entire ResourceSet?
- Q: What else is included in the query results? Why do I see the contents of some external EPackages, such as Ecore?
- Q: How does one use attributes?
- Q: How are null-valued features represented in the query language?
- Q: How does the generic pattern matcher API work (as opposed to the generated ones)?
- Q: What are MatcherFactory classes for?
- Q: What is the delta monitor, and how to use it?
- Q: Can patterns be defined recursively?
Q: What does EMF-IncQuery call the 'qualified name' of a graph pattern?
It is actually the fully qualified name, which is the qualified name of the containing package, a separator dot, and the simple name in the pattern (the one you define it with after the pattern keyword). The qualified name of the package has dot-separated segments, like in Java.
Q: What is the complexity of pattern matching?
In our typical use cases, as patterns are relatively small, and their constraints are pretty restrictive, therefore the size of the match set does not grow combinatorically, as predicted by the theoretical worst case scenario.
The total memory usage (of the EMF application plus IncQuery) will be the size of the model + the size of the match set, and actually slightly more than that because some intermediate results are also cached and incrementally maintained. As of now, we do not carry out any deep query optimization, so in some corner cases these "intermediate results" can grow much bigger than the match set - although usually they don't. A good advice here is to look for small, connected parts of patterns that occur more than once, and refactor them into a separate helper pattern that will be called from all the original occurrences using the 'find' keyword. We have summarized best practices on how to avoid problems on the performance page.
Query evaluation time is pretty much instantaneous. Actually at low level, the result set has has to be copied, but then usually you will want to iterate over it anyways, so this won't be the dominant cost. Note that getOneMatch() can be much faster than getAllMatches(). If you have some bound input parameters (as opposed to retrieving all matches globally), then the restricted result set is even smaller, and is accessed with a cheap hash-based lookup.
Update time (i.e. model manipulation overhead) is related to how many new matches appear or old ones disappear due to the modification, and also the amount of change in the internal caches and indices. Most typically, a single change in the model only makes a very limited amount of change in the match set and the internal caches, and is therefore cheap.
Initialization time is composed of reading the model and then filling up the caches and indices accordingly. The latter one is basically the update overhead on model element creation times the size of the model, as the mechanism is almost the same as the one that will maintain the cache afterwards upon model updates. As for model reading, the current version (see below) traverses the entire model once, when the first matcher is constructed for the given pattern and EMF model. Depending on pattern contents, a re-traversal may be required for newly registered patterns -- this behavior will change in the near future for a more flexible approach, whereby the developer will be able to identify batches of patterns that should be initialized together (see also the next question).
For further information, see our performance page.
Q: How and when are the match set caches initialized?
You can attach a pattern matcher engine on an EMF model root (preferably EMF ResourceSet or Resource, but potentially any containment subtree) at any time (that is, even before any contents have been loaded). In the current version, at most one pattern matcher engine is built for each of these EMF "roots" (this is true per IncQueryEngine; you can create separate "unmanaged" engines that will not share pattern matcher caches with the default "managed" engine). It is constructed when you first initialize a pattern matcher on that root, and the next time you instantiate a pattern matcher (for the same or a different pattern) on the same root, it will reuse the underlying engine. However, if the second pattern uses model features (e.g. element types) that were irrelevant for the first pattern, then the engine must re-traverse the model to gather additional information.
With optimum performance in mind, you need to consider the followings:
- If you initialize several patterns using various model elements on a (large) model that has already been loaded, there could be many repeated "model read" traversals over the entire ResourceSet (depending on the contents of the patterns). In exchange, memory is allocated by IncQuery only gradually, as you initialize the matchers step-by-step.
- In order to avoid the model (re)traversal for initialization, another option is to initialize all IncQuery matchers your application will use on a Resource(Set) before its contents have been loaded. This way, no additional initialization overhead is experienced due to pattern matcher initialization operations, however all memory is allocated at once.
- Alternatively in wildcard mode, the IncQuery engine will cache all EObject and reference types during the first model traversal, which means that independently of the contents your patterns, no further retraversal will be necessary. This mode is on by default for development time, with an option to opt-out in case you need to work with large models. If you want to use wildcard mode during runtime, refer to the API Javadoc.
- IncQuery can now handle groups (batches) of patterns together: a new API feature allows to initliaze a (freely defined) group of patters together in one go, without needlessly traversing the model several times. The development environment will treat patterns residing in the same .eiq file as a group. During runtime, you can compose a PatternGroup however you will, with built-in support for the group of registered (generated) patterns declared within a single package. The code generator also generates a group from each .eiq source file.
Q: Does IncQuery need to load all the models into memory, or only the necessary ones like EMF Model Query 2?
IncQuery was primarily designed to define and execute queries against models that are already in the memory. If you initialize a "matcher" on a ResourceSet that has already been filled with the contents of the model, as addressed in the previous question, IncQuery will perform an exhaustive model traversal, and while doing so, it will trigger the loading of any referenced external resources. If the model is changed at a later time to refer to additional external resources, they will be loaded into the ResourceSet as well.
In the development environment, the Query Explorer initializes the pattern matchers for the entire ResourceSet (used by the host editor from which the model is being used). In order to better support working with fragmented models, we support a feature whereby the developer has ability to restrict the matcher initialization to only the main Resource of host editor by selecting an alternative action for the green button of the Query Explorer. (Note that API even support setting the matcher scope to the containment subtree below any EObject.)
In summary, IncQuery does not currently concern itself with querying "workspace" models (i.e. models that are not loaded into memory inside some editor, for instance). In this sense, it is complementary to Model Query 2. However, Model Query 2 could be extended to use IncQuery-based model indexers, to speed things up considerably. This is a feature that we plan to implement sometime in the future.
Q: What is included in the query results if the matcher is attached to an EMF model root that is not the entire ResourceSet, just some Resource or containment subtree?
Every EObject in the containment subtree below the selected EMF model root will be considered for pattern matching, as well as their attributes and the EReferences interconnecting between them. EReferences pointing outward from the subtree, as well as the elements they are directly pointing to, currently may or may not be considered (depending on complicated things), so do not assume either case. Nothing else will be considered.
Q: What is included in the query results if the scope of the matcher is the entire ResourceSet?
You do not have to worry about any of the above if the scope is given as the entire ResourceSet - in this case, all EObjects and their attributes and references will be considered, regardless in which Resource of the Set they reside in. If the initially loaded Resources contain references to external Resources, they too will be resolved and loaded within the ResourceSet.
However, there are a few other exceptions discussed in the next point.
Q: What else is included in the query results? Why do I see the contents of some external EPackages, such as Ecore?
If the EMF model root is a ResourceSet, the scope of query evaluation will also include external resources that are refererred from the ResourceSet but not attached to any ResourceSet. (This does not happen normally, as ResourceSets are typically closed w.r.t. references.) A frequent occurrence of this phenomenon is nsURI-based references to metamodel elements in registered EPackages - e.g. from .ecore models. This is why you might see referenced EPackages appearing in the results, when you run queries against an .ecore model. In fact, you might see some duplicate EPackages as well, if there are nsURI-based and "platform:"-prefixed references alongside each other - which is the correct result, as these will be separate objects.
Q: How does one use attributes?
Use the EAttributes as path expressions in the pattern definition to navigate from the EObject to its attribute value. Let's say the constraint House.owner.name(ThisHouse, OwnerName); binds the variable OwnerName is now bound to an attribute value (of type java.lang.String, more precisely EString). Afterwards, the raw value can be directly used in a check() condition, or in any other pattern constraint, or even as a parameter variable. The equality of two attribute values can be asserted by a '==' constraint between the two variables, such as MyAge == YourAge; or even by using the same variable in both path expressions. For inequality, the operator '!=' is provided; for more complex attribute checks, use a check() expression.
Q: How are null-valued features represented in the query language?
Unset or null-valued attributes (or references) simply won't match, as there is no referenced EObject or attribute value to substitute in the target pattern variable. If you are especially looking for these, use a negative application condition (neg find hasXYZ(...)).
Q: How does the generic pattern matcher API work (as opposed to the generated ones)?
There is a "generic" matcher with a corresponding generic match class; they are not as easy to use as the pattern-specific generated classes, but they conform to the same reflective interfaces. You can get an instance of merely by providing the Pattern object directly. See GenericPatternMatcher in the API Javadoc.
Q: What are MatcherFactory classes for?
MatcherFactory classes can create the associated (generated / generic) Matchers with some type-safe Java generics magic. This might be useful in a trigger engine or other very generic system, as you can collect a large number of matcher factories that contain all knowledge about the pattern, and parametrize them later to specify the actual EMF model to match against. Using Java generics, you can have a method that handles match objects in a type-safe way for each matcher created from the collection of factories.
Instead of calling the constructor of a generated Matcher, you can also use its static factory() instead to attach a matcher to an EMF model. There is also a GenericMatcherFactory for the generic matcher. You can use the MatcherFactoryRegistry to obtain a matcher factory for a Pattern.
Q: What is the delta monitor, and how to use it?
It is a device that can be attached to the pattern matcher. From that time on, it will keep track of newly appeared matches, as well as previously existing but disappeared ones. So by default it monitors the difference between the current state and the time it was created. You can remove these matches from the delta monitor yourself to 'acknowledge' them; for that particular match, the difference (appearance / disappearance) from that time on will be displayed instead.
See the source code of the EMF-IncQuery validation framework for example usage.
Q: Can patterns be defined recursively?
Theory: the language does not forbid the usage of recursive patterns, however great care should be taken when using this feature. If the recursion is not well-founded, i.e. matches can circularly support each other, then the result may be different from what you expect (technically, the matcher does not observe minimal fixpoint semantics, the fixpoint it stabilizes upon may be non-minimal).
Practice: most of the time, people want to write recursive patterns to evaluate some kind of transitive closure. If you can, just use the built-in transitive closure operator (find myPattern+(A,B)), and then you have nothing to worry about. If your use case is too complex, you can experiment with recursive patterns on your own risk; if the model graph itself is DAG (acyclic) w.r.t to the edges that your pattern traverses, you should be fine.