Caching in PSI Elements  Custom Language Development

Caching in PSI Elements

This post explains how to cache lazily initialized data in your implementation of a PsiElement.

Introduction

Implementing the PSI classes for your custom language is complicated. One of the stumbling blocks is caching in your custom PSI elements.

IntelliJ is heavily multi-threaded. A PSI element can be accessed by multiple threads at the same time. For example, highlighting and inspections could be executed at the same time in different threads. Therefore you need to make sure that your PsiElements are thread-safe.

IntelliJ’s threading model uses a multiple-readers / single-writer model. com.intellij.openapi.application.Application declares runWriteAction(...) and runReadAction(...). These are used to run code with read-only or read/write access to PsiElements, among other things.

If you read PsiElements in your implementation then you’re fine as long as you don’t cache the result. If you cache values then you have to ensure that this is done in a thread-safe way.

Check out the General threading rules in IntelliJ before you continue here.

Reasoning about concurrency is easily wrong, take this with a grain of salt.

Caching data

Don’t cache data unless it’s really necessary!

A file is parsed and stored as a tree of nodes, i.e. a hierarchy of ASTNodes. These nodes are wrapped in PsiElements. This layer of PSI elements adds further functionality on top of the basic ASTNodes. Both trees can be manipulated programmatically.

If your cached data depends on the tree then it must be updated whenever something in the the tree was added, removed or modified. IntelliJ offers a method called subtreeChanged() to let you know when this happens. You have to implement it to stay up-to-date after modifications.

In most cases your solution is like this:

  • update your cache values in a thread-safe way, e.g. by using double-checked locking
  • override subtreeChanged() in your PsiElement and drop your cached values in a thread-safe way

You’ll be in trouble, if:

  • you call code which uses further locks. This may result in a dead-lock because you can’t control the order in which the locks are acquired.
    This could happen, for example, if you access a stub index while calculating the new value. As far as I can tell, you shouldn’t access a stub index in your PSI element. I don’t know of any documentation about this, though, and I happened to do this in BashSupport which resulted in a dead-lock.
  • you access PsiElements which are not sub-elements of your current element. IntelliJ calls subtreeChanged() only for modifications of sub-elements and not for sibling or parent elements.

There are two declarations of subtreeChanged():

  1. In ASTDelegatePsiElement. This is the base class of all PsiElements which wrap ASTNodes.
  2. In PsiFile. This class contains an additional method and will be covered after the general rules for subtreeChanged() are explained.

Caching in a PSI element

Override subtreeChanged() in your custom PsiElement and drop your cached data there. Always call the superclass’s implementation in addition to your own code.

You must only use the PSI element’s children elements. subtreeChanged() is only called for updates to a PsiElement’s children. If you need to access other elements you have to refactor your code and move the cache to the parent element which contains all accessed elements. Of course, you could also add it to your PsiFile.

IntelliJ implementation details

Changes to ASTNodes trigger the subtreeChanged() method of PsiElements, see CompositeElement.java for details. The method is always guarded by the central lock com.intellij.psi.PsiLock.LOCK, i.e. it will be only called once at a time. You shouldn’t use PsiLock in your own classes, though. Therefore you have to add thread-safey on your own.

As far as I can tell subtreeChanged() is always wrapped in a write action.

Sample code

This sample code shows a simplified implementation of a PSI element. The internal state consists of referencedName and nameTextRange.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// we assume that this class extends ASTDelegatePsiElement
public class MyPsiVarImpl {
    // lock to guard the internal state
    private final Object stateLock = new Object();

    //fields which represent the internal state
    private volatile String referencedName;
    private volatile TextRange nameTextRange;

    @Override
    public void subtreeChanged() {
        super.subtreeChanged();

        //drop the cached values
        synchronized (stateLock) {
            this.referencedName = null;
            this.nameTextRange = null;
        }
    }

    /** Returns the variable name without prefix markers like $ */
    public String getReferencedName() {
        if (referencedName == null) {
            synchronized(stateLock) {
                if (referencedName == null) {
                    referencedName = getNameTextRange().substring(getText());
                }
            }
        }

        return referencedName;
    }

    /** Returns the text range inside of this PsiElement which represents the referenced name */
    protected TextRange getNameTextRange() {
        if (nameTextRange == null) {
            synchronized (stateLock) {
                if (nameTextRange == null) {
                    nameTextRange = TextRange.create(getPrefixLength(), getTextLength());
                }
            }
        }

        return nameTextRange;
    }
}

You may have noticed that the first method calls the other one to compute its result. A slighly improved version updates both values at the same time:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// we assume that this class extends ASTDelegatePsiElement
public class MyPsiVarImpl {
    // lock to guard the internal state
    private final Object stateLock = new Object();

    //fields which represent the internal state
    private volatile String referencedName;
    private volatile TextRange nameTextRange;

    @Override
    public void subtreeChanged() {
        super.subtreeChanged();

        //drop the cached values
        synchronized (stateLock) {
            this.referencedName = null;
            this.nameTextRange = null;
        }
    }

    private void updateNameCache() {
        if (referencedName == null) {
            synchronized (stateLock) {
                if (referencedName == null) {
                    nameTextRange = TextRange.create(getPrefixLength(), getTextLength());
                    referencedName = nameTextRange.substring(getText());
                }
            }
        }
    }

    /** Returns the variable name without prefix markers like $ */
    public String getReferencedName() {
        updateNameCache();

        return referencedName;
    }

    /** Returns the text range inside of this PsiElement which represents the referenced name */
    protected TextRange getNameTextRange() {
        updateNameCache();

        return nameTextRange;
    }
}

Caching in a PsiFile

PsiFile.java adds the additional method clearCaches(). Override this method instead of subtreeChanged(). There’s no need to override subtreeChanged() in your sub-class of PsiFile because PsiFileImpl already calls clearCaches() in subtreeChannged().

Caching references

References which remember the result of a resolve call are implemented differently. Inherit from com.intellij.psi.impl.source.resolve.reference.impl.CachingReference to do this. CachingReference drops its cached value when the PSI changes.

Why double-checked locking can be used in PSI elements

Double-checked locking is used to initialize a value at first access. The value is then kept and not modified again.

A PSI element has an additional method which resets the value. If a reset may occur at any time then the usual double-checked locking is not working as expected:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// double-checked locking with a reset() method which may called at any time
@BadCode
public class LazyInit {
    private final Object stateLock = new Object();
    private volatile String data;

    public void reset() {
        synchronized (stateLock) {
            data = null;
        }
    }

    public String getData() {
        if (data == null) {
            synchronized (stateLock) {
                if (data == null) {
                    data = calculateData();
                }
            }
        }

        // reset() may be called at this point in another thread
        // and reset data again before we return it

        return data;
    }
}

Double-checked locking may still be used in your PsiElements, because IntelliJ’s threading model guarantees that reading data, i.e. calls to getData(), doesn’t happen at the same time as writing data, i.e. reset().

subtreeChanged() is called in a write action, i.e. at a time when no read action is active. The methods which use the cached data are all wrapped in a read action and will not be run with an active write action.

General threading rules in IntelliJ Must-read (jetbrains.org)
The SDK DevGuide here explains the general threading rules. It doesn’t cover PSI, though.
ASTDelegatePsiElement.java (github.com)
The base class of most PSI elements, it declares the subtreeChanged() method. There’s currently no documentation attached to the method, though.
PsiFile.java
The base class for PsiFiles, it declares the subtreeChanged() and clearCaches() methods.
PsiFileImpl.java
Base class for your PsiFile implementation. It already implements subtreeChanged() to call the more general method clearCaches().
Post in the support forum
This is a post by JetBrains’s Peter Gromov about this topic.