Scalameta

Scalameta

  • Trees
  • SemanticDB
  • Browse sources
  • GitHub

›SemanticDB

Trees

  • Guide
  • Quasiquotes
  • Examples
  • ScalaFiddle Playground
  • AST Explorer

SemanticDB

  • Guide
  • Specification

Community

  • Built with Scalameta
  • Presentations

SemanticDB Guide

SemanticDB is a data model for semantic information such as symbols and types about programs in Scala and other languages. SemanticDB decouples production and consumption of semantic information, establishing documented means for communication between tools.

In this document, we introduce practical aspects of working with SemanticDB. We describe the tools that can be used to produce SemanticDB payloads, the tools can be used to consume SemanticDB payloads and useful tips & tricks for working with SemanticDB. If you're looking for a comprehensive reference of SemanticDB features, check out the specification.

Installation

This guide covers several non-standard command-line tools: metac, metacp, metap. To install these tools on your computer, you can do the following:

  1. Install the coursier command-line tool by following the instructions here. Make sure you are using the latest coursier version (1.1.0-M6 or newer).
  2. Add the following aliases to your shell:

Maven Central

alias metac="coursier launch org.scalameta:metac_2.12.6:4.0.0 -- -cp $(coursier fetch -p org.scala-lang:scala-library:2.12.6)"
alias metacp='coursier launch org.scalameta:metacp_2.12:4.0.0 -- --dependency-classpath $(echo $JAVA_HOME/jre/lib/rt.jar):$(coursier fetch org.scala-lang:scala-library:2.12.6 -p)'
alias metap="coursier launch org.scalameta:metap_2.11:4.0.0 --"

NOTE. These installation instructions are for the current unstable master branch, it's recommended to view this document at the latest git tag instead of master.

(Optional) Instead of running metap on the JVM, you can build a native binary on macOS or Linux. Thanks to Scala Native, native Metap works much faster than regular Metap (on a personal laptop of one of the authors of this guide, a simple Metap invocation takes 500+ ms on JVM and 10 ms on native).

  1. Install the coursier command-line version 1.1.0-M6 or later.
  2. Setup the development environment for Scala Native.
  3. Link a native metap binary.
coursier bootstrap org.scalameta:metap_native0.3_2.11:4.0.0 -o metap -f --native --main scala.meta.cli.Metap

Example

Let's generate SemanticDB for a simple Scala program. (At the moment, our SemanticDB producers provide full Scala support and partial Java support. Theoretically, the SemanticDB protobuf schema can accommodate other languages as well, but we haven't attempted to do that yet).

object Test {
  def main(args: Array[String]): Unit = {
    println("hello world")
  }
}

In order to obtain a SemanticDB corresponding to this program, let's use the Metac command-line tool. For more information on other tools that can produce SemanticDB, see below.

$ metac Test.scala

metac is a thin wrapper over the Scala compiler. It supports the same command-line arguments as scalac supports, but instead of generating .class files it generates .semanticdb files. Newer versions of Metac may also generate an accompanying .semanticidx file, but it's an experimental feature, so we won't won't be discussing it in this document.

$ tree
.
├── META-INF
│   └── semanticdb
│       └── Test.scala.semanticdb
└── Test.scala

If we take a look inside Test.scala.semanticdb, we'll see a weird mix of legible-looking text and special characters. That's because .semanticdb files store protobuf payloads.

$ xxd META-INF/semanticdb/Test.scala.semanticdb
00000000: 0aaa 0408 0412 0a54 6573 742e 7363 616c  .......Test.scal
00000010: 612a 580a 1a5f 656d 7074 795f 2f54 6573  a*X.._empty_/Tes
00000020: 742e 6d61 696e 2829 2e28 6172 6773 2918  t.main().(args).
00000030: 082a 0461 7267 7380 0101 8a01 2e22 2c0a  .*.args......",.
00000040: 2a12 2812 0c73 6361 6c61 2f41 7272 6179  *.(..scala/Array
00000050: 231a 1812 1612 1473 6361 6c61 2f50 7265  #......scala/Pre
00000060: 6465 662e 5374 7269 6e67 232a 530a 0d5f  def.String#*S.._
00000070: 656d 7074 795f 2f54 6573 742e 180a 2008  empty_/Test... .
00000080: 2a04 5465 7374 8001 018a 012f 0a2d 0a00  *.Test...../.-..
00000090: 1211 120f 120d 7363 616c 612f 416e 7952  ......scala/AnyR
000000a0: 6566 2322 160a 145f 656d 7074 795f 2f54  ef#"..._empty_/T
000000b0: 6573 742e 6d61 696e 2829 2e92 0102 3a00  est.main()....:.
000000c0: 2a5c 0a14 5f65 6d70 7479 5f2f 5465 7374  *\.._empty_/Test
...

In order to make sense of .semanticdb files, we can use the Metap command-line tool. For more information on other tools that can consume SemanticDB, see below.

$ metap .
Test.scala
----------

Summary:
Schema => SemanticDB v4
Uri => Test.scala
Text => empty
Language => Scala
Symbols => 3 entries
Occurrences => 7 entries

Symbols:
_empty_/Test. => final object Test extends AnyRef { +1 decls }
_empty_/Test.main(). => method main(args: Array[String]): Unit
_empty_/Test.main().(args) => param args: Array[String]

Occurrences:
[0:7..0:11) <= _empty_/Test.
[1:6..1:10) <= _empty_/Test.main().
[1:11..1:15) <= _empty_/Test.main().(args)
[1:17..1:22) => scala/Array#
[1:23..1:29) => scala/Predef.String#
[1:33..1:37) => scala/Unit#
[2:4..2:11) => scala/Predef.println(+1).

Metap prettyprints various parts of the SemanticDB payload in correspondence with the SemanticDB specification. Here are the most important parts:

  • Uri stores the URI of the source file relative to the directory where the SemanticDB producer was invoked.

  • Symbols contains information about definitions in the source file, including modifiers, signatures, etc.

    For example, _empty_/Test.main(). => method main: (args: Array[String]): Unit says that main is a method with one parameter of type Array[String].

  • Occurrences contains a list of identifiers from the source file with their line/column-based positions and unique identifiers pointing to corresponding definitions resolved by the compiler.

    For example, [2:4..2:11): println => scala/Predef.println(+1). says that the identifier println on line 3 (zero-based numbering scheme!) refers to the second overload of println from scala/Predef.

What is SemanticDB good for?

SemanticDB decouples producers and consumers of semantic information about programs and establishes a rigorous specification of the interchange format.

Thanks to that, SemanticDB-based tools like Scalafix, Metadoc and Metals don't need to know about compiler internals and can work with any compiler that supports SemanticDB. This demonstrably improves developer experience, portability and scalability. Next-generation semantic tools at Twitter are based on SemanticDB.

For more information about the SemanticDB vision, check out our talks:

  • Semantic Tooling at Twitter (June 2017) by Eugene Burmako & Stu Hood.
  • SemanticDB for Scala developer tools (April 2018) by Ólafur Páll Geirsson.
  • How We Built Tools That Scale to Millions of Lines of Code (June 2018) by Eugene Burmako.

Producing SemanticDB

Scalac compiler plugin

The semanticdb-scalac compiler plugin injects itself immediately after the typer phase of the Scala compiler and then harvests and dumps semantic information from Scalac in SemanticDB format.

scalac -Xplugin:path/to.jar -Yrangepos [<pluginOption> ...] [<scalacOption> ...] [<sourceFile> ...]

The compiler plugin supports the following options that can be passed through Scalac in the form of -P:semanticdb:<option>:<value>

Option Value Explanation Default
-P:semanticdb:failures:<value> error,
warning,
info,
ignore
The level at which the Scala compiler should report crashes that may happen during SemanticDB generation. warning
-P:semanticdb:profiling:<value> on,
off
Controls basic profiling functionality that computes the overhead of SemanticDB generation relative to regular compilation time (on for dumping profiling information to console, off for disabling profiling). off
-P:semanticdb:include:<value> Java regex Which source files to include in SemanticDB generation? .*
-P:semanticdb:exclude:<value> Java regex Which source files to exclude from SemanticDB generation? ^$
-P:semanticdb:sourceroot:<value> Absolute or relative path Used to relativize source file paths into TextDocument.uri. Current working directory
-P:semanticdb:targetroot:<value> Absolute or relative path The output directory to produce META-INF/semanticdb/**/*.semanticdb files. The compiler output directory, matches the sbt setting key classDirectory and scalac command-line option -d.
-P:semanticdb:text:<value> on,
off
Specifies whether to save source code in TextDocument.text (on for yes, off for no). off
-P:semanticdb:md5:<value> on,
off
Specifies whether to save a hexadecimal formatted MD5 fingerprint of the source file contents in TextDocument.md5 (on for yes, off for no). on
-P:semanticdb:symbols:<value> all,
local-only,
none
Specifies which symbol informations to save in TextDocument.symbols (all for both local and global symbols, local-only for only local symbols and none for no symbols). all
-P:semanticdb:diagnostics:<value> on,
off
Specifies whether to save compiler messages in TextDocument.diagnostics (on for yes, off for no). on
-P:semanticdb:synthetics:<value> on,
off
Specifies whether to save some of the compiler-synthesized code in the currently unspecified TextDocument.synthetics section (on for yes, off for no). on

semanticdb-scalac can be hooked into Scala builds in a number of ways. Read below for more information on command-line tools as well as integration into Scala build tools.

Metac

Metac is a command-line tool that serves as a drop-in replacement for scalac and produces *.semanticdb files instead of *.class files. It supports the same command-line arguments as scalac, including the compiler plugin options described above.

metac [<pluginOption> ...] [<scalacOption> ...] [<sourceFile> ...]

With metac, it is not necessary to provide the flags -Xplugin:/path/to.jar and -Yrangepos, which makes it ideal for quick experiments with SemanticDB. For an example of using Metac, check out Example.

sbt

In order to enable semanticdb-scalac for your sbt project, add the following to your build. Note that the compiler plugin requires the -Yrangepos compiler option to be enabled.

addCompilerPlugin("org.scalameta" % "semanticdb-scalac" % "4.0.0" cross CrossVersion.full)
scalacOptions += "-Yrangepos"

Javac compiler plugin

The semanticdb-javac compiler plugin collects and dumps SemanticDB information after the analyze phase of the Java compiler. It currently only produces symbol information, not occurrences.

To use it, follow these instructions:

  1. Install the plugin by adding a dependency to org.scalameta:semanticdb-javac in your build, or by running:

    $ coursier fetch --intransitive org.scalameta:semanticdb-javac_2.12:4.0.0

  2. Add the plugin to your build's compile classpath. If invoking javac directly add it as one of the listed -cp entries. Otherwise, adding the plugin as a library dependency in your build tool should be enough.

  3. Add the following javac option:

    "-Xplugin:semanticdb <target-dir> --sourceroot <source-root>"

    Note: Giving quotes around this option is necessary on the command line, as that is how javac is able to tell what arguments belong to the plugin. If you are constructing the javac command programmatically as a sequence of string arguments, then this should be a single string without quotes.

    Replace <target-dir> with whatever directory you want the generated SemanticDB to live in, and replace <source-root> with the root you want source file URIs to be relative to. If <source-root> is omitted, it defaults to the current working directory.

For example, a full javac invocation using the plugin would look like:

javac "-Xplugin:semanticdb java-project/target/semanticdb --sourceroot java-project/" \
  -cp <classpath>:<path-to-semanticdb-javac.jar> \
  -d java-project/target/classes \
  java-project/src/main/File1.java java-project/src/main/File2.java

Metacp

Metacp is a command-line tool that takes a classpath, generates SemanticDB files for all classfiles and returns a new classpath that contains the SemanticDB files. Advanced command-line options control caching, parallelization and interaction with some quirks of the Scala standard library.

metacp [options] <classpath>
Option Value Explanation Default
<classpath> Java classpath Specifies classpath to be converted to SemanticDB.
--dependency-classpath <value> Java classpath The classpath for library dependencies to compute external library references. For example, should include the JDK and scala-library if those are not part of <classpath>. The difference between <classpath> and --dependency-classpath is that entries in --dependency-classpath will not be processed for --out. Empty.
--out <value> Absolute or relative path Says where Metacp should output conversion results. See below
--exclude-scala-library-synthetics,
--include-scala-library-synthetics
Specifies whether the output classpath should include a jar that contains SemanticDB files corresponding to definitions missing from scala-library.jar, e.g. scala.Any, scala.AnyRef and others. --exclude-scala-library-synthetics
--par,
--no-par
Toggles parallel processing. If enabled, classpath entries will be converted in parallel.

NOTE: Some of our users have reported deadlocks supposedly caused by enabling --par. Proceed at your own risk.
--no-par
--verbose Toggles periodic progress printouts that help gauge Metacp's progress for long-running invocations.
--usejavacp Attempts to autodetect the locations of JDK libraries and the Scala library based on the Metacp's classloader, so that these libraries don't have to be specified in --dependency-classpath.
--stub-broken-signatures Catches exceptions that arise during generation of `Signature` payloads and stubs these payloads with `NoSignature` values instead of failing Metacp invocations. May be useful for dealing with missing symbol errors arising from missing optional dependencies.
--log-broken-signatures Logs exceptions that arise during generation of `Signature` payloads. May be useful in combination with --stub-broken-signatures to have a sense of what exactly is being stubbed.

Metacp understands classfiles produced by both the Scala and Java compiler. Since Metacp is a standalone application independent from the Scala compiler, the compiler plugin is not required when using Metacp.

Because Metacp only works with classfiles and not sources, SemanticDB files that it produces only contain the Symbols section. Neither Occurrences nor Diagnostics sections are present, because they both require source information. For more information about the SemanticDB format, check out the specification.

As an example of using Metacp, let's compile Test.scala from Example using Scalac and then convert the resulting classfiles to SemanticDB. Note that newer versions of Metac may also generate an accompanying .semanticidx file, but it's an experimental feature, so we won't be discussing it in this document.

$ scalac Test.scala
<success>

$ tree
.
├── Test$.class
├── Test.class
└── Test.scala

$ metacp .
{
  "status": {
    "/Users/ollie/dev/scalameta/target/.": "/Users/ollie/dev/scalameta/target/out/target"
  },
  "scalaLibrarySynthetics": ""
}
$ tree out
out
└── target
    └── META-INF
        └── semanticdb
            └── Test.class.semanticdb

$ metap out/target
Test.class
----------

Summary:
Schema => SemanticDB v4
Uri => Test.class
Text => empty
Language => Scala
Symbols => 3 entries

Symbols:
_empty_/Test. => final object Test extends AnyRef { +1 decls }
_empty_/Test.main(). => method main(args: Array[String]): Unit
_empty_/Test.main().(args) => param args: Array[String]

Consuming SemanticDB

Scala bindings

The semanticdb library contains ScalaPB bindings to the SemanticDB protobuf schema. Using this library, one can model SemanticDB entities as Scala case classes and serialize/deserialize them into bytes and streams.

libraryDependencies += "org.scalameta" %% "semanticdb" % "4.0.0"

semanticdb is available for all supported Scala platforms - JVM, Scala.js and Scala Native. For more information, check out autogenerated documentation for Scala 2.11 and Scala 2.12.

Caveats:

  • At the moment, there are no compatibility guarantees for Scala bindings to the SemanticDB schema. The current package of the schema (scala.meta.internal.semanticdb) is considered internal, so we do not provide any guarantees about compatibility across different versions of the semanticdb library. We are planning to improve the situation in the future.
  • At the moment, SemanticDB-based tools are responsible for implementing discovery of SemanticDB payloads on their own. For example, the non-trivial logic that Metap uses to traverse its inputs and detect SemanticDB files must be reproduced by Scalafix, Metadoc and others. We are planning to improve the situation in the future.

Metap

Metap is a command-line tool that takes a list of paths and then prettyprints all .semanticdb files that it finds in these paths. Advanced options control prettyprinting format.

metap [options] <classpath>
Option Value Explanation Default
<classpath> Pseudo classpath Supported classpath entries:
  • .semanticdb files (prettyprinted directly)
  • Directories (traversed recursively, all found .semanticdb files prettyprinted)
  • .jar files (traversed recursively, all found .semanticdb files uncompressed and prettyprinted)
--compact,
--detailed,
--proto
Specifies prettyprinting format, which can be either --compact (prints the most important parts of the payload in a condensed fashion), --detailed (more detailed than --compact, but still pretty condensed), or --proto (prints the same output as protoc would print, see below). --compact

For an example of using Metap, check out Example.

Protoc

The Protocol Compiler tool (protoc) can inspect protobuf payloads in --decode (takes a schema) and --decode_raw (doesn't need a schema) modes. For the reference, here's the SemanticDB protobuf schema.

$ tree
.
├── META-INF
│   └── semanticdb
│       └── Test.scala.semanticdb
└── Test.scala

$ protoc --proto_path <directory with the .proto file>\
--decode scala.meta.internal.semanticdb.TextDocuments\
semanticdb.proto < META-INF/semanticdb/Test.scala.semanticdb

documents {
  schema: SEMANTICDB4
  uri: "Test.scala"
  symbols {
    symbol: "_empty_/Test.main().(args)"
    kind: PARAMETER
    name: "args"
    access {
      publicAccess {
      }
    }
    language: SCALA
    signature {
      valueSignature {
        tpe {
          typeRef {
            symbol: "scala/Array#"
            type_arguments {
              typeRef {
                symbol: "scala/Predef.String#"
              }
            }
          }
        }
      }
    }
  }
  symbols {
    symbol: "_empty_/Test."
    kind: OBJECT
    properties: 8
...

protoc was useful for getting things done in the early days of SemanticDB, but nowadays it's a bit too low-level. It is recommended to use metap instead of protoc.

SemanticDB-based tools

Scalafix

Scalafix is a rewrite and linting tool for Scala developed at the Scala Center with the goal to help automate migration between different Scala compiler and library versions.

Scalafix provides syntactic and semantic APIs that tool developers can use to write custom rewrite and linting rules. Syntactic information is obtained from the Scalameta parser, and semantic information is loaded from SemanticDB files produced by the Scalac compiler plugin and Metacp.

Thanks to SemanticDB, Scalafix is:

  • Accessible: Scalafix enables novices to implement advanced rules without learning compiler internals.
  • Portable: Scalafix is not tied to compiler internals, which means that it can seamlessly work with any compiler / compiler version that supports the SemanticDB compiler plugin.
  • Scalable: Scalafix does not need a running Scala compiler, so it can perform rewrites and lints in parallel. (Unlike compiler plugin-based linters that are limited by the single-threaded architecture of Scalac).

Metadoc

Metadoc is an experiment with SemanticDB to build online code browser with IDE-like features. Check out the demo for more information.

Metadoc takes Scala sources and corresponding SemanticDB files generated by the Scalac compiler plugin. It then generates a static site that is possible to serve via GitHub pages, supporting jump to definition, find usages and search by symbol.

Thanks to SemanticDB, Metadoc is:

  • Cross-platform: Scala bindings to SemanticDB are cross-compiled to JVM and Scala.js, which means that the site generator and the online code browser can reuse the same logic to work with SemanticDB payloads.
  • Portable: Just like Scalafix, Metadoc is not tied to compiler internals, which means that it can seamlessly work with any compiler / compiler version that supports the SemanticDB compiler plugin.

Metals

Metals is an experiment to implement a language server for Scala using Scalameta projects such as Scalafmt, Scalafix and SemanticDB. Check out the presentation for more information.

Metals uses SemanticDB to:

  • Index project dependencies for intelligent jump to definition and quick lookups of project symbols.
  • Communicate with the Scala compiler regarding semantic information about the opened project.
  • Feed semantic information into Scalafix-based refactorings.

Thanks to SemanticDB, Metals is:

  • Mostly portable: Unlike Scalafix and Metadoc, Metals has modules that interface directly with compiler internals, but the majority its functionality is based on SemanticDB, so it can work with any compiler / compiler version that supports the SemanticDB compiler plugin.
  • Surprisingly fast: A well-defined schema for semantic information that can come from multiple locations (dependency classpath, uncompiled files, compiled files, etc) allows for a robust implementation of indexing, which reliably speeds up operations like jump to definition and find usages. We have even experimented with a relational index for SemanticDB data, which further improves performance characteristics.
  • Resilient: Reification of semantic information makes it possible to consult results of previous typechecks and accommodate certain edits by simply shifting offsets in old SemanticDB snapshots. This technique is surprisingly effective for supporting minor edits that result in temporarily invalid code.
← PreviousNext →
  • Installation
  • Example
  • What is SemanticDB good for?
  • Producing SemanticDB
    • Scalac compiler plugin
    • Metac
    • sbt
    • Javac compiler plugin
    • Metacp
  • Consuming SemanticDB
    • Scala bindings
    • Metap
    • Protoc
  • SemanticDB-based tools
    • Scalafix
    • Metadoc
    • Metals
Scalameta
Docs
Trees GuideQuasiquotesSemanticDB
Community
Chat on Gitter
More
GitHub
Copyright © 2018 Scalameta