SemanticDB Guide
SemanticDB is a data model for semantic information such as symbols and types about programs in Scala and other languages. SemanticDB decouples production and consumption of semantic information, establishing documented means for communication between tools.
In this document, we introduce practical aspects of working with SemanticDB. We describe the tools that can be used to produce SemanticDB payloads, the tools can be used to consume SemanticDB payloads and useful tips & tricks for working with SemanticDB. If you're looking for a comprehensive reference of SemanticDB features, check out the specification.
Installation
This guide covers several non-standard command-line tools: metac
, metacp
,
metap
. To install these tools on your computer, you can do the following:
- Install the
coursier
command-line tool by following the instructions here. Make sure you are using the latest coursier version (1.1.0-M6 or newer). - Add the following aliases to your shell:
alias metac="coursier launch org.scalameta:metac_2.12.6:4.0.0 -- -cp $(coursier fetch -p org.scala-lang:scala-library:2.12.6)"
alias metacp='coursier launch org.scalameta:metacp_2.12:4.0.0 -- --dependency-classpath $(echo $JAVA_HOME/jre/lib/rt.jar):$(coursier fetch org.scala-lang:scala-library:2.12.6 -p)'
alias metap="coursier launch org.scalameta:metap_2.11:4.0.0 --"
NOTE. These installation instructions are for the current unstable master
branch,
it's recommended to view this document at the latest git tag instead of master
.
(Optional) Instead of running metap
on the JVM, you can build a native binary
on macOS or Linux. Thanks to Scala Native,
native Metap works much faster than regular Metap (on a personal laptop of
one of the authors of this guide, a simple Metap invocation takes 500+ ms on JVM
and 10 ms on native).
- Install the
coursier
command-line version 1.1.0-M6 or later. - Setup the development environment for Scala Native.
- Link a native
metap
binary.
coursier bootstrap org.scalameta:metap_native0.3_2.11:4.0.0 -o metap -f --native --main scala.meta.cli.Metap
Example
Let's generate SemanticDB for a simple Scala program. (At the moment, our SemanticDB producers provide full Scala support and partial Java support. Theoretically, the SemanticDB protobuf schema can accommodate other languages as well, but we haven't attempted to do that yet).
object Test {
def main(args: Array[String]): Unit = {
println("hello world")
}
}
In order to obtain a SemanticDB corresponding to this program, let's use the Metac command-line tool. For more information on other tools that can produce SemanticDB, see below.
$ metac Test.scala
metac
is a thin wrapper over the Scala compiler. It supports the same
command-line arguments as scalac
supports, but instead of generating .class
files it generates .semanticdb files. Newer versions of Metac may also generate
an accompanying .semanticidx file, but it's an experimental feature, so we won't
won't be discussing it in this document.
$ tree
.
├── META-INF
│ └── semanticdb
│ └── Test.scala.semanticdb
└── Test.scala
If we take a look inside Test.scala.semanticdb, we'll see a weird mix of legible-looking text and special characters. That's because .semanticdb files store protobuf payloads.
$ xxd META-INF/semanticdb/Test.scala.semanticdb
00000000: 0aaa 0408 0412 0a54 6573 742e 7363 616c .......Test.scal
00000010: 612a 580a 1a5f 656d 7074 795f 2f54 6573 a*X.._empty_/Tes
00000020: 742e 6d61 696e 2829 2e28 6172 6773 2918 t.main().(args).
00000030: 082a 0461 7267 7380 0101 8a01 2e22 2c0a .*.args......",.
00000040: 2a12 2812 0c73 6361 6c61 2f41 7272 6179 *.(..scala/Array
00000050: 231a 1812 1612 1473 6361 6c61 2f50 7265 #......scala/Pre
00000060: 6465 662e 5374 7269 6e67 232a 530a 0d5f def.String#*S.._
00000070: 656d 7074 795f 2f54 6573 742e 180a 2008 empty_/Test... .
00000080: 2a04 5465 7374 8001 018a 012f 0a2d 0a00 *.Test...../.-..
00000090: 1211 120f 120d 7363 616c 612f 416e 7952 ......scala/AnyR
000000a0: 6566 2322 160a 145f 656d 7074 795f 2f54 ef#"..._empty_/T
000000b0: 6573 742e 6d61 696e 2829 2e92 0102 3a00 est.main()....:.
000000c0: 2a5c 0a14 5f65 6d70 7479 5f2f 5465 7374 *\.._empty_/Test
...
In order to make sense of .semanticdb files, we can use the Metap command-line tool. For more information on other tools that can consume SemanticDB, see below.
$ metap .
Test.scala
----------
Summary:
Schema => SemanticDB v4
Uri => Test.scala
Text => empty
Language => Scala
Symbols => 3 entries
Occurrences => 7 entries
Symbols:
_empty_/Test. => final object Test extends AnyRef { +1 decls }
_empty_/Test.main(). => method main(args: Array[String]): Unit
_empty_/Test.main().(args) => param args: Array[String]
Occurrences:
[0:7..0:11) <= _empty_/Test.
[1:6..1:10) <= _empty_/Test.main().
[1:11..1:15) <= _empty_/Test.main().(args)
[1:17..1:22) => scala/Array#
[1:23..1:29) => scala/Predef.String#
[1:33..1:37) => scala/Unit#
[2:4..2:11) => scala/Predef.println(+1).
Metap prettyprints various parts of the SemanticDB payload in correspondence with the SemanticDB specification. Here are the most important parts:
Uri
stores the URI of the source file relative to the directory where the SemanticDB producer was invoked.Symbols
contains information about definitions in the source file, including modifiers, signatures, etc.For example,
_empty_/Test.main(). => method main: (args: Array[String]): Unit
says thatmain
is a method with one parameter of typeArray[String]
.Occurrences
contains a list of identifiers from the source file with their line/column-based positions and unique identifiers pointing to corresponding definitions resolved by the compiler.For example,
[2:4..2:11): println => scala/Predef.println(+1).
says that the identifierprintln
on line 3 (zero-based numbering scheme!) refers to the second overload ofprintln
fromscala/Predef
.
What is SemanticDB good for?
SemanticDB decouples producers and consumers of semantic information about programs and establishes a rigorous specification of the interchange format.
Thanks to that, SemanticDB-based tools like Scalafix, Metadoc and Metals don't need to know about compiler internals and can work with any compiler that supports SemanticDB. This demonstrably improves developer experience, portability and scalability. Next-generation semantic tools at Twitter are based on SemanticDB.
For more information about the SemanticDB vision, check out our talks:
- Semantic Tooling at Twitter (June 2017) by Eugene Burmako & Stu Hood.
- SemanticDB for Scala developer tools (April 2018) by Ólafur Páll Geirsson.
- How We Built Tools That Scale to Millions of Lines of Code (June 2018) by Eugene Burmako.
Producing SemanticDB
Scalac compiler plugin
The semanticdb-scalac
compiler plugin injects itself immediately after the
typer
phase of the Scala compiler and then harvests and dumps semantic
information from Scalac in SemanticDB format.
scalac -Xplugin:path/to.jar -Yrangepos [<pluginOption> ...] [<scalacOption> ...] [<sourceFile> ...]
The compiler plugin supports the following options that can
be passed through Scalac in the form of -P:semanticdb:<option>:<value>
Option | Value | Explanation | Default |
-P:semanticdb:failures:<value> |
error ,warning ,info ,ignore |
The level at which the Scala compiler should report crashes that may happen during SemanticDB generation. | warning |
-P:semanticdb:profiling:<value> |
on ,off |
Controls basic profiling functionality that computes the overhead of
SemanticDB generation relative to regular compilation time
(on for dumping profiling information to console,
off for disabling profiling).
|
off |
-P:semanticdb:include:<value> |
Java regex | Which source files to include in SemanticDB generation? | .* |
-P:semanticdb:exclude:<value> |
Java regex | Which source files to exclude from SemanticDB generation? | ^$ |
-P:semanticdb:sourceroot:<value> |
Absolute or relative path |
Used to relativize source file paths into
TextDocument.uri .
|
Current working directory |
-P:semanticdb:targetroot:<value> |
Absolute or relative path |
The output directory to produce META-INF/semanticdb/**/*.semanticdb
files.
|
The compiler output directory, matches the sbt setting key classDirectory
and scalac command-line option -d .
|
-P:semanticdb:text:<value> |
on ,off
|
Specifies whether to save source code in
TextDocument.text (on for yes,
off for no).
|
off |
-P:semanticdb:md5:<value> |
on ,off
|
Specifies whether to save a hexadecimal formatted MD5 fingerprint of the source
file contents in TextDocument.md5 (on for yes,
off for no).
|
on |
-P:semanticdb:symbols:<value> |
all ,local-only ,none
|
Specifies which symbol informations to save in
TextDocument.symbols (all for both local and global symbols,
local-only for only local symbols and
none for no symbols).
|
all |
-P:semanticdb:diagnostics:<value> |
on ,off
|
Specifies whether to save compiler messages in
TextDocument.diagnostics (on for yes,
off for no).
|
on |
-P:semanticdb:synthetics:<value> |
on ,off
|
Specifies whether to save some of the compiler-synthesized code in
the currently unspecified TextDocument.synthetics
section (on for yes, off for no).
|
on |
semanticdb-scalac
can be hooked into Scala builds in a number of ways.
Read below for more information on command-line tools as well as integration
into Scala build tools.
Metac
Metac is a command-line tool that serves as a drop-in replacement for scalac
and produces *.semanticdb
files instead of *.class
files. It supports the
same command-line arguments as scalac
, including the compiler plugin options
described above.
metac [<pluginOption> ...] [<scalacOption> ...] [<sourceFile> ...]
With metac, it is not necessary to provide the flags
-Xplugin:/path/to.jar
and -Yrangepos
, which makes it ideal
for quick experiments with SemanticDB. For an example of using Metac,
check out Example.
sbt
In order to enable semanticdb-scalac
for your sbt project, add the following
to your build. Note that the compiler plugin requires the -Yrangepos
compiler
option to be enabled.
addCompilerPlugin("org.scalameta" % "semanticdb-scalac" % "4.0.0" cross CrossVersion.full)
scalacOptions += "-Yrangepos"
Javac compiler plugin
The semanticdb-javac
compiler plugin collects and dumps SemanticDB information after the analyze
phase of the Java compiler. It currently only produces symbol information, not occurrences.
To use it, follow these instructions:
Install the plugin by adding a dependency to
org.scalameta:semanticdb-javac
in your build, or by running:$ coursier fetch --intransitive org.scalameta:semanticdb-javac_2.12:4.0.0
Add the plugin to your build's compile classpath. If invoking
javac
directly add it as one of the listed-cp
entries. Otherwise, adding the plugin as a library dependency in your build tool should be enough.Add the following javac option:
"-Xplugin:semanticdb <target-dir> --sourceroot <source-root>"
Note: Giving quotes around this option is necessary on the command line, as that is how javac is able to tell what arguments belong to the plugin. If you are constructing the javac command programmatically as a sequence of string arguments, then this should be a single string without quotes.
Replace
<target-dir>
with whatever directory you want the generated SemanticDB to live in, and replace<source-root>
with the root you want source file URIs to be relative to. If<source-root>
is omitted, it defaults to the current working directory.
For example, a full javac invocation using the plugin would look like:
javac "-Xplugin:semanticdb java-project/target/semanticdb --sourceroot java-project/" \
-cp <classpath>:<path-to-semanticdb-javac.jar> \
-d java-project/target/classes \
java-project/src/main/File1.java java-project/src/main/File2.java
Metacp
Metacp is a command-line tool that takes a classpath, generates SemanticDB files for all classfiles and returns a new classpath that contains the SemanticDB files. Advanced command-line options control caching, parallelization and interaction with some quirks of the Scala standard library.
metacp [options] <classpath>
Option | Value | Explanation | Default |
<classpath> |
Java classpath | Specifies classpath to be converted to SemanticDB. | |
--dependency-classpath <value> |
Java classpath |
The classpath for library dependencies to compute external library references.
For example, should include the JDK and scala-library if those are not
part of <classpath> . The difference between
<classpath> and --dependency-classpath is
that entries in --dependency-classpath will not be processed
for --out .
|
Empty. |
--out <value> |
Absolute or relative path | Says where Metacp should output conversion results. | See below |
--exclude-scala-library-synthetics ,--include-scala-library-synthetics
|
Specifies whether the output classpath should include a jar that contains
SemanticDB files corresponding to definitions missing from
scala-library.jar , e.g. scala.Any ,
scala.AnyRef and others.
|
--exclude-scala-library-synthetics
|
|
--par ,--no-par |
Toggles parallel processing. If enabled, classpath entries
will be converted in parallel.
NOTE: Some of our users have reported deadlocks supposedly caused by enabling --par . Proceed at your own risk.
|
--no-par
|
|
--verbose |
Toggles periodic progress printouts that help gauge Metacp's progress for long-running invocations. |
|
|
--usejavacp |
Attempts to autodetect the locations of JDK libraries and the Scala library
based on the Metacp's classloader, so that these libraries don't have to be
specified in --dependency-classpath .
|
|
|
--stub-broken-signatures |
Catches exceptions that arise during generation of `Signature` payloads and stubs these payloads with `NoSignature` values instead of failing Metacp invocations. May be useful for dealing with missing symbol errors arising from missing optional dependencies. |
|
|
--log-broken-signatures |
Logs exceptions that arise during generation of `Signature` payloads.
May be useful in combination with --stub-broken-signatures
to have a sense of what exactly is being stubbed.
|
|
Metacp understands classfiles produced by both the Scala and Java compiler. Since Metacp is a standalone application independent from the Scala compiler, the compiler plugin is not required when using Metacp.
Because Metacp only works with classfiles and not sources, SemanticDB files
that it produces only contain the Symbols
section. Neither Occurrences
nor Diagnostics
sections are present, because they both require source
information. For more information about the SemanticDB format, check out
the specification.
As an example of using Metacp, let's compile Test.scala from Example using Scalac and then convert the resulting classfiles to SemanticDB. Note that newer versions of Metac may also generate an accompanying .semanticidx file, but it's an experimental feature, so we won't be discussing it in this document.
$ scalac Test.scala
<success>
$ tree
.
├── Test$.class
├── Test.class
└── Test.scala
$ metacp .
{
"status": {
"/Users/ollie/dev/scalameta/target/.": "/Users/ollie/dev/scalameta/target/out/target"
},
"scalaLibrarySynthetics": ""
}
$ tree out
out
└── target
└── META-INF
└── semanticdb
└── Test.class.semanticdb
$ metap out/target
Test.class
----------
Summary:
Schema => SemanticDB v4
Uri => Test.class
Text => empty
Language => Scala
Symbols => 3 entries
Symbols:
_empty_/Test. => final object Test extends AnyRef { +1 decls }
_empty_/Test.main(). => method main(args: Array[String]): Unit
_empty_/Test.main().(args) => param args: Array[String]
Consuming SemanticDB
Scala bindings
The semanticdb
library contains ScalaPB
bindings to the SemanticDB protobuf schema. Using this
library, one can model SemanticDB entities as Scala case classes and
serialize/deserialize them into bytes and streams.
libraryDependencies += "org.scalameta" %% "semanticdb" % "4.0.0"
semanticdb
is available for all supported Scala platforms - JVM, Scala.js
and Scala Native. For more information, check out autogenerated documentation for
Scala 2.11
and Scala 2.12.
Caveats:
- At the moment, there are no compatibility guarantees for Scala bindings to the SemanticDB
schema. The current package of the schema (
scala.meta.internal.semanticdb
) is considered internal, so we do not provide any guarantees about compatibility across different versions of thesemanticdb
library. We are planning to improve the situation in the future. - At the moment, SemanticDB-based tools are responsible for implementing discovery of SemanticDB payloads on their own. For example, the non-trivial logic that Metap uses to traverse its inputs and detect SemanticDB files must be reproduced by Scalafix, Metadoc and others. We are planning to improve the situation in the future.
Metap
Metap is a command-line tool that takes a list of paths and then prettyprints all .semanticdb files that it finds in these paths. Advanced options control prettyprinting format.
metap [options] <classpath>
Option | Value | Explanation | Default |
<classpath> |
Pseudo classpath |
Supported classpath entries:
|
|
--compact ,--detailed ,--proto |
Specifies prettyprinting format, which can be either --compact
(prints the most important parts of the payload in a condensed fashion),
--detailed (more detailed than --compact, but still pretty
condensed), or --proto (prints the same output as
protoc would print, see below).
|
--compact
|
For an example of using Metap, check out Example.
Protoc
The Protocol Compiler tool (protoc
) can inspect protobuf payloads in
--decode
(takes a schema) and --decode_raw
(doesn't need a schema) modes.
For the reference, here's the SemanticDB protobuf schema.
$ tree
.
├── META-INF
│ └── semanticdb
│ └── Test.scala.semanticdb
└── Test.scala
$ protoc --proto_path <directory with the .proto file>\
--decode scala.meta.internal.semanticdb.TextDocuments\
semanticdb.proto < META-INF/semanticdb/Test.scala.semanticdb
documents {
schema: SEMANTICDB4
uri: "Test.scala"
symbols {
symbol: "_empty_/Test.main().(args)"
kind: PARAMETER
name: "args"
access {
publicAccess {
}
}
language: SCALA
signature {
valueSignature {
tpe {
typeRef {
symbol: "scala/Array#"
type_arguments {
typeRef {
symbol: "scala/Predef.String#"
}
}
}
}
}
}
}
symbols {
symbol: "_empty_/Test."
kind: OBJECT
properties: 8
...
protoc
was useful for getting things done in the early days of SemanticDB,
but nowadays it's a bit too low-level. It is recommended to use metap
instead of protoc
.
SemanticDB-based tools
Scalafix
Scalafix is a rewrite and linting tool for Scala developed at the Scala Center with the goal to help automate migration between different Scala compiler and library versions.
Scalafix provides syntactic and semantic APIs that tool developers can use to write custom rewrite and linting rules. Syntactic information is obtained from the Scalameta parser, and semantic information is loaded from SemanticDB files produced by the Scalac compiler plugin and Metacp.
Thanks to SemanticDB, Scalafix is:
- Accessible: Scalafix enables novices to implement advanced rules without learning compiler internals.
- Portable: Scalafix is not tied to compiler internals, which means that it can seamlessly work with any compiler / compiler version that supports the SemanticDB compiler plugin.
- Scalable: Scalafix does not need a running Scala compiler, so it can perform rewrites and lints in parallel. (Unlike compiler plugin-based linters that are limited by the single-threaded architecture of Scalac).
Metadoc
Metadoc is an experiment with SemanticDB to build online code browser with IDE-like features. Check out the demo for more information.
Metadoc takes Scala sources and corresponding SemanticDB files generated by the Scalac compiler plugin. It then generates a static site that is possible to serve via GitHub pages, supporting jump to definition, find usages and search by symbol.
Thanks to SemanticDB, Metadoc is:
- Cross-platform: Scala bindings to SemanticDB are cross-compiled to JVM and Scala.js, which means that the site generator and the online code browser can reuse the same logic to work with SemanticDB payloads.
- Portable: Just like Scalafix, Metadoc is not tied to compiler internals, which means that it can seamlessly work with any compiler / compiler version that supports the SemanticDB compiler plugin.
Metals
Metals is an experiment to implement a language server for Scala using Scalameta projects such as Scalafmt, Scalafix and SemanticDB. Check out the presentation for more information.
Metals uses SemanticDB to:
- Index project dependencies for intelligent jump to definition and quick lookups of project symbols.
- Communicate with the Scala compiler regarding semantic information about the opened project.
- Feed semantic information into Scalafix-based refactorings.
Thanks to SemanticDB, Metals is:
- Mostly portable: Unlike Scalafix and Metadoc, Metals has modules that interface directly with compiler internals, but the majority its functionality is based on SemanticDB, so it can work with any compiler / compiler version that supports the SemanticDB compiler plugin.
- Surprisingly fast: A well-defined schema for semantic information that can come from multiple locations (dependency classpath, uncompiled files, compiled files, etc) allows for a robust implementation of indexing, which reliably speeds up operations like jump to definition and find usages. We have even experimented with a relational index for SemanticDB data, which further improves performance characteristics.
- Resilient: Reification of semantic information makes it possible to consult results of previous typechecks and accommodate certain edits by simply shifting offsets in old SemanticDB snapshots. This technique is surprisingly effective for supporting minor edits that result in temporarily invalid code.