Leveraging your data files with Sublime Text
I bet your application creates some kind of log file that you inspect manually. Or it uses a data file with a well-known structure.
Have you ever thought Oh dear, this wall of text is unreadable! If only I had some highlighting? Beg no more, because we’re making your dearest dreams come true!
The scenario
Let’s say this is a screenshot of your typical log file when you open it in Sublime Text 3:
I’d say it is hard to get an idea of what’s really going on unless you are so into the format that you can see blonde, brunette, redhead. What if we could improve it so we highlight some interesting parts?
Okay, maybe you don’t like the colors or maybe you would’ve highlighted other stuff. We’ll be learning how to achieve this result so you can roll your own!
Sublime Text 3
First of all, you need to get yourself a copy of Sublime Text 3. If you haven’t heard of this awesome text editor, head over to their home page and learn some of its interesting features.
Syntax files
Sublime Text lets you create your own syntax definitions and its highlighting through their own data files. Each syntax definition mainly consists of 2+1 files:
- .sublime-syntax file: defines the structure of the syntax you are targetting.
- .tmTheme file: defines styling for each match you performed within the previous file.
- .sublime-settings file: allows the user to create some properties to use when the syntax is in use.
We’ll walk through all of them as we progress through the following section.
Case study: unreadable log files
Okay, so this is the sample log file we have:
As we can see, there’s a common pattern going on:
Let’s start with a simple syntax definition and work from there.
Bootstrap
First of all, we’re going to create the 2+1 files we mentioned before.
Navigate to Sublime Text’s Packages
folder (%appdata%\Sublime Text 3\Packages
in Windows, ~/.config/sublime-text-3/Packages
in Linux) and create a folder called AwesomeCodingScarsLog
. Let’s now add the barebones files.
AwesomeCodingScarsLog.sublime-syntax
Create it and paste this code:
Here we’ll define our syntax rules.
AwesomeCodingScarsLog.tmTheme
Create it and paste this code:
This is the base file for the Monokai color scheme. Here we’ll add our styling.
AwesomeCodingScarsLog.sublime-settings
Create it and paste this code:
This defines settings that override the current ones when this syntax is selected. It tells Sublime Text to use our .tmTheme
automatically when we select the syntax, so the styling is kept separate in that file.
Set up sample log file
Open Sublime Text with the sample log file we mentioned before and select the syntax we’re going to define. You can do it by either pressing Ctrl+Shift+P
and then writing Syntax AwesomeLog
in the field that appears on screen, or you can go to the bottom right corner and select the syntax manually from the list.
When you can read AwesomeCodingScarsLog
in the bottom right corner of Sublime Text while the focus is on our sample log file, you are ready to continue.
First step: understand the format
Open AwesomeCodingScarsLog.sublime-syntax
. Let’s check what we pasted previously:
First lines declare it’s a YAML file. It’s mandatory for the syntax to be parsed.
The scope
property defines a name that’s assigned to a match when applying styling. In this case, it’s the base styling for the syntax. A scope
can have nesting, specifying scopes from least to most specific and applying them in a cascading fashion. If you want to know more, check Sublime Text’s official docs on this feature.
After the global scope
definition we find the contexts
definition. Each one, in turn, defines a list of regular expressions that match
the lines in your file. When a match
is found, we can modify the stack or apply styling.
So, let’s add the first one!
Everything is unexpected
This step is a temporal one that we’ll use to ensure we’re on the right track as we go.
Modify the only entry in the main
context
so it is:
This means we’re tagging everything with the ascl.unexpected
style. However, that’s still not defined. Let’s fix that!
Open AwesomeCodingScarsLog.tmTheme
and add this definition in the place where we had a comment:
With this, now we’ve got this lovely file:
This is our starting point. We’ll have a nice way of knowing we’re missing some matchings.
Log levels
Now it’s time for us to match something: the log levels. We know the format is [log_level]
and that they are either D
(debug), I
(info), W
(warning), E
(error) or F
(fatal).
In the .sublime-syntax
file we’re going to define matches for these, so inside the main
context but before the acsl.unexpected
match, insert the following code:
We’re capturing the [D]
log level and the rest of the line. The scope
property is applied to the whole match
and the captures
list defines specific scopes for each capture group in the regular expression. This way, the [D]
tag will have the acsl.debug
scope and the rest of the capture will have the acsl.line
one.
This will yield this highlighting:
Repeat this match with the rest of the tags (using acsl.
+ info
, warning
, …) and we’ll have the following file:
Nice! Now everything in the file is expected, but there’s no styling yet!
Styling log levels
Let’s start with the acsl.debug
scope. In the .tmTheme
file, where we left the comment, paste this code:
Do it again for each other log level with the following colors:
acsl.info
:#00B764
acsl.warning
:#EDD436
acsl.error
:#A50101
acsl.fatal
:#FF0000
You’ll now have this style:
Great job, it’s starting to take shape! What if we extend the style in the log levels to the rest of the tags before the real log line?
Styling tag and timestamp
Back in the .sublime-syntax
file, find the debug
match and update it like so:
Update all other matches to account for the new match
and captures
additions and you’ll have this highlighting:
Nice! Isn’t it easier to see which kind of messages you’re having in the file?
Styling important data
Before we call it a day, we’d like to highlight everything that’s between the single quotes because, for us, they are important and deserve attention. Let’s add this match
after all of our log level match
definitions:
And this style:
We save and… Nothing changes. Why is that?
When Sublime Text tries to match a new line it tests all matches in the context. Some of them may match, and they will do it at different positions. For our scenario, the ^(\[D\])(\[.+\])(\[.+\])(.+)$
pattern matches at the start of the line, while the '([^']+)'
pattern does it somewhere in the middle of the line. Sublime Text then uses the match with the leftmost start or, in case of a draw, the first that was defined.
So, first of all, let’s modify the debug
match to be like this:
This way we only match tags until the timestamp. Notice how we’ve dropped the $
symbol and how we’ve ditched the captures
list altogether: everything in the capture will have the same style. When Sublime Text tries to match the '([^']+)'
pattern, this one won’t trigger and it will safely work! You can modify the other captures so they have these changes.
So, we save again and we see this:
Oh, no! Didn’t we fix it?
It’s the same problem, but between the .+
pattern and the '([^']+)'
one. The former matches everywhere! In fact, if it wasn’t the last one (i.e. it was before the fatal
definition) it would be selected instead of the log ones!
Enter several contexts
Okay, so we know we’ve matched the start of each line, and those patterns will be preferred instead of the unexpected
one because of definition order. What if we could say Okay, this is a log line, it has these tags, and after the timestamp there’s the real log data and we’ll style it separately? That’s what we’ll achieve by manipulating the context stack.
Modify the debug
match (and the other levels’) to be like this:
It says: when you match this pattern, apply the acsl.debug
scope to the match and then push the log_line
context into the stack. And where’s the log_line
context, you say?
It’s defined as an entry in the contexts
mapping. When it’s in the stack, only this context will be processed until we modify the stack. So, we need to stop using it at some point or we won’t use the main
one again!
That’s what the match: '$'
does. When we get to the end of the line (because our log files are single-lined), we pop the context so we go back to the previous one (the main
context, in this case).
Now, move the single quotes match into the log_line
context and remove it from the main
one. You will have this:
Now, we’d see this:
Yay! Congratulations, now you know Kung-Fu! :)
Bonus: Sahkab dialog files
Back in 2012, some friends and I started a prototype for a videogame called Sahkab. It was a top-down adventure set in a sci-fi universe.
Because we were eager to learn, we built our custom scripting language (aimed at the programmers) and our custom dialog file format (aimed at the writer).
This is a sample screenshot of one of the dialog files, properly highlighted:
I wish we had it when we were working on the prototype, as I can tell you it was a bit less intuitive to write them with a white-only text :)
I hope this post motivates you to build your own syntax definitions to help yourself and your team!
You can find the code we’ve been writing here.
Thanks for reading!