Figwheel, Reloaded
I’m working on an Om Next app with a Pedestal backend. Figwheel is a great way to do ClojureScript development, and the Reloaded workflow is really nice for working on Clojure servers. However, reloading Clojure code in the JVM still has corner cases which are difficult to solve. When you run into one, it’s not always obvious what’s causing the problem. Stuff just starts behaving weirdly. And I ran into just such a bug the other day.
Background
Here’s a description of what I was working on when I found the issue and how it manifested itself. If you’re just interested in the bug and the solution, feel free to skip ahead.
Requests and responses between the Om Next client and Pedestal server are
Transit-encoded EDN. When the client sends an update request
(called a mutate
request in Om parlance) to create a new object, it includes a
temporary id for the object. The client also uses this temporary id to reference
the item in the app itself until it knows its permanent id. The server response
includes a map of any temporary ids and the permanent ids associated with the
objects. For example, a request to create might a new todo
object with
:item/text
value I’m a new item!
might look like this:
Here’s a breakdown of what that means:
-
[...]
You can send multiple requests at a time to the remote and they’re grouped in a vector. This request has a single mutate request, which is a list(...)
. -
todo/new-item
is the type of mutation. It tells the server how to interpret the rest of the request. -
{:item/id #om/id[...], :item/text "..."}
is the data associated with thetodo/new-item
request. This is a map with two keys,:item/id
and:item/text
. -
#om/id["6b542daa-6d03-418e-a008-34505dad905a"]
is the temporary id that the client associated with the new object. The special#om/id[... ]
syntax is called a tagged literal. This is read and interpreted as the string literal representation of anom.tempid.TempId
instance1 with a value of the UUID in the enclosed string. In Clojure,2 theprint-method
multimethod has been extended forTempId
to write the literal with that syntax.Similarly, this same method is extended for
java.util.Date
to print a tagged literal like#inst "1985-04-12T23:20:50.52Z"
. -
"I’m a new item!"
is the text value associated with:item/text
.
The server parses this mutate request and perhaps writes it to a PostgreSQL database. For example, we might have a table like
And in response to the request, the server issues a SQL statement like
The database server returns 852154481843896390
for the item_id
,
and our Om remote server will return a response to the client:
The response is a map ({...}
), not a vector ([...]
) like the
request. This map has a single key, todo/new-item
, which corresponds
to the type of the mutation. The value associated with todo/new-item
is
also a map with a single key, tempids
, which is also a map, associating
the temp id the client assigned to the request with the value the database.
Both the request and response data that I’ve shown above are EDN. The actual
payloads are Transit encoded. Transit plays a similar role to JSON in providing
a way to transfer data between applications. Indeed, Transit can be transferred
as JSON (and also MessagePack). It includes more types than JSON, such as 64-bit
integers, bytes, points in time, URIs, sets, lists, and maps with composite keys.
Transit also includes a way to extend the meaning of the encoded data. This is
what allows both the front end and the back end to understand that #om/id
, an
Om-specific extension to Transit, is to be read as an om.tempid.TempId
value.
When Transit-encoded in JSON, the above response should be
It’s just a nested JavaScript array with a bunch of string values. The Transit reader in the client knows how to convert this back into an EDN value.
The Bug
I’ve been using the Untangled client library which provides some useful conventions for writing Om apps. Most of the remote server examples use Ring. The Untangled server uses HTTP Kit, which is largely compatible with Ring.
I’m using Pedestal, and there aren’t a lot of examples out there for using
Pedestal with Om. The Om library includes an Om Transit writer which knows how
to write om.tempid.TempId
values. I just needed to figure out how to wire in the
Om transit writer into the interceptor chain. I came across a
gist by Andre R that provided an example. I plugged it
in to my Pedestal app and it worked—most of the time.
Oh, my. I initially noticed something was wrong because the client app wasn’t
always updating as it should from the server responses. I logged the requests
and responses to the JavaScript console and saw that sometimes the responses
from the server didn’t include the #om/id
tag, and that these correlated with
the instances when the app wasn’t updating. Here’s an example of what that
looked like:
The "#~om/id"
value is missing. In its place are the two elements
"^ "
and "~:id"
.
But I also saw times when the responses did include the #om/id
tag. And it
seemed like it was breaking when I was updating code that had nothing to do
with the API responses. That said, it seemed to only happen after I updated code
and ran reset
. In my notes I wrote:
This has something to do with the Reloaded workflow. On
(go)
, works fine. On(reset)
, it’s messed up, no longer handling#om/ids
At this point I was at a loss. I knew that there were corner cases where code reloading would cause odd bugs, which is one of motivations for Stuart Sierra to write the Component library and suggested guidelines to avoid these situations. Where had I run afoul of these guidelines? The app is too big and I’m still too new with the libraries to know for sure what areas of the code I can discount, especially as the error seemed to pop up regardless of what sections of the code I was updating.
Building a test case
Time to find a minimal test case. I started with a fresh Pedestal service app.
I added a single interceptor to do the Om Transit encoding. As there
was no indication in the main project that there was any problem with the server
interpreting or processing client requests, I just hard-coded a
response value so I could easily test at the command line with curl
.
Prior to being encoded, the responses looked just fine in the logs. And why test
through the browser if I didn’t have to?
And guess what? No luck. Everything worked fine. I couldn’t get the server
to fail. So I branched my project app and implemented a similar hard-coded
handler, and confirmed I could still see the error. I started ripping out code,
trying to work down to the point where the error went away. Rip, (reset)
,
curl
, repeat. I ran lein clean
, thinking maybe there was some stale code
that was causing the issue. And at one point it started working with my command
line requests! I couldn’t get it to fail!
I started up Figwheel to confirm it worked from the browser. Yes! No error! I
updated the server code to remove some logging I was using for debugging.
Reload via (reset)
, test. And now it’s broken? What is going on? What had I
done? Certainly removing logging lines shouldn’t break the Transit writer! But I
had also reloaded the code. Could that be it?
I thought back to Stuart Halloway’s Debugging with the Scientific Method talk at Clojure/conj in 2015 and remembered he said something about writing down everything you were doing. So at this point I wrote down the steps that replicated the bug.
-
terminal 1
lein clean
-
emacs
cider-restart
(restarts repl) -
repl
(go)
-
terminal 2
curl localhost:8081/om
(still works) -
repl
(reset)
-
terminal 1
rlwrap lein run -m clojure.main script/figwheel.clj
-
repl
(reset)
-
terminal 2
curl localhost:8081/om
(still works) -
editor whitespace change in
server.clj
-
repl
(reset)
-
terminal 2
curl localhost:8081/om
(still works) -
editor whitespace change in
system.clj
-
repl
(reset)
-
terminal 2
curl localhost:8081/om
(still works) -
emacs
cider-restart
-
repl
(go)
-
terminal 2
curl localhost:8081/om
(still works) -
repl
(reset)
-
terminal 2
curl localhost:8081/om
(borked! yay!)
Wow. Nineteen laborious, time-consuming steps in three different windows. I did it twice to confirm that these steps replicated the bug, and I was relieved that it did. Finally I had a way to at least reproduce it.
But which steps were necessary? I eventually worked out a set of 8 steps that reliably demonstrated the bug.
- Make sure Figwheel and server repls aren’t running
-
terminal 1
lein clean
-
terminal 1
rlwrap lein run -m clojure.main script/figwheel.clj
-
emacs
cider-jack-in
(starts repl) -
repl
(go)
-
terminal 2
curl localhost:8081/om
-
repl
(reset)
-
terminal 2
curl localhost:8081/om
And this explains why I couldn’t get my minimal test case project to fail. I didn’t have a client app, so I wasn’t running Figwheel. When I added a bare-bones Om client and ran Figwheel, the test case project failed just as consistently with the same steps.
What to do? I knew that the issue was with the Reloaded workflow, and I didn’t
need to use that. I could use the lein run-dev
script included in the
Pedestal service template which also picks up changes when I loaded buffers
in CIDER to the repl. So this wasn’t a blocker to continued development. It
was just blocking development using my preferred workflow. But I did want to
use Reloaded.
I polished the code in the test case project to the clearest, smallest test case I could think of. I knew at this point I was going to have to ask for help and wanted to make it as easy as possible for someone to examine and understand what was going on. I pushed it to GitHub, including in the README as much information as I could to explain the issue and how to replicate the bug.
Now it was time to reach out to the community, which would be the mailing lists or Clojurians Slack. But which mailing lists? Which channel? Pedestal? Om? Figwheel? Component? Clojure? ClojureScript? I certainly didn’t want to spam them all.
I decided to try the #clojure Slack channel. I typed up a concise description of the issue and pasted it into the message window. Robert Stuttaford (@robert-stuttaford) responded within a minute and shared that he had encountered a similar issue, and not surprisingly, it entails conflicts between the Figwheel and the Reloaded workflow compilation methods.
First a bit of background on Clojure file types. Clojure files which target
only the JVM use the .clj
extension. ClojureScript files, targetting JavaScript,
use the .cljs
file. Clojure also has a portable Clojure
file type with the extension .cljc
. Portable Clojure files can target multiple
platforms. So when targeting the JVM, you can use .clj
and .cljc
files.
When targeting JavaScript, you can use .cljs
and .cljc
files.
When working on the server, starting the repl compiles the code required by
the server. When reloading code using (reset)
,
clojure.tools.namespace.repl/refresh
reloads all Clojure files suitable for
the JVM that are on the classpath, not just those required by the server.
When Figwheel compiles the front end code, it copies required .cljs
and .cljc
files into the resource
directory. The resource
directory is included on the
classpath, so clojure.tools.namespace.repl/refresh
, used during the Reloaded
workflow, picks up any .cljc
files Figwheel has copied there, and these can
conflict with the server code already loaded into the JVM.
I decided to look into ways of updating the classpath used by repl/refresh
.
Knowing that the code was written by Stuart Sierra, I would be surprised if
there wasn’t a way. Looking at the source code for
repl/refresh
, I saw in the documentation
The directories to be scanned are controlled by ‘set-refresh-dirs’; defaults to all directories on the Java classpath.
Yes! After some messing around at the repl, I came up with the following
function which returns all of the directories repl/refresh
would look
at by default, sans the resource
directory:
Hard-coding the resource-path
like that feels a bit hacky. What if
there are multiple resource paths that happen to include public
? Should
I factor out the exclusions
set so that can be defined elsewhere?
However, I can tackle those issues if and when I encounter them. Right now this
is a pragmatic solution, and it’s not completely ugly. There’s a single place
I need to update if I need to change which paths are included. And the
dev/server/user.clj
file is generally a per-project file anyway.
I updated reset
to call repl/set-refresh-dirs
with the directories
returned by refresh-dirs
.
I don’t know whether refresh-dirs
needs to be a function or whether
I need to call repl/set-refresh-dirs
on every reset
call, but it’s
not a performance bottleneck and it works.
Note that this doesn’t work with clojure.tools.namespace 0.2.11, the
stable release of this writing as repl/refresh-dirs
is marked private.
I’m using 0.3.0-alpha3 where it’s now public.
I have a general solution, but I was still curious which namespace was causing
the issue. Here are the list of .cljc
files Figwheel was copying into
resources
:
> find ./resources/public/js -name "*.cljc"
./resources/public/js/cljs/stacktrace.cljc
./resources/public/js/om/next/impl/parser.cljc
./resources/public/js/om/next/protocols.cljc
./resources/public/js/om/tempid.cljc
./resources/public/js/om/transit.cljc
./resources/public/js/om/util.cljc
The culprit is om/transit.cljc
. I ran Figwheel, deleted om/transit.cljc
,
and then ran (reset)
and confirmed that the code worked as expected. This
makes sense, given the affected code uses the om.transit
namespace. As
expected, deleting a different file, such as om/tempid.cljc
, did not fix the
bug. However, deleting files from resources
isn’t a very convenient or
robust solution.
- In Clojure (on the JVM), it’s a record. In ClojureScript, it’s a type. Records and types are very similar in Clojure the language. I’m not sure why the implementations are different on the JVM and JavaScript.↩︎
- In ClojureScript, rather than use a multimethod, the
TempId
type implments theIPrintWithWriter
protocol.↩︎ </li>