Got Parameters? Just Use Docopt

Written by J. David Smith
Published on 7 September 2017

It's one of those days where I am totally unmotivated to accomplish anything (despite the fact that I technically already have – the first draft of my qual survey is done!). So, here's a brief aside that's been in the back of my mind for a few months now.

It is extremely common for the simulations in my line of workOr our, hi fellow student! to have a large set of parameters. The way that this is handled varies from person to person, and at this point I feel as though I've seen everything; I've seen simple getopt usage, I've seen home-grown command-line parsers, I've seen compile-time #defines used to switch models! fml Fig 1: Me, reacting to #ifdef PARAM modelA #else modelB #endif Worse, proper documentation on what the parameters mean and what valid inputs are is as inconsistent as the implementations themselves. Enough. There is a better way.

Docopt is a library that is available in basically any language you care aboutThis includes C, C++, Python, Rust, R, and even Shell! Language is not an excuse for skipping on this. that parses a documentation string for your command line interface and automatically builds a parser from it. Take, for example, this CLI that I used for a re-implementation of my work on Socialbots:See here for context on what the parameters (aside from ζ, which has never actually been used) mean.

Simulation for <conference>.

Usage:
    recon <graph> <inst> <k> (--etc | --hmnm | --zeta <zeta> | --etc-zeta <zeta>) [options]
    recon (-h | --help)

Options:
    -h --help                   Show this screen.
    --etc                       Expected triadic closure acceptance.
    --etc-zeta <zeta>           Expected triadic closure acceptance with ζ.
    --zeta <zeta>               HM + ζ acceptance.
    --hmnm                      Non-Monotone HM acceptance.
    --degree-incentive          Enable degree incentive in acceptance function.
    --wi                        Use the WI delta function.
    --fof-scale <scale>         Set B_fof(u) = <scale> B_f(u). [default: 0.5]
    --log <log>                 Log to write output to.

This isn't a simple set of parameters, but it is far from the most complex I've worked with. Just in this example, we have positional arguments (<graph> <inst> <k>) followed by mutually-exclusive settings (–etc | –hmnm | ...) followed by optional parameters ([options]). Here is how you'd parse this with the Rust version of Docopt:

const USAGE: &str = ""; // the docstring above

#[derive(Serialize, Deserialize)]
struct Args {
    // parameter types, e.g.
    arg_graph: String,
    arg_k: usize,
    flag_wi: bool,
    // ...
}

fn main() {
    let args: Args = Docopt::new(USAGE)
                            .and_then(|d| d.deserialize())
                            .unwrap_or_else(|e| e.exit());
}

This brief incantation:

  1. Parses the documentation string, making sure it can be interpreted.
  2. Correctly handles using recon -h and recon –help to print the docstring.
  3. Automatically deserializes every given parameter.
  4. Exits with a descriptive (if sometimes esoteric, in this implementation) error message if a parameter is missing or of the wrong type.

The same thing, but in C++ is:

static const char USAGE[] = R""; // the docstring above

int main(int argv, char* argv[]) {
    std::map<std::string, docopt::value> args 
        = docopt::docopt(USAGE, 
                         {argv + 1, argv + argc},
                         true,
                         "Version 0.1");
}

Although in this version type validation must be done manually (e.g. if you expect a number but the user provides a string, you must check that the given type can be cast to a string), this is still dramatically simpler than any parsing code I've seen in the wild. Even better: your docstring is always up to date with the parameters that you actually take.Of course, certain amounts of bitrot are always possible. For example, you could add a parameter but never implement handling for it. However, you can't accidentally add or rename a flag and then never add it to the docstring, which is far more common in my experience. So – for your sanity and mine – please just use Docopt (or another CLI-parsing library) to read your parameters. These libraries are easy to statically link into your code (to avoid .dll/.so not found issues), and so your code remains easy to move from machine to machine in compiled form. Please. You won't regret it.

Rusting My (Academic) Code

Written by J David Smith
Published on 23 February 2017

A couple of weeks ago I wrote about Verification of Computer Science Research. In particular, I mentioned several methods that I'd been (trying) to use to help improve the verifiability of my research. This is the first in an (eventual) series on the subject. Today's topic? Rust.

One of the reasons it took me so long to write this post is the difficulties inherent in describing why I chose Rust over another, more traditional language like Java or C/C++. Each previous time, I motivated the selection by first covering the problems of those languages. I have since come to realize that that is not a productive approach – each iteration invariably devolved into opining about the pros and cons of various trade-offs made in language design. In this post, I am instead going to describe the factors that led to me choosing Rust, and what practical gains it has given me over the past two projects I've worked on.

In the Beginning...

First, some background on who I am, what I do, and what constraints I have on my work. I am a graduate student at the University of Florida studying Optimization/Security on Online Social Networks under Dr. My T. Thai. Most problems I work on are framed as (stochastic) graph optimization problems, which are almost universally NP-hard. We typically employ approximation algorithms to address this, but even then the resulting algorithm can still be rather slow to experiment with due to a combination of difficult problems, large datasets and the number of repetitions needed to establish actual performance.

This leads to two often-conflicting constraints: the implementations musts be performant to allow us to both meet publication deadlines and compete with previous implementations, and the implementations must also be something we can validate as correct.Well, ideally. Often, validating code is only done by the authors and code is not released. Most code in my field is in either C or C++, with the occasional outlier in Java when performance is less of a concern, in order to satisfy that first constraint. However, after repeatedly having to work in other's rushed C/C++ codebases, I got fed up. Enough of this! I thought, There must be something better!

I set about re-implementing the state-of-the-art approximation algorithm (SSA) for the Influence Maximization problem.Hung Nguyen, Thang Dinh, My Thai.
“Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-Scale Networks.”
In the Proceedings of SIGMOD 2016.
This method has an absurdly-optimized C++ implementation, using all sorts of tricks to eek out every possible bit of performance. Solving this problem – even approximately – on a network with 40+ million nodes is no small feat, and the implementation shows the effort they put into getting there. This implementation formed my baseline both for performance and correctness: every language I rewrote it in should produce (roughly) identical solutions,The algorithm uses non-deterministic sampling, so there is room for some error after all the big contributors have been found. and every feasible language would have to get in the same ballpark in terms of memory usage and performance.

In my spare time over the space of a couple of months, I implemented SSA in Chez Scheme, Clojure, Haskell, OCaml and, eventually, Rust. My bent towards functional programming shows clearly in this choice of languages. I'm a lisp weenie at heart, but unfortunately no Scheme implementation I tried nor Clojure could remotely compete with the C++ implementation. Haskell and OCaml shared the problem that graph processing in purely-functional languages is just an enormous pain in the ass, in addition to not hitting my performance goalsSome of this is due to my unfamiliarity with optimizing in these languages. However, I didn't want to get into that particular black magic just to use a language that was already painful for use with graphs. – though they got much closer. Then I tried Rust. Holy. Shit.

The difference was immediately apparent. Right off the bat, I got performance in the ballpark of the baseline (once I remembered to compile in release mode). It was perhaps 50% slower than the C++ implementation. After a few rounds of optimization, I got that down to about 20-25%.Unfortunately, I no longer have the timing info. As that isn't the focus of this post, I also haven't re-run these experiments. Had I been willing to break more safety guarantees, I could have applied several of the optimizations from the baseline to make further improvement. Some of this performance is due to the ability to selectively use mutability to speed up critical loops, and some is due to the fact that certain graph operations are simply easier to express in an imperative fashion, leading me to write less wasteful code. What's more: it also has a similar type system – with its strong guarantees – to the system of ML-family languages that I adore, combined with a modern build/packaging tool and a thriving ecosystem. It seemed that I'd found a winner. Now, for the real test.

The First Paper: Batched Stochastic Optimization

I was responsible for writing the implementation for the next paper I worked on.The code – and paper – aren't out yet. I'll hear back in early March as to whether it was accepted. We were extending a previous workXiang Li, J David Smith, Thang Dinh, My T. Thai. “Privacy Issues in Light of Reconnaissance Attacks with Incomplete Information.”
In the Proceedings of WI 2016.
to support batched updates. As the problem was again NP-hard with dependencies on the ordering of choices made, this was no small task. Our batching method ended up being exponential in complexity, but with a strict upper bound on the size of the search space so that it remained feasible. This was my first testbed for Rust as a language for my work.

Immediately, it paid dividends. I was able to take advantage of a number of wonderful libraries that allowed me to jump straight from starting work to implementing our method. Performance was excellent, parallelism was easy, and I was able to easily log in a goddamn parseable format. It was wonderful. Even the time I spent fighting the borrow checkerA phase I will note was very temporary; I rarely run into it anymore. failed to outrun the benefits gained by being able to build on others' work. I had a real testing framework, which caught numerous bugs in my batching code. Even parallelism, which generally isn't much of a problem for such small codebases, was no harder than applying OpenMP. Indeed, since I needed read-write locks it was in fact easier as OpenMP lacks that feature (you have to use pthread instead). It was glorious, and I loved it.

Still do, to be honest.

The Second Work: Generic Analysis

The next paper I worked on was theory driven. I showed a new method to estimate a lower bound on approximation quality for a previously unstudiedNot exactly new, but general bounds for this class of problem had not really been considered. As a result, the greedy approximation algorithm hadn't seen much use on it. class of problems. The general idea is that we're maximizing an objective function f, subject to some constraints where we know that all maximal solutions have the same size. I needed to be able to transparently plug in different values of f, operating on different kinds of data, with different constraints. Further, for efficiency's sake I also needed a way to represent the set of elements that would need updating after each step.

Rust gave me the tools to do this. I defined a trait Objective to abstract over the different possible values of f. Each implementor could handle building their own internal representation, and with associated types I could easily allow each kind to operate on their own kind of data.

It worked. Really well, actually.

I wrote a completely generic greedy algorithm in terms of this trait, along with some pretty fancy analysis. Everything just...worked, and with static dispatch I paid very little at runtime for this level of indirection. At, least, as soon as it compiled.

Not All Fun & Games

As great as my time with Rust has been, there are still a few flaws I feel compelled to point out.

Borrow Checking is Sometimes Extremely Painful

While in most cases the borrow checker became something I dealt with subconsciously, in a few it was still an extraordinary pain in the ass. The first, and what seems to be the most common, is the case of having callbacks on structs. Some of this comes down to confusion over the correct syntax (e.g. Fn(...) -> X or fn(...) -> X), but I mentally recoil at the thought of trying to make a struct with callbacks on it. This was not a pleasant experience.

The second arose in writing the constructor for an InfMax objective using the SSA sampling method. There are multiple different diffusion models for this problem, each represented as a struct implementing Iterator. I wanted my InfMax struct to own the Graph object it operated on, and pass a reference of this to the sampling iterator. The borrow checker refused.

To this day, I still don't know how to get that to work. I ultimately caved and had InfMax store a reference with lifetime 'a, giving the sampler a reference with a lifetime 'b such that 'a: 'b (that is, 'a is at least as long as 'b). While this workaround took only a moderate amount of time to find, I still lost quite a bit trying to get it to work as I wanted. Giving InfMax a reference never caused any issues, but I still find it annoying.

Trait Coherence

In order to use the bit-set library, I wanted to force the Element associated type of each Objective to satisfy Into<usize>. The NodeIndex type from petgraph does not, but has a method `fn index(self) -> usize`. Due to the rules about when you can implement traits (namely, that you can't implement traits when both the trait and the type are non-local), this was impossible without forking or getting a PR into the petgraph repository.

Forking was my ultimate solution, and hopefully a temporary one. There isn't a clear way to work around thisImplementing a helper trait was frustrating because you can't implement both From<T: Into<usize>> for Helper and From<NodeIndex> for Helper, presumably because at a future point NodeIndex could implement From/Into<usize>?, and yet there also isn't a clear way to make this work without allowing crates to break each other.

Fortran FFI & Absent Documentation

I almost feel bad complaining about this, because it is so, so niche. But I'm going to. I had to interface with a bit (read: nearly 4000 lines) of Fortran code. In theory, since the Fortran ABI is sufficiently similar to C's, I ought to be able to just apply the same techniques as I use for C FFI. As it happens, this is mostly correct. Mostly.

Fortran FFI is poorly documented for C, so as one might imagine instructions for interfacing from Rust were nearly absent. It turns out that, in Fortran, everything is a pointer. Even elementary types like int. Of course, I also had the added complexity that matrices are stored in column-major order (as opposed to the row-major order used everywhere else). This led to a lengthy period of confusion as to whether I was simply encoding my matrices wrong, wasn't passing the pointers correctly, or was failing in some other way (like passing f32s instead of f64s).

It turns out that you simply need to flatten all the matrices, pass everything by pointer, and ensure that no call is made to functions with unspecified-dimensional matrices.Calling functions with matrices of the form REAL(N, M) works provided N and M are either variables in scope or parameters of the function.

Numeric-Cast-Hell

The inability to multiply numbers of different types together without as is immensely frustrating. I know why this is – I've been bitten by a / b doing integer division before – but formulae littered with x as f64 / y as f64 are simply difficult to read. Once you've figured out the types everything needs to be, this can be largely remedied by ensuring everything coming into the function has correct type and pre-casting everything else. This helps dramatically, though in the end I often found myself just making everything f32 to save myself the trouble (as that was what most values already were). The inverted notation for things like pow and log (written x.pow(y) and x.log(y) in Rust) further hinders readability.

Verifiability & The Big Wins for Academic Use

I began this post by referencing the verifiability of simulations (and other code). Through all this, I have focused more on the usability wins rather than the verifiability. Why is that? Fundamentally, it is because code is only as verifiable as it is written to be. The choice of language only impacts this in how well it constrains authors to “clearly correct” territory. In comparison to C/C++, Rust has clear advantages. There is no risk of accidental integer division, very low risk of parallelism bugs due to the excellent standard primitives, and nearly-free documentation.Some commenting required. Accidentally indexing outside of bounds (a problem I've run into before) is immediately detected, and error handling is rather sane.

Further, the structure of Rust encourages modularization and allows formerly-internal modules to be pulled out into their own crates, which can be independently verified for correctness and re-used with ease – even without uploading them to crates.io. A concrete example of this is my Reverse Influence Sampling Iterators, which were originally (and still are, unfortunately) a local module from my most recent paper and have subsequently found use in the (re-)implementation of several recent influence maximization algorithms.

This modularity is, I think, the biggest win. While reviewing and verifying a several-thousand-line codebase associated with a paper is unlikely to ever be practical, a research community building a set of common libraries could not only reduce the need to continually re-invent the wheel, but also limit the scope of any individual implementation to only the novel portion. This may reduce codebase sizes to the point that review could become practical.

Case in Point: the original implementation of the TipTop algorithm was nearly 1500 lines of C++. My more recent Rust implementation is 253.Admittedly, this is without the weighted sampling allowed in the C++ implementation. However, that isn't hard to implement.

(Please forgive my lack of comments, that was tight paper deadline.)
This gain is due to the fact that I didn't have to include a graph representation or parsing, sampling, logging (the logs are machine-readable!), or command-line parsing.

In Conclusion...

For a while, I was worried that my exploration of alternate languages would be fruitless, that I'd wasted my time and would be stuck using C++ for the remainder of grad school. Words cannot describe how happy I am that this is not the case. Despite its flaws, using Rust has been an immeasurable improvement over C++ both in terms of productivity and sanity. Further, while verifiability is very much secondary in my choice of language, I believe the tools and safety that Rust provides are clearly advantageous in this area.

Verifiability of CS Research

Written by J David Smith
Published on 21 December 2016

It is (thankfully) becoming increasingly common for researchers in Computing to publish their code along with the associated paper. This does wonders for the reproducibility of the research, but recently it has become clear that this is not enough. For a concrete example of this, consider Errol by Andrysco, Jhala, and Lerner. The researchers working on this project had reported a 2x speed-up over the previous state-of-the-art (Grisu3), a number which was reproduced by the POPL Artifact Evaluation Committee when they ran the build scripts and benchmarks included in the Errol artifact. An author of Grisu3 thought the results suspicious, tested them, and informed the authors of Errol that the they'd found it to be 2x slower than their own work. As it turns out, this was correct: Grisu3 had been erroneously compiled without optimizations enabled due to the Errol authors' unfamiliarity with SCons. What should one take away from this story? Not to use betterSCons, while having its own problems, is IMO better than make. build systems? Not to include build scripts? Worse: not to publish code & artifacts? In my view, it is simple: reproducible work is insufficient for computer science.

Consider the usage of experimental reproduction in the experimental sciences.Of which I have never been a member, and so this is purely the viewpoint of an outsider looking in The objective of reproducing experiments is to verify the results of the experiment. However, consider that in e.g. experimental physics, the experiments would be reproduced independently using separate lab equipment. This introduces independence in the results, as no two people using two distinct sets of equipment will perform the experiment identically. In Computer Science, on the other hand, many of our experiments come down to running a bundle of code. Absent defects in the machines used to run the code, every pair of computers will produce more-or-less identical results.Benchmark timing is its own beast. While in theory every two machines would produce the same relative performance, in practice that can vary based on distinguishing features of the machines (e.g. memory speed & size). Therefore, the simple reproduction of results cannot give us the same effect as in the experimental sciences.

However, organizations like the POPL Artifact Evaluation Committee don't merely engage in reproduction. They additionally seek to verify that the results in the paper match the artifact, and that the artifact seems to legitimately work. In this case, it was a simple miss in the build script. More, however, can be done to aid this verification. On the most recent project I've worked on, I've attempted to do just that.

The most obvious way to aid verification is to have good documentation, especially for how to build and run your code. Surfacing this documentation is also important. The use of a README works well for build/run docs, but for future researchers working on extensions deeper insight into the operation of a codebase is necessary. The use of documentation tools like Doxygen for C++ or Rustdoc for Rust can surface the documentation in a readable way that is useful both for the original authors returning after work on other projects, and for future researchers.

While I'm still exploring ways to improve, I've found several tools that have helped with this in my most recent project. Over the next couple of weeks, I'm going to be writing about them in more detail. In particular, I want to examine how each improves the ability to verify my work, and how each falls short. Through this, I aim to get some sense of the direction I ought head in for future improvement.

Adding New Capabilities to Kiibohd

Written by J David Smith
Published on 13 January 2016

One of the reasons I bought my ErgoDox was because I'd be able to hack on it. Initially, I stuck to changing the layout to Colemak and adding bindings for media keys. Doing this with the Kiibohd firmware is reasonably straightforward: clone the repository, change or add some .kll filesKLL itself is pretty straightforward, although the remapping rules at times are cumbersome. They follow the semantics of vim remap rather than the more-sane-for-mapping-the-entire-keyboard noremap, and then recompile and reflash using the provided shell scripts.

This week, I decided to finally add a new capability to my board: LCD status control. One thing that has irked me about the Infinity ErgoDox is that the LCD backlights remain on even when the computer is off. As my computer is in my bedroom, this means that I have two bright nightlights unless I unplug the keyboard before going to bed.

Fortunately, the Kiibohd firmware and KLL language support adding capabilities, which are C functions conforming to a couple of simple rules, and exposing those capabilities for keybinding. This is how the stock Infinity ErgoDox LCD and LED control is implemented, and how I planned to implement my extension. However, the process is rather poorly documented and presented some unexpected hurdles. Ultimately, I got it working and wanted to document the process for posterity here.Once I get a better understanding of the process, I will contribute this information back to the Github wiki The rest of this post will cover in detail how to add a new capability LCDStatus(status) that controls the LCD status. LCDStatus(0/1/2) will turn off/turn on/toggle the LCD.

Background & Setting Up

Before attempting to add a capability, make sure you can compile the stock firmware and flash it to a keyboard successfully. The instructions on the Kiibohd repository are solid, so I won't reproduce them here.

An important note before beginning is that it is possible to connect to the keyboard via a serial port. On Linux, this is typically /dev/ttyACM0. The command screen /dev/ttyACM0 as root will allow one to connect, issue commands, and view debug messages during development.

This post is specifically concerned with implementing an LCDStatus capability. If you don't have an LCD to control the status of, then this obviously will be nonsense. However, much of the material (e.g. on states and state types) may still be of use.

The Skeleton of a Capability

Capabilities in Kiibohd are simply C functions that conform to an API: void functions with three parameters: state, stateType, and args. At the absolute minimum, a capability will look like this:

void my_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
}

The combinations of state and stateType describe the keyboard state:This information is buried in a comment in Macro/PartialMap/kll.h

stateType state meaning
0x00 (Normal)0x00key depressed
0x00 (Normal)0x01key pressed
0x00 (Normal)0x02key held
0x00 (Normal)0x03key released
0x01 (LED)0x00off
0x01 (LED)0x01on
0x02 (Analog)0x00key depressed
0x02 (Analog)0x01key released
0x02 (Analog)0x10 - 0xFFLight Press - Max Press
0x03-0xFEReserved
0xFF (Debug)0xFFPrint capability signature

Every capability should implement support for the debug state. Without this, the capability will not show up in the capList debug command.

void my_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
        return;
    }
}

Within this skeleton, you can do whatever you want! The full power of C is at your disposal, commander.

Turning Out the Lights

The LCD backlights have three channels corresponding to the usual red, green, and blue. These are unintuitively named FTM0_C0V, FTM0_C1V, and FTM0_C2V.These names refer to the documentation for the LCD itself, so in that context they make sense. To turn them off, we simply zero them out:

void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
        return;
    }
    FTM0_C0V = 0;
    FTM0_C1V = 0;
    FTM0_C2V = 0;
}

With this addition, we have a capability that adds new functionality! I began by adding this function to Scan/STLcd/lcd_scan.c, because I didn't and still don't want to mess with adding new sources to CMake. Now we can expose this simple capability in KLL:

LCDStatus => LCD_status_capability();

It can be bound to a key just like the built-in capabilities:

U"Delete": LCDStatus();

If you were to compile and flash this firmware, then pressing Delete would now turn off the LCD instead of deleting. On the master half. We will get to communication later.

Adding Some Arguments

The next step in our quest is to add the status argument to the capability. This is pretty straightforward. First, we will update the KLL to reflect the argument we want:

LCDStatus => LCD_status_capability( status : 1 );

The status : 1 in the signature defines the name and size of the argument in bytes. The name isn't used for anything, but should be named something reasonable for all the usual reasons.

Then, our binding becomes:

U"Delete": LCDStatus( 0 );

Processing the arguments in C is, unfortunately, a bit annoying. The third parameter to our function (*args) is an array of uint8_t. Since we only have one argument, we can just dereference it to get the value. However, there are examples of more complicated arguments in lcd_scan.c illustrating how not-nice it can be.

void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
    }

    uint8_t status = *args;
    if ( status == 0 ) {
        FTM0_C0V = 0;
        FTM0_C1V = 0;
        FTM0_C2V = 0;
    }
}

Figuring out how to restore the LCD to a reasonable state is less straightforward. What I chose to do for my implementation was to grab the last state stored by LCD_layerStackExact_capability and use that capability to restore it. In practice, it doesn't matter if you even can restore it: any key that changes the color of the backlight also changes its magnitude. The default ErgoDox setup has colors for each partial map, and I'd imagine most people would put a function like this off of the main typing map because of its infrequent utility. As a result, the mere act of pressing the modifier to activate this capability will turn the backlight back on. However, I implemented it anyway just in case. layerStackExact uses two variables to track its state:

uint16_t LCD_layerStackExact[4];
uint8_t LCD_layerStackExact_size = 0;

It also defines a struct which it uses to typecast the *args parameter.

typedef struct LCD_layerStackExact_args {
	uint8_t numArgs;
	uint16_t layers[4];
} LCD_layerStackExact_args;

We can turn the LCD back on by calling the capability with the stored state. Note that I copied the array, just to be safe. I'm not sure if it is necessary but I didn't want to have to try to debug corrupted memory.

void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
    }

    uint8_t status = *args;
    if ( status == 0 ) {
        FTM0_C0V = 0;
        FTM0_C1V = 0;
        FTM0_C2V = 0;
    } else if ( status == 1 ) {
        LCD_layerStackExact_args stack_args;
        stack_args.numArgs = LCD_layerStackExact_size;
        memcpy(stack_args.layers, LCD_layerStackExact, sizeof(LCD_layerStackExact));
        LCD_layerStackExact_capability( state, stateType, (uint8_t*)&stack_args );
    }
}

Now binding a key to LCDStatus(1) would turn on the LCDs.

Creating Some State

Like most mostly-functional programmers, I abhor state. Don't like it. Don't want it. Don't want to deal with it. However, if we want to implement a toggle that's exactly what we'll need. We simply create a global variable (ewww, I know! But we can deal) LCD_status and set it to the appropriate values. Then toggling is as simple as making a recursive call with !LCD_status.

uint8_t LCD_status = 1; // default on
void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
    }

    uint8_t status = *args;
    if ( status == 0 ) {
        FTM0_C0V = 0;
        FTM0_C1V = 0;
        FTM0_C2V = 0;
        LCD_status = 0;
    } else if ( status == 1 ) {
        LCD_layerStackExact_args stack_args;
        stack_args.numArgs = LCD_layerStackExact_size;
        memcpy(stack_args.layers, LCD_layerStackExact, sizeof(LCD_layerStackExact));
        LCD_layerStackExact_capability( state, stateType, (uint8_t*)&stack_args );
        LCD_status = 1;
    } else if ( status == 2 ) {
        status = !LCD_status;
        LCD_status_capability( state, stateType, &status );
    }
}

Binding a key to LCDStatus(2) will now...do nothing (probably). Why? The problem is that the capability will continuously fire while the key is held down, and the microcontroller is plenty fast enough to fire arbitrarily many times during even a quick tap. So, we will guard the toggleThe other two options move the keyboard to a fixed state and thus don't need to be protected. with an additional condition:

else if ( status == 2 && stateType == 0 && state == 0x03 ) {
    // ...
}

Release (0x03) is the only state that fires only once, so we check for that. Alas, even after fixing this, we still only have one LCD bent to our will! What about the other?

Inter-Keyboard Communication

The two halves of an Infinity ErgoDox are actually completely independent and may be used independently of one another. However, if the two halves are connected then they can communicate by sending messages back and forth.If both halves are separately plugged into the computer, then they can't communicate. I haven't delved into the network code, but I assume it is probably serial like the debug communication.

Very Important Note: You must flash both halves of the keyboard to have matching implementations of the capability when using communication.

The code to communicate changes relatively little from case to case but is rather long to reconstruct by hand. Therefore, I basically just copied it from LCD_layerStackExact_capability, changed the function it referred to, and called it a day. Wonderfully, that worked! Well, sort of. It turns out that not guarding against recursion caused weird issues where it would work with the right-hand being master, but not the left. It took a long time to debug because the error was unrelated to the fix (guarding against the recursive case).

#if defined(ConnectEnabled_define)
  // Only deal with the interconnect if it has been compiled in
  if ( status == 0 || status == 1 ) {
     // skip in the recursive case

    if ( Connect_master )
      {
        // generatedKeymap.h
        extern const Capability CapabilitiesList[];

        // Broadcast LCD_status remote capability (0xFF is the broadcast id)
        Connect_send_RemoteCapability(
            0xFF,
            LCD_status_capability_index,
            state,
            stateType,
            CapabilitiesList[ LCD_status_capability_index ].argCount,
            &status);
      }
  }
#endif

The magic constant LCD_status_capability_index ends up available through some build magic that I haven't delved into yet.

The Final Result

Putting all of that code together, we get:

uint8_t LCD_status = 1; // default on
void LCD_status_capability(uint8_t state, uint8_t stateType, uint8_t *args) {
    if ( state == 0xFF && stateType == 0xFF ) {
        print("my_capability(arg1, arg2)");
    }

    uint8_t status = *args;
    if ( status == 0 ) {
        FTM0_C0V = 0;
        FTM0_C1V = 0;
        FTM0_C2V = 0;
        LCD_status = 0;
    } else if ( status == 1 ) {
        LCD_layerStackExact_args stack_args;
        stack_args.numArgs = LCD_layerStackExact_size;
        memcpy(stack_args.layers, LCD_layerStackExact, sizeof(LCD_layerStackExact));
        LCD_layerStackExact_capability( state, stateType, (uint8_t*)&stack_args );
        LCD_status = 1;
    } else if ( status == 2 && stateType == 0 && state == 0x03 ) {
        status = !LCD_status;
        LCD_status_capability( state, stateType, &status );
    }

#if defined(ConnectEnabled_define)
    // Only deal with the interconnect if it has been compiled in
    if ( status == 0 || status == 1 ) {
       // skip in the recursive case

      if ( Connect_master )
        {
          // generatedKeymap.h
          extern const Capability CapabilitiesList[];

          // Broadcast LCD_status remote capability (0xFF is the broadcast id)
          Connect_send_RemoteCapability(
              0xFF,
              LCD_status_capability_index,
              state,
              stateType,
              CapabilitiesList[ LCD_status_capability_index ].argCount,
              &status);
        }
    }
#endif
}

I have a working implementation of this on my fork of kiibohd. I'm looking forward to adding more capabilities to my keyboard now that I've gotten over the initial learning curve. I've already got a couple in mind. Generic Mod-key lock, anyone?

Thoughts on XCOM: Enemy Within and XCOM: Long War

Written by J David Smith
Published on 5 January 2016

At this point, I consider XCOM: Enemy Unknown/Within to be one of my favorite games of the past few years, if not an all-time favorite. I'm not going to talk much about why XCOM is a good game; that has been covered more than adequately. I am rather disappointed that it took til 2015 for me to discover it, but I am immensely glad that I did. I have over 100 hours on Steam at this point, far surpassing any other recent single-player game.Prior to XCOM the most recent game I'd put this much time into was Dragon Age: Origins. I beat the game on Classic Ironman recently, and after a few several many attempts at Impossible Ironman, I was in the mood for something new. Still XCOM, mind you, but new. After looking a bit at the Second Wave options,XCOM has a variety of options that tweak the gameplay, such as reducing the accuracy of injured soldiers ("Red Fog") or gradually increasing accuracy as you approach a complete flanking ("Aiming Angles"). I am looking forward to unlocking item-loss on death ("Total Loss"; an omission that surprised me until I learned about the Second Wave options) when I get around to beating Impossible I decided to take a look at that mod I kept hearing about: Long War.

Ho. Lee. Shit.

More Like Long Changelog

To say that the number of changes made by Long War is many does a great disservice to the word. The changes in Long War are legion. They are multitudes. This isn't merely a set of tweaks and additions. Rather, it is very nearly a total conversion.

Once I recovered from my shock and the number of changes, my first fleeting thought was one of concern. Was this just a kitchen sink mod? A realization of some long-time fan's laundry-list of changes to make XCOM more like its predecessor? An in-my-opinion misguided attempt to make a game about saving Earth from space aliens more realistic? Further inspection only increased these concerns. Why have Assault Rifles and Battle Rifles and Carbines? What did adding 4 more classes bring to the table? Why require 10 corpses for an autopsy instead of 1?Not that it matters, I have so fucking many corpses and wrecks that they could ask for 50 and I could still do most of the autopsies. These thoughts made me hold off on diving into it. I started (and failed) another Impossible Ironman campaign first, then I downloaded and installed Long War.

They strongly recommend that you start back on Normal, because as I mentioned above: the changes are significant. So I did. I learned about the new systems, and steamrolled mission after mission with the new classes. That isn't to say they're imbalanced, just that if you play enough Impossible then Normal becomes pretty straightforward, even if it is a bit harder than Enemy Within Normal. One thing that struck me very early on is the quality of several of the UI/UX changes made by the Long War team. Scanning for contacts now stops right before a mission expires, giving you the chance to wait for a new tech to finish or a soldier to get their ass out of bed. When a unit (alien or human) enters Overwatch, it is now shown next to their healthbar, which means that the player is no longer out of luck if they happen to look away during the alien turn. They also added a "Bronzeman" mode that strikes a good medium between savescum and Ironman modes. It that behaves very much like the default in Fire Emblem: you are able to restart the mission, but not re-load in-mission saves. I would love to have these changes by themselves in the base game, and I do believe that some (like the Overwatch change) made it into XCOM 2. The changes made to the actual gameplay don't fundamentally alter the way it plays at a tactical level, although they do change the squad compositions that are effective. However, there are long-term ramifications to some of the changes that notably impacted my enjoyment of the game.

To Know Your Face

Far and away the most damaging change to the game is Fatigue.Not the only bad change, though. I could probably write an entire post on why Steady Weapon is awful design. In XCOM, when a soldier is injured in a mission they must take some time off after to heal up. As a result, it is prudent to have at least a B-list of soldiers that you can sub in for that Major Sniper that you barely saved from bleeding out. In Long War, when a soldier is not injured they still must take 3-5 days off (more for psionic soldiers actually wanting to use their psychic powers). This means that not only do you need a well-prepared B-list, but also a C-list. On the surface, this seems kind of cool.I would phrase that more subtly, but I already gave my thesis away. It means that you have to try more strategies, with more combinations of units as they rotate through various states of wounded, gravely wounded, and fatigued. However, it also means that you need a lot more soldiers. I never had more than 20 at a time in Enemy Unknown or Enemy Within. In Long War, you start with something like 40.

ThisOther changes, like increased squad size and muddy class identities, also play into this. However, fatigue is far and away the biggest cause, so I'm focusing on that. unintentionally changes something that I very much liked about XCOM: the impractical, unsustainable, and outright damaging attachment I had to my soldiers. I don't always know their names (I'm really very bad with names), but I know their faces. I remember The Volunteer from my first successful (Ironman) campaign. I remember the struggles, the near misses. She was the sole survivor of the tutorial mission, which I'd forgotten to disable and one of the only women I recruited in the entire campaign. Yet she turned out be psychic and despite numerous barely-stopped bleed-out timersWhen an XCOM solder is reduced to 0 HP, they have a chance to instead bleed out. They become effectively dead (as far as your tactics are concerned), but unless you get to them with a Medkit and stabilize them within 3 turns, they become actually dead. she managed to survive the entire campaign.

After ten hours with Long War, I didn't know any of my soldiers. I lost a Corporal (rank 3 in LW) and two Lance Corporals (rank 2) in a single mission (two in a single turn) due to lucky shots from Thin Men, and didn't feel the urge to restart it. It wasn't even that I had replacements for them all, as I didn't have any medics at all once my Medic Corporal died. I was utterly detached from them. I didn't know any of them, not like before. This ultimately seemed to neuter the tension of each and every mission, turning a game whose bog-standard abduction missions I could play for hours into one where ten hours over two sessions felt like a slog. The moment-to-moment tension of one missed shot dooming a soldier is lost when you no longer care particularly about any of your soldiers. I uninstalled the mod after the second session – and then immediately spent several hours longer than I'd intended playing a new Classic Second Wave Ironman campaign.

This, of course, does not make Long War bad. As I mentioned above, it nears the level of being a total conversion. In fact, I think it is most aptly called exactly that. The Polygon quote on the project page is quite telling:

"Turns XCOM: Enemy Within into nothing short of a serviceable turn-based military alien invasion strategy wargaming simulator." - Polygon

I did not enjoy Long War because I was looking for more of what I liked about XCOM: Enemy Within. I wanted more of the XCOM that was almost Fire Emblem with guns and aliens, not a "military alien invasion strategy wargaming simulator". All told, I think Long War is one of the best mods I've ever seen. However, this is not the mod I was looking for.

The Actual Point

But that's not what I wanted to write about. All of this is just the backdrop. You see, XCOM 2 is coming out soon, and internet comment sections – being the cesspits that they are – are full of a specific breed of comment that gets under my skin. Not the comments recommending that fans try Long War. No, those are fine, at least on principal if not in practice. Good, even, as the mod they are pushing is in fact rather good.

My problem is the legion of comments that follow one of a few varieties:Paraphrased because this scrublord didn't bother screenshotting or bookmarking the comments when they were first seen, and digging through internet comments for another couple of hours doesn't seem particularly appetizing. Nobody needs screenshots of commenters being shitlords anyway.

All have the same underlying assumption: that people have the same tastes as the commenter. This self-projection is unfortunately endemic online, especially within the gaming community (where I've seen more "you're wrong because you like a thing that I don't" fights than anywhere else by a large margin).

Empathize, for a moment, with a mythical person that is considering picking up XCOM. Saving the Earth from aliens sounds cool, and they like strategy and tactical games, so it seems like a natural fit. Maybe this person that would agree with these comments; they would find Long War to be generally superior to the base and might skip the sequel in favor of the Long War team's game.Although from the sounds of it, the new kid on the block is just getting started. I would honestly be surprised if most Long War fans didn't pick up both XCOM 2 and the Long War team's product. But maybe – maybe – they wouldn't. This isn't merely hypothetical: if I had installed Long War immediately, I never would have gotten the hundred-plus hours of gameplay out of XCOM that I did. I barely lasted ten hours in Long War. The moment I knew it was over for me is when I began spending more time wondering when LW was going to get interesting than thinking about optimal strategies for filling aliens with holes. Again, I'm not saying that Long War is bad, merely that it isn't what I'm looking for.

The moral here is that internet commenters need to stop being shitlords and consider the fact that not all players – even only considering those that like a particular game – like games for the same reasons. So don't say "Just install Long War. You'll thank me later." Instead, consider "If you like XCOM, try the Long War mod. It's bloody fantastic." Don't imply that the intersection of people who like XCOM and people who like Long War is total. It isn't – I am proof of that – and it could drive people away from experiencing a pretty fantastic game.

Looking Back on 2015

Written by J David Smith
Published on 1 January 2016

It seems that every year of my life is more eventful than the last. 2015 was a big year in a lot of ways. I graduated from the University of Kentucky, moved away from home for good, and started grad school at the University of Florida. I worked 3 different jobs in 3 different states. I finally got my drivers license, and over the course of the year bought 2 cars and wrecked 1.

My Last Semester in Kentucky

I could've had an easy last semester at UK, but of course I opted not to. I continued learning German (something that I have let lapse, unfortunately), working in Dr. Jacobs lab, and did multiple projects for classes. The project of note was learn2play, which was an attempt at learning to play Hearthstone by watching the screen. Despite all of the changes that have happened to that game this year, my project actually should still work. This pleases me. The skills I gained in this semester have been invaluable. I learned how to use scikit-learn, which has paid more dividends than any other library I've ever used save numpy.

I also made a point of overcoming my aversion to lists during this semester. I was given a notebook for my birthday, and began using it for, well, everything. In particular, whenever I had a set of tasks to do, I'd write down the list, with blank checkboxes next to each item. This immediately helped me stay on top of the many non-school tasks I had, and is probably the reason that I managed to make it to Boston without forgetting anything.

I've Never Been to Boston in the Fall

After the semester ended, I started a second but very different internship at IBM. Instead of being in the ExtremeBlue program, I was working with the AppScan Source team in Littleton, MA near Boston. AppScan Source is a security-oriented static analysis tool. Going in, I had some idea of what I'd be doing,Machine learning to reduce the false positive alerts given to users by the tool which turned out to be entirely wrong. I actually ended up working on a somewhat blue-skiesIn that the form of the resulting visualization was unknown, as was the set of inputs to build it. We knew that it would be a visualization, and hoped that it'd be helpful for understanding the product's reports project.

This project is probably my all-time favorite. Although I had no idea going in, it turns out that I really like working on data visualization.I got a couple of Edward Tufte's data visualization books for Christmas and have already gotten most of the way through one of them (The Visualization of Quantitative Information). I got to do a ton of experimentation on not only different ways of viewing the data, but also different ways of constructing it. I spent 4 days one week writing finite-domain Prolog code. It was glorious, although the result was impractical. The final version of the visualization ended up being beautiful, and I wish I could put an image here. Once it gets deployed in the product or I get some other indication that I'm legally allowed to, I'm going to see about printing a poster-sized visualization of something.

This internship was great not only because of the project, but also because of the team. Kris Duer was my mentor for the project, and was great to work with. I wasn't his only intern over the summer, and people constantly joked about him building an army. The team as a whole was great to work with, and when pass through the Boston area again, I'm definitely going to try and see them.

The summer wasn't all work, though. I got to hang out with my good friend John Bellessa, who was one of my teammates from the previous summer. Doing the Freedom Trail with him and his fiancé Lorraine was one of the highlights of my summer.

During this time I also began learning Brazilian Jiu-Jitsu at Fenix in Lowell. I had no idea what I was getting into, but I'm glad that I did. BJJ is much different than the martial arts I'd done in the past because it focuses almost exclusively on ground work.When rolling (BJJ-ese for sparring), we'd start on the ground and stay there 99% of the time. The instructor at Fenix is great, and if you're in the area I'd highly recommend checking out his gym. I've continued practicing BJJ now that I'm in Gainesville, and plan to do so in the future.

Of course, I have to mention that I totaled my first car while I was in the Boston area. By rear-ending someone at a red light. Oops. I really liked that car,A 2003 Honda Civic Hybrid. I drove from Boston to just south of DC (~500mi) without stopping for gas. On the trip back I took the scenic route. I got about 10mpg better on the way down. {% asset_img mpg.jpg %} and am really disappointed that I only got to use it for about two months. Having to buy another in short order was a stressful experience that I hope to never have to repeat. I have had a lot more trouble with the car I bought to replace it, which overheated multiple times on the trip from Boston to Florida. Being stuck on the side of the road many hours drive from anyone you know is by far my least favorite part of the year.

Florida

Once I actually got to Florida, things looked up. I didn't have any more car trouble. I ended up renting a room in a house from a fellow graduate student, Elaine. She's an older student who is finishing up her PhD in Journalism/Anthropology this semester. Having someone to talk to that was familiar with the area was invaluable. It was also nice to get to talk to someone whose research area is so far removed from my own.

Classes at UF haven't been particularly remarkable. I took grad-level classes at UK, so I wasn't surprised at all by the level of difficulty. I did have to do two projects this semester, both of which turned out reasonably well. For one, I used Markov Chains to show that the performance of attacks on Mix networks changes when multiple adversaries act independently. For the other, I showed that by replacing words with synonyms authorship attribution on Twitter can be defeated. In the latter project, I constructed a visualization that I quite liked.{% assetimg allnew.png Relative Confusion Matrices %}This grid of confusion matrices shows the results of different combinations of machine learning features and evasion methods. This construction shows how a single user evading classification impacts the overall classification accuracy. The top row is present merely to provide a reference to a common feature set for authorship attribution, which does not perform well on tweets.

I found an advisor, Dr. My Thai, relatively quickly. She works on social network-related projects, which is what I'm really interested in. I started working in her lab in late October. I'm pretty happy with my decision thus far, but it is very early in my career. I like my lab-mates and Dr. Thai seems to be understanding and in general nice to work under.

Applying for the NSF Fellowship

One thing that UF has that UK didn't is a course dedicated to helping students write the NSF Fellowship applications. Run by Dr. Mazyck (who definitely has a strong personality), the course focused on helping us avoid common and not-so-common mistakes in the application process by starting early and constantly revising our essays. I spent so. much. time. on those essays over the first ¾ semester, it isn't even funny. I'm hopeful that it will pay off, but I won't find out for another few months. I based my application on detecting throwaway harassment accounts without compromising user privacy, which made it easy to cover broader impacts but more difficult to detail the intellectual merits of my proposed work. The biggest thing I got from this was from the personal essay. In writing it, I discovered that much of the work that I've done is actually relatively well tied together – and that it rather clearly points in the direction I'm heading now.

Progress on Goals for 2015

The goals I set for 2015 were simple: get over my list-phobia and become more consistent. I can't say that I really succeeded at the second, but I definitely succeeded at the first. I pretty routinely write out lists in my notebook to organize my thoughts. As for consistency and self-discipline? I still don't have a decent daily routine, so that one goes down as partial failure. However, I do at least have a pretty reliable weekly routine which will be getting upset next week by my class schedule change.

My overarching 'Otherness' goal is still just that: a goal. Thinking back on the year, I feel that there has been very little interpersonal conflict. The majority of difficulties I faced this year were from events such as wrecking my car, not people. However, I do still catch myself snapping at people. It is rare, but it happens. Mostly to my baby brother when he gets really talkative in the middle of me trying not to die to Zed mid-lane. Not an excuse, but context is important. I am making an effort to better hold my tongue.

Looking Forward to 2016

Somehow, I doubt this year will be crazier than last year. Only time will tell, but I wouldn't really mind either way. I do have a few goals for 2016.

My 'career' goals are to publish at least one paper first author, and to finish my quals (which are now an extensive lit review for your proposal at UF). Pretty self-explanatory. I simply want to make progress on my PhD. I would also like to look into teaching a class myself. I TA'd this past semester, which was an experience I enjoyed, and will be TAing this coming semester. I'd like to take the logical next step and teach a class myself. It is unlikely that I will get to do so this year, but I'd like to make progress on getting to do so.

My only other goal this year will seem a bit odd to people that know me, as I'm really a rather indoors-y person: I'd like to go backpacking. Not like backpacking on a mountainside for a day, but more like backpacking from one state to another. I've not done much looking into this yet, but I'm hoping to take a couple of weeks this summer and go somewhere (the Pacific Northwest? Europe?) to do this.

The past year has left me very hopeful for the future. I managed to survive the general insanity of moving cross-country twice in a year, and have acclimated reasonably well to first-year-grad-student life. The future is bright, and full of potential. Here's hoping that I won't play too much XCOM to take advantage of that.

Representative Computer Science Courses

Written by J David Smith
Published on 12 November 2015

One problem: the coursework is not at all representative of what one actually encounters in a job in the technology industry.

Representative?

The course I am TAing is the second programming course at UF. It's a weird class, and probably best described as a C++ course with a very wide variety of projects. For reference: the first course is in Java. Students are expected to know programming fundamentals a priori, and expect to learn how to program in C++ and some more advanced programming techniques. The projects they've done in this course so far have been computing magic squares, mimicking memory allocation using custom-built linked lists, and a lexer. Students are also working on a group project concurrently with the third (and upcoming fourth) project.

My first thought on seeing the project list was of the vi learning curve. The first and second projects are appropriately difficult, but the lexer is on another level entirely. They "just" have to implement one without any formal language definition or any knowledge of the theory behind how one would typically construct a lexer.My knowledge on this is sketchy at best, given that I haven't actually taken a compilers course. I believe the typical tool is a pushdown automata, which are generally automatically generated from an EBNF. I would consider this project if not difficult then overly time-consuming for me, which indicates that it is almost certainly not appropriately scaled for the second programming course.

That isn't what I came here to talk about, though. My problem with the course setup is that students are on their second year and have absolutely no idea how to do anything but read from and write to a terminal. And to top that off: they don't know anything but Java and C++, which are two of the worst languages in existence.Excuse my hyperbole, but I would seriously turn down any job offer if they said I'd be stuck writing Java. C++ would be negotiable if I got to do cool HPC stuff with it, but I have a great dislike for the language. The experiences they have in these classes are not only unrepresentative of work in industry, but are so far removed as to be mis-representing it!

Nobody, outside of very specialized areas, implements their own data structures. These are important, but we have a data structures class. There is no need for entry-level students to rebuild the wheel from messy blueprints when they could be doing something interesting. Very few people write command line programs for a living. They are common in OSS because libraries make them simple to write. But students don't get to use libraries outside of <iostream>, so they don't see the benefits. Not only that, but working in the terminal is easily the least intuitive part of our field. It is for power users, but command-line programs all but necessitate that style of work.You could use an IDE, but consider the bugs these can introduce. Literally today, one student's program was hanging. We tried it from cmd.exe and on the lab machines, and it worked perfectly! The IDE simply didn't handle the program's exit properly.

I had an epiphany today when the student said that. I have wanted to understand why students leave computer science, and here I found a reason I hadn't considered. It wasn't difficulty (the student is doing well in the course and is switching to a major that will probably be harder), it was dislike. They didn't like what they were shown! And I am left to wonder? Would they like what programming is actually like?

We Can Do Better

This changes my perspective. There are a lot of topics that are front-loaded but don't need to be. There is something to be said for having weed out classes, but oughtn't those be based on difficulty, not liking something? The command line, Java, C++, esoteric projects, library restrictions: all of these get in the way. What do we gain from them? The first three give marketable skills, which are important, and may warrant some early coverage for those students wishing to have an internship between their sophomore and junior years. However, are these the best skills for the jobs students want? And the latter two: are they even contributing?

My alma mater's sequence was 1 semester of Python, followed by 2 of C++. At least that covers a not-shit language, and if memory serves the projects were actually interesting. There was one lecturer in particular that stands out, although I did not get to take their course. Dave Brown taught the second C++ course, and the students worked individually on one project over the course of the semester. They implemented a roguelike game in C++, complete with a test suite. There were milestones for each stage of the project, and every student I spoke withI was a tutor at UK and so got to talk to a lot of fellow students about what classes they liked/didn't like actually enjoyed it.

Emallson & I

Written by J David Smith
Published on 13 September 2015

Recently I commented on Facebook (and Twitter, see below) that I was tempted to just have people call me Emallson. It occurred to me afterward that most people that have me on FB (1) have never seen the name Emallson in association with me, and/or (2) have no idea what the background on this name is, and why I use it. So that's what this post is for: to clarify that I am Emallson.This isn't my only alt-name either. The other common (public) one is Atlanis, which this domain is named for!

I got the name Emallson more than a decade ago in the least exciting of ways: a random name generator. I was just starting to play Anarchy Online, my 2nd MMO, and couldn't seem to come up with a name that wasn't already taken. So I hit random. 3 or 4 tries later, Emallson popped up, and the game let me finish character creation and enter the world. Not very exciting, but in some sense it was a transformative event for me. I have a lot of good memories from AO, almost all of them as Emallson.AO has this interesting XP-debt system. When you die, your unsaved XP is added to a pool and you earn 1.5X XP until the pool is empty. At one point the pool was so full that the XP in it could have taken me from level 40 to 80 by itself. I never got to a very high level. I played AO heavily for years, and very quickly became attached to that name. The reactions I get to EmallsonIt's in my email address. in the real world are about what I'd expect. Weird glances, confused faces, requests to spell it out one more time. One thing that struck me as odd was how many people responded the same way in-game. Neither of these deterred my use of it. Each time I explained it to someone, I got a little more attached to it.

When I first installed Ubuntu, I had to pick a username. I'll let you guess what I chose. I've been using that name as my Linux username ever since. I picked up the email emallson@archlinux.us. My official university emails are emallson (at UK) and emallson (at UF). My login to the UK lab machines is emallson.Still working on getting that changed at UF Almost everywhere except in direct contact with other human beings, my name is emallson. In my mind, it doesn't seem strange to be called emallson: people I know are the only ones that don't. Funny how that works.

There is another bit of history in this name. Y'see, when I became involved in as many different communities as I did, I had a lot of different names. Most of them are dead now, but emallsonCapitalization of emallson is strictly optional. (and to a lesser extent, Atlanis) live on. The way I approached these community identities was very much like one might approach a tabletop RPG character: compartmentalization of what each identity knew and how each behaved.First person to cry schizo gets slapped. I can't say for sure why I did this; I know some part of it was that it entertained me. Emallson, being such an early arrival, didn't suffer from this. Emallson was always me. More than that: it was a blank slate. I knew of nothing in any way related to anything that remotely sounded like Emallson. Emallson was connected to literally nothing, save me and my AO character. Even my real name has more connection than that.

My real name is Johnathan David Smith. All three terms are crowded with others claiming the same names. You can't google that name without coming up with millions of hits that are entirely unrelated to me. You wanna know what you get when you google Emallson? Go look; I'll wait. This ultimately was one of the core reasons that Emallson stuck to me, I think. It was empty and I could fill it, versus a name that was already full that I needed to force my way into. "'Force my way into'?", one might ask. Yes. On my soccer team in high school, I was Smithy, because there were other Davids and Johns. In my internship this past Summer, I was Agent Smith, because I sat next to John Peyton and John Butler, and again there were other Davids. The name of the Tau Beta Pi chapter president my junior year was David Smith – and it wasn't me.

Quite a few people on campus have already conversed with me via email. Typically, they'll have talked with me via my University address, which reports my name as Johnathan D Smith.Trying to get that changed, too. The conversation typically goes like this:

"Hello; Johnathan, right?" "Yea, but call me David." "David?" "Yea, middle name." "Why?" \

It's been this way for my entire life, and I'm kind of frustrated with it. That was the instigation for that comment. I'm tired of explaining myself, tired of being mistaken for others. One thing I'm not quite tired of is being able to easily hide. As names go, David Smith is easy to go unnoticed with. Emallson, on the other hand, is still a name, but it is mine and none other's. I can make a statement about Emallson that I can't about David: I am Emallson, and Emallson is me.

New Blog (Again)

Written by J David Smith
Published on 8 September 2015

Why?

My old blog engine worked well for the most part. The major wrench in the works was actually the dependency on Emacs as an exporter. It prevented me from doing a lot of fancier things with the content because templating by string concatenation is a pain in the ass. It also would routinely break with updates to org-mode.To be frank, my most significant gripe with Emacs these days is how difficult it is to maintain a statically versioned config. Once installed, things don't update; but whenever I re-run the setup (which happens more often than I want to admit when I'm hopping back and forth between machines), things will inexplicably break because a MELPA package updates.

I really liked the Clojure piece: that bit was very pleasant to work with. I didn't take the time to understand how some of the pieces *cough*optimus*cough* worked, but it still did exactly what I wanted.

Ditching org-mode

I ultimately ditched org-mode entirely, which was rather disappointing. The problem ultimately was that nothing supported it, and I got sick of rolling my own workarounds when Markdown covers 99% of my use case and the edge cases are covered by inline HTML.Yes, I know ox-md exists and will let me export to Markdown, but there isn't much point in exporting from org when it is giving me relatively little. The amount of fiddling required for the more advanced org features to render the way I want them to is too much for me.

I really like org-mode, so this was a tough sell for me. I even went through and created an org AST parser in Clojure (using the output of org-element as input) using Enlive to transform it to HTML, but it was finicky as hell and I knew that I would not want to update it when the output of org-element changes next.

Really, I could have plugged in markdown instead of org as the parser for my existing blog and gotten away with it. But no. That adventure is over for now; I have other adventures that are consuming what was formerly fiddling-with-blog-engine time.

Tufte CSS

I fell in love with Tufte CSS as soon as I saw it. I don't know if it is actually a great choice for my blog, but I'm gonna give it a shot! The highlights of it are that it has excellent font design, is incredibly simple, and has this lovely concept of margin notes. Margin notes are really simple in concept but I've never seen them on a blog before. I am rather fond of asides and frequently littered my posts with parentheticals containing them. I believe that margin notes are better suited for this.

No More Comments

I never really had issues with my Disqus comments, but I also never had much use for them either. Nobody commented. They provided no analytics and I doubt that I'd have used them anyway. If people want to comment on a blog post, they can email me, or tweet at @emallson.

Why Hexo?

I could describe some of the things I like about it, but honestly: it was the first batteries-included static blog engine for Node that I came across. It is doing everything I want for right now, so I'm unlikely to change it for the moment.

In Conclusion...

This is one part of my effort to update my site as a whole. Updating the style of my blog is an important piece. I have updated my main page as well, and am debating whether I should stick with Bootstrap or go with Tufte. I feel like I could accomplish a lot with that margin to give more info and character to specific events, but we will see. We will see.

How To Set Up an Encrypted, Compressed Filesystem in Arch Linux

Written by J David Smith
Published on 15 August 2015

The best example I have of this is a large dataset I'm downloading from a REST API as we speak. The current uncompressed size is 25G. The amount of space used on this partition has only increased by about 5G so far.The size is reported by du -hs, which does not report compressed size on a btrfs-compressed partition

This document is intended to be a guide on how to set up a disk (especially a SSD, which will best take advantage of the features) to use both encryption and compression. Please read the entire thing once at least before attempting installation. In particular, in step 4+ there are gaps in the process where 'normal' installation continues (and for which I have not duplicated the normal instructions). While none of them are irreversible, it will be easier if you understand everything before diving in.

The Goal

One frustration I've always had with FS setup guides is that they often don't start with what they intend to give you. I will not make that mistake. The ultimate result of this guide should be a fresh Arch Linux installation with:

    **WARNING:** It is **very** important that you do not use a
    swapfile on btrfs! *It will not work!* You have been warned!

Note: Much of the LVM-on-LUKS material is now covered on the Arch Wiki, which I did not realize when beginning to write. The material used to be much more scattered. I pieced together much of the contents of this post from reading various blogs and the dm-crypt wiki page.

Step 0: Pre-Setup

BACK UP YOUR DATA!

Unless you are working with a brand-new drive, do a double check to confirm that you have all the data you need. Unlike normal formatting, where blocks are typically touched in an ordered fashion, encrypted data will be spread across the drive. Thus, the chance to retrieve data will very quickly vanish!

With that said, grab the latest Arch CD and burn it to a disc. Boot from it.Remember to pull up this document on a phone or other computer, or to print it off!

Step 1: Initial Partitioning

Using your favorite partition editor (I personally am a fan of parted), create 2 partitions:

  1. /boot (See this page for UEFI systems)
  2. A blank partition consuming the rest of the drive (or some portion of it. Your choice)

For simplicity, I will use sda1/2 to refer to these partitions. In the real world, it is best to use their UUIDs to reference them.

Step 2: Encryption

Setting up disk encryptionAgain, I make no promises about the security of your data! The default cryptsetup settings are pretty solid, but not necessarily optimal! is surprisingly easy with cryptsetup.

  1. # cryptsetup luksFormat /dev/sda2
This command sets up encryption on <code>/dev/sda2</code>. It should prompt
you for a passphrase.<label for="sn-key" class="margin-toggle
sidenote-number"></label><input type="checkbox"
class="margin-toggle" id="sn-key"/><span class="sidenote">You can
replace it with a key on a flash drive or some other setup later.
Setting LUKS up to use anything other than the default passphrase
setup is outside the scope of this guide</span> *Please* remember
this!
  1. # cryptsetup open –type luks /dev/sda2 vg
This command sets up a mapping from <code>/dev/mapper/vg</code> to the
(decrypted) contents of the drive.

Step 3: LVM

To create a set of LVMI use LVM here because – last I knew – swap partitions can't be on BTRFS sub-volumes. Since LVM is already needed, there isn't much point in adding yet another layer of indirection with BTRFS sub-volumes on top of LVM volumes. volumes:

  1. # pvcreate /dev/mapper/vg
This command creates an LVM *physical volume*. See the man page
for more details on what that actually means.
  1. # vgcreate vg /dev/mapper/vg
This command creates a *volume group* on the *physical volume* at
`/dev/mapper/vg`.
  1. # lvcreate -L <N>G vg -n swap
<code>lvcreate</code> creates a *logical volume* in a *volume group*. Again,
see the man page for more details on the actual meaning of the
terminology.
Replace `<N>` by the amount of RAM you have. So if you had 4GB,
it'd be `-L 4G`.
  1. # lvcreate -L 30G vg -n root
This partition will be used for <code>/</code>. I like having a fairly large
amount of space, especially as some dev kits (looking at you,
Android) clock in at rather heinous sizes.
  1. # lvcreate -l +100%FREE vg -n home
Finally, use the rest of the space for home.
  1. # mkfs.btrfs /dev/vg/root
<code># mkfs.btrfs /dev/vg/home</code>
`# mkswap /dev/vg/swap`
Create the filesystems on each of the partitions. Compression is
set after creation.

Step 4: Compression

Continue with the normal installation with two exceptions:

When mounting either btrfs volume, use the -o compress=lzo option to mount.In fact, existing btrfs partitions can be compressed on the fly simply by setting compress=lzo or compress=zlib in /etc/fstab This will enable compression of newly-written data.

When generating the /etc/fstab file, add the compress=lzo option to the 4th column. If you are using an SSD, adding noatime,discard,ssdNote that enabling discard has security ramifications! Discard will remove any chance of claiming plausible deniability and will reveal some of the usage patterns of the disk. Discard will not reveal any data. In my case, I find it worthwhile to make this tradeoff in order to extend the life of the drive. is also recommended. When labeling the drives in /etc/fstab the command lsblk -o NAME,LABEL,UUID can be used to locate the LABELs or UUIDs of your volumes. *It is strongly recommended that you use those instead of the dev-path* format!

Step 5: Bootloader

Continue with normal installation until you are setting up the boot loader.If this is your first time setting up a boot loader on UEFI, it may seem as if the world has suddenly become a confusing and dangerous place. I recommend using systemd-boot (formerly known as gummiboot). Any feelings about systemd aside, it is really simple and easy to use. See the Arch Wiki for more info.

Step 5.1: Configure mkinitcpio

Two hooks need to be added to mkinitcpio: encrypt and lvm2. Add them – in that order – to the HOOKS line of /etc/mkinitcpio.conf after the keyboard hook and before the filesystem hook. If you also want to set up hibernation, add the resume hook just before the filesystem hook. If you are using an alternate keymap (like colemak or dvorak), add the keymap hook immediately before the keyboard hook.

The placement of the hooks is important! They are run in the order they are listed. This ordering makes sure that the keyboard is enabled before decryption is attempted – otherwise no passphrase could be entered – and that decryption occurs before filesystems are mounted.

Run mkinitcpio -p linux to rebuild the initramfs.

Step 5.2: Configure the Kernel Parameters

Any bootloader you use should provide a way to configure kernel parameters see the relevant wiki page for details on how to do it for your specific bootloader. There are three parameters that are important:

This parameter should come *first* in the options line. It sets
up decryption and the device mapping. In my case I have this set
as `cryptdevice=/dev/sda2:vg:allow-discards`. Make sure that you
use the correct device (preferably via UUID, which is not a thing
I am doing right now). If you do *not* have `discard` enabled,
then leave off the `:allow-discards` part.
This parameter controls which partition is mounted as <code>/</code>. This
should be set as `root=/dev/vg/root` unless you chose names other
than `vg` or `root` for your root volume.
This parameter is optional but recommended for enabling
hibernation. It should point to your encrypted swap partition:
`resume=/dev/vg/swap`

My entire (working!) kernel parameter line is:

plain
cryptdevice=/dev/sda2:vg:allow-discards root=/dev/vg/root quiet rw resume=/dev/vg/swap

Step 6: Finish & Enjoy!

Everything should be in order, so finish the installation process and reboot. If you have set things up correctly, then after booting you should be greeted with a prompt for your passphrase.

That's it! Your / and /home partitions are both transparently compressed and encrypted (in that order), and your swap partition is encrypted!Additionally: if you followed the instructions to enable hibernation, then `systemctl hibernate` should work and rebooting should prompt for your passphrase before resuming.

On my laptop, compressing /home has gotten me 15-30% more storage (depending on what I have on home at any given time – large text files like JSON data compress better than small text files or binary data like videos). If I were using zlib instead of lzo or used the compress-force mount option, it'd be even more. A 15% storage gain may not seem like much, but that's an extra 30GB of space on my 200GB /home partition. Given that SSDs are typically smaller than their magnetic-platter siblings, every additional byte helps.