Inertial Balance

What to do when it's late at night and your high schooler says his table wasn't able to complete their physics lab today because they were missing equipment, and the teacher said, maybe half jokingly, that they could complete the lab at home if they didn't finish it in class? That's right, you build experimental equipment in your garage.

This is the Interial Balance we built from scratch using two hacksaw blades. It took about about 20 minutes to build and then another 10 to actually run the experiment.

I hope we don't have to "junkyard wars" all of his labs, but this was fun and quick to build.


GOP Climate Change Denial Timeline

Building on The Republican race: Five degrees of climate denial, extended to the full seven stages:

Stage 1: Denial
Pre 2010 - The climate is not changing.
Stage 2: Ignorance
2010 - The climate might be changing, or it might not, we just don't know.
Stage 3: GAIA Bashing
2014 - Climate change is real, but it’s natural.
Stage 4: We so tiny
2016 - Climate change is real, but humans aren't the primary cause.
Stage 5: We so poor
2018 - OK, humans are the primary cause, but we can't afford to do anything about it.
Stage 6: Acceptance
2020 - This is awful, why didn't you tell us it would be this bad!?!?
Stage 7: Revert to Form
2024 - We would have fixed the climate if it wasn't for Obama.


An update on httplib2

It's March 2016 and that means I've been building and maintaining httplib2 for over 10 years. As with most of my best software, httplib2 was initially a rage-based project, fed by my disgust on the state of HTTP client libraries in 2006.

In the past 10 years it's gathered quite a following, with 0.9.2 current being downloaded from PyPI:

   37297 downloads in the last day
   210596 downloads in the last week
   830573 downloads in the last month

I'm done.

I've been programming exclusively in Go for the past four years, and except for the occasional security bug to fix in httplib2 I've avoided using Python like the plague.

So I could go on being a bad package maintainer, or I can hand over the project to people that are more invested in keeping it working. If you are interested in being one of those people please ping me, via email or twitter and I will add you to That’s a new github organization, but have done nothing beyond that, I'll leave it up to the new maintainers to decide about forking, importing bugs, etc. from the original project.


Piccolo Now Sports LaTeX Support

In the first of two enhancements to piccolo, LaTeX support has been added. Just put the LaTeX in a latex-pic element and it will be converted to a PNG, with the alt text set to the LaTeX code.

<latex-pic>E = mc^2</latex-pic>

Transforms into:

E = mc^2

This is mostly driven by me trying to learn Geometric Algebra and wanting to talk about things like:

e_{23} = e_2 \wedge e_3

Next up is adding inline Julia.


Halloween Dragon

We do large displays for Halloween, usually involving converting the garage into a haunted house. They're major undertakings, with four or more actors, so we only do them every other year. I also try to roll a "build" into every Halloween. For example, 6 years ago I built this as part of the haunted house:

And two years ago the same skeleton got taken out of the jail and laid out on a table, this time animated with pneumatics. Neat tip on this one, those sprinkler valves for watering systes work just as well for controlling air.

This year Lynne requested that we do something less intensive. Now I agreed that the haunted houses were a lot of work, but I still wanted to do something big, or at least give the appearance of something big. That's when I hit on the idea of doing a dragon in the garage, but only opening the garage door a bit, so you could only see the nose and the tail of the dragon. Let the kid's imaginations fill in the rest of the beast.

The head was paper-mache, with a hot-glued cardboard frame, and two nostrils that could be hooked up to a fog machine. Here's photos of the build. And here's a shot of the fog test:

Next came the build for the articulated tail, with a wooden frame and bicycle brake cabling for the controls. Here are some photos from the build. And here's the tail in action, first w/o the scales, and then with the scales.

And finally putting it all together, with a "Beware of Dragon" sign, and a scattering of bones across the driveway, and the Dragon Sound Effects Panel I posted yesterday:

Now it doesn't look too scary during the day, but on Halloween night, only lit by a strobe light, with the candy bowl between the nose and the tail, and the proper "story" it was quite effective. Here's Lynne's description:

The kids were told the dragon was sleeping (cue - snoring noise) but could smell the kids when they got close (cue - sniffing noise) and "probably" couldn't get out of the garage, was "probably" full from all the kids he ate earlier, and "probably" wouldn't eat them if they were careful. They had to walk to candy bowl between head and tail. As they did the dragon awoke, started to move, and then huge banging and movement of garage door as he tried to get to them. They would run and scream and then bravely try again! It was precious.

I think we had over 100 people see the dragon, some coming back for multiple attempts. It was a blast to run and just as fun to build, as the three youngest kids all participated in the build. I can't wait for 2017, I'm thinking a giant pneumatic octopus...


Dragon Sound Effects Web Audio API

A sound effects board with the sounds of a dragon built using the Web Audio API. Because Halloween.

Chains: [Source]

Roar: [Source]

Sniff: [Source]

Snore: [Source]





Six Places

One of the questions that comes up regularly when talking about zero frameworks is how can you expect to stitch together an application without a framework? The short answer is "the same way you stitch together native elements," but I think it's interesting and instructional to look at those ways of stitching elements together individually.

There are six surfaces, or points of contact, between elements, that you can use when stitching elements together, whether they are native or custom elements.

Before we go further a couple notes on terminology and scope. For scope, realize that we are only talking about DOM, we aren't talking about composing JS modules or strategies for composing CSS. For the terminology clarification, when talking about DOM I'm referring to the DOM Interface for an element, not the element markup. Note that there is a subtle difference between the markup element and the DOM Interface to such an element.

For example, <img data-foo="5" src=""/> may be the markup for an image. The corresponding DOM Interface has an attribute of src with a value of "", but the corresponding DOM Interface doesn't have a "data-foo" attribute, instead all data-* attributes are available via the dataset attribute on the DOM Interface. In the terminology of the WhatWG Living Standard, this is the distinction between content attributes vs IDL attributes, and I'll only be referring to IDL attributes.

With the preliminaries out of the way let's get into the six surfaces that can be used to stitch together an application.

Attributes and Methods

The first two surfaces, and probably the most obvious, are attributes and methods. If you are interacting with an element it's usually either reading and writing attribute values:


or calling element methods:


Technically these are the same thing, as they are both just properties with different types. Native elements have their set of defined attributes and methods, and depending on which element a custom element is derived from it will also have that base element's attributes and methods along with the custom ones it defines.


The next two surface are events. Events are actually two surfaces because an element can listen for events,

ele.addEventListener(‘some-event’, function(e) { /* */ });

and an element can dispatch its own events:

var e = new CustomEvent(‘some-event’, {details: details});

DOM Position

The final two surfaces are position in the DOM tree, and again I'm counting this as two surfaces because each element has a parent and can be a parent to another element. Yeah, an element has siblings too, but that would bring the total count of surfaces to seven and ruin my nice round even six.

  <img src="">

Combinations are powerful

Let's look at a relatively simple but powerful example, the 'sort-stuff' element. This is a custom element that allows the user to sort elements. All children of 'sort-stuff' with an attribute of 'data-key' are used for sorting the children of the element pointed to by the sort-stuff's 'target' attribute. See below for an example usage:

<sort-stuff target="#sortable">
   <button data-key=one>Sort on One</button>
   <button data-key=two>Sort on Two</button>
 <ul id=sortable>
   <li data-one=c data-two=x>Item 3</li>
   <li data-one=a data-two=z>Item 1</li>
   <li data-one=d data-two=w>Item 4</li>
   <li data-one=b data-two=y>Item 2</li>
   <li data-one=e data-two=v>Item 5</li>

If the user presses the "Sort on One" button then the children of #sortable are sorted in alphabetical order of their data-one attributes. If the user presses the "Sort on Two" button then the children of #sortable are sorted in alphabetical order of their data-two attributes.

Here is the definition of the 'sort-stuff' element:

    function Q(query) {
          function(e) { return e; });

    var SortStuffProto = Object.create(HTMLElement.prototype);

    SortStuffProto.createdCallback = function() {
      Q('[data-key]').forEach(function(ele) {
        ele.addEventListener('click', this.clickHandler.bind(this));

    SortStuffProto.clickHandler = function(e) {
      var target = Q(this.getAttribute('target'))[0];
      var elements = [];
      var children = target.children;
      for (var i=0; i<children.length; i++) {
        var ele = children[i];
        var value = ele.dataset[];
          value: value,
          node: ele
      elements.sort(function(x, y) {
        return (x.value == y.value ? 0 : (x.value > y.value ? 1 : -1));
      elements.forEach(function(i) {

    document.registerElement('sort-stuff', {prototype: SortStuffProto});

And here is a running example of the code above:

Note the surfaces that were used in constructing this functionality:

  1. sort-stuff has an attribute 'target' that selects the element to sort.
  2. The target children have data attributes that elements are sorted on.
  3. sort-stuff registers for 'click' events from its children.
  4. sort-stuff children have data attributes that determine how the target children will be sorted.

In addition you could imagine adding a custom event 'sorted' that 'sort-stuff' could generate each time it sorts.

So there's your six surfaces that you can use when composing elements into your application. And why the insistence on making the number of surfaces equal six? Because while history may not repeat itself, it does rhyme.



Today Google launched gRPC, a new HTTP/2 and Protocol Buffer based system for building APIs. This is Google's third system for web APIs.

The first system was Google Data, which was based on the Atom Publishing Protocol [RFC 5023]. It was an XML protocol over HTTP. The serving system for that grew, but started to hit scalability issues at around 50 APIs. The scaling issues weren't in the realm of serving QPS, but more in the management of that many APIs, such as rolling out new features across all APIs and all clients.

Those growing pains and lessons learned led to the next generation of APIs that launched in 2010. In addition to writing a whole new serving infrastructure to make launching APIs easier, it was also a time to shed XML and build the protocol on JSON. This Google I/O video contains good overview of the system:

Now, five years later, a third generation API system has been developed, and the team took the opportunity to make another leap, moving to HTTP/2 and Protocol Buffers. This is the first web API system from Google that I haven't been involved in, but I'm glad to see them continuing to push the envelope on web APIs.


Detecting Benchmark Regression

Subtitle if this were an academic paper, which it’s not: A k-means clustering derived point statistic highly correlated with regressions in commit-series data with applications to automatic anomaly detection in large sets of benchmark results.

TL;DR: To detect regressions in benchmark data over a series of commits use k-means clustering on mean and variance normed commit-series. For each of the clusters find the best fitting step function to each cluster’s centroid. The metric |step/fit| is highly correlated with regressions, where step is the height of the step function, and fit is the mean square error of the fit of the step function to the centroid.

Below is the description of how we detect performance regression for the Skia graphics library. I’m writing this up because after much searching I haven’t found anyone describe the method we came up with for detecting performance regressions and maybe this writeup will be useful to other people.

Problem Statement

Skia is an open source cross platform 2D graphics library. In Skia, like many other software projects, we have large number of performance tests, aka benchmarks, and we run those benchmarks every time we change the code. Just having a large number of benchmarks isn’t a problem, but being cross platform means running those tests across many different platforms; Linux, Mac, Windows, Android, ChromeOS, on different GPUs, etc. which leads to a combinatorial explosion in benchmark results. For every commit to Skia today the testing infrastructure generates approximately 40,000 benchmark measurements. That number of results tends to change frequently as tests, platforms, and configurations are added and removed regularly. The number of results has been over 70,000 per commit in the past several months.


To make the following discussion easier let’s define some terms.

A Trace is single benchmark and configuration tracked over a series of commits. Note that this isn’t exactly a time series since the measurements aren’t taken at equidistant times, but are spaced by commits to the codebase. Also note that for each benchmark there may be multiple traces, for example, one for Windows 8, one for Linux, and one for Android.

Fig 1 - Trace

A “performance regression” is a significant change in either direction of a metric. Now a metric that drops may actually be a good performance increase, but could also be an indication of a test that is broken, or has stopped working entirely. So regardless of the benchmark, we are looking for step-like changes in either direction.

The issue with tens of thousands of traces is that you just can’t look at the raw numbers, or even plot all the data, and figure out when you’ve had a regression. At first we tried setting performance ranges for each trace, i.e. an upper and lower bound for each trace. If a later commit caused the benchmark results to move outside those bounds then that would trigger an error. There are many drawbacks to monitoring benchmarks by manually placing bounds on each trace:

  1. The most important drawback is that in such a system a single test result can trigger an alert. You know old phrase, “the plural of anecdote isn't data”, a single benchmark measurement is virtually meaningless as any number of anomalies could actually be responsible for that benchmark result changing. For example, a machine could overheat forcing a move to frequency scaling, or other processes on the machine may starve the test of CPU cycles. You can work to mitigate these eventualities, but they never completely go away. SPC systems such as the Western Electric rules might be applicable in this case, but we’ve never tested them.
  2. Constant manual editing of trace bounds is time consuming and error prone.
  3. Constantly adding manual trace bounds for new benchmarks is time consuming. Add one test and you may have to add many trace bounds, one for each member of that combinatorial explosion.
  4. Forgetting to add new ranges for new benchmarks another source of error.

Even if you automate the placing of trace bounds, you still have the issue of transient behavior that looks like a regression, and you also have to take pains that the automatic calculation of trace bounds doesn’t mask a true regression.

Fig 2- Is this a regression or an anomaly?

So we needed a better system than setting trace bounds. The next section explains the system we implemented and have successfully run for several months now.

Before we go further let’s define a few more terms.

Normalized Traces
Normalization is the process of modifying each Trace so that it has a mean of zero and a standard deviation of 1.0. Note that if the standard deviation of a trace is too small, then blowing that up to a standard deviation of 1.0 would introduce nothing but noise, so there’s a lower limit for the standard deviation of a trace, and below that we don’t normalize the standard deviation of the trace. The idea is to extract just the shape of the trace, so that all the normalized traces are comparable using a sum of squares distance metric. The smaller the sum of squares error is between two normalized trace, the more similar their shapes.
k-means clustering
I’m not going to explain k-means clustering in one paragraph, you should go look it up on Wikipedia or any of the fine explanations available on the web. The important point is that we are using k-means clustering to group normalized traces together based on their shape. The idea is that many traces will move together in the same direction from commit to commit. For example, if I speed up rectangle drawing then all tests that use rectangles should get faster, if not in the same proportion.
The centroid is the center point at the center of a cluster. In this case the mean of the normalized traces in a cluster, which acts as a prototype shape for the members of the cluster.
Regression Factor

For each cluster of normalized traces we find the best fitting step function to the centroid. From that best fitting step function we calculate Fit and Step, where Fit is the sum of squares error between the step function and the centroid, and Step is the height of the step function.

From there we calculate the Regression Factor:

R = Step / Fit

A smaller Fit values gives you a larger R, which means that the more a centroid looks like step function the larger R gets. Similarly the larger Step gets the larger R gets, which is a measure of how big of a change the centroid represents.

Putting it all together.

So finally, with all the preliminaries set up, we can get to the core of the system.

Here’s a view of the system at work, finding regressions in performance. Note that out of 40,000 benchmarks the first cluster contains 1336 traces and has a Regression Factor of -4.08e+3.

Screenshot 2014-11-23 at 1.14.19 PM.png

Continuous Analysis

The above system works for finding regressions once. But what happens if you want to check for regressions continuously as new commits land and new benchmark results arrive? One last definition:

A cluster is considered interesting if it’s Regression Factor is over 150. This is only a rule of thumb based on observing the system and may be relevant only to the Skia benchmarks, while a different cutoff may be appropriate for other datasets. The important point in that as |R| grows so does the likelihood of that cluster being a regression.

To continuously monitor for Interesting clusters, start by running the above analysis once and find interesting clusters. If there are any then triage them as either really needing attention, such as a CL needs to be rolled back, or ignorable, say in the case where a true performance increase was seen.  Then on a periodic basis run the analysis again when new data arrives. What should be done with the new set of interesting clusters produced from the analysis? The existing interesting clusters have already been triaged, and those same clusters may appear in the output of the analysis, and new interesting clusters may appear. The process of only raising up new interesting clusters for triaging while folding existing clusters with similar clusters that appear in the analysis results is called cluster coalescing.

Cluster coalescing currently works by looking at all the new interesting clusters and if they have the same traces as the 20 best traces in an existing cluster then they are considered the same cluster. Note that ‘best’ means the 20 traces that are closest to the centroid of a cluster. Note that this is an area of active work and we are still experimenting regularly with new cluster coalescing schemes.

Wrap Up

I hope that was useful. Please shoot me any questions on Twitter @bitworking. The code for the software that does the above analysis, and much more, is open sourced here.


Observations on hg and git

Having recently moved to using Git from Mercurial here are my observations:

Git just works

No matter what I try to do, there's a short and simple git command that does it. Need to copy a single file from one branch to my current branch, need to roll back the last two commits and place their changes into the index, need to push or pull from a local branch to a remote and differently named branch, there are all ways to do those things. More importantly, Git does them natively, I don't have to turn on plugins to get a particular piece of functionality.

Turning on plugins is a hurdle

The fact that what I consider to be core functionality is hidden away in plugins and need to be turned on manually is an issue. For example, look at this section of the docs for the Google API Python Client:

A big thing that trips up contributors is that "--rebase" is in a plugin (and I keep forgetting to update the docs to explain that).

Git is fast

So Git is fast, not just "ooh that was fast", but fast as in, "there must have been a problem because there's no way it could have worked that fast". That's a feature.


Git branches are much smoother and integrated than MQ. Maybe this is just because I got stuck on MQ and never learned another way to use hg, but the branch and merge workflow is a lot better than MQ.


In Git ssh: URIs just work for me. Maybe I just got lucky, or was previously unlucky, but I never seemed to be able to pull or push to a remote repository via ssh with hg, and it just worked as advertised with Git.


Git is helpful. Git is filled with helpful messages, many of the form "it looks like you are trying to do blah, here's the exact command line for that", or "you seem to be in 'weird state foo', here's a couple different command lines you might use to rectify the situation". Obviously those are paraphrasing, but the general idea of providing long, helpful messages with actual commands in them is done well throughout Git.


I'm not writing this to cast aspersions on the Mercurial developers, and I've already passed this information along to developers that work on Mercurial. I am hoping that if you're building command line tools that you can incorporate some of the items here, such as helpful error messages, speed, and robust out-of-the-box capabilities.