Two Approaches to Optimizing JavaScript

This is a really nice exchange of posts on two different approaches to optimizing JavaScript:

Nick Fitzgerald writes about speeding up source maps with Rust and WebAssembly, Vyacheslav Egorov (mraleph) responds with a masterclass in using profiling and expertise to get similar speedups in pure JavaScript, and Nick caps off the exchange by porting some of mraleph's speedups back to Rust and talking about trade-offs.

Civil, fair-minded, technically interesting, and we're 300% faster than when we started.  All programming debates should happen like this.

ZestCash is not Good

A couple of days ago, TechCrunch featured a favorable story about a new startup called ZestCash, which provides an online lending alternative to existing payday loans (I'm not going to link to them directly, you can get to them on your own easily enough). The story regurgitates ZestCash's copy about the evils of the existing payday loan industry, including numbers highlighting just how usurious the sector is. What it fails to mention is ZestCash's own rates, which run between 242% and 462% APR at the time that I'm writing this.

To put that into perspective, consumer advocates regularly warn against the abusive nature of the ~30% APRs charged by many credit cards. The Center for Responsible Lending, which is frequently mentioned on ZestCash's website at the time of this writing, supports a 36% annual interest rate cap. To make that point absolutely clear: ZestCash repeatedly cites a consumer advocacy group in making the case that they're a responsible lender, and then turns around and charges rates up to more than 12x those advocated for by that same group.

Beyond the ridiculously high rates, the entire site is filled with disingenuous copy that seems designed to make unsavvy consumers feel smart and responsible for using ZestCash. They claim in big letters on the front page that ZestCash is "up to 50% cheaper than a payday loan", but you have to click two links deep to find the explanation for where that number comes from, at which point you learn that 50% is over a payday loan that has been "rolled over" 7 times. They have an entire page dedicated to trying to convince you that APR doesn't really matter. They make a big deal about the fact that they don't have a lot of extra fees, but the fees they do have are massive: a 30% 'origination fee' on every loan, and a $35 late fee per missed payment on top of whatever overdraft fees your bank charges. They make a big deal of the fact that they clearly disclose their terms, but they're required to do so by federal law. Almost every sentence on their website makes me tremble with rage for one reason or another.

The worst part about all of this is that their marketing message seems to have worked, at least this early on. I learned about them from a tweet by an entrepreneur I admire, which said he liked how ZestCash was trying to do payday loans in a "don't be evil" way (he seemed to back off this when their rates were pointed out). A twitter search right now shows an overwhelmingly positive reaction, and the coverage of the service from major tech and business sites has been mostly positive as well. What gives? Do people really trust the press release they get from a company that much? Do they not go to the front page of the service and click around at least a little bit? Are Douglas Merrill and Shawn Budde big enough players that nobody's willing to criticize them? Are their investors? I was similarly confused by the positive reaction to Betterment, a service that launched a little while ago that appears to try to convince consumers that ETFs are just savings accounts with higher returns, but this really takes it to a whole new level.

(ed. note: Betterment is no longer pushing the marketing approach mentioned above as aggressively as when they launched. Thanks, KW)

Jason Fager
Introducing jdiscript

Charles Nutter's recent post on browsing memory with JRuby and the Java Debugger Interface reminded me of my own little project for the JDI, jdiscript ("helping you write elegant scripts for a more civilized debugger"... I'll work on that). I've spent some spare time over the last week or so dusting it off to get it from the simple handler class I've used for one-off tasks in the past to something more generally useful and enjoyable. It still has a ways to go, but I thought I would go ahead and share the basic features and show a quick script that hints at how powerful JDI scripting can be.

What you get from the JDI is a set of tools for working with VMs. A JDI program can start a new VM, attach to a running VM, be attached to by a VM, or be spawned from a VM when an exception occurs. Once you have a hook on one or more running VMs, you can then begin inspecting and controlling them. Inspection occurs via Mirrors representing the remote VM's live objects, classes, threads, and the VM itself, while control can be exerted by manipulating values via a Mirror or by requesting notification of particular events and handling them how you'd like.

The JDI is very powerful, but it can be a bit clunky for writing short, quick scripts, which is where jdiscript comes in. Launching or attaching to a VM is mostly boilerplate, so jdiscript provides utility classes that take care of that for you. While the JDI allows you to easily request event notifications, the process of matching those notifications to code that will do something useful in response is left to the programmer, so jdiscript provides an event loop and a standard way to dispatch events to handlers. There are also some nice minor features in jdiscript, like redefining JDI's EventRequest classes to be chainable. Most of this functionality is exposed through the top-level JDIScript class, which allows you to cut out a bit of code to access common operations. And of course, all of this is intended for use from a higher-level JVM language such as Groovy or JRuby.

So let's get started with an example. One of the more kickass events that we can track with the JDI is the MonitorContendedEnterEvent, which fires whenever a thread tries to enter a monitor that's already held by another thread. Uncontended locks in Java are basically free these days, but contention can still be a performance killer in multithreaded apps, so it's nice to be able to find these spots without a lot of headache so we can eliminate them as far as possible.

Here's a groovy script that prints out a stack trace for any contending threads, and prints another notice when the thread ends up entering the monitor:

package org.jdiscript

import org.jdiscript.handlers. import org.jdiscript.util.VMSocketAttacher import com.sun.jdi.

VirtualMachine vm = new VMSocketAttacher(12345).attach() JDIScript j = new JDIScript(vm)

j.monitorContendedEnterRequest({ long timestamp = System.currentTimeMillis() println "${timestamp}: Contended enter for ${it.monitor()} by ${it.thread()}" it.thread().frames().each { println " " + it } } as OnMonitorContendedEnter).enable()

j.monitorContendedEnteredRequest({ long timestamp = System.currentTimeMillis() println "${timestamp}: Contended entered for ${it.monitor()} by ${it.thread()}" } as OnMonitorContendedEntered).enable(){ println "Got StartEvent" } as OnVMStart)

println "Shutting down"

Let's fire this up against a Cassandra instance to see it in action. I picked Cassandra because it's a production-quality high-concurrency project that's still dirt easy to get up and running from source, and it already includes a python script for stress testing (you can grab a copy of the 0.6.4 release, the version used for this post, to try it for yourself).

To start, copy bin/cassandra to bin/cassandra-dbg and add '-agentlib:jdwp=transport=dt_socket,server=y,address=12345,suspend=y' to the exec command that starts the server, so that the JVM will wait for the debugging script to attach before it starts. Once it's up and waiting, fire off the groovy script. You should see the start event print out, followed by a bunch of stack traces. Most of these are just ReferenceQueue activity, and any others we don't really care about right now (it's just startup, so as long as there's no deadlocking, we're probably fine with a bit more contention).

Once Cassandra is up and stable, fire up contrib/py_stress/ (you'll need to have thrift installed and the Cassandra python bindings generated). Here's what comes up for my run:

1281044297880: Contended enter for instance of by instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784) in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784) in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784) in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784) in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784) in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.utils.FBUtilities:239 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.utils.FBUtilities:229 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.dht.RandomPartitioner:118 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.dht.RandomPartitioner:44 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.db.Memtable:124 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.db.Memtable:116 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.db.ColumnFamilyStore:434 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.db.Table:407 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.db.RowMutation:200 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.service.StorageProxy$2:138 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   org.apache.cassandra.utils.WrappedRunnable:30 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   java.util.concurrent.ThreadPoolExecutor$Worker:886 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   java.util.concurrent.ThreadPoolExecutor$Worker:908 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)
   java.lang.Thread:637 in thread instance of java.lang.Thread(name='ROW-MUTATION-STAGE:11', id=1784)

(Subsequent identical traces elided)

1281044297896: Contended enter for instance of by instance of java.lang.Thread(name='ROW-MUTATION-STAGE:13', id=1785) 1281044297903: Contended enter for instance of by instance of java.lang.Thread(name='ROW-MUTATION-STAGE:12', id=1786) 1281044297912: Contended enter for instance of by instance of java.lang.Thread(name='ROW-MUTATION-STAGE:15', id=1787) 1281044297927: Contended enter for instance of by instance of java.lang.Thread(name='ROW-MUTATION-STAGE:16', id=1788) 1281044297935: Contended enter for instance of by instance of java.lang.Thread(name='ROW-MUTATION-STAGE:17', id=1789) 1281044297940: Contended enter for instance of by instance of java.lang.Thread(name='ROW-MUTATION-STAGE:14', id=1790) 1281044297945: Contended enter for instance of by instance of java.lang.Thread(name='ROW-MUTATION-STAGE:18', id=1791)

Pretty quickly, 8 threads back up waiting on a synchronized method deep down in the creation of a new MessageDigest which will be used to get an md5 hash of the row key. This is pretty easy to "fix" (though honestly it probably never caused anyone a problem): just preallocate some md5 MessageDigest instances and stuff them in threadlocals.

So there we go. A few lines of code, and we found an avoidable contention point without opening a profiler or even knowing anything ahead of time about the underlying codebase. Awesome.

I'll post more on troubleshooting with JDI in the future, including more examples. In the meantime, I'd love to get some feedback on jdiscript; feel free to email me (jfager at gmail), create an issue on github, or join jdiscript's new group. I've also thrown up a wiki page for collecting jdiscripts, so if you put together something useful, please share it.

Jason Fager
Lexi-Sortable Number Strings
When working with key-value databases, it's common practice to structure key strings to take advantage of database-specific features. With Cassandra, for instance, you can perform range queries over lexicographically sorted row keys. As an example of how this feature might be used, imagine an index consisting of rowkeys representing attributes like "population" or "region" that map to cities:
  "population/0389625": "Tulsa"       
  "population/0815358": "San Francisco"       
  "population/2851268": "Chicago"     
  "population/3831868": "Los Angeles"     
  "population/8391881": "New York"     
  "region/central": "Chicago", "Tulsa"     
  "region/east": "New York"     
  "region/west": "Los Angeles", "San Francisco" 
To find cities with at least 1 million people, we would issue a range query from "population/1000000" to "population/9999999". We had to pad out our numbers with leading zeroes to get a fixed-width, but that's easy enough, and this approach works fine for positive integers. But how do we deal with negative integers, or floating point numbers? We'll focus on Java longs and doubles, but the code should be trivially modifiable to work with ints and floats, and the ideas should translate easily to most languages. First up, negative numbers. We can't simply concatenate a negative number to a string that's supposed to sort lexicographically, because larger negative numbers sort as smaller strings:
    "-4"     "-7"     "04"     "07" 
In Java, longs are specified as signed two's complement 64 bit integers, meaning that if we treat a long's underlying bits as a string, we would solve this problem at the cost of storing negatives after positives (using 4 bits for demo purposes):
    0100 (4)     0111 (7)     1001 (-7)     1100 (-4) 
But this is easy to fix: just flip the sign bit:
    0001 (-7)     0100 (-4)     1100 (4)     1111 (7) 
When we use this method in production, we'll want to do two more things: represent the number in hex to avoid ridiculously long keys, and prepend a type token to remind us that a given hex string is a long. Here's the code:
What about floating point? The most common representation of floating point values, and the one that you're almost certainly using, is standardized by IEEE 754. Conveniently enough, IEEE 754 is designed such that:
if two floating-point numbers in the same format are ordered (say x < y), then they are ordered the same way when their bits are reinterpreted as Sign-Magnitude integers.
Said another way, just like with longs, if we just treat the underlying bits of IEEE 754 floating point numbers as strings, we'll get proper sorting with only minor complications. What are the new minor complications? A sign-magnitude integer uses its highest bit to signal signedness, and the remainder of its bits to signal the magnitude of the number:
     0100 (4)     0111 (7)     1100 (-4)     1111 (-7) 
We see that positive numbers sort correctly, but negative numbers have some problems - they sort high-to-low, and they sort after positive numbers. To solve the latter problem, we'll again flip the sign bit for all numbers:
    0100 (-4)     0111 (-7)     1100 (4)     1111 (7) 
And to make negative numbers sort low-to-high, we'll flip all their other bits, as well:
    0000 (-7)     0011 (-4)     1100 (4)     1111 (7) 
Here's the code, building on the utility methods defined earlier:
You can also check out the full code snippet, which includes tests. Now that we can store floating point attributes, our city index can look like:
  "area/dC06759999999999A": "Tulsa"     
  "area/dC06CFD70A3D70A3D": "San Francisco"
  "area/dC06D400000000000": "Chicago"     
  "area/dC07D4E6666666666": "New York"     
  "area/dC07F24CCCCCCCCCD": "Los Angeles" 
  "population/l800000000005F1F9": "Tulsa"
  "population/l80000000000C70FE": "San Francisco"       
  "population/l80000000002B81C4": "Chicago"     
  "population/l80000000003A783C": "Los Angeles"     
  "population/l8000000000800CC9": "New York"     
  "region/central": "Chicago", "Tulsa"     
  "region/east": "New York"     
  "region/west": "Los Angeles", "San Francisco" 
There are a couple of areas left for improvement. First, hex was chosen because it's easy to work with, but it produces strings that are longer than necessary. If the encoding were changed to Base 64, only 10 characters per key would be needed, and if the keys in the system we're using can be given as raw byte arrays, we can get away with 8 bytes + n bits (where n varies based on how many types you want to handle). Second, this doesn't even pretend to work with arbitrary-precision integers or decimals. After typing this post up, I found an implementation of these ideas that does extend to BigDecimal values, available as part of Lily CMS, as described by Bruno Dumon. You can also take a look at Lucene's NumericUtils class, though you should take care to note that the long returned from the doubleToSortableLong method is not suitable for direct translation to a lexicographically sortable string, as it does not alter the sign bit.
Jason Fager