A 30 year sorting algorithm saga: from 250k to 14GB in one minute

Microsoft Research busted through the world record for sorting an unprecedented amount of data in under one minute, with a new sorting technique called MinuteSort (with flat datacenter storage). The Microsoft Research team sorted the equivalent of two 100-byte data records for every human being on the planet.

That’s a rough equivalent of a short email message, for example, this series of 1’s represents 200 bytes:

1111111111111111111111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111111111111111111111

This is 200 bytes of a plain text message you might send in an email.

How much wood could a woodchuck chuck if a woodchuck could chuck wood?
Good woodchucks chuck much.

Multiply that by the population of planet Earth, currently at around 7 BILLION, and the result is 1,400,000,000,000 bytes, or over 14GB in one minute.

That’s a lot of data to wade through, and sorting hasn’t always been so fast and sexy. Take a look at this scan of an ancient TI-99 programming magazine article from 1983 on five popular sorting algorithms, some still used today.

  • Bubble Sort
  • Shell Sort
  • Selection Sort
  • Heap Sort
  • Quick Sort

Here’s a sampling of the article, page 1 & 2 (Download the complete 5 pages)

 

Ti-1_2ti-1a_2

 

Turn to the last page or the article and you can see that in 1983 the Quick Sort, wins the honor of sorting roughly 250k in about 1 minute and the Bubble sort can run equivalent of the 200 byte sample sample above in about 6 minutes (ouch).

image_12

How times have changed.

Sorting basics from Carnegie Mellon http://www.cs.cmu.edu/~adityaa/211/Lecture12AG.pdf [pdf]

Download the complete article

Big congrats to the awesome team at Microsoft Research! 100 points and a Geritol if you are old enough to have used a TI-99 or have read this magazine.

Common JavaScript mistakes and pitfalls you can avoid

Having solid JavaScript skills is more important than ever in today’s web applications, and will continue to grow in importance as more browsers, IDEs, etc…, boast of HTML5 feature support. New HTML5 elements such as <canvas> , and features like Web Storage, Web Sockets, or Geolocation depend heavily on JavaScript. Despite tons of freely available, open source, or 3rd party libraries such as jQuery, Prototype, or Script.aculo.us, we still need to keep in mind that they’re just JavaScript layered on top of JavaScript – a level of abstraction. Knowing the under the hood workings and “gotchas” of JavaScript will certainly give you a boost with writing better code.

Developers that use strongly typed languages tend to treat JavaScript as a restricted rather than an expressive language, as they’re used to working with many compiler restrictions. On top of that, JavaScript’s reputation as a “toy, browser-only, language” precedes itself; many developers don’t bother to look deeply at some of its powerful features, as well as some language quirks that can really cause trouble.

The truth is out there: Dealing with truthy/falsy values

Dealing with comparisons and equality is usually straightforward, but JavaScript has a few caveats. Because of the way JavaScript deals with expressions, it has two sets of equality operators and some complicated rules around one of them (==/!==). The first set…

The “Equals” operator  ( = = )  and The “Does-not-equal” Operator ( ! = )

And the second set…

The “Strict Equals” operator ( = = = ) and The “Strict Does-not-equal” Operator ( ! = = )

If you’re wondering why JavaScript has two sets of equality(ish) operators, it’s because JavaScript works on the premise of truthy and falsy values.  The == / != operators will work as you might expect if the operands are of the same type (boolean), but if they are not, the == and != operators try to coerce the values, using a rather large and drawn out ritual to determine the result. Truthy/falsy values are a means for JavaScript to deal with non-Boolean expressions, by pretending as if they are Boolean anyway.

As a rule of thumb, JavaScript evaluates the values in the list below as false, and everything else as true:

  • false
  • null
  • empty string (“”)
  • 0
  • NaN

Since values like null, “”, and NaN, aren’t boolean values but are still treated as such, that makes them falsy. In short, when you compare expressions that aren’t actual boolean values, JavaScript can produce results that are inconsistent with what you might expect. For example, if you compare the value 0 to false, this is what you get:

(0 == false)  // true 
(0 === false) // false (correct)

When expressions accept anything, including NaN, undefined, etc…, there’s a lot of room for mistakes, even by experienced developers. Amplify that with the many rules involved to mentally juggle, and it too easy to see how == and != disrupt your thought processes while coding.

The take-away: the strict === and !== operators are the best way to go, but don’t shrug off the equals operator just yet, as you’ll likely run across them daily in the JavaScript world, making this valuable information for the maintenance coder.

What exactly is an isNaN(NaN); anyway?

NaN, or Not-A-Number, is a construct of JavaScript that’s useful only in very specific edge cases. Why then, is this important to know? Being a mischievous little code gremlin, NaN pops up everywhere. It’s a property of the Number object, (Number.NaN), and default value of NaN is NaN, yet it’s a falsy value. When trying use NaN in an expression, NaN returns a false every time. Since NaN is no help in determining what is actually non-numeric, you’ll need an alternative. Fortunately there is a very reliable isNaN() function you can use. Notice when the isNaN() function evaluates an alphanumeric string, the result is true, and a false for numeric values. To further investigate how isNaN() behaves, enter a few of the following statements in the Console tab of any browser’s dev tools and browse the results. (IE F12 dev tools featured below).

SNAGHTML6fc7755

isNaN() works as expected with numbers and alphanumerical data.

The conculsion: use isNaN().

Is eval() evil or just misunderstood?

eval()is the little function that wants to do everything for you, including dynamically creating and executing JavaScript.  Want to parse JSON? eval(). Want to clump a bunch of stuff together? eval(). Want to make up code to run on the fly? eval(). eval() finds ways to do something, anything, with what you give it, making it easy for developers to do both awesome and horrible things with it.

In an attempt to keep eval() innocent, many suggest limiting eval() usage to the following scenarios:

1) You need to allow script to run dynamically.

This opens up your page to potential security issues, so only use this when it’s necessary, as there are a few different ways to execute code dynamically in JavaScript (a quick Bing search shows many different techniques). That leaves…

2) Parsing JSON

You can use eval() for JSON parsing; however, you are still going to run into the same security issues. On top of that, the JSON.parse() method works much better as it’s meant just for JSON handling.

In general, stay away from eval() where you can, it has the potential to get evil quickly, and there’s usually a decent alternative.

Avoid function faux pas.

Functions live a first class life In JavaScript, so they’re not just functions, they’re objects too. It all depends very much on how you them. Take, for example, the following simple function to calculate two numbers:

var result = function (a, b) { return a + b;};

In many languages, the result passed back from the function will contain the sum of the values of the arguments, a, and b. In JavaScript; however, there is more to the story. Opening any browser’s developer tools and inspecting the code, shows the type as an “Object (Function)” and its value as the function itself.

SNAGHTML45e09f_1

You don’t need to, but you can, define and name functions. If you don’t provide a name when creating an inline function, it’s an anonymous function.

Having the knowledge that functions are objects allows you to better understand and work with Open Source or 3rd Party libraries, such as jQuery or Script.aculo.us. Consider jQuery’s chaining feature, that allows you to “tack on”, i.e., chain, other methods to the end of the previous method, in a single statement. Since the function is also an object, you can expect to call methods on it using dot notation, as the following simple jQuery chaining example demonstrates:

$("#inputBox").removeClass("normal").addClass("error");

In addition to function behavior, it’s necessary to know how to write good, clean, functions. Since JavaScript throws everything into a global namespace, you’ll want to encapsulate your functions into namespaces with classes and members (see below). And watch out for globals…

A variable of global proportions.

The authors of JavaScript intended it to be a language with a very low barrier for entry for the internet pioneers circa 1995. Because of this, they made the decision to allow, and even encourage, globally scoped variables. One or two global variables in a very small program can be useful, even manageable, and was certainly not a big issue at the time since HTML had only a few tags and the DOM a handful of nodes. However, JavaScript has grown into arguably the most popular language in the world, and has powered a spectrum of applications from the smallest to the largest. Anything other than the simplest of sites can contain nasty bugs because of the global space.

When possible, avoid global variables, and here’s why:

  • It’s too easy to have variable naming clashes in large applications.
  • It’s too easy to overwrite the wrong variable, or the right variable at the wrong time, when there are many variables accessible from anywhere.
  • Not needing to declare variables means there are often “floaters” hanging around in globally-scoped memory, but doing nothing.

A good workaround to this language gotcha is to create your own namespace and classes in it, and store any would-be global variables as properties of that object instead. Though they’re not real namespaces, they are objects that can behave like namespaces, making it easier for you to organize your code. The code below provides an outline of how you can create and access your own namespace:

var SampleNamespace = new function () { };
SampleNamespace.SampleGlobalObject = new function () {
 
    // Private function 
    var privateFunction = function ()
    { }
 
    // Public function
    this.publicFunction = function ()
    { }
}
// Calling the public function 
var publicFunctionTest = function () {
    SampleNamespace.SampleGlobalObject.publicFunction();
}

Global variables aren’t the only problematic variables in JavaScript. JavaScript doesn’t perform block scope, so variables you define can inadvertently become as error prone as global variables. The only scope smaller than the global scope is function scope. This means that you can access variables where you might expect otherwise, such as outside an if/else or for loop.

Although JavaScript is a dynamic, expressive, language, you need to organize it into namespaces, classes, and other units for a solid maintenance experience.

Summary

Watch out for some of the most common pitfalls that developers encounter, even while using Open Source libraries such as jQuery. There are, of course, many more language quirks and features that can also cause trouble, as with any language. If you want to go deeper into JavaScript, here are two highly recommended books.

JavaScript: The Good Parts by

JavaScript Patterns by Stoyan Stefanov

A few tips for being a more effective solo developer

There’s tons of developers that are self employed, freelance or consultant; some who take long term contracts on teams and others that have a few key, stable customers that they routinely work for.  If you’re solo in the sense that you’re the primary start-to-finish person for most of your projects, there’s not a lot of resources out there targeted your way. Most training, talks, books, blog posts and other content is geared toward the enterprise developer that works on a team.  Since that’s the case, I wanted to throw out a few things you can do as a solo developer to take some of the pain away from not having the resources that teams generally have (e.g., hardware, CI server, costly software developer tools, etc…) to help boost your productivity.

Use Source Code Control

Many solo developers have managed with their own way of using “copy & paste” source control.  While you might have a routine down when doing so, you can still benefit from even some of the basic features of source control – with actual source control being the first reason.  There’s lots of room for mistake when something’s done manually, particularly when that something is maintaining versions of sets of files.

Source control systems offer basic tracking of files and project assets that you’ll label with a version, and most integrate with familiar tools like Visual Studio (often including express versions).  SCC software also allows you to freely explore code you’d like to write without you having to mentally keep track of changes when changing code, you can work on multiple files and roll back any or all of them to their previous state, without worry.  It’s safe to say that most developers have experienced this pain at least once in their career – that is, chasing an idea with many changes to the code and when the developer’s ready to ditch the changes, s/he can’t recall what’s been changed.

Download  TFS Basic for Visual Studio 2010 for individual source code control, and you can also use other popular SCC programs like Tortoise SVN.

Try out some Agile practices

Although you can’t do 100% Scrum (an Agile subset) as an individual developer you can do some of it.  Obviously it’s silly to do daily standups as an individual, but you can still incorporate many parts of Agile and/or Scrum into your projects.

The Agile manifesto states the principles that Agile values…

“Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on
the right, we value the items on the left more.”

These principles are easy to follow as an individual developer or as a team, it works for everyone.

One of the most important things to remember is to keep the feedback loop with your client open, and have frequent discussions with the client about the progress of the app.  Respond to change by allowing the feedback from the customer to drive the next cycle of coding, which should be short (1-2 weeks). As a solo developer, you can create and use backlogs, use burn down charts and hold retrospectives with the customer. Those practices give a nice reality check as to the health of the project.

JD Meier’s blog with living Agile

Unit Testing – It’s for you too.

You may be thinking unit testing is overkill for the type of projects you work on particularly if they’re small projects, or that it’d cost too much to ramp up to speed with a testing framework and tools.  If so, you’re not alone, as those are some common misconceptions that solo developers make (perhaps because they’re not getting out to mingle with their professional peers and aren’t getting up to date info; see below 😉 ).

If you’re looking for a low friction way to get started with unit testing and you’re already using Visual Studio 2010, then try the built in MS Test tools.  After you get a solid feel for creating unit tests; the knowledge is there to compare other tools & testing frameworks such as nUnit or xUnit (xUnit was built by Brad Wilson on the ASP.NET team).

One of the primary goals of ASP.NET MVC is to promote a concept called “convention over configuration”.  This means when writing MVC applications, developers are guided to use test projects when without having to download separate utilities, as well as guided to use other best practices.  By using the conventions, your code will be cleaner and better, especially because you’re now testing.

Get out and mingle with your professional peers

Don’t get out to user groups? Code Camps? It’s a great place to network and learn new things, and keep your skills sharper than they’d be without.  Get on social networking like Twitter, LinkedIn and Facebook.  These are all great ways to keep in touch with other developers, many of whom are also solo devs.  Having an issue?  Post the question on forums such as http://www.asp.net or MSDN Forums (among many others). Send out a tweet on Twitter and watch the replies come in, or post questions on a blog, many of who are on the various product teams here at Microsoft. Getting an answer from the source or a trusted MVP or community members is a great way to expand your knowledge base, stay up to date, and see what’s important out there to other developers.