Friday 30 September 2011

Strata NY 2011 [Day 2]: Man and Machine

Strata NY 2011 [Day 2]: Man and Machine

strataMakerFair.jpg [ Strata Mini Maker Faire -- photo by Pinar Ozgar for O'Reilly Media ]

This post was written by Mimi Rojanasakul. She is an artist and designer based in New York, currently pursuing her MFA in Communications Design at Pratt Institute. Say hello or follow her @mimiosity.

While my selective experience of the conference left terms like Hadoop and Map Reduction just as mysterious to me as they were before Strata, I still gleaned a general sense of the technologies at work within this growing data deluge.

But first, we get a welcome sense of perspective from day 2 keynote speaker Mark Madson of Third Nature. He reminds us, with a somewhat unexpected and engaging history lesson, that our present "information overload" certainly is not the first of its kind. Everything from the printing press and the Dewey decimal system could be considered as precursors to the processing and organizational systems for big data developing today -- interesting thoughts to keep in mind while encountering the rest of the conference's pitches and promises.

Arnab Gupta's more forward-looking treatise on "Man + Machine" taps into an age old discussion about our relationship with technology, which thus far has been a "history of innovation substituting human labor". We usually like to interpolate Skynet and malicious robots in a dystopian future from there, but Gupta argues that the next evolution will be towards the enhancement humanity.

It took 3 weeks for online gamers to crack the code of the AIDS enzyme model, a feat that no computer or researchers had been able to accomplish in more than a decade. A mortgage default map of California could have helped predict the 2008 housing bubble burst, if someone using the tool had the curiosity to test what would change if unemployment went up. The fact remains that humans are really brilliant at things machines are not, and vice versa. While data is the key word, it is the process of making sense of it and distilling meaningful signals that matter more and more.

strataEnzymeModel.jpg [ AIDS enzyme model cracked by online gamers using FoldIt ]

Meanwhile, new tools for data visualization stirred up their own macrocosm of hype within the conference.

Hjalmar Gislason recognized the need for compelling visuals to support the goals of his company, DataMarket -- a global exchange (or "app store") for structured data. He walked us through the choice between visualization libraries Protovis and D3. Both are free, open-source, and use JavaScript and SVG to provide a selection of customizable templates that publish easily to the web (excluding older versions of Internet Explorer, of course.) D3 is still under active development and has a few more interactive features.

strataProtovis.jpg [ Protovis template styles, including famous Florence Nightingale chart and Napoleon's march information visualization ]


strataD3diagram.jpg
strataD3force2.jpg [ Static images of a few of D3's interactive visualization options: Voronoi diagram above, and force-directed graph below ]

Oftentimes, it's not picking the machines or software, but a putting together the right people that becomes crucial to success. The Guardian's Alistair Dant takes us through the process of building a new interactive piece from the ground up, and how it is made possible by the variety of skills in his team -- the usefulness of a motion graphics person to whip up a video to present ideas to managers and editors, development and flash experts that can creatively work around unusual problems, and having someone with legal experience never hurts. Technology is only as good as the talent that wields it, and it can have as much influence on the pace and shape of the creative process as has on the end product.

strataGuardianBalls.jpg[ Guardian interactive piece that visualized tweets during the 2010 World Cup ]

Presented with such powerful and seductive communications software, I worry how easy it is to use them as a crutch or lean towards novelty rather than make critical decisions that best articulate your message. It would be interesting to compare the work from template-ready Tableau and D3, versus more general purpose "creative coding" libraries, like Processing. With the flexibility comes more power, though building a piece from scratch (or something closer to it) certainly is not the pragmatic choice for everyone. In any case, by understanding the constraints and opportunities around each tool, even the simplest ones, an elegant solution can be reached.

Recreating Old Visualizations with New Technology: Census Statistics

Recreating Old Visualizations with New Technology: Census Statistics

d3_census.jpg
Programmer and data analyst Jim Vallandingham has just 're'-created a particular data visualization technique and visual style from a diagram attributed to Henry Gannett, back in 1903. The original lithographed chart depicts the proportion of foreign born of each leading nationality, in the largest cities in the U.S., and appeared in the publication following the twelfth census of the United States.

For the three examples he has featured online under the apt title "Old Visualizations Made New Again" [vallandingham.me], Jim did not use pencils or an old-school drawing board, but a combination of D3.js and Coffeescript programming skills.

The results look amazing and while screen-based, still relatively Tufte-ian, while the source-code is open for all to learn and assimilate. More detailed information is available here.

Thursday 29 September 2011

Strata NY 2011 [Day 2]: Data for Social Change

Strata NY 2011 [Day 2]: Data for Social Change

strataDWB.jpg[ Drew Conway and Jake Porway of Data Without Borders. Designated the "James Bond and Harry Potter of data" by Alistair Croll -- photo by Pinar Ozger for O'Reilly Media ]

This post was written by Mimi Rojanasakul. She is an artist and designer based in New York, currently pursuing her MFA in Communications Design at Pratt Institute. Say hello or follow her @mimiosity.

The first keynote of day 2 at the Strata Conference leads with a rather pointed question: with the data revolution upon us and unprecedented analytical power at hand, are the results of our pursuits personally fulfilling? Even programmers and data scientists with the best intentions often seem to end up developing tools that "make already comfortable lives ever so slightly more comfortable." Meanwhile, NGO's and nonprofits with decades of first-hand experience (not the kind you can google), are clambering for technical expertise to make use of their information resources. Essentially, what Jake Porway and Drew Conway of Data Without Borders have done is put two and two together.

This type of collaborative model is already in play for the United Nations team that built Hunchworks, a platform to post and verify hypothesis about potential crisis emerging worldwide (no affiliation to DWB). Chris van der Walt is communications strategy, Sara Farmer manages the systems and technology, and Dane Peterson brings in essential UX ethnography skills and hand-drawn powerpoint charm. Both initiatives make a smart decision about minimizing participation costs -- Hunchworks through crowd sourcing, while DWB harnesses good will and free time to participate in weekend-long "data dives," which place computer programmers, data analysts, and researchers together at the same table.

strataHunchInterface.jpg [ Hunchworks interface ]

For those few whose hearts are not moved by pure motivation to do public good without compensation, perhaps the announcement of the Heritage Provider Network $3 million prize caught their eye. Within 3 years, the goal is to develop a predictive algorithm that could identify patients who will be admitted to a hospital within the next year. The size of the reward should be telling of the seeming impossibility of solving this entrenched social, biological, and densely bureaucratic problem, which makes me think this is exactly the kind of challenge to set.

The winner of the Tableau data visualization prize, Steve Wexler, made his own public health contribution with a rather cheeky interactive dataset showing STD cases in Texas, proud purveyor of abstinence-only sex education:

strataTexasMap.jpg
strataTexasChart.jpg [ Stills of Steve Wexler's winning data visualization ]


The Tableau contest's crowd favorite, John Boeckenstedt, gave form to some interesting trends in higher education:
strataCostAdmit.jpg[ Cost of attendance (x) and admittance rate (y); orange = public; blue = private; circle size corresponds to percent of undergraduates receiving institutional aid. ]

At a more intimate level, Anne Wright, of BodyTrack, shares her engineer's approach to self-diagnosis and better health when all the conventional tests fail to. Her experiences with explorable data for projects at NASA translated into the experimentation with several monitoring devices to self-track her daily behaviors and identify what was making her feel unwell. Although some of the tools and graphic models were rough around the edges at first, it shows the power of collecting your own observations and modeling it to explore trends in order to find the stories to tell yourself.

Enlightening as the speakers were, the spaces in-between formal presentations were often an even more captivating scene: business cards changing hands, a conviviality possible only around like-minded people, and a few "what recession?" jokes cracked in good humor. Big sponsors and private businesses held a dominating presence, but a number of people I spoke to represented nonprofits and academic institutions: a representative from UNESCO trying to better engage audiences and communicate education data for the Millenium Development Goals, or a history professor from UC Berkley who wants to help the social sciences catch up with big business statistical analysis.

Having seen a slice of what "big data" can do for the business community, I look forward to watching these public interest groups adopt and grow with the same tools.

Wednesday 28 September 2011

The Fortune 500: How Company Rankings Have Changed over Time

The Fortune 500: How Company Rankings Have Changed over Time

fortune500.jpg
The Fortune 500 [fathom.info] by Ben Fry's Fathom Design maps the rankings of the top 500 companies between 1955 to 2010.

Based on a publicly available data file of the listings that was originally discovered on Wikipedia, the graph includes more than 84,000 unique data entries to show the actual ranking over time, their revenue or their profit. The width of each line is the company name, but drawn in a very tiny way.

While the views are beautiful, and the interaction and view changes are extremely smooth, this visualization was actually meant as a simple sketch, so that unfortunately few quirks and details have not been solved.

Or put differently, if this is the sketch, than what is the real deal?

UPDATE: There is also an alternative version available, made by Gregor Aisch.

Strata NY 2011 [Day 1]: The Human Scale of Big Data

Strata NY 2011 [Day 1]: The Human Scale of Big Data

strata911Memorial.jpg

This post was written by Mimi Rojanasakul. She is an artist and designer based in New York, currently pursuing her MFA in Communications Design at Pratt Institute. Say hello or follow her @mimiosity.


The 2011 Strata Conference in New York City kicked off on Thursday with a brief introduction by O'Reilly's own Ed Dumbill. He ventures a bold assessment of the present social condition and how data science plays into it: the growth of our networks, government, and information feel as if they are slipping out of our control, evolving like a living organism. Despite this, Dumbill is optimistic, placing the hope to navigate this new "synthetic world" on the emerging role of the data scientist. And so sets the stage for the speakers to follow.

The first keynote comes from Rachel Sterne, New York City's first Chief Digital Officer and a who's who in the digital media world since her early twenties. Though there was some of the expected bureaucratic language, examples of what was being done with the city's open data showed very real progress being made in making parts of government more accessible and allowing the public to engage more directly in their community. New York City is uniquely situated for a project of this nature, and the individual citizens are a key factor - densely packed in and cheerfully tagging, tweeting, and looking for someone to share their thoughts with (or perhaps gripe to). Through NYC Digital's app-building competitions, hackathons, and more accessible web presence, New Yorkers are able to compose their own useful narratives or tools - from finding parking to restaurants on the verge of closing from health code violations. By the people and for the people -- or at least an encouraging start.

strataNYCMap.jpg[ New York City evacuation zone map was shared with other parties to protect against heavy internet traffic taking down any individual site ]

On matters of a completely different spatial scale, we turn to Jon Jenkins of NASA's SETI Institute and Co-Investigator of the Kepler mission. The Kepler satellite, launched in July of 2009, boasts a 100,000 pixel camera that checks for tiny planets blocking a star's luminescence for over 145,000 stars in its fixed gaze, snapping a photo every 30 minutes with bated breath for potential candidates. As of February 2011, over 1200 planetary candidates were identified. Despite the cosmic scale of Kepler's investigations, Jenkins' communicates with a Carl-Sagan-like sense of wonder that is difficult not to get swept up in. Video renderings of distant solar system fly-bys show worlds not unlike our own, a reminder that the motives for some of our greatest accomplishments come from an innate, irrepressible curiosity.

strataKeplerFOV.jpg[ Photo and graphic representation of Kepler's field of vision ]
strataKeplerSuns.jpg[ Recently discovered planet with two suns ]

Amazon's John Rauser begins his own talk with a different story about staring at the sky. It's 1750, Germany, and Tobias Mayer is about to discover the libration (wobble) in the Moon. Rauser argues that it was Mayer's combination of "engineering sense" and mathematic abilities that allowed him to make the first baby steps toward establishing what we now know as data science. While an earlier presenter, Randy Lea of Teradata, focused mostly on the technological advancements made in the field of big data analytics, Rauser emphasized the human characteristics demanded for this career. Along with the more obvious need for programming fluency and applied math, he cites writing and communication as the first major difference in mediocracy and excellence, along with a strong, self-critical skepticism and passionate curiosity. These last three virtues could just as easily be transplanted into any o! ther field, and judging from the applause and approving tweets, the relevancy clearly struck a nerve with the crowd.

From a design perspective, the obvious continuation to so many of these presentations was the successful visual communication of all this data. My aesthetic cravings immediately subside when Jer Thorp, current Data Artist in Residence at the New York Times, takes the stage. His presentation walks us through a commission to design an algorithm for Michael Arad's 9/11 memorial that would place names according to the victims' relationships to one another. Though clustering the 2900 names and 1400 adjacency requests was at first an issue of optimization-by-algorithm, manual typographic layout and human judgement was still necessary to achieve the aesthetic perfection needed. Thorp also made a great point about visualizations not only being an end-product, but a valuable part of the creative process earlier on.

strata911RelationshipViz.jpg[ Early visualization of density of relationships ]

WTC Names Arrangement Tool from blprnt on Vimeo.

[ Processing tool built to arrange the name clusters by algorithm and by hand ]

To be honest, I was skeptical at first of the decision to cluster the names by association rather than simple alphabetization -- an unnecessary gimmick for what should be a uncomplicated, moving experience. Part of the power of the Vietnam Memorial was its expression of the enormous magnitude of human casualties with simple typographics, while its logical organization provided map and key for those purposefully looking for one name. But as Thorp explained these adjacencies in context, the beauty of the reasoning began to unfold. First, it is a matter of new ways of understanding. We do not browse, we search. And collecting and visualizing our identity based on our social networks has become second nature. It has the potential to tell stories about each individual's lives that go beyond the individual experience, creating a physical and imagined space to extend this unifying connectivity.

Overall, it was a humanizing first experience with professional "big data." Coming from a background in art and design, you could say I had some apprehensions about my ability to understand the myriad of technical disciplines represented at Strata. Despite this, the experience so far has been of unexpected delights -- a keenly curated look at where we are with data today.

I admit this first post was low on data visualizations, but there were plenty of interface and graphics talks in the afternoon sessions to share in the next posts. Stay tuned!