links for 2009-01-12

links for 2009-01-11

It’s the filtering, stupid

Picture 3I’ve been struggling for a while around a very big concept – how to bring scientific publishing to the modern web world of social, collaborative creation. Yes, the current science publishing system is indeed a social, collaborative effort, but I think it quite weak when using current online tools. So I’ve been spending a lot of time thinking of all the places the current system could be helped by all the tools I use daily in the social, living, collaborative Web.

Ugh. Big Gulp, indeed. So, the best I can do is break it down into smaller chunks. Partly because different parts of the find-navigate-recombine-contribute cycle of scientific publishing are at different Web-savviness. Partly because it’s easier to digest for me. And partly because some parts are more likely to change sooner than other parts.

Finding and navigating

If you take the progression of content being generated on the Web, it just seems to be getting easier to publish and more fine-grained. We went from big publishing houses, putting out digital representations of physical units, such as The Article, or The Paper, or The Book. Search and index sites like Google and Yahoo stepped in to help us find and navigate all the Stuff. And the letter to the Webmaster became the feedback channel.

Things quickly got pared down to the blog post size and a democratization of tools causing an explosion of all sorts of info on the Web. RSS feed readers and personal home pages stepped in to help us manage these morsels of information. Comments and new posts became the feedback channel.

The latest push in data generation has been nano-sized grains of info, flooding us through Facebook, Twitter, and all sorts of status update services. Used to keeping up with things by reading everything, we have become stuck just keeping up with what others might be saying. And our tools to follow this are just not keeping up.

I was banging my head trying to summarize what this was. At Le Web, I found myself hearing things related to “filtering”. I realize now that this thought might have been triggered by a good talk by Clay Shirky (which I only discovered recently through my tweeps) on filter breakdown – if you can’t keep up with the stream then your filter is broken.

That gave me the word I was missing to describe the first and what I see largest issue with the future of science publishing. Indeed, I see filtering as a problem that is relevant to personal social Web use and even to business use of the social Web.

Linearity

The current filter tools we use, such as Google or Technorati, are too linear (an earlier rant of mine on this topic). You need to go through each item in turn, and the hierarchy is linear. There is very little by way of discovering new things or understanding the conceptual relationships between items other than order in a list. While I am at it, blogs are linear too. When there are more than five comments, the conversation breaks down and it’s between the poster and the commenter. What’s more, after more than five comments per post, a blog becomes less a conversation with the poster, since keeping up with all that can be difficult.

Personal home pages, such as NetVibes are not the solution, since they are still set up by the user and still require the user to read things. No help from the tool except managing the multiple streams. Even sites like Alltop seem to be curated pages for mulitple feeds. No help from the tool, once more.

I have been watching as various multi-dimensional search engines for various particular streams have appeared. Since I use Twitter a lot, I have been more keen to see a new tool to follow Twitter (and was happy that Twitter bought Summize). Indeed, for work, I find Twitter useless due to the volume of of the data stream and my desire to follow and participate in that stream. There are no tools that do this. The tools I have seen are simple word counters (Twitstat, twitt(url)y, Twitscoop, twitrratr) and can have serious failures (for example in this pic, see how a negative reaction was misconstrued)

Semantics anyone?

Folks have been talking a long time about a semantic web, where “meaning” added to information makes that information in some way richer. There are a ton of tools out there based on semantics and folks thinking and working on it. And there are some interesting search engines for the sciences, such as DeepDyve, NextBio, and Knewco, all of which layer some form of multi-dimensional interface on top of search data.

In the social Web space, there is one company that I have been talking with a bit, Crimson Hexagon (hopefully, more on them later). They semantically analyze feeds of data for sentiment analysis.

But, many of these seem like librarian jobs, where much of the semantics is hard-coded in the data as it is classified and created or by data-mining static sets of data. I’d like to see semantics arising out of the use and creation of the data, much like people tagging their photos have added a layer of semantics in Flickr, rather than some librarian in the company data-mining all the time.

The closes analogy I get to explain user-generated semantics versus librarian-style categorization is the difference between Yahoo 1996, with its cadres of employees manually cataloguing the Web, versus Delicious, where the users do it as part of their regular, personal, use of the service. Another analogy I like to use is how paths on a commons can be designed: don’t put down paths at first and then observe where the grass is worn down, indicating optimal user paths.

Water water everywhere

I think it’s great that there are so many folks working on this. But, the Semantic Web has been expected for a long time, but we’ve been too busy being geeky rather than applying it for something useful. The services above are all going in a good direction, though, and all of them are trying to get all that stuff on the web and filter it.

I feel that this year someone will come out with a wizz-bang search tool that throws in some form of semantics (part librarian, a priori, and part user-generated) and simple but powerful visualization and navigation of relationships between results. I think there’s still a hole for a tool to allow individuals or corporations navigate streams of data. The companies above are all trying it in their own particular way.

Is there a winner in any of them? Or will one arise that takes the most useful features of each of these?

Redefining the concept of organism

Staph PlateA while back, I stumbled upon an article by Freeman Dyson on Carl Woese. Carl Woese is a long time scientist studying the origins of life and revolutionized thinking around early life, microbiology, and phylogeny.

Freeman Dyson's article is a great overview of the discussions around pre- and post-Darwinian evolution*. Darwinian evolution is what we are used to, a standard "fight for survival" of non-interbreeding species that slowly evolve their fitness to the challenges in the environment.

What really flipped me was Dyson discussing pre-Darwinian evolution, an idea postulated by Carl Woese back in 2004 in an article titled "A New Biology for a New Century." The thought is that early life was a time of promiscuous gene swapping. That, as I see it, an organism was a sack of molecules and genes that worked together to propagate a collection of "features." Then, at some point (Woese suggests at least three) an organism stopped, found a good set of genes, and brought this lateral gene transfer down to a trickle. And there you have a "species."

The three times Woese mentions were the times that gave rise to Archaebacteria, Bacteria, and everyone else. This free mix and match with a sudden stop makes sense of why there are three large groups of cell structures, yet that they are related in some way.

This idea really hit home for me when listening to Penny Chisholm, a microbiologist, talking about Prochlorococcus on Science Friday. This small cyanobacteria might be the most abundant photosynthetic organism, but Chisholm and colleagues only discovered it in the 80s.

What was interesting was her answer about different species of Prochlorococcus: she called them "genomic variants."

This ties back to what Woese was implying about lateral gene transfer and pre-Darwinian evolution and sacks of organisms with a collection of genes. If all organisms are in the possibility-space of all arrangements of genetic elements, then a particular strain of organism would be a peak of variation in that particular area of arrangement of genetic elements (but still part of a continuum of possibility-space).

Micro-organisms still do a lot of gene transfer (witness the spread of antibiotic resistance across species). But I suppose at some level they mix up everything and can have a large amount of variation across a single species. Hence, Chisholm's observation that Prochlorococcus species are best viewed as variants than distinct species. Promiscuous lateral gene transfer across Prochlorococcus "species" deflates the definition of species as non-interbreeding organisms.

As microbial biology has a renaissance due to the rise of synthetic biology (humans effecting lateral gene transfer in micro-organisms at a scale we haven't done before), understanding "speciation" in terms of "genetic variants" will go a long way in understanding how and what genes are to be used.

*Dyson also compares the way cultures laterally transfer as the post-Darwinian era.

Image from If you dream it…

Funny story about common chord progressions

I read this funny article about how the ‘Sensitive Female Chord Progression’ has infiltrated so many songs.

Link: Striking a chord – Boston Globe

Even though Beyoncé’s “If I Were a Boy” hit radio too late to be the
song of the summer, there’s still a case to be made that it’s the
perfect song to cap off the year. It’s not because of the empathetic
lyrics, or B’s heartrending, disappointed vocals. No, it has everything
to do with the four chords that underpin the song’s verse, circling
from yearning to triumph and back again, four chords that were
inescapable in 2008.

While some of the comments were snooty, there was a great link to the video below – a comedian talking about how Pachabel’s Canon in D has hounded him his whole life.

Help NPR plan their social media activities for the US presidential inauguration

Cool. Via Perry Hewitt.

Link: NPR: Help NPR Plan Our Social Media Activities for the Inauguration:

Help NPR Plan Our Social Media Activities for the Inauguration

The presidential inauguration is less than a month away and the NPR social media desk is kicking it into high-gear to figure out how we can get all of you involved in our inauguration coverage. We're also looking for some techies who can help make it happen.