Google, wikipedia, amazon, data¶

This is a bit of a follow-up to some commercialisation thoughts, prompted by the following four points:

Google visitors hitting a quite unknown research-oriented lets-try-something website of mine.
A google search for some building terms (door etc.) that landed me on the Dutch Wikipedia website. I've looked at wikipedia before and I'm impressed by it. Just browse it for a bit (try the English version) and keep in mind that all this has been made by volunteers. And you can correct errors or add new pages yourself! Yes, you.
James Tauber about amazon recommendations and self-hosting, liking the power of amazon's proprietary database and the good book recommendations they manage to wring from your and other's buying decisions and book reviews. Great. But: But as you know, I'm also interested in hosting my own data (see aggregation versus hosting). I'm always on the lookout for ways I can take back my data, host it on jtauber.com and provide it to aggregators rather than have to host it with them.
James links to DataLibre, I recommend reading their DataLibre Mission - Own Your Data, Write Once - Read Everywhere, their preference is to have as much data as possible accessible "in the wild", instead of in company-internal databases. While we don't wish these companies any harm we certainly don't feel they should be holding all the cards when it comes to the semantic web.

I sympathise with the idea of hosting your own data. Of course, for a lot of companies it'll be much handier to have some website do it for them - for a small fee. The internet makes this technically feasible, now it's up to a bit of structuring and a bit of organisation. So that's three things: own your data, structuring and organisation. * Own your data. Take a manufacturer in the construction industry as an example. You've got your own marketing materials and your own catalog, perhaps even a website. But "everybody uses" the local specification system, so you pay to be included. In their closed database in their format. And "everybody uses" that big publishing outfit's "construction documentation". So you pay them to include you. And you pay for inclusion in a few portals. All in their format and in their closed database. That is a steaming pile of money, burning a hole right through your quarterly figures.

I can't really help to make the comparison with scientific journals. Scientists write the articles (which costs money), scientists referee the articles (which costs money), scientists buy the journals with articles (which costs money). The publishers of the journals do a useful job, but what they pay (approximately zero) is way out of proportion with what they do (the scientist themselves do most of the work) and what they get payed (I don't want to know).

Back to the construction industry: paying money for advertisements and a bit of promotion and a bit of professional binding etc. of catalogs: that's OK. But the amount of effort and money needed to get your data that you own into somebody else's closed database...

Structuring. Let's asume we want to make our data available for our customers and our collegues and the "aggregators" (catalog overviews, specification systems, etc.). Just chucking that full-color PDF catalog on the net won't cut it. The others are not going to type that over... So we need some structuring. This structuring does not need to end in a big standardisation swamp: like I describe halfway The Internet operating system - for the building industry :
Let's start with getting the individual acts together. Make the info in the drawing system available over the Internet in some documented format. Likewise for the specification system. Likewise for the accounting system. Likewise... Put behind passwords what should be kept within the project.

Figure out your own simple data format if you have to, document it and notify your partners, customers, whatever. Probably it'll be handier to have somebody else do the hard work and use his webserver to put your data online. But make sure you really own the data and that you can get it back :-) (A system like del.icio.us allows you to download a full dump of your data, which is the most trustworthy of promises that you own your own data).
Organisation. "How is this going to work: that's impossible without a big commercial entity behind it?!?". Well, the internet works quite alright without a single commercial entity, thank you. I honestly don't suspect the construction industry to come up with the one single standard to end all standards. The industry is very diverse and without any really big companies that can force the rest.
The alternative? Local initiatives, small initiatives, cooperation, some research results, etc. Wikipedia works surprisingly well, so...

To finish up, some thoughts from my side with a potential startup in the back of my mind... The fact that I like the "own your data" idea almost makes it impossible for me to want to hoard other people's information. I'm not an grumpy dragon sitting on a hoard of information, ready to test the fire resistance and, if inadequate, the tastiness of everybody that ventures to close.

This is not something that's bad news, though. Hoarding information means that I (would I start that startup) would have to compete on the same terms as existing information providers. On their ground. OUCH. Better do something on a terrain where they can't follow: making as much as possible of the information free.

Not that I'm planning to start an 40-man company. The smaller the better, if I'm going through with it. Which means the company has to travel light, without 30 metric ton of data chained to the leg.

Still can't really understand why I'm seriously thinking about a startup. I was scared stiff of it a year ago. No way I was going to do that. But it is good for polishing my research-thinkwork. And I'm slowly coming into the mood. There's serious money in this. I believe I've mentioned one or two business models halfway this article... Hm.

And I kinda like it to post these thought out here in the open. The market is big enough when it is open. It is only when you want to make a quick land-grab when you really have to be secret. And that's not something in my planning.