31 October 2010

The project failure & cloud connection

Around 2000, I worked on a project with a co-worker – he was maybe fifty and had worked in the IT industry – initially at IBM then contracting and consulting for nearly thirty years. At the time; he was heading up a web-development and creative team for a small consultancy firm which he saw as a new opportunity for him. He had a good reputation, people liked him and he had a vast amount of experience from being involved in literally hundreds of projects – developing, managing, business developing etc.

One day he slips into conversation – as an aside really; that he has never worked on a project that was delivered. Every project he worked on had been cancelled, merged into another programme, didn’t meet its financial goals (as with infrastructure consolidation) or his involvement ceased before it ever went live (if any of them ever did – he didn’t know). He said it in a resigned manner, almost as a badge of honour that he had got through all that with no job satisfaction in terms of delivery payoff whatsoever.

I have seen this scenario repeated many times in the decade since; although perhaps none quite so extreme. I have myself worked on hundreds of projects in many roles and can count the number of successful systems deliveries on one hand. People naturally want to associate themselves with successes rather than failures so it is perhaps understandable that this is not a common consultant topic.

A whitepaper released late last year attempts to put a figure on the worldwide cost of IT project failures. This turns out to be $6.2 Trillion and it doesn’t look like sensationalism. The US in particular is apparently losing almost as much money per year to IT failure as it did to the financial meltdown (with no end in sight). The paper makes an attempt to factor in what it calls indirect costs; basically a lost opportunity cost from the time wasted on failed/abandoned projects. It does not however take into consideration the wider indirect costs of people training for careers that are not actually delivering, IT staff disillusionment (turnover), operational failures for delivered IT systems (one in five businesses lose £10,000/hour through systems downtime) and associated security failures.

The paper has received some condemnation (due to its base assumption of a 65% IT project failure rate) but there is dearth of analysis in this area and quoting this figure is about as good as it gets right now. 50-80% figures have been quoted in one form or another for decades. CIO thinks rates are actually rising due to the recession. People have the choice of either working with a figure (challenging assumptions/statistics etc) or burying their head in the sand.

We will never know the exact figure for IT project failure. Similarly, we will never know whether efficiency/functional benefits of those systems that have worked have paid for the failures i.e. What has IT really given us? We are simply reliant upon these systems going forward. What can we do to reduce future project failure rates?

Although there are superficial similarities with scientific/experiment and IT/project communities respectively, the "accept defeat" approach of the former is routed in constant learning whereas IT is surely about delivering benefit now. Learning is only the priority for the largest, most stable, lowest turnover organisations; that is to say - next to none of them. The scientific regimen of independent assessment however is invaluable for IT projects. Tool-based PM consultants such as Bestoutcome are probably as good as the big management consultants for this purpose though. Techniques from the engineering community (when introduced to IT) have not had a huge effect.

The paper ends with a call to arms to simplify – IT/business communications, projects goals etc.

I agree in principle but this is oversimplification - when your realm of influence is the organisation you are in. I have seen many projects that although conceptually simple and with genuine IT/business agreement start to fail the moment integration with other organisations – vendors/hosting providers/recruiters/sub-contractors/data sources are required. Despite solutions being simple and manageable inside your organisation – just a few touch-points with others (basically anything worth doing) cause them to be complex and therefore unpredictable/at risk no matter what your collective capabilities. There is even a case for blinkered-simplification/procedures actually contributing to project failure: Complexity at least brings an element of flexibility, allowing you to react if the project starts going bad.

Better project managers, SOA, ramped-up co-worker involvement, Facebook-like "hackathons", daily IT/business meetings, PRINCE certifications, extranets, more rigorous cost control and mathematical complexity models within your organisation will have minimal effect on the success of your multi-touch-point projects – for they are already in the realm of chaos. Even improved PM collaboration through tools such as Asana will have a minimal effect on success rates. The role of the “good project manager” is perhaps the most scrutinized, personality-driven, divisive and misunderstood of all IT positions. Radical open enterprise models such as BetterMeans that effectively remove Project Managers in favour of automation and decentralized decision making will similarly be ineffective. Although in the case of the open enterprise model, I do agree that this will ultimately prevail (crowd-sourcing, creativity enabling and ultimately efficiency/cost) but not for decades (due to the need to directly attack the failure rate first).

Although there are certainly sizeable success increases to be made if you are experienced in the particular technology, the IT project failure rate will only consistently and materially fall once there are flexible cloud services that organisations can get 80% of all their needs by just subscribing to them.

Integration ceases to be the bottleneck. The other 20% being “secret sauce” value-add-ons developed in-house; probably mainly algorithms that, by definition, do not require integration with other organisations or heavy project governance. Other components of the 20% would be device specific exploitation code; essentially building the so-called App-Internet model (rather than full-Cloud). Organisations already recognise the economic justification for cloud computing so it is perhaps inevitable that project failure rates will eventually fall by default.

Cloud transition will take organisations years yet however. Both InfoWorld and Gartner have arrived at 2013 for broadly when the majority of organisations will run cloud. This may be optimistic. In the interim (three more years+ of high project failure rates?), delivery is simply better served by being built upon cloud solutions now; building partnerships with cloud providers if they can and leveraging their buying power if they cannot. Also, interim/limited functionality cloud solutions must be considered in preference to bespoke development/on premises deployment. In a real sense the project (that they would have otherwise completed themselves) becomes a creativity-driven; architectural investment and commercial partnership instead. Both Project Management and Enterprise Architect roles will need to shift accordingly.

It would be interesting to see any future IT project failure analysis split between organisations that have implemented virtualisation, those that have implemented private cloud and those that have implemented public cloud. The project failure and cloud connection is not well documented.

27 October 2010

Why are we still talking about search?

Google is obviously pretty successful due in large part to its search service. Despite them not having the best social networking track record, it’s an activity that maps well to the social world. We search for things – our keys, a present, a holiday, our next partner, a restaurant to take a friend to - our next role - ourselves. Google doesn’t give us exactly what we want but that’s OK because we don’t know exactly what we want either. It’s useful to be given a few social curve-balls.

Despite us all mainly using Google or Bing (Yahoo, Ask etc. all gone now), new search engines e.g. Blekko still regularly surface with varying differentiations and significant funding.

What if, rather than vague searches (inevitably) generating prioritized yet potentially vague results (M:M) [in terms of question:answer]; we could ask a specific question that provided a generally accepted, precise result (or if that question had previously recently been addressed, a link to same) that we (and others) could refer to (to later justify our logic). What if we wanted - an actual answer (1:1)?

Consider - “Which is the largest State in the US?” The question is vague (by land or population?) also Google is mainly matching keywords rather than actually understanding the question (and of course we get many results so the full exchange is M:M). Actually, Google have been delivering factual answers to some queries for five years. It only works for some though (it doesn’t work in this case). Regardless, Google’s first result correctly tells us Alaska and additionally gives us comparison with other States and square mileage too. We aren’t guaranteed getting the same result when we query in future (so accountability goes out the window) but the answer is OK for most purposes. If we make the question “Which is the largest Republican State in the US?” we still get an answer but takes us a few minutes to manually cross-reference the links. If we ask “Which is the largest, happiest Republican State in the US?” we simply don’t get a usable answer.

Someone living in Alaska is worried about perceived rising Russian immigration and they want to know if it has risen in their adult lifetime. There are no obvious figures on the Web. They could tweet/email the Governor and (if enough others have asked broadly the same question) then he might post a response on his blog (if he has one) once his research team have arrived at some defensible statistics for him to refer to (assuming they can find some).

This simple/current scenario has challenges and inconsistencies on both question and answer sides. Here are a few:

1) People have to feel so strongly about the issue that they have to find time to interact – search for any similar questions/answers, write their email, monitor the Governor’s twitter-stream/blog for an answer. They feel as though they don’t want to put themselves forward unless others feel the same. Conversely, enough people have to use similar phrasing of the question in order to kick-start the answer process.
2) The answer (if it is given) is not explicitly tied to the question. It could be made on his blog, a successive TV appearance, in a press release or personal speech.
3) There is no disambiguation or prioritization of questions – meaning that there is just so much noise that the Governor or his team simply don’t have the time to answer.
4) The answer could be presented in different ways – perhaps to achieve political ends or simply because they are underused by Government e.g. taking Russian immigration over a twenty year time-frame may show no year-on-year increase but taken over a two-year time-frame it may show a very different story.
5) There is little opportunity to to-and-fro with the question and answer thread – blending anecdotal evidence, redefining terms.
6) It’s a naive voter with a question - asking a team of well researched people. Odds are there will be a soft soap answer as independent experts are not involved in the process to moderate.
7) Statistics could come from Federal or State government, commercial, charity or other bodies. Also there is unlikely to be any checks and balances e.g. does the US figure for people incoming match with the Russian figure for people outgoing?
8) Any presented figures will be difficult to understand (unless an intuitive and consistent graphical approach is used).
9) Lawyers may need to be consulted to define the term “immigrant”.
10) There is no accepted “end” to either the question or answer.

In business, we need - to commit resources to a business plan, to indemnify ourselves against litigious customers, to make our point in meetings and to provide defensible input to decision making. Consumers largely search for things (M:M) while companies have to query things (1:1), although as with social networking, there’s some overlap (M:1) in the middle. Companies lose out as they maintain a blinkered-view missing critical information not captured through their own processes. Consumers lose out since they spend so much time searching for things (and the Internet is so rich) that, like travelling, the journey becomes the destination.

Google Search Appliance has done very well in the enterprise. There is a need for a one-ring-to-rule-them-all "get things" function. Also, our work and home lives are merging and when the answer you present is (close enough to be) unanimously accepted and is able to be referenced in any future analysis of your decision - then it will always stand. Once you don’t have to think about the logistics of getting answers, you can concentrate on actually doing things with them (for both consumers and the enterprise). To do this, quite simply - we need a single answer, a 1:1 model that works for everyone - like search – a Q&A service.

We have “I’m Feeling Lucky” of course (M:1) but this is an indulgent lucky dip that actually loses Google revenue. The other way around (1:M) doesn’t work since it entails definitively stating the question e.g. using something like QBE or NLQ (both of which are hard to do right) but still getting a range of results. What other services are currently available? Facebook released Facebook Questions earlier this year. This is slated to be integrated into Community Pages (already includes Wikipedia content). It is developing but is still very social/free-text. Wikipedia is an encyclopaedia rather than a Q&A service. Yahoo Answers has more of a wiki/free-text approach. It probably is the widest used but is also known for being random and open to abuse. PeerPong is expert-focussed. Off-topic a little, for product reviews, there are Hunch, DooYoo, EpinionsMahalo (their Answers service is a bit wider focussed than just products though but still very social). There are mobile Q&A services such as Mosio and ChaCha. Quora (founded by ex-Facebook CTO Adam D’Angelo) appears to stack up against Facebook Questions and again is very social. Social Q&A in general is big. StackExchange is an open-source platform for developing your own social Q&A service. In the enterprise, Opzi is attempting to be a corporate Quora. MSFT shut down their offering last year. Qhub runs a niche hosted Q&A service. Some social ones such as Hunch have repositioned themselves as recommendation services.  Many just cater to niches e.g. StackOverflow/development. OSQA is a free open-source oneSeveral dot-com bubble casualties have refocused on discussion based Q&A e.g. Ask has refocused on a mobile Q&A service similarly Startups.com.There are other start-ups.

All of these services are in some way niche, most require an account (which will put off many) and most provide multiple answers (1:M), although collectively they will probably end reliance on reference libraries (and on the product side – magazines like “Which?”). They helpfully hand you a piece of the jigsaw rather than solving the jigsaw for you. As has previously been said "None of these sites are Google-killers. In fact they make Google stronger because the questions and answers often will be indexed - extending Google's reach into the tail". Quora in particular has been recently heralded as a killer Q&A service but asking it "What are the most effective ways to engage news audiences?" receives a bunch of people's opinions; whereas we could have a single list in order of effectiveness that has been produced through actual operational data and curated by either a Journalist or Statistician. If there were any dissention around the term "effectiveness" then that could be hammered out through social interaction. There wouldn't be a need for a - discussion. As with all Quora questions, you need to read all responses to get a full answer but even then its just a subset of people that responded. Another example - RockmetIt better than Flock?

If the industry is framing Web 2.0 is the Social web and Web 3.0 as the Semantic web – it seems churlish not to leverage the powers of both in the Q&A service. The benefits of great data integration using common terms (Semantic) and crowd-sourcing (Social) in a Q&A service are obvious. By the same token, some answers will always require a (semi)/professional element to them (Expert) e.g. Someone asking a question about Alaska’s closeness to Russia (surprisingly - 2.5 miles) might get a quick answer but it will take an expert to plan how to get from one to the other through that particular route. Expert knowledge also comes into play around the presentation side – knowing which facts to present to most effectively answer questions e.g. the largest State question above – given to another search engine also gives Alaska but the square mileage is significantly less. A Geographer knows this is due to the second result not including water. There are other components but, in the main, both questions and answers have - social, semantic and expert elements.

Back to the Alaska question (“Which is the largest, happiest Republican State in the US?”). This requires both a semantic element (to determine the largest State) and a social element (who is happy?). If we make this “Which is the largest, happiest Republican State in the US liable to switch to Democratic?” then there is an expert element too.

Onto the proposed Q&A service - it needs to check a few boxes right off the bat:

1) Social. It would have to be based upon a popular, real-time social system. Despite perceived issues investing in their ecosystem, Twitter is arguably better than Facebook (or any other) for this purpose but it also needs to amplify and expand the reach of questions: present persistent questions that have undergone a process of disambiguation (both automated using semantic algorithms and manual through a public process of “backing” existing questions). Questions also need to be prioritized based on number of backers. Something like Kommons, Replyz  or maybe Formspring or Open Media could work here. Amplify is maybe too discussion focussed. The focus here is on directly (directly to the tweeter) quantifying the importance of a question and the validity of an answer. Social media monitoring like Crimson Hexagon could offer supporting services but they typically rely upon sentiment analysis whereas we really need direct engagement through voting or other mechanisms.
2) Semantic. We all need to be sure at the very least that we are talking about the same data. In short it will need an index. Google have Rich Snippet functionality but this is not a common approach adopted by most site ownersGoogle also acquired Metaweb earlier for this year for this purpose but something more open like Sindice would really work providing organisations have an incentive to cooperate (integration/ontologies/micro-formats etc.). Providing and articulating the incentive is the toughest piece of this by far. Then – we want an interface on the results (produced by the index) that allows us to de-select terms that we are not interested in so that the system knows for next time but also to publish the results of our tinkering and link it to the threads in the question or answer. Something like Sig.ma would do this.
3) Expert. We need data sets that are both selected and curated by subject matter experts. Curated answers can often be the most useful. Google acquired Aardvark earlier this year for this purpose but something like Wolfram Alpha or Qwiki would work. Involving Wolfram Alpha would add other benefits too – computational power (through Mathematica) to aggregate on-the-fly, leading NLQ support to translate questions, updatable widgets to support the presentation side and the semi-celebrity name of Stephen Wolfram attracting academic curators. All Wolfram Alpha content would need to be indexed and we would probably need to extend the publication side to be able to deselect returned facts as required (doesn’t appear to currently be available).

We are talking about potentially lots of social and semantic data needing to be agreed and presented. We’ll need simple data visualization with a basic level of interactivity so, with the example above; the user can slide the time scale to represent his adult lifetime. Something like Many Eyes, Hohli, StatPlanet or some improved Google Charts should do here. Also, we will want to store all our questions and answers forever so that we can reference them. Partnership with a very large EMC-like network storage partner will be required.

The resulting composite solution would be the most advanced Q&A service in the world. Q&A is so important and multi-faceted that it needs multiple solutions to work together. Its influence, if accepted, would be profound.

The service would drive out real answers (reference and topical) through a combination of social, expert and semantic approaches and actually referentially improve itself - as bringing social attention to the data would force organisations to improve data. Usability-wise - it wouldn’t hamper ad-hoc users by having a scary QBE type interface or by blinding users with spreadsheets of data. Conversely, it wouldn’t be so simple that it misses the point of what people want. It would be a natural extension to tweet activity (retweet, favourite, reply etc.). Opportunities for monetization would be rife: not least advertising with the solution’s wider collaborative user base, increased opportunities for interactivity and greater relevancy. New forms of credibility and relevancy would also become available. Journalism would shift to sourcing stories from datasets.

The traditional search space e.g. "Alaska Jobs" is already being eroded by location-based services. Google is certainly well placed to build a killer Q&A service as is Facebook with its obvious social advantages and semantic/graph plans. Google is confusing though - on the one hand, they have the right investments and infrastructure but on the other they still appear search/data/click driven - even imperiously so ("...designers...need to learn how to adapt their intuition"). They also have issues with innovation.

The big search players generally recognise current search solutions do not meet the needs of the user but they don't have obvious solutions. There is an opportunity for innovation and for weaning people of this style of information seeking. There are smaller players that, if they cooperated now, could produce a more potent, compelling, flexible and lasting solution than any. Their biggest problem would be being creative about incentives for organisational involvement on the semantic side. Let’s hurry though - its hard going searching for things. We've got work to do now. We have questions. 

18 August 2010

Statistical thinking & the ability to read and write

Statistics are hugely underused. Even when they are used, they are improperly used. Much of the bolstering of weak arguments, miscommunication and ad-hoc ideology creation performed by individuals and organisations from the industrial revolution onwards are down to poor statistical use.

Entire industries are devoted to generating statistics; duplicating monstrous amounts of effort and when they are referred to e.g. in sales pitches, organisational reports, building/product tolerances, war crimes tribunals, political and environmental manifestos, news articles and business plans – they are so heavily caveated as to make them insensible. This is by contrast to the relatively deterministic, transparent and auditable approach organisations use to produce BI to support their own business decisions. Coupling this BI with governmental statistics is often necessary for the best decision making support, so commercial BI is itself hamstrung by statistics.

There are two key reasons for this:

1) Statistics are hard to find. If you are looking for, for example, the number of people that work in London currently (a simple enough request that many service organisations would need to be aware of), you will find this close to impossible. The UK Office for National Statistics (ONS) does not have this on their main site. Nor does the Greater London Authority or the newly released UK government linked data site. After you have wasted maybe twenty minutes of your time, you will be reduced to searching for “how many people work in London” then trawling through answers others have given when that same question has been asked. You will receive answers but many will be by small organisations or individuals that do not quote their sources. In the worst case, you may not even find these – instead using an unofficial figure for the whole of the UK which you have had to factor down to make sense just for London. If you search hard enough you will find what you are looking for at an ONS micro-site (completely different URL to ONS) but this data is over five years old.
2) Statistics have a poor image. The blame for this, in part, may be attributed to the famous Disraeli quote – “Lies, damn lies and statistics” in which he set generations of professionals into thinking they were akin to a practicable yet modish Victorian politician by disregarding statistics and cocking-a-snoop at the establishment in favour of their own experience. Showing you are practicable with maverick tendencies while (in overtly disregarding information that may cast doubt on your decision-making) shoring yourself up from failure - are powerful incentives. By contrast, other famous statistical quotes have been forgotten (“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write” [H.G Wells]). Short-sighted governmental data integration, hugely delayed/over-budget government data-centric projects such as the UK National Health Service’s Records System, confusion over keys statistics e.g. number of asylum seekers and high-profile data losses haven’t helped matters since.

There is also a bit of a myth that statistical interpretation is an art and that the general public can be confused – even sent entirely the wrong message by engaging in statistical understanding. Only statisticians can work with this data. Certainly there is this side to statistical analysis (basically anything involving probability, subsets, distributions and meta-statistics) but for the most part, both the general public and organisations are crying out for basic (the answer to one question without qualifiers e.g. where/if etc.) statistical information that is quite simply - on a web-site (we can all just about manage now thanks), produced by or sponsored by the Government (we need to have a basic trust level) with a creation date (we need to know if its old). If statistics are estimates, we need to know that and any proportions need to indicate the sample size. We need this since we are now sophisticated enough to know “8/10 owners said their cats preferred it” has less impact than if we are talking about a dataset of ten cats rather than 10K cats (we don’t really need this though since we’re capable of working out proportions ourselves).

We don’t want graphs since the scale can be manipulated. We don’t want averages since it is similarly open to abuse (mean, median or mode?).  If we make a mistake and relate subsets incorrectly then the people that we are communicating to may identify this and that in itself becomes part of the informational mix (perhaps we were ill-prepared and they should treat everything else we say with care). We actually don’t need sophisticated Natural Language Processing (NLP), BI or Semantic Web techniques to do this. It would be nice if it were linked data but concentrate on sourcing it first. We really are not that bothered about accuracy either (since its unlikely we are budgeting or running up accounts on governmental statistics).

Mostly we are making decisions on this information and we are happy rounding to the nearest ten percent. Are we against further immigration? Is there enough footfall traffic to open a flower shop? Do renters prefer furnished or unfurnished properties in London? Which party has the record for the least taxation? What are the major industries for a given area? We just need all the governmental data to be gathered and kept current (on at least a yearly basis) on one site with a moderately well thought-out Query By Example (QBE)-based interface. That’s it.

Reading and writing have been fundamental human rights in developed countries for decades. Broadband Internet access is fast becoming one too. Surely we need to see access to consistent, underwritten government statistics in this vein too. Where other political parties dispute the figures, they should be able to launch an inquiry into them. Too many inquiries will themselves become a statistic – open to interpretation. It is absolutely in the interest of organisations and current affair-aware individuals.  

06 August 2010

Say hello, wave goodbye

Google Wave is/was an interesting product. It is nothing less than an attempt to oust email as our primary communications medium and therein lays its story. It is basically Instant Messaging with two additional functions - the ability to automate a response so the user doesn't really see a difference between a human and a program (or Robot). It also supports mini-applications (or Gadgets) in a similar way to iGoogle and Facebook, blurring the lines between conversations and documents. Each individual exchange (or Wave) is logged and can be added to at anytime. It is better than email because it is real-time and richer exchanges can be made. It is worse than email because using this functionality is confusing for all but the tech savvy (having multiple Robots, Gadgets and humans all involved in the same Wave is rife with issues of ownership, progression and timing). Also the success of email is due to its ubiquity - everyone can use it. Hardly anyone knows what a - Wave is or what to do with it.

It was un-ceremoniously released to the general public a couple months ago (although some developers have been using it by invitation since its tech-celebrated debut last Summer) and pronounced dead by Google this week.

In Google's own words - "Wave has not seen the user adoption we would have liked". After less than fifty working days? For such a new and cart-upsetting product? With hardly any permeation of Wave concepts to the general public, next to no marketing and no specific commercial targeting? Of course! It was inevitable that this would be the outcome if an organisation were to review a key product launch after so short a time frame with so little support. This has to fall in the space of Google testing out new concepts/obtaining user feedback, with little fanfare (and so little possibility of failure) with a view to releasing the (inevitable) email replacement again at some point in the future.

Rich real-time communication supported by an easy/interactive interface to a computer (rather than a simple search dialogue) are most definitely the way forward. At the very least, collaborative programming environments benefit from this type of interface, as does trading and social networking proper. Google Wave uses the Extensible Messaging and Presence Protocol (XMPP) protocol which enables the most efficient peer-to-peer communication currently available. These needs are not going away and neither will the constituent parts of Google Wave (in some form).

Wave Robots are perhaps the most interesting component right now. In operation, they are a bit like a Turing Test or a Twitter-bot in that they facilitate a conversational and collaborative approach to establishing (and getting) what you want. Moving past the simple search/response model we have now, it is inevitable that there will need to be some interaction, some toing-and-froing and narrowing-down for something to understand (disambiguously) what it is that you actually want. It happens in real-life and so probably needs to be modelled in virtual-life.

Wave Robots are coded relatively straight-forwardly in Java using a Wave API. This element of Wave remains significantly undersold. As an example, I developed one to allow collaborative SPARQL queries to be made against any open linked data. A few working SPARQL queries have been uploaded to give you an idea.

Queries can be collaboratively changed in real-time and results can be sent out to a named Google Docs account as a spreadsheet. Once in Google Docs, there are several charting options available to make the data more accessible. Its like a hugely more powerful Google search (for those of a technical persuasion). You can add other Robots to the Wave to allow syntax highlighting of the code e.g. Kasyntaxy (although at the moment, this doesn't seem to specifically support SPARQL). As its all Robot based, its server-side and so you can use it on your mobile device - whatever that may be. Just add querytheweb@appspot.com to your Wave contacts to use. Type "cycle" to cycle between one of two endpoints. An endpoint is basically a SPARQL query engine. The two used are both generic meaning that they are not tied to a particular data set. This means your queries will need to use the FROM clause to identify which data you are querying (by URI). Type "help" to get a list of other options.

This Wave Robot took just a day to develop in Java and was deployed to Google App Engine. It is basic but even as it stands, it is still probably the best way to access and present open linked data currently available.

However long Google maintains/runs/supports Wave, its constituent parts will, at some point, be mainstream. The demise of Google Wave is not the demise of the email replacement concept. Get ahead of the game and develop some Wave Robots now. Get used to the concepts and the working environment Google has provided until the end of this year. Your work will be able to re-surface in some new product next year.

UPDATE: Interesting Scoble commentary on Google Wave ending here.

03 August 2010

Starbucks, here’s how you get into the outsourcing business

Hello Starbucks. You have made enormous success of the past fifteen years and have become an integral part of 21st century global cultural fabric. You have a store on the Great Wall of China, introduce new words to us like Yirgacheffe and are a bit like Viagra and The Simpsons (you aren’t technically the best but you’re easier to find and we have so much fun with you - we don’t care). We’d come to you eight-days-a-week if we could. You are to be applauded.

You have reached somewhat of an impasse though. You aren’t growing much; many of your stores are busy only at lunchtime and your brand doesn’t make us think – “that’s progressive!” anymore. Free Wifi/Foursquare deals, exclusive album sales and instant coffee will not get you those 40,000 stores you wanted a few years back. Your mantra of “A Starbucks on every corner” remains a good one. We know it’s tough out there but you just need to stick with your plan; maybe be a little bolder. Here’s what you do:

1) Recognise that you have to diversify. Your huge rent bill will surely eventually cripple you. You either need to dramatically cut costs (how? [Given your locations]), increase demand for coffee (how? [Given everyone drinks it anyway) or expand into new markets.
a. In 2008, the FM market was approximately $846BN, with approximately half ($426BN) apportioned to internal services meaning that the outsourced FM market in 2008 was worth around $420BN. It is surprisingly difficult to obtain free global branded coffee shop market sizes but for the UK at least, this is a $2.5BN market (2009). Let’s assume the UK is 5% of the global market (pretty standard), leaving a total market of $50BN. This means the FM market is roughly ten-times the size of the branded coffee-shop market.
2) Build more stores. Very roughly and taking the UK as a case in point; you have 750 stores and there are 30M people employed in the UK. With a few (contentious) assumptions, if you increased the number of stores tenfold (7,500), each store would need to accommodate just 200 people (1.5M/7,500). Obviously, it would need to take place over some years. A burgeoning senior citizen population, increased contract working and home-working will reduce the market, making the figure more manageable longer term.
a. Assume half actually work in an office (the rest in retail stores, hospitals, lumberjacks, machinists in plants, plumbers, nurses etc.). This takes the potential market down to 15M.
b. Assume half work for big name organisations that will want to maintain their own premises (taking it down to 7M - basically the SME market).
c. Assume half of those actually work in an office at any given time (the rest visiting clients, training, sick/vacation, travelling, WFH etc.) taking it down to 3M).
d. Assume you lose half the remaining people to other coffee-shops as the market is quite fragmented at the lower-end (taking it down to 1.5M).
3) Use your Starbucks card. At the moment this is used as a store card (arguably faster than paying otherwise). Put an RFID chip in and use it to track peoples employing organisations, access times and automatically bill the organisations according. You can hugely undercut existing FM services if you open-up this new revenue stream. You also expand your coffee market.
4) Build meeting rooms. Organisations need secure ad-hoc meeting rooms (HR, competitive, strategic discussions etc.). Let’s assume all new stores have one. These would need to be empty by default i.e. not having coffee drinkers in them and controlled by an online booking system. Let’s also put sophisticated video conferencing facilities in each one. Of course meetings are going to overrun and the people outside waiting for the next slot are going to have to either play nice/assertively claim their room but this happens in offices already. You might want to partner with others for larger, scheduled meetings.
5) Deploy IT Infrastructure. Cyber-cafes may be on the wane as cheap mobile devices rise but you would need to pop Internet terminals in your stores to mop up those without laptops at any given point. The shift to cloud-based computing means organisations won’t need development/file/application servers because they won’t have IT departments. Each store is also going to need a couple of wireless printer/scanner/copiers.
6) Go stealth. To avoid monstrously over-selling your brand, you are going to need to expand your stealth experiments on a wider scale. Focus individual stores on the areas they are in (creative/business/education etc.). Maybe change the decor to fit-in with local murals on the walls. It may be healthy to engender some competition between them. There would clearly need to be more variety in (interior and exterior) store design.
7) Forget the Baristas. Everyone knows this isn’t a skilled job. Stop pretending it is. It’s not like they spend years learning the correct Frappuchino for the chocolate Starbucks coin customers eat. They’re a bit like your Starbucks cards – over-engineered. Do give them training but make this in basic IT services in addition to working the coffee machine. They should need to know how to reboot the router, connect to it and any of the various wireless devices you have in your store from most portable devices, reset passwords, create accounts and escalate issues – that sort of thing. Ultimately, they’ll thank you for it. Future employers will place much more emphasis on IT service skills.
8) Culture shift slightly.
a. “Third-place”. This internal marketing needs to go. Yes – there’s a place for a safe haven, a "third place", that place outside of work and home where you know that you will be greeted with a smile and some respect. This is more than a coffee shop though. It is now a hackneyed term anyway. It was used at the Playstation 2 launch and is employed by countless gyms over the world. Is it really harder to create another market than get a good chunk of one (or both) of the existing ones?
b. Seat-saving. This needs to go. Someone cannot come in, sit down on one of your sofas and then “save” seats around them; dissuading potential users as their “friends are coming”. This prevents people from using stores for more than a quick coffee i.e. to work. Hot-desks are essential. It has to be first-come-first-served. Subtle advertising cues should be able to make this culturally frowned upon so it ceases to be an inhibitor.
c. Table Service. Your service isn’t great at lunchtimes. Queues can be large. People on laptops are dissuaded from leaving their laptop but they still want a coffee. Your new Trenta sizes may address this issue (slightly) but your smaller competitors offer table service for the same price.
d. Enhance security. You cannot have hoodies/Hells Angels/gypsies/beggars etc. associating with Senior Executives (can you?). You are likely going to need a security guard in most stores to gently dissuade them. Can’t they all do double-duty as Baristas too though? Security Barista? IT Barista? Table-service Barista? They can be more Pokémon than Borg.
e. Get out of food. You are not known for your food. Stay with chilled things that go well with hot coffee e.g. muffins, cakes, chocolates, biscotti etc. The hot breakfast sandwiches, wraps and salads all need to go. They take too long, are odorous, other brands do them better, people don’t want them in their office and will also want a break from you (their workplace) to go get them anyway. Get a food partner if you must and link it to your Starbucks card. We can work it out.

You have the cultural and economic reach to become our workplace. This isn’t something you can do quickly. It’s a goal over the next fifteen years. You can choose to move up from being an escape to being a destination. That journey will mean you need to take a leap and recognise that you’re big enough right now and that you’ll have missed service elements along the way (but that others will fill-in and contribute to the new eco-system). It may also mean you concentrate on the back-office, lose a bit of your élan/put your brand on the back-burner and cancel that order for corporation T-shirts.

A little like those faceless East-India type holding companies that keep going for hundreds of years. That’s OK though. You have certainly let your face grow long of late but to paraphrase The Beatles further; you are the coffee man. They are the coffee men. You are the water-cooler.

Why you should outsource your office to Starbucks

Office rules have relaxed a lot over the last ten years. Many office workers are now working from home regularly and when we are in the office; we are all comfortable having water-cooler discussions in a coffee-shop and taking our laptop in to work. Even Government recognises the benefit. The days of rigorously putting in a nine to five every day in a shirt and tie are all but gone. Why not go further and – do most or all of our work there; dispensing with the need for our current physical offices? Yes - it’s a little out-there but idly run with it for a while.

Yes, we are talking facilities management (FM) but – perhaps a narrower definition of it with all value-add services being done by partners. Could it actually be done? There is a definite market for professional, ad-hoc and casual working environments e.g. The Hub.

What would the benefits be?

1) Cost savings. FM has become an important industry/profession; responsible for approximately 5% of GDP in the most developed countries. After HR, FM accounts for an organisation’s greatest expenditure with 20% of total organisational expenditure. Significantly reducing this figure would allow smaller organisations to compete and stimulate growth.
2) High street utilization. High streets (or Main streets in the US) have lots of empty shop fronts. They could be re-commissioned (as Starbucks’) bringing much welcome new life and trade opportunities. This would add to our existing spaces - our homes (city/suburban), our work (downtown/business park) and the mall; maintaining a space distinct from these – a common (high street/main street). It’s ultimately about variety, possibilities, culture and escape.
3) Cultural cross-pollination. Organisations generally benefit from finding out about other organisation’s ideas/challenges. Some will consider this a drawback since they fear dilution of organisational “special sauce”. But what is this really? - People, process and IP – assets that aren’t going anywhere. It’s just the physical environment shifting.
4) Flexibility. Working in the same office is dull. Work in whichever Starbucks-office you like.

What would drawbacks be?

1) Noise. In the long-term, once Starbucks is recognised as more than a coffee shop, people will act differently there and noise will become no more of an issue than it currently is in offices. Headsets will help in the short-term.
2) Loss of status and image. If you are Swiss Re and you have spent $1BN on your gherkin building, you care about prestige, internal branding and providing a great environment for your workers. If there is a great environment elsewhere though – for free - is prestige and internal branding worth it? It is for the for big name organisations, the multi-nationals. For everyone else - no.
3) No physical storage. People keep things (coats, umbrellas etc.) at their place of work; they will need to keep them elsewhere. A small amount of lockers could be made available. HR and accounting would need to digitize all physical files. Is this a real issue? It shouldn’t be. For every filing cabinet, there’s a good reason why it should be in the cloud.
4) Team accommodation. If you are working solo or there are just a few of you then you can usually find seats together. There would be a problem accommodating project based teams (3-10) people in this way. This on-demand physical availability of teams is the biggest drawback to office-Starbucks. There would need to be a responsive real-time system that is capable of identifying empty seats together and placing a reservation on them.

Next up there will be an open letter to Starbucks asking them to consider our audacious plan.

13 June 2010

Semantic Web - Part 2 (Where is it?)

The enterprise in general has barely given these technologies a second thought to-date and the consumer has little idea about them (short of a vague idea that Web 3.0 will make the Web more intelligent). The concept of storing more data in a graph/RDF format remains disputed. MSFT, for example (as a populist bridge between the two), could be said to have a less-than-enthusiastic approach (all the main APIs are in Java with .NET versions managed only by enthusiasts). Few MSFT products use RDF internally (Media Management). None (including SQL) use OWL/SPARQL. Google have their recent Rich Snippets initiative but their Chart API currently only works with spreadsheets (rather than RDF). Facebook are actively pursuing the graph format as its on a secret logo inside Mark Zuckerberg's hoodie. Twitter have recently announced annotations - a way to add meta-data to tweets (which could be used semantically in future). Some breaking sites use RDF e.g. Glue, Drupal and Tripit but there are no killer apps yet.

The world awaits an application that inherently makes a tool of the Semantic Web. This will likely be focussed around disambiguation since wholesale data integration is a tougher nut to crack.

Reasons follow (descending order of importance to-date):

1) Openness. There are few clear reasons for the enterprise (the people who manage the vast majority of data) to be more open with data. Especially their raw data (as opposed to their massaged/reporting data). Government has a remit of transparency so they have more data in this format.
2) Federation. Of business processes. There are a host of facts (and rule-based logic linking them together) required to make the above scenario (and anything like it) function; all working over several different organisations with different remits. Each taking revenue share. Building federated applications; using other peoples data are also rife with issues of SLA and legalities.
3) Performance. Storing data as facts (triple or graph) results, in most cases, in a poor performing back-end. Relational databases are widely recognised as most efficient for most scenarios and therefore what most organisations use. Tests indicate triple queries are on average 2-20 times slower than an equivalent relational query. This alone instantly negates OWL/RDF and SPARQL for widespread use in the enterprise. There are also huge uncertainties over how distributed SPARQL queries and reasoners will work in practice - at Internet scale.
4) Ambiguity. Many believe that the real world is just too complex to be mapped:
a. It is just not able to be boxed and named in terms of categorisation and relationship (or ontology as it is called). This is essentially the same as the schema or Entity Relationship Diagram (ERD) for relational databases.
b. Linking ontologies together accurately (differing owners, world-views, drivers) is impracticable at Internet scale. Related to this is the almost philosophic issue around the use of the URI to identify a resource. It is hard to give a crisp definition of what 'representation' (of a resource) means in order to justify an assertion e.g. an image of Texas does not 'represent' Texas.
c. The recombination of facts necessary for inference are too simplistic e.g. the addition of - drunks (subject) frequent (predicate) bars (object) to our scenario might allow our agent to infer that the CEO is therefore a drunk. This may or may not be true but given known facts, it is a rash statement to make (especially if you are considering networking with him). You might not even know that the agent considers the CEO to be a drunk (it will just be one of the many factors that it uses to provide actions for you). This makes the situation much worse since bad decisions are then difficult to debug/improve.
5) Validation. Maintaining graph data is less structured than a relational format. It makes validating data integrity challenging. Data quality is a huge issue for the enterprise. Many CIOs will look to the ceiling when you ask them about their data quality and meta-data (needed by the Semantic Web to function) takes a back step in priority. Existing relational databases have much better tools and processes.
6) Semantics. The Semantic Web community have not helped themselves by casually using unclear terms, bolting on reasoning/inference to the Semantic Web definition and generally setting-up camps about well - semantics. Running parallel to Semantic Web development has been Natural Language Processing (NLP) development which, by contrast, has a clearer mission statement, can achieve some of the same goals and is actually more about human language semantics than the Semantic Web.

The first two above are related. There is simply an absence of reason for the enterprise to be more open with its data and link its transactions and business processes with other enterprises right now. However, it is fair to say there is a small revolution going on in respect of consumer personal openness or privacy. Consumers are seriously considering the benefits of publishing their purchases and sharing their screens for mass consumption. At the same time, Facebook is criticized for being too open and free-wheeling with personal data. What this tells us is that – the consumer wants to be in control of their data. This desire to control their data, all of it, all the time is essentially a form of self-actualisation; managing their on-line presence – facts, hopes, desires, history and image. Self-actualisation as popularised by Maslow is an established end-game for individuals and society in general. This will happen initially due to consumer demand. The enterprise will then be dragged into this process by more powerful forces of expectation and socialization (Web 2.0) – market forces. They will have little choice but to serve up their data and integrate with others. This could happen surprisingly quickly – within a year: Six-months of observed consumer demand, three-months to get an enterprise pilot going and another three for everyone to assess their new situation (slightly faster than Web 2.0 enterprise adoption) and bang – Web 3.0 is a competitive differentiator.

It is tempting to suggest that performance will become a non-issue due to better future technology (and network is more of a bottleneck than database for many applications) but the reality is the information explosion is so potent that it will broadly compete with Moore’s law to keep performance in play. Poor performance simply blocks enterprise adoption right now. There are creative solutions mooted e.g. swarm based reasoners. Similar high concept solutions are likely necessary since the performance gap is huge.

Ambiguity removal is essentially what the Semantic Web is all about. Objections around whether the world can be generally mapped or not are valid concerns; they can certainly be well illustrated by examples showing literally stupid inferences that can be made. Such examples are on miniscule datasets though. With the real live Internet (as with all complex systems) - outliers even-out at scale.

It is easy to imagine a massively connected system refining ontologies in real time based on user input (although certainly people would need some carrot to do this). It is less easy to imagine this happening for reasoning but a great algorithm could well emerge that will simply not be questioned (or even understood by the majority of the population) as it just works most of the time and anyway; when it doesn’t, you can’t really tell e.g. PageRank.

Tool interest will be stimulated once the inhibitors above start to be addressed. Tangential to this, a general consumer focus on their personal data will mean organisations are compelled to improve their customer data in order to meet new consumer driven activity (personal profile, reputation management).

The semantics point is ultimately an artefact of Semantic Web technologies not being commercialized. Once the enterprise gets involved, efficiency will drive out any hint of academic ownership. NLP is actually complementary to the Semantic Web since it is a front-end to allow access to the semantic data.

None of the inhibitors above are outright Semantic Web deal breakers. It is not conceptually flawed. If RDF/OWL or SPARQL have implementation requirements (Grouping, BI/aggregation, lineage etc.), they can change; they are still evolving. Collectively though, the inhibitors are assuredly deal breakers for widespread adoption in the five to ten year range. Before that time, as it seeps gradually into collective consciousness, performance and reasoning visibility will likely become the main inhibitors. It is not an all-or-nothing idea though. A little semantics goes a long way as they say. Next post will explore how elements of the Semantic Web can be utilised now.

12 June 2010

Semantic Web - Part 1 (What is it?)

The science fiction future that futurologists love predicting is some way away because meaningful data is currently not well integrated. You cannot have anything like the rampant “Imagine...” scenarios (typically using “Agents” and RFIDs) that futurologists speak to on a wide scale until (in terms of data) a Single Version of Truth (SVOT) is agreed upon for the data we are looking at (or disambiguation as it is called) and it is integrated wholesale.

Many of these scenarios will happen since they portrait a more efficient, opportunity filled or simply fun lifestyle; one enabled by information. Someone in the future will find a way to monetize it (maybe in a Minority Report/advertising kind of way) because market forces always apply.

Let us try one out in the next paragraph to illustrate:

Imagine you are in a bar. A distributed agent *knows* your location through your mobile. It also *knows* that the CEO of a large company in your industry is in the same bar and that that company is hiring for a role one level above your current one. Of course, it *knows* that you have been in your current role for a while. It does not *know* whether the CEO has any people with him at the moment (or is waiting for them) as he has restricted this information by his personal privacy settings. The agent suggests though (by texting or calling you) that it is worth you going up to the CEO and introducing yourself but not before it has already emailed him your resume and informed you that his drink of choice is a dry Martini.

This scenario is by turns – fantastically cool, moderately disturbing, highly efficient, opportunistically enabling and culturally changing. Some version of it will likely happen. What it is not is - artificially intelligent. Aside from the technology existing to do it all right now (mine GPS location data in real-time to determine matches against pre-set scenarios e.g. connecting people for jobs then checking social encounter rules and privacy settings) and the rule-based logic involved is straightforward (if people in same location and same industry and networking opportunity exists then...) the fact that all data required to fulfil this scenario is in different formats and in any case – secured (since there is little reason for the owners to share) will prevent our vista from happening. A lesser inhibitor is the rule-based logic - straightforward certainly; but the types of scenarios we are talking about require a lot of rules and it is unclear who will maintain them.

Future agent does not *know* anything, it has simply traversed the Internet (using look-up tables or schemas) to find an SVOT (because data is well integrated your location is stored disambiguously) and acted upon them as directed by predefined rule-based logic. Basically it has acted like any program around today (but on better data).

To fully integrate data you need senders and receivers to agree on a standard for both storage and communication (storing data in one format and communicating it in another defeats the purpose of data integration). This standard needs to be simple (since we also want to exchange data with mobile and embedded devices and generally want rapid and broad diffusion of the format) and not restrict others from building niche and more complex standards on top. The simplest standard is - a fact (Sales, personal etc.). Facts – of course, are condensed down to, something that the fact is about (the subject), something about the subject (the predicate) and something that is related to the subject through the predicate (the object). Examples are:

You (subject) located in (object) bar (predicate)
bar (subject) place of (object) socializing (predicate)

You cannot decompose facts any more than this otherwise; they would not tell us anything. It is conceptually akin to storing data at a physical level as 0s and 1s. Any type of information in the world can ultimately be stored as a list of linked facts in this way.

I (subject) purchase (predicate) Jack Daniels (object)

What is missing here is (you might ask) the timestamp and location; don’t we have to add them as columns four and five? Surely our future scenario needs that information? No – the idea is that we stick with the simple triple representation and it becomes:

I (subject) purchase (predicate) Jack Daniels (object)
My purchase (subject) was timed at (predicate) 1430HRS (object)
My purchase (subject) was located at (predicate) O'Malley's bar

While it is certainly true that much rule-based logic will always be required to fulfil the type of scenarios above, the amount of it is significantly reduced by the ability to make inferences using facts. Consider our facts:

You (subject) located in (object) bar (predicate)

This fact is automatically generated by your phone broadcasting its GPS location and it being matched to its location as commercial premises and finally cross-referenced against its business type.

bar (subject) place of (object) socializing (predicate)

This is a core fact that never changes and was created by some internationally accepted community effort. Because we have a like-term (bar), we can now infer that:

You (subject) are currently (object) socializing (predicate).

You have not specifically told anyone that you are socializing. It has not been encoded. Indeed it may be the middle of the afternoon so, in lieu of further information anyone may have otherwise assumed you were at work. We could have built-in rule-based logic to achieve the same result (if you are in a bar then you are socialising) but we have been saved the trouble by inference. Performance has been maintained as inference was in memory. This type of logic – syllogism has been around since at least Ancient Greeks. The implicit knowledge that both you and the CEO are physically in the same informal situation at the same time allows an opportunistic suggestion to be made; opening and closing a loop in real-time without having written a rule for it.

If everyone used the fact format (and its inferencing - managed by reasoners) for data storage and communication then we should all be able to resign from our jobs and hang-out in bars; secure in the knowledge that we have reached a technological plateau and an agent will at some point fix us up with a new role. Imagine.

The existing Internet is still very page focussed. You go to a job search site and have to trawl through pages of job descriptions, applying your own experience to decide which ones are interesting e.g. is a “Sales Executive” the same as a “Business Development Executive”? Does that term differ by industry category? If so, should I follow these links instead? You have to do a lot of work to find the things you want. So much so that you either give-up or end up with things that you don’t want. Using the fact format at Internet scale with disambiguation removes the necessity for humans to contextualise the data and so enables machines to better process it; which in-turn leads to more pervasive automation and those Imagine scenarios. This is what is meant by the Semantic Web (Web 3.0/SemWeb).

Is the Semantic Web inevitable? Opinion is divided to say the least. The Imagine scenarios (or variations of them) are inevitable. They absolutely require disambiguation and wholesale data integration. This in-turn has to necessitate a standard for storage and communication of facts. The Semantic Web is an attempt (the only one in town) to deliver that fact standard. Inferencing must be considered an optional component of the Semantic Web. It may uncover previously unknown information or simply be required to make it work at scale. It may require too much effort to be practicable for many due to its current reliance on an Open World Assumption (OWA).

The core promise of the Semantic Web is - disambiguation and wholesale data integration (Linked Data). It is the primary enabler for data-mashups. There are certain parallels with the early days of the Object Orientated Programming (OOP) movement. The Semantic Web is inevitable but it won't be called such. It will still be the Web.

There is an established fact format right now – RDF (Resource Descriptor Framework). Much of the supporting systems are also in place e.g. query and descriptor languages (SPARQL and OWL respectively). They have been quietly developed over the last eight years or so and all focus around the core premise of the simple fact (or triple as it is known [subject/predicate/object]). Next post will explore why we have yet to see widespread adoption of these technologies.

UPDATE: Great source of further reading links here.

05 June 2010

Beware the IDEs? Not so much

Whether to run an Integrated Development Environment (IDE) in a browser or not can be a surprisingly emotive subject. The majority of developers, if they do not already have it, want a well appointed workstation running Visual Studio (for .NET), Eclipse (for Java), Dreamweaver (for JavaScript/HTML/CSS) or similar. They baulk at the idea of running a browser-based IDE despite building browser-based applications and clear advantages to working in this way. It is worth logically and dispassionately looking at the situation.

Key advantages are:

1) Portability. Developers can work from anywhere with a web connection. Does this really happen? Offshore developers will typically have a work desktop and maybe a personal laptop. They may also come onshore for a short period and use a client desktop. If they work through an outsourcer/consultancy they may have another. This counts but it is a Dropbox like aspect of portability. Its main advantage is in supporting those lifestyle situations; when you were not scheduled to develop but – thinking about it - you can; either to get ahead or to react to real time issues – even when you are on vacation, travelling or visiting friends. There are also humanitarian reasons for being able to learn a trade and contribute without having to own even a $100 laptop but let us save that for a later post.
2) Collaboration. Developers can let others debug their code by sharing it via unique URL. Anyone navigating to that link will receive a separate, fully modifiable and executable version of the code. That means no API version inconsistencies come compile time. Real-time collaborative coding is also easier.
3) Efficiency. Hours and on larger projects – days are wasted in setting-up workstations and supporting environments e.g. source control/configuration management at the start of each project. Even if it has been done several times before, at-some-point it will fail because there are just too many variables on a desktop. This just goes away.
4) Cost. Older workstations can be used since compilation and anything else heavy is performed on the server. Cost also improved by collaboration and efficiency (2 and 3 above).

Key disadvantages are:

1) Usability. The browser is not perceived as being rich enough to accommodate a responsive editor, allow class/library management and debug. It is also impracticable to design and test a GUI due to the greater drag/drop precision required.
2) Connectivity. You need to be connected to the Internet in order to run your IDE and therefore develop. This limits your portability (1 above).

There are other points on both sides but above are the key ones. Let us hold those two disadvantages up to the tiniest bit of scrutiny:

1) Usability. Large text file editing in a responsive (next to no latency) way (the main criticism) can now be achieved using HTML5 Canvas/JavaScript. See Bespin and also Kodingen (extending Bespin and integrating with other services) for examples of this using Python/PHP/ROR. See CodeRun for an example of full-on code management using .NET/JavaScript. They are both free, quick and have clean efficient interfaces. Bespin is more of a work-in-progress and does not yet support all browsers though. GUI design is admittedly more of an issue right now but:
a. People already successfully use graphical editors in a browser e.g. Splashup, SUMO Paint and the recently Google acquired Picnick.
b. HTML5 adoption is affording more options here.
c. In both consumer and enterprise spaces, we are moving toward a widget-based UX making designing GUIs from scratch less common.
2) Connectivity. By this - offline development is meant i.e. enabling those circumstances where the developer is using a laptop (since if they were using a desktop surely it would be connected to a network?) and they are in an area of no Wi-Fi coverage (since otherwise they would have network access?). Granted this is a situation that occurs but consider further that this also means:
a. There are no collaboration or research possibilities available (no IM/no Google). If you get stuck when developing or need to clarify a technical point – you’re on your own.
b. You need a single professional and automated solution that synchronises all code, images, configuration files, media (that has previously been unit and integration tested and checked in) and also synchronises test data and potentially business rules (since it is good practice to keep these out of code). Either it will synchronise actual data/business rules in which case you need high grade encryption on your laptop (as you are likely using customer data) or your solution needs to de-sensitize the data/business rules somehow (and you need to have agreed this process with any customer). What kind of developers will be happy with these two restrictions? Only sole developers working on their own project.

Staying with the logical analysis of this, there are four decent reasons in favour of widespread use of browser-based IDEs and two against. There is also enough mitigation to mostly address the two negatives. There is clearly a significant net gain to be made. Side points around “developers won’t stand for it”, “Development is an art (it’s not!)” or “you just do not understand” etc. are emotional and really do not have a place in the decision. It is understandable (in a carpenter cherishing his chisel kind of way) but this noise is a real contributory factor as to why browser-based IDE have not made more of an impact to-date.

When their new OS comes out later this year, are Google really going to say – buy Chrome laptops, they can do everything your regular laptop can – unless you are developing? Given their long developer-friendliness; this would appear a peculiar move. Unless of course it is precisely because they are developer-friendly; that they will pander to populist developer belief and treat them as artisans needing powerful magical workstations? If they do this though – they risk confusing consumers and certainly the non-developing IT community as to their strategy at a time when they are already perplexed by what is happening with Chrome OS, Chrome and Android as a run-time environment.

Google have a new programming language – Go which currently needs OSX or Linux. Like the majority of languages today, it is C-based. It has been out for nearly a year but has not received a great deal of press. It will need a differentiator other than speed to compete (how many web applications really have a processor bottleneck these days?). Surely there is an opportunity to build a browser-based IDE for Go and enable a new generation of more casual (but also more open) developers around the world?

04 June 2010

Where did those mash-up tools go?

Just eighteen months ago, mash-up tools were big. The promise of building applications quickly, with minimal development and with context directly reflected within the application (as they are made by SMEs rather than IT resources) remains appealing. They looked to be the perfect tool for civic activists and knowledge workers alike. They were high in Garner’s top ten technologies to watch. The enterprise was starting to take them seriously as a mechanism to reduce crippling data integration challenges and consumers - bored on a diet of pushing links - thought they would be fun and/or a showcase for themselves in much the same way as blogs have been.

Now, the wind has shifted and MSFT’s Popfly and Google’s Mash-up Editor are both gone. Other niche vendors e.g. Sprout Builder similarly disappeared. Of the big players, Intel Mash-up Maker and Yahoo Pipes continue. All of them attempt/have attempted to straddle the void between consumer and enterprise spaces. This is an important distinction since – mash-ups even within the enterprise rely upon an ad-hoc passionate approach rather than formal development. They are typically built at home by passionate non-programmers who want to invest time in a single non-niche tool so whatever they learn is portable (work and social/other organisations etc). Even more so than blogging (which also uses one tool across both domains) a single tool is required - as more learning investment is required. With the exception of SAP’s Visual Composer (if you are an SAP shop), none of the tools mentioned have been particularly successful in either space let alone both. Why is this?

1) No UX standards. For both enterprise and consumer, there are no standard for widgets (or gadgets or web-parts or whatever else you call discrete self-contained UX functions). There are standards for business cards (vCard) – why not widgets?
2) Slow linked open data adoption. More of a consumer inhibitor at the moment. Linked Data is a core component of the Semantic Web vision that uses a specific set of current technologies. It provides a way to readily mix-and-match data in a meaningful way and so is a key enabling technology for mash-ups. Sig.ma is a simple mash-up tool for RDF data. Unlike other data-based mashups which tend to be query-based, Sig.ma is search-based. You enter a search term, the search engine gets your data, you remove the bits that are not relevant and (if you like) re-publish the data again as RDF (or other formats). This is perhaps too simplistic for users right now but it is evolving and could become a potent research tool. Any mash-ups that rely upon open linked data (ideally the best data of all) suffer from a lack of it; although this is changing as Government initiatives in particular publish RDF data for transparency reasons.
3) Insular data integration. Although the various flavours (SOA/ETL/EII/EAI) have been core CIO agenda topics for over five years, they have been mainly confined to the particular enterprise itself; especially in the narrow form of web services and have been of limited success even there. Extranet take-up, where common data is shared between parties in the supply chain, has been leisurely and this is precisely where mash-ups are needed. Very few organisations treat meta-data with the same focus as data. This means, mash-ups have trouble vouching for data currency and lineage which detracts from user take-on. It is possible that the solution to data integration will be the Semantic Web and a greater openness of organisations to share data. If so, we will be waiting some years yet.
4) Industry standards. Have been slow to be adopted. A notable exception here is XBRL for common reporting.
5) SSO. This is not exactly an issue for consumer mash-ups; assuming you are using open linked data but it is still a huge inhibitor to data integration for many organisations.
6) Blogging comparison. Although superficially similar to blogging - mashing (if you are going to do it properly) requires a thorough data understanding and a lot more effort than committing stream-of-conscious thoughts before they float away into the ether (or linking to other peoples work). Blogging is simply an easier way to achieve microcelebrity and also - because the majority of posts are written in the first-person (I think...) they can be defended (if need be) by the simple statement - "These are my opinions". This is a segueway into a whole minefield of philosophy, politics and culture that is best left alone. People mainly do. Only a small proportion will directly challenge someone's written thoughts. Publishing a mashup however - where you are vouching for the legitimacy of the data opens you up to direct challenge (people may have provably better data) and so people resist it. Only when the number of single versions of truth in the world become smaller and more consolidated will this situation change.

Popfly showed early promise as a learning tool but never really got past being a Silverlight showcase. Its focus was on looking good (geo mash-ups and slick drag/drop) rather than data integration. It did not use RDF at all. MSFT have been slow to utilise semantic web approaches in general. Some of their media management technologies use RDF in the background but their main focus has been around semantic search through the Semantic Engine. This initiative uses recently acquired Powerset technologies and will be released through SQL Server. PowerPivot has been significantly downsized from its original Project Gemini remit which would have provided not just a potent reporting mash-up environment but the management and support processes and infrastructure to QA and promote the mash-ups throughout the enterprise. This latter point is a key inhibitor to mash-up growth in general.

There are still signs of life in the mash-up tool space. Dapper is advertising focused. NetVibes is portal focussed. Alchemy API takes a content management/annotation approach (similar to Intel Mash-up Maker). Birst takes an analytic portal approach. Jackbe looks interesting; it appears to take a sales analytics approach. Snaplogic is not exactly a mash-up tool but it certainly takes a non-technical approach to data integration. None of them really play in that sweet-spot between enterprise and consumer though.

The parallel economic downturn has influenced mash-up take-on. The enterprise essentially stopped unproven development and consumers have yet to be sold on the concept but let’s face it – the focus on one or two drop-downs for configuration and Google Maps didn’t help either. The future of mash-ups is secure because it is the future of building useful applications quickly by SMEs and that will always be desirable. Five back-end data sources (database, RSS etc.) linked to five middle-ware components (aggregation, integration etc.) and five front-end components (analytics, data entry etc.) generates 125 possible combinations of application straight off the bat. Adding tailoring through filtering, personalisation and general configuration takes in into the thousands. This simple logic guarantees a future at least in the enterprise.

Whether the name “mash-up” has been tainted by its recent hiatus and – like its raison d’etre will need to resurface as part of something new remains to be seen. We can be sure though that the next generation of mash-up activity needs to be three things in order to stick around; interactive, data focused and usable in both enterprise and consumer spaces.

09 May 2010

We're all journalists. Can't we be designers too?

Why do web sites still look so dull? Ten years ago when broadband adoption and the content management tools we take for granted today were evolving this was understandable. Now though, the opportunity for anyone, anywhere to knock together a free site with federated content, streaming media, real-time integration, lots of storage, payment options, embedded BI, bolt-on UX functions (gadgets/web-parts), graphically engaging templates and critically - keep it up-to-date all without writing any code (maybe CSS if you really get into it) does not afford the same excuses.

So why are - organisational sites and blogs are just as dull looking now as they were a decade ago? We have real-time integration and video but; from a graphical design perspective, they haven’t really moved on. Of course the focus always needs to be on content; form follows function after all but can we not just have a little graphical integrity on top?

Here are the main reasons why sites are dull:

1) Trend. There is still a Googly/Web 2.0 trend for basic, almost amateurish looking text with minimal graphics and an informal almost chummy way of addressing the consumer even for large organisations. This is partially a marketing approach to engendering trust in the consumer (you are dealing with a friendly colleague-kid/folksy dad/generally laid-back individual who does not just want to take your money - he has a cause). This trend extends to the name (something snappy/abstract; typically with an ‘r’ at the end) and certain marketing approaches. As with all trends, people will tire of them. Given that this one has been running for a good decade, we are due for a new one. There is an embryonic trend among career designers for overlaying text on web sites. Previously this was considered anathema due to usability reasons. It has been used for ages in print media but all you need to do there is read. When things become interactive, use of this technique needs to be handled with care. It can make sites more organic and stimulating though.
2) Tool. Existing free tools do not go far enough in supporting design. The Blogger Template Designer released earlier this year does a great job of allowing the user to customise fonts, backgrounds (and has a good quality variety of templates to start with). It is highly functional. It stops though (as with most tools in this space) at providing support for the creative process itself. In a sense it provides too much freedom and not enough creative support.  It is not a stretch to imagine core graphic design principals – proximity (are you sure you want to put that there?), contrast (your pink on cyan scheme?), alignment and repetition being supported through a tool. Also, Google has a ready-to-go tool with “Find similar images” (potentially driven by Image Swirl that could help in building composite background images. The underutilized grid system could be implemented within a tool. At the very least, let’s have a way to get text anti-aliased.
3) Fear. Most people are not designers. Everyone is a critic on the Internet. People are worried that their design will be wrong, weird or somehow not good enough. These are the same worries people in the mid-nineties went through went they started blogging. Blogs have become complementary to the established newspaper industry. No-one (blogging in their first language at least) now feels threatened by comments criticising their use of semantics, grammar, structure etc. by journalists. Their enthusiasm more than makes up for any lack of storytelling narrative.
4) Advertising. It is straightforward for organisations and bloggers in particular to advertise through their sites. Over half of all bloggers do it. Typically, this entails embedding banners on a page in corporate colours/graphics/typeface (and so unable to be changed) which can break a design. There needs to be less restriction on what can be done to these banners (Photoshop etc.) so that they can be adapted to a particular design. This means more designers will eventually incorporate them. Bloggers should also consider whether the pennies they receive in advertising revenue is worth corrupting a design. If a blog starts getting serious hits (around 10K unique visitors/month) then by all means advertise but advertising is generally not compatible with graphical design right now.

This post has been about why sites are dull. They are by no means insurmountable reasons and a movement to amateur web design is to be welcomed. Why? – Because it will complement and ultimately improve the graphical design integrity of the Internet. Is this (non-tangible) result worth the effort? That is too large a question for the tail end of this post. This audience will be likely (fairly equally) split into those that maintain content is so much of the WWW-equation that it is barely even worth discussing presentation. Others will argue philosophically that life without art is impossible. Let us just leave with the knowledge that graphic design services are a $12BN/year industry in the US alone and attractive sites (like attractive people) will bring others back to them.