Monthly Archives: January 2015

Privacy of Local Clouds

LocalCloudUnfortunately, local clouds do not automatically protect data — they are just an enabling conceptual element.  This inevitability leads to the  “measures & counter-measures” games:  for each measure we implement, spies will implement new counter-measures.  And for every  counter-measure, we need to respond with yet a new measure.  This is a game we have to play.

Security technology has a role to play.  For example, I think there are interesting possibilities using “taint tracking” to track different kinds of information (personal information, GPS location, physical addresses, email addresses, message content, …) as it flows through a program.  I recently read Wikipedia’s Taint checking and TaintDroid: An Information Flow Tracking System for Real-Time Privacy Monitoring on Smartphones.  Taint tracking enables programs to compute what they want, but their output values are “tainted” with all the types of input included in those outputs.  Users then set up policies to control what taint types are allowed to leave their local cloud.   This is a conservative approach, in that using ANY piece of information in a calculation (say, even just the number of characters in my email address) would taint a calculation’s output (even though the length of my email says almost nothing about my email address).  Taint tracking offers much finer control of information than “can program access information X at all”.

Of course, it is still possible that a “spy node” infects your local cloud and nefariously sends data up to a data-sucking, giga-cloud in the sky (perhaps called “SkyNet”).  This possibility must also be seriously considered.  And then we have to consider various possible counter-measures.

Overall security of a giga-cloud will be better than the security of a local deca-cloud.  But — and this is a big but — there is hundred million times more data in a giga-cloud and therefore its breach is a hundred million times more valuable.  Sure, some local clouds will be hacked, but it’s not very profitable (unless you’re a celebrity who likes to take nude selfies).

 

Examples of Local Clouds

LocalCloudWhen does it actually make sense to use a local cloud?  One great way to characterize some types of local clouds is by their scope:  a single idea that groups all the nodes in the cloud.

Personal local cloud: (also called a “personal area network”) There are potentially as many personal, deca-clouds as there are people.  The  cell-phone is the leading candidate for the compute node in this network.  But I can envision that we carry another, more powerful and probably somewhat more bulky, compute node device that lives out of sight in a backpack or purse.  Other nodes in this cloud would be watches, heart-rate monitors, etc.  But importantly,  these personal clouds could be isolated and independent from all other clouds (based on personal preferences).  The bulk of the data never needs to leave this cloud.  Hacking this cloud gives the hacker only one person’s data, which dramatically limits the hacker’s value.  Getting data on millions of people, requires hacking millions of independent personal clouds.

Car cloud:  This is a small deca-cloud (10) or hecto-cloud (100) that is local to a single car or vehicle.  The master compute node is contained somewhere in the car chassis.  It also contains nodes for the dashboard display and all kinds of other cool features.  Existing cars already have multiple (hundreds) of on-board computers that could in connected to the car’s cloud.  The most recent CES conference contained lots of interesting, new automobile functions, and all of these should be included in the car’s local cloud.  Again, if we localize most of the car cloud’s data to the car, privacy is enhanced.

Traffic intersection cloud:  There is value in having adjacent cars in limited communication with each other.  Organizing a cloud around a traffic intersection is one concrete example.  Here all the cars that enter the geographic neighborhood of a particular traffic signal, join the cloud, and share data about input and output directions, exact arrival times, urgency, etc.  The traffic intersection cloud would use this to adjust (physical or virtual) traffic light timings.  The primary compute node is physically located in that intersection. Cars leave the local cloud when they leave the intersection.

A different example is forming local clouds around groups of adjacent cars traveling down an interstate.  The cars intercommunicate to smooth and optimize local traffic flow.  There are no physical nodes dedicated to this kind of traveling cloud; each car provides compute resources and interconnects with other cars.

House cloud:  Houses and apartments provide another great example of local clouds.  There could be one or more compute nodes in the house that interconnect with refrigerators, other heavy appliances, security systems, entertainment systems, Christmas lights, etc.  All kinds of businesses will want to get their hands on pieces of this data, but why should we let them?

Neighborhood cloud:  A group of adjacent houses could get together to form a neighborhood cloud.  Many neighborhood clouds will have no nodes dedicated to the neighborhood; all nodes are associated with one of the member houses.  But wealthy neighborhoods might have dedicated security cameras and gate controllers.

Traveling Car cloud:  This is a short-lived, variation of the neighborhood cloud.  All cars within, say 100 meters of each other dynamically form a cloud.  Lead cars transmit traffic and road conditions to following cars.  Which corporations REALLY need to get their hands on data in this cloud?  What would they do with it that benefits the cars in the group?

In all of these examples, there is no reason to send all of this data to some single, giant, centralized giga-cloud.  No reason other than to allow the giga-cloud owner to mine all that data and sell your information to advertisers and other buyers for who knows what purposes.    

I realize there is not a black-and-white distinction between local and global clouds; the situation is gray because clouds can and will be interconnected.  But the local vs. global terminology emphasizes that we don’t need or want massive global clouds for many, many kinds of data.  Data, and the clouds that contain it, should be as local as possible.

We Must Not Loose Control of Artificial Intelligence

There have been lots of science fiction stories where a scientist creates a technology with the best of intentions, but then something unforeseen happens, and the technology gets away from him.  The book Frankenstein was probably the first.  The movie Trancendence is a recent example where an AI project goes horribly wrong.  There are many other examples.

I really love AI because it truly can change our world for the better.  Such techniques will allow us to do all kinds of things that are unimagined today.  But there is also a real possibility that such powerful technologies can be used against us by evil people, and, yes, even the possibility that they turn into evil autonomous agents.  It is up to us to be careful and prudent about such possibilities.

The Future of Life Institute published an open letter  urging additional research into ensuring that we don’t loose control of AI’s tremendous capabilities.  The letter is short, but contains, in part

We recommend expanded research aimed at ensuring that increasingly capable AI systems are robust and beneficial: our AI systems must do what we want them to do. 

I encourage you to ready this brief letter.  And—If this concerns you like it concerns me—to join me and sign the open letter.

Humans Need Not Apply

Under that category of “technology is neither good nor bad; and it is seldom neutral”,  I just watched a very interesting and well-done video about the impact of intelligent machine technology on our jobs.

In part, it compares horses and people.  When the automobile started entering our economy, horses might have said, “This will make horse-life easier and we horses can move to more interesting and easier jobs”.  But that didn’t happen; the horse population peaked in 1915 and has been declining ever since.  I’m sure we all agree that intelligent and cognitive applications will certainly replace some jobs.  The question is: will there be enough new jobs to keep humans fully employed?   Might unemployment raise to 45% as the video suggests?  How many future job descriptions will contain the phrase “Humans Need Not Apply”.

What the video fails to discuss is how massive unemployment might be averted.  I’d like to see even some proposals or suggestions.  Do you have any ideas?

I would also like to think that I—a high-tech, machine-learning, cognitive-app, AI technologist— would be immune to these kinds of changes.  But I’m less certain after watching this video.    You should definitely check it out.

Characterizing Clouds

LocalCloudIn a previous post, I made the distinction between global clouds (most aggregated) and local clouds (least aggregated.  But there can be clouds that aggregate at multiple levels.  What terminology should we use to describe such clouds?

Ownership:  Perhaps the most important attribute of any cloud is its owner.  I believe many/most people loose track of this idea.  Sure, they use the Facebook cloud, but they often forget that Facebook, the company, owns all that data and uses it for THEIR benefit.

Scope:  We can refer to clouds by the entity that aggregates them.  For example, we could talk about personal clouds, which are clouds scoped to a single person.  We could also have home clouds, neighborhood clouds, city clouds, state clouds, national clouds, and finally global clouds.  More about this in a future post.

Relative Size:  I originally considered using relative size terms like micro-cloud as a cloud a million times smaller than the currently largest cloud.  But, as the biggest clouds get bigger, all the terms must change, so relative size is a bad idea.

Absolute Size:  Rather than relative size, we should refer to clouds by their absolute size.  A cloud that aggregates roughly 10 total (atomic + compute) nodes is called a deca-cloud (the “deca” prefix means 10). A cloud that roughly aggregates 1000 nodes is called a kilo-cloud.   The largest clouds on the planet (Google, Facebook, etc) are giga-clouds or peta-clouds.

Network:  The network cost and complexity grows faster (super-linear) than the number of nodes it interconnects.  The network needs of 1000, isolated kilo-clouds is radically less than the network needs of a single giga-cloud, even though they contain the same number of nodes.  Raw network costs (in dollars) captures the idea, but is not as abstract as I’d like.

Bandwidth:  We can also characterize local and global clouds by their bandwidth (bytes sent per second) needs.  A larger cloud will naturally require more total bandwidth than a smaller cloud.  Bandwidth per node is a better measure.  But all hub and spoke networks (single hop between spoke nodes and the hub node) that perform the same computation have the same bandwidth/node; there is no distinction between a deca-cloud and a giga-cloud.

Local Clouds

LocalCloud“Cloud” is one of the industry’s current buzz words.  And while there are many good reasons to implement and use global clouds,  we shouldn’t thoughtlessly push everything to such clouds. Let me explain why I say this, and some of its implications.  In particular, I am going to argue that local clouds are a better solution for some problems.

A big reason for global clouds is the desire to aggregate all the data into a single cloud (OK, it may just be virtually aggregated into a single, distributed cloud infrastructure—but all the points below still apply).  This use case is the poster child for “big data”.  All the aggregated data can be analyzed by the global cloud, this way and that way,  correlations and patterns can be found, predictive models can be constructed and validated,  etc.

But a big reason against global clouds—the one that pushed me down this line of thinking—is privacy.  There is a saying: “If you don’t pay for a product, you ARE the product”.  How can so many large cloud infrastructures be free to users?  Because cloud providers harvest and sell their users and users’ data.  I don’t like paying for services anymore than the next guy, but I really don’t like be sold.  Me any my data are MINE !

Another huge, industry buzz word is IoT (Internet of Things).  The IoT movement starts at the opposite end of the aggregation spectrum with atomic nodes on each individual device.  Each atomic node has to support a minimal set of sensors and actuators plus enough networking to enable other, more capable, nodes to read and manipulate the atomic node.  Most atomic nodes do not need to also be a “compute node”, which contain a general purpose processor and (relatively) large amounts of storage.  Most atomic nodes only need to communicate (directly or indirectly) with a compute node.

A common view of IoT is that the compute node should be in the global cloud.  This makes some sense, because ALL of the data from ALL of the atomic nodes are available for global analysis in the global cloud.  But it also gives tremendous (unfair?) advantage to the cloud owner.  Typically the cloud owner claims ownership of all this data—and they certainly have legal claim to all derived data they construct from the individual data.  Plus, they can reliably infer a lot of personal information from it:  personal habits, when are you active, what do you like, which topics interest you, etc, etc, etc.  I see this as a huge loss of personal privacy.

But do we really need to push massive oceans of data from all the atomic IoT nodes up to global clouds?  Are global clouds the only viable location for compute nodes?  I say, “No”.

I believe a better way—at least for some applications—is to build local compute nodes, near the atomic nodes.  The local compute node(s) acts like a local cloud.  But none of this data is ever pushed up to a global cloud.  This achieves many of our objectives (atomic nodes remain simple, a single compute node can service multiple atomic nodes) but does not violate privacy.

I’ll have more to say about “local clouds” in upcoming posts.

2015 is the Mole-of-Bits Year

MoleOfBitsTake a look at this graphic from IDC.  It estimates the totals number of bits in the world over time.  There are many things going on in this graphic.  It shows that enterprise data is certainly growing, but does not comprise the majority of data; sensor data and social media data far outstrip it.

It also shows that a huge fraction of all data contains uncertainty. This has dramatic implications for old school programmers.  Programming absolutely must continue to adopt new approaches to handle uncertain input data.  Particularly for emerging cognitive applications. The traditional excuse of classifying ANY input errors as “garbage in” just won’t not cut it any more.

But my favorite part of this graphic are the axes; forget the graphic curves (how often does that happen). The x-axis shows time with 2015 on the far right.  The y-axis shows the number of bits in the world. For the chemists among you, 10 to the 23rd is an essentially Avagadro’s number (6.02 E23), which is the number molecules in a “mole”.  What does this data mean?  Imagine you’re holding a tablespoon filled with water.   You’re holding a mole of water molecules.  The chart above implies that this year, 2015, there will be one bit of data for EVERY molecule of atomic H2O in that tablespoon.  To me, that is nothing short of INCREDIBLE and AWESOME.  When I was growing up, I remember trying to imagine how we would ever have such a gi-nor-mous number of macroscopic things.  Well here we are, and in my lifetime.  I’m moved.

So I hearby officially declare

2015 as the “mole of bits” year