Tag Archives: cloud

Privacy of Local Clouds

LocalCloudUnfortunately, local clouds do not automatically protect data — they are just an enabling conceptual element.  This inevitability leads to the  “measures & counter-measures” games:  for each measure we implement, spies will implement new counter-measures.  And for every  counter-measure, we need to respond with yet a new measure.  This is a game we have to play.

Security technology has a role to play.  For example, I think there are interesting possibilities using “taint tracking” to track different kinds of information (personal information, GPS location, physical addresses, email addresses, message content, …) as it flows through a program.  I recently read Wikipedia’s Taint checking and TaintDroid: An Information Flow Tracking System for Real-Time Privacy Monitoring on Smartphones.  Taint tracking enables programs to compute what they want, but their output values are “tainted” with all the types of input included in those outputs.  Users then set up policies to control what taint types are allowed to leave their local cloud.   This is a conservative approach, in that using ANY piece of information in a calculation (say, even just the number of characters in my email address) would taint a calculation’s output (even though the length of my email says almost nothing about my email address).  Taint tracking offers much finer control of information than “can program access information X at all”.

Of course, it is still possible that a “spy node” infects your local cloud and nefariously sends data up to a data-sucking, giga-cloud in the sky (perhaps called “SkyNet”).  This possibility must also be seriously considered.  And then we have to consider various possible counter-measures.

Overall security of a giga-cloud will be better than the security of a local deca-cloud.  But — and this is a big but — there is hundred million times more data in a giga-cloud and therefore its breach is a hundred million times more valuable.  Sure, some local clouds will be hacked, but it’s not very profitable (unless you’re a celebrity who likes to take nude selfies).


Examples of Local Clouds

LocalCloudWhen does it actually make sense to use a local cloud?  One great way to characterize some types of local clouds is by their scope:  a single idea that groups all the nodes in the cloud.

Personal local cloud: (also called a “personal area network”) There are potentially as many personal, deca-clouds as there are people.  The  cell-phone is the leading candidate for the compute node in this network.  But I can envision that we carry another, more powerful and probably somewhat more bulky, compute node device that lives out of sight in a backpack or purse.  Other nodes in this cloud would be watches, heart-rate monitors, etc.  But importantly,  these personal clouds could be isolated and independent from all other clouds (based on personal preferences).  The bulk of the data never needs to leave this cloud.  Hacking this cloud gives the hacker only one person’s data, which dramatically limits the hacker’s value.  Getting data on millions of people, requires hacking millions of independent personal clouds.

Car cloud:  This is a small deca-cloud (10) or hecto-cloud (100) that is local to a single car or vehicle.  The master compute node is contained somewhere in the car chassis.  It also contains nodes for the dashboard display and all kinds of other cool features.  Existing cars already have multiple (hundreds) of on-board computers that could in connected to the car’s cloud.  The most recent CES conference contained lots of interesting, new automobile functions, and all of these should be included in the car’s local cloud.  Again, if we localize most of the car cloud’s data to the car, privacy is enhanced.

Traffic intersection cloud:  There is value in having adjacent cars in limited communication with each other.  Organizing a cloud around a traffic intersection is one concrete example.  Here all the cars that enter the geographic neighborhood of a particular traffic signal, join the cloud, and share data about input and output directions, exact arrival times, urgency, etc.  The traffic intersection cloud would use this to adjust (physical or virtual) traffic light timings.  The primary compute node is physically located in that intersection. Cars leave the local cloud when they leave the intersection.

A different example is forming local clouds around groups of adjacent cars traveling down an interstate.  The cars intercommunicate to smooth and optimize local traffic flow.  There are no physical nodes dedicated to this kind of traveling cloud; each car provides compute resources and interconnects with other cars.

House cloud:  Houses and apartments provide another great example of local clouds.  There could be one or more compute nodes in the house that interconnect with refrigerators, other heavy appliances, security systems, entertainment systems, Christmas lights, etc.  All kinds of businesses will want to get their hands on pieces of this data, but why should we let them?

Neighborhood cloud:  A group of adjacent houses could get together to form a neighborhood cloud.  Many neighborhood clouds will have no nodes dedicated to the neighborhood; all nodes are associated with one of the member houses.  But wealthy neighborhoods might have dedicated security cameras and gate controllers.

Traveling Car cloud:  This is a short-lived, variation of the neighborhood cloud.  All cars within, say 100 meters of each other dynamically form a cloud.  Lead cars transmit traffic and road conditions to following cars.  Which corporations REALLY need to get their hands on data in this cloud?  What would they do with it that benefits the cars in the group?

In all of these examples, there is no reason to send all of this data to some single, giant, centralized giga-cloud.  No reason other than to allow the giga-cloud owner to mine all that data and sell your information to advertisers and other buyers for who knows what purposes.    

I realize there is not a black-and-white distinction between local and global clouds; the situation is gray because clouds can and will be interconnected.  But the local vs. global terminology emphasizes that we don’t need or want massive global clouds for many, many kinds of data.  Data, and the clouds that contain it, should be as local as possible.

Characterizing Clouds

LocalCloudIn a previous post, I made the distinction between global clouds (most aggregated) and local clouds (least aggregated.  But there can be clouds that aggregate at multiple levels.  What terminology should we use to describe such clouds?

Ownership:  Perhaps the most important attribute of any cloud is its owner.  I believe many/most people loose track of this idea.  Sure, they use the Facebook cloud, but they often forget that Facebook, the company, owns all that data and uses it for THEIR benefit.

Scope:  We can refer to clouds by the entity that aggregates them.  For example, we could talk about personal clouds, which are clouds scoped to a single person.  We could also have home clouds, neighborhood clouds, city clouds, state clouds, national clouds, and finally global clouds.  More about this in a future post.

Relative Size:  I originally considered using relative size terms like micro-cloud as a cloud a million times smaller than the currently largest cloud.  But, as the biggest clouds get bigger, all the terms must change, so relative size is a bad idea.

Absolute Size:  Rather than relative size, we should refer to clouds by their absolute size.  A cloud that aggregates roughly 10 total (atomic + compute) nodes is called a deca-cloud (the “deca” prefix means 10). A cloud that roughly aggregates 1000 nodes is called a kilo-cloud.   The largest clouds on the planet (Google, Facebook, etc) are giga-clouds or peta-clouds.

Network:  The network cost and complexity grows faster (super-linear) than the number of nodes it interconnects.  The network needs of 1000, isolated kilo-clouds is radically less than the network needs of a single giga-cloud, even though they contain the same number of nodes.  Raw network costs (in dollars) captures the idea, but is not as abstract as I’d like.

Bandwidth:  We can also characterize local and global clouds by their bandwidth (bytes sent per second) needs.  A larger cloud will naturally require more total bandwidth than a smaller cloud.  Bandwidth per node is a better measure.  But all hub and spoke networks (single hop between spoke nodes and the hub node) that perform the same computation have the same bandwidth/node; there is no distinction between a deca-cloud and a giga-cloud.

Local Clouds

LocalCloud“Cloud” is one of the industry’s current buzz words.  And while there are many good reasons to implement and use global clouds,  we shouldn’t thoughtlessly push everything to such clouds. Let me explain why I say this, and some of its implications.  In particular, I am going to argue that local clouds are a better solution for some problems.

A big reason for global clouds is the desire to aggregate all the data into a single cloud (OK, it may just be virtually aggregated into a single, distributed cloud infrastructure—but all the points below still apply).  This use case is the poster child for “big data”.  All the aggregated data can be analyzed by the global cloud, this way and that way,  correlations and patterns can be found, predictive models can be constructed and validated,  etc.

But a big reason against global clouds—the one that pushed me down this line of thinking—is privacy.  There is a saying: “If you don’t pay for a product, you ARE the product”.  How can so many large cloud infrastructures be free to users?  Because cloud providers harvest and sell their users and users’ data.  I don’t like paying for services anymore than the next guy, but I really don’t like be sold.  Me any my data are MINE !

Another huge, industry buzz word is IoT (Internet of Things).  The IoT movement starts at the opposite end of the aggregation spectrum with atomic nodes on each individual device.  Each atomic node has to support a minimal set of sensors and actuators plus enough networking to enable other, more capable, nodes to read and manipulate the atomic node.  Most atomic nodes do not need to also be a “compute node”, which contain a general purpose processor and (relatively) large amounts of storage.  Most atomic nodes only need to communicate (directly or indirectly) with a compute node.

A common view of IoT is that the compute node should be in the global cloud.  This makes some sense, because ALL of the data from ALL of the atomic nodes are available for global analysis in the global cloud.  But it also gives tremendous (unfair?) advantage to the cloud owner.  Typically the cloud owner claims ownership of all this data—and they certainly have legal claim to all derived data they construct from the individual data.  Plus, they can reliably infer a lot of personal information from it:  personal habits, when are you active, what do you like, which topics interest you, etc, etc, etc.  I see this as a huge loss of personal privacy.

But do we really need to push massive oceans of data from all the atomic IoT nodes up to global clouds?  Are global clouds the only viable location for compute nodes?  I say, “No”.

I believe a better way—at least for some applications—is to build local compute nodes, near the atomic nodes.  The local compute node(s) acts like a local cloud.  But none of this data is ever pushed up to a global cloud.  This achieves many of our objectives (atomic nodes remain simple, a single compute node can service multiple atomic nodes) but does not violate privacy.

I’ll have more to say about “local clouds” in upcoming posts.