Private Information Network

11 Aug 2003

The eternal computer problem: how to access my documents from anywhere?

I have been accessing my mail either via Outlook or the Outlook Web Access for the last 2 years, and don't see myself moving back to a POP model where the data is kept on the client. On the other hand keeping data on a server raises one important problem: who owns and maintains the server?
If a company owns it, there certainly is a cost and/or limitations attched to it: you may have to pay, the storage is limited, the future of the service is uncertain. If you host your own server, then the maintainance cost is actually your personal time, to patch the software, do backups (if you are very motivated) and maybe even code some tools.
Even if we set the cost question aside, the software is just not always available or good enough to do the job.

Email is one example, but there are other pieces of information that I usually want to access from multiple machines are: music and video files, bookmarks, notes (typed or handwritten), contact list, code and lately RSS feeds.
Tim O'Reilly recently argued in his blog that all software should be network aware, I would add to that software should support "personal network aware": my information should be safely available from any computer I own.
For example, as I recently mentioned in my SharpReader wishlist, I want to have the same feeds and same read/unread entries from the office and from home.

Microsoft approaches this need with the "anywhere on any device" slogan, but except for a couple of products (like Exchange with its web access and Passport), I haven't seen much personal data consolidation between multiple computers.
HailStorm/MyServices was a server-based approach to this, and it not only allowed me to keep some personal data in a place available from any networked computer, it also allowed me to share some of that information with selected partners. The problems with HailStorm were it was given quite some control to Microsoft over your data, but also its consent and data frameworks were too general and ambitious to keep it simple and performant.
The next generation Windows filesystem (winFS) seems to tackle the issue some more in Longhorn, by removing "the distinction between local and remote storage" and including "some built-in replication/offline capability" (sources The Register and MSDN Online Chats: .NET Strategy).

In the mean time, not many applications handle this multi-machine need, except for web applications, obviously.

Scenarios
Bookmarks
The author of "The FurryGoat experience" mentioned in this blog that he uses emails to store bookmarks. Who has never done that ;-)

There is now a bunch of software that tackles the issues of keeping lots of bookmarks and keeping them on multiple machines.
Netscape used to support storing your bookmarks (and contacts) on a central repository, via LDAP, although I never used it.
More recently, Synchit is a application that monitors your bookmarks and keeps a web-based copy synchronized. You can then access your bookmarks from another computer, using the same application or using a browser.
Personally, I use eBookmarks, which is a web-based bookmarks repository. Although it is very primitive (no sorting or organizing in categories), I find it very useful.

Notes
How do you keep track of ideas and reflections? Some people would once more rely on email.
I use eNotes which is a custom web-based blog-like system, except that the notes are private and only available to their author.
But what about Ink notes? I don't know of any solution for these yet.

Files
Of course ftp, web and scp access are very useful, but that obviously doesn't pass the NAT/firewall test. All of my desktop boxes are firewalled and I like them that way.
You can also shuttle files between work and home using email, but you can only push files (not pull), you run into size restrictions and you have to keep track of versions and merge changes by yourself.
In terms of P2P solutions to enable personal file-sharing networks, I tried BadBlue, but it crashed on me after 10 minutes, so I can't say whether it is a useful tool.
File access should allow both directory browsing and searching.

Streams
It'd be pretty cool to be able to stream video or sound between your personal boxes. This way you could keep all your mp3 on one box, have only one set of speakers for a couple desktop boxes and play videos remotely without waiting for the download to complete.

Summary
These scenarios can fall in at least three categories: remote file access, application synchronization, and streaming. A variant of file access could be delayed transfer (ala BITS).
Also, another dimension can be added by attempting to share these with other people. Although this would be very useful, it also adds quite some complexity in terms of access control.
Multiple problems can be separated: the communication (clients need to be able to find each other and communicate securely), the synchronization (data needs to be synchronized efficiently) and the access.

Communication framework
The first problem is to allow the machines from the group to know which group they belong to and communitcate with each other.
So far I only looked at JXTA's documentation, I'll plan on looking at the MS P2P SDK too. JXTA appears to have some really nice features: it allows firewall piercing, stream-specialized pipes, pipe relaying and "rendezvous" hosts (that you can use as first contact in the network). A couple languages have JXTA implementations, but I couldn't find a C# port. Also, I am not sure whether it supports having a peer behind a NAT accessible via a public IP (with port forwarding). (Update: JXTA supports this configuration)
I haven't looked at the security features in JXTA yet, it seems to allow good access control with membership and access services.

Because a client has to understand the JXTA protocol, it seems that the "rendezvous" hosts could be used as http gateways for some operations (file transfer, access control management) from "dumb" client machines.

Synchronization framework
This is probably a more difficult problem: data needs to be duplicated to n machines with no master copy; network traffic should be minimized; machines are not always online; you may have conflicting changes; applications may try access on data concurrently with synchronization process.
Here are a couple papers on data synchronization:
Principles of synchronization by Alan Schmitt.
XML synchronization and the Harmony project, on Benjamin Pierce's page.

Given that in our case, there shouldn't be that many conflicts (one user accessing his personal data), it may be possible to simplify the data model and get ride of conflicts. In some cases, changes (renaming a bookmark, moving a directory,...) could be considered a delete followed by an add.
Also, for larger data sets (email), to avoid large storage and bandwidth problems, some replicas might not need to have the complete data. Some could be pulled on demand instead..

Access control
One of the questions is what level of granularity is needed for the permissions: per machine, per app/machine, per user,... ?

A finer access control is needed as soon as you want to run clients on boxes that you don't completely own: running a file transfer client from somebody else's box, allowing a friend to connect to your private network,...
The finer the granularity, the more complex the permission management becomes. It also raises the question of where this data should reside and how should it be accessed and maintained.

A possible approach to managing the identity problem is to have token that bundle an identity with a specific private network. When you want to give a friend some access to your network, generate such a token and send it to him. The token will get added to his network's personal distributed keyring and to your network's distributed access manager. You can use your access manager to control and revoke his access. One problem is how to prevent your friend from giving a copy of the token to one of his own friends.

Conclusion
I tried to make a case that private information networks could be a great user-oriented development in personal computing, and tried to outline the key problems to be solved for this type of infrastructure. Although they aren't mentioned, user experience and UI design problems also need to be tackled, to allow a user-friendly access to one's private network.

Security is probably one of the problems that needs more thinking, as a hacker gaining access to your private network automatically gains access to resources from multiple machines. I would think a solution is to only allow non-sensitive data onto this network (no financial records, social security information, encrypted emails, ...). In any case, sandboxing (chroot, ...) and other mitigation techniques (don't mix data and scripts/commands like Word's macros) should be used to limit the impact of one peer being compromised.

Some scenarios can be handled by stand-alone applications, while some others (media streaming, application specific data synchronization) need to be integrated existing applications. The basic file transfer seems to be a good choice to begin experimenting with this type of infrastructure, as it can be handled in a stand-alone application.

Comments are welcome, please let me know if the "private information network" would make sense to you and also if you know any project that already tries to solve this.

Update:
I have just tried Nullsoft's WASTE personal file-sharing program. It only covers the file-sharing part of the private information network, but it is very convenient.

Update:
I ran into this video presentation of Chandler by Mitchell Kapor (hosted by MURL). It definitely seems related to the ideas discussed above, with a focus on email, IM and calendar information and both ubiquitous access and information sharing.

Curiosity is bliss Archive TIL Tools Search RSS About

Julien Couvreur's programming blog and more

Private Information Network