Friday, August 29, 2014

It's yer data! - how Google secured its future, and everyone else's

Dear Google,

This is a love letter and a call to action.

I believe we stand at a place where there is a unique opportunity in managing personal data.

There is a limited range of data types in the universe, and practically speaking, the vast majority of software works with a particularly tiny fraction of them.

People, for example. We know things about them.

Names, pictures of, people known, statements made, etc.

Tons of web applications conceive of these objects. Maybe not all, but probably most have some crossover. For many of the most trafficked apps, this personal data represents a very central currency. But unfortunately, up until now we've more or less been content with each app having it's own currency, that is not recognized elsewhere.

You can change that. You can establish a central, independent bank of data, owned by users and lent to applications in exchange for functionality. The format of the data itself will be defined and evolved by an independent agency of some sort.

There are two core things this will accomplish.

1) It will open up a whole new world of application development free from ties to you, Facebook, Twitter, etc.

2) It will give people back ownership of their data. They will be able to establish and evolve an online identity that carries forward as they change what applications they use.

Both of these have a dramatic impact on Google, as they allow you to do what you do best, building applications that work with large datasets, while at the same time freeing from you concerns that you are monopolizing people's data.

A new application world

When developing a new application, you start with an idea, and then you spend a lot of time defining a data model and the logic required to implement that idea on that data model. If you have any success with your application, you will need to invest further in your data model, fleshing it out, and implementing search, caching, and other optimizations.

In this new world, all you would do is include a library and point it at an existing data model. For the small fraction of data that was unique to your application, you could extend the existing model. For example:
from new_world import Model, Field

BaseUser = Model("https://new_world.org/users/1.0")

class OurUser(BaseUser):
    our_field = Field("our_field", type=String)

That's it. No persistence (though you could set args somewhere to define how to synchronize), no search, no caching. Now you can get to actually building what makes your application great.

Conceivably, you can do it all in Javascript, other than identifying the application uniquely to the data store.

And you can be guaranteed data interoperability with Facebook, Google, etc. So if you make a photo editing app, you can edit photos uploaded with any of those, and they can display the photos that are edited.

Securing our future

People have good reason to be suspicious of Google, Facebook, or any other organization that is able to derive value through the "ownership" of their data. Regardless of the intent of the organization today, history has shown that profit is a very powerful motivator for bad behaviour, and these caches of personal data represent a store of potential profit that we all expect will at some point prove too tempting to avoid abusing.

Providing explicit ownership and license of said data via a third-party won't take away the temptation to abuse the data, but will make it more difficult in a number of ways:

  • Clear ownership will make unfair use claims much more cut-and-dried
  • A common data format will make it much easier to abandon rogue applications
  • Reduced application development overhead will increase the competitive pressure, lowering the chance of a single application monopolizing a market and needing to grow through exploitation of its users data

A gooder, more-productive, Google

By putting people's data back in their hands, and merely borrowing it from them for specific applications, the opportunities for evil are dramatically reduced.

But what I think is even more compelling for Google here is that it will make you more productive. Internally, I believe you already operate similar to how I've described here, but you constantly bump up against limitations imposed by trying not to be evil. Without having to worry about the perceptions of how you are using people's data, what could you accomplish?

Conclusion

Google wants to do no evil. Facebook is perhaps less explicit, but from what I know of its culture, I believe it aspires to be competent enough that there's no need to exploit users data. The future will bring new leadership and changes in culture to both companies, but if they act soon, they can secure their moral aspirations and provide a great gift to the world.

(Interesting aside, Amazon's recently announced Cognito appears to be in some ways a relative of this idea, at least as a developer looking to build things. Check it out.)

Thursday, April 24, 2014

PyCon 2014

I've now been back from PyCon for a week, and I've got some thoughts to share.

Scope

It was huge.

I usually try to memorize everyone's names, and I have some habits that help me with that. But there were so many people, I think that may have fallen apart. :)

A lot of hero worship, as I met, or at least observed from a distance, many people who helped shape my views on software (+Tres Seaver in particular).

Conversely, I managed to avoid running into those attending from my employers (I'm looking at you, +Kenneth Reitz, Sandy Walsh, and probably someone from RIM/BlackBerry).

Diversity

All the promotion of the diversity was terrific. At the same time that it's great to be part of a movement that is markedly more female-friendly then the tech community at large, Jessica McKellar made it clear that we have so much farther to go. As the father of two girls, it's very important to me that we change the culture around technology to emphasize that there's no particular skillset or aptitude that's required for entry.

Software is our world, and we can empower EVERYONE to play a part in shaping it.

Content Overview

I enjoyed the talks that I went to, but I did skip more than I was intending to. I had trouble letting go of work, and there was a lot of content that was relatively beginner focused, or represented tutorials that I knew had high-quality online counterparts, should I need them. I feel like this was a deficiency of my own, and one I hope I handle better if I come back next year.

Meta programming

I've been flirting with creating my own language for a while now, and if I were to do so, it would probably be on top of Python. Thanks to talks by +Allison Kaptur and +Paul Tagliamonte, I feel much more prepared to do so.

Allison provided a brilliant guide to implementing the import mechanism from scratch. Having read +Brett Cannon's blog when he created importlib, I knew there was a huge amount of work that went into getting it right, so it was an intimidating area. Yet in 20 minutes Allison walked us through getting something functional.

Paul's talk on Hy was not quite so accessible, but perhaps even more inspiring. The relative ease with which Hy and Python can co-exist within the same project is just awesome, though mucking around with ASTs remains a bit of a scary idea.

Sprints

While I was skipping talks, I consoled myself in the thought that I would really engage during the Sprints (I had a day and half scheduled for these). But I didn't, and while I think that had more to do with me (once again, I worked), I'll share what I think could have been done better, in case anyone else felt the same way.

Technically Sprints started Sunday evening, but I get the feeling that no one was actually interested in running them Sunday evening (or maybe my timing was off). There were a handful of people there, but no clear organization or plan about what was to be worked on.

Monday morning, it was certainly better attended, but it still wasn't inviting. There was a central chart of what rooms contained what projects, but within the rooms there was no indication of who was working on what. From my limited time being involved in or running other short coding sessions, I was also surprised not to see much use of flipcharts or whiteboards.

I guess how I would do it, if I ran it next year (I'm not volunteering, yet), is provide signs for each project to put at their table, and encourage each of them to write out a list of goals for the sprint in a place where it can be publicly examined and crossed off. Perhaps also provide special shirts/hats/badges to the "leader" of each sprint. The experience I would like is for someone to be able to wander in, examine what each project is doing without actually bothering anybody, and then if they find something they think could fit them, to know who to ask.

Misc.

  • Ansible is what we're using at SFX, and while I've had some experience with it, I have a much more robust appreciation for it, thanks to +Michael DeHaan
  • Peep should be included in the standard library. Seriously.
  • Asyncio makes things most people do easier. Bravo!
  • iPython Notebook is cool, and +Catherine Devlins talk about executable documentation has me itching to try it out.

Conclusion

As someone who has been around the block but doesn't find much time to actually code anymore, I may not be the code audience for PyCon. But I'm still delighted to have finally made it to one, and I'm really tempted to make it a family trip next year.