Friday Blast #30
Differential privacy for dummies (2016) - differential privacy is a way of working with data which doesn’t provide information about any one person / item in the data set. A neat example is asking people to identify as criminals/drug users etc. There is a protocol of adding noise to the data, which can be done by each person being asked in turn, so that (1) you don’t know if any one person is actually in the target class and (2) you can compute population queries such as total number of people in the class, proportion of people in the class etc. There’s a bunch more examples there as well as a brief theoretical overview. Interesting stuff for the age of big data and big surveillance.
SSTable and log structured Storage: LevelDB (2012) - we’ve covered Log-structured merge-tree here before, but mostly in the context of their use in Big Data storage engines like BigTable, HBase, Cassandra etc. LevelDB is a storage engine built with the same methods, but aimed at “small data”. It’s an alternative to SQLite or flat files. The article goes into some details about SSTables, which are the underlying data structure in many database engines (of the NoSQL variety at least).
Protocol buffers, Avro, Thrift & Message Pack (2011) - just a bit of commentary on the various serialization and RPC mechanisms.
Measuring & optimizing I/O performance (2009) - written in the age of disks, there’s still a bit of interesting stuff here. Mainly around the use of
Terraforming Stack Overflow Enterprise in AWS (2018) - the story of how Palantir “productionized” their install of Stack Overflow Enterprise. A nice overview of the SO architecture is included as well. If you’re a big company looking for a knowledge management solution, do know that SOE exists.
Evaluating options for Amazon’s HQ2 using Stack Overflow data (2018) - a data journalism article from our data team’s Julia Silge. Amazon is looking for a place to setup their second headquarters in the US and are running a sort of contest for cities. Much like the Olympics it seems like it’s a losing proposition for cities, but they can’t afford not to participate. Julia looked at technical consideration of the workforce in the 20 finalist cities to see how well they’d match the things Amazon would need.