Sunday, August 31, 2014
Friday, August 29, 2014
Thursday, August 28, 2014
Wednesday, August 27, 2014
Friday, August 22, 2014
Thursday, August 21, 2014
Friday, August 15, 2014
Thursday, August 14, 2014
Tuesday, August 12, 2014
Monday, August 11, 2014
Tuesday, August 5, 2014
Friday, August 1, 2014
Daily Tag 08/02/2014
-
Making Pinterest — Powering big data at Pinterest
tags: pinterest data hive truth qubole
- We orchestrate all our jobs (whether Hive, Cascading, HadoopStreaming or otherwise) in such a way that they keep the HiveMetastore consistent with what data exists on disk. This makes is possible to update data on disk across multiple clusters and workflows without having to worry about any consumer getting partial data.
- To balance flexibility, speed and isolation, we created an isolated working directory for each developer on S3.
- As we scaled to a few hundred nodes, EMR became less stable and we started running into limitations of EMR’s proprietary versions of Hive.
-
Subscribe to:
Posts (Atom)