-
Making Pinterest — Powering big data at Pinterest
- We orchestrate all our jobs (whether Hive, Cascading, HadoopStreaming or otherwise) in such a way that they keep the HiveMetastore consistent with what data exists on disk. This makes is possible to update data on disk across multiple clusters and workflows without having to worry about any consumer getting partial data.
- To balance flexibility, speed and isolation, we created an isolated working directory for each developer on S3.
- As we scaled to a few hundred nodes, EMR became less stable and we started running into limitations of EMR’s proprietary versions of Hive.
-