-
Streaming API track approx. 5000 hashtags, best approach to get results? | Twitter Developers
-
Request Rate and Performance Considerations - Amazon Simple Storage Service
- If you must use sequential numbers or date and time patterns in key names, add a random prefix to the key name.
-
-
HanWorks Research: Down the Rabbit Hole with Kafka
- What is stored in ZooKeeper regarding your Kafka broker?
Basically the mapping from broker id to (host, port) and the mapping from (topic, broker id) to number of partitions. - The big difference between here is that the high level api does broker discovery, consumer rebalancing and keep track of state (i.e. offsets) in zookeeper, while the low level api does not.
- The benefit of using the high level api is that a consumer will not be starved if a broker fails in a given cluster of kafka brokers. When the failed broker is restored, messages will then be consumed from that broker.
-
Friday, March 28, 2014
Daily Tag 03/29/2014
Thursday, March 27, 2014
Wednesday, March 26, 2014
Daily Tag 03/27/2014
-
Going Go Programming: Installing Go, Gocode, GDB and LiteIDE
"LiteIDE"
-
Java 8: From PermGen to Metaspace ~ Java EE Support Patterns
-
Java 8 Released! — Lambdas Tutorial | Javalobby
"Lambdas"
Tuesday, March 25, 2014
Monday, March 24, 2014
Sunday, March 23, 2014
Saturday, March 22, 2014
Friday, March 21, 2014
Daily Tag 03/22/2014
-
Scalable real time state update with Storm groupBy / persistentAggregate / IBackingMap | Svend
-
-
- We upped the number of file descriptors since we have lots of topics and lots of connections.
- We upped the max socket buffer size to enable high-performance data transfer between data centers
- We generally feel that the guarantees provided by replication are stronger than sync to local disk, however the paranoid still may prefer having both and application level fsync policies are still supported.
- JMZ and the 4 letter commands
-
-
tags: DSRG MIT distributed reading paper
-
tags: netflix data architecture
-
Stream Processing and Mining just got more interesting - Data
Thursday, March 20, 2014
Daily Tag 03/21/2014
-
- This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group.
-
-
Running and debugging Clojure code with Intellij IDEA ~ Tomek Lipski's blog
-
David, a dependency management tool for Node.js projects
tags: david dependency npm node
Wednesday, March 19, 2014
Daily Tag 03/20/2014
-
- You might be wondering – how do you do something like a "windowed join", where tuples from one side of the join are joined against the last hour of tuples from the other side of the join.
-
-
Trident tutorial · nathanmarz/storm Wiki
tags: transaction id atomic storm
- store the transaction id with the count in the database as an atomic value
-
-
tags: windowed aggregation scidb
-
Concepts · nathanmarz/storm Wiki
- transforming a stream of tweets into a stream of trending images requires at least two steps: a bolt to do a rolling count of retweets for each image, and one or more bolts to stream out the top X images (you can do this particular stream transformation in a more scalable way with three bolts than with two).
-
-
Guaranteeing message processing · nathanmarz/storm Wiki
- The second value is a 64 bit number called the "ack val". The ack val is a representation of the state of the entire tuple tree, no matter how big or how small. It is simply the xor of all tuple ids that have been created and/or acked in the tree.
-
Tuesday, March 18, 2014
Monday, March 17, 2014
Friday, March 14, 2014
Thursday, March 13, 2014
Wednesday, March 12, 2014
Tuesday, March 11, 2014
Friday, March 7, 2014
Daily Tag 03/08/2014
-
High Scalability - High Scalability - The WhatsApp Architecture Facebook Bought For $19 Billion
tags: whatsapp architecture erlang facebook scalability
- It’s for the 450 million active users, with a user based growing at one million users a day, with a potential for a billion users. Facebook needs WhatApp for its next billion users. Certainly that must be part if it. And a cost of about $40 a user doesn’t seem unreasonable, especially with the bulk paid out in stock. Facebook acquired Instagram for about $30 per user. A Twitter user is worth $110.
- Interesting to note Facebook Chat was written in Erlang in 2009, but they went away from it because it was hard to find qualified programmers.
- How does the registration process work internally in Whatsapp? WhatsApp used to create a username/password based on the phone IMEI number. This was changed recently. WhatsApp now uses a general request from the app to send a unique 5 digit PIN. WhatsApp will then send a SMS to the indicated phone number (this means the WhatsApp client no longer needs to run on the same phone). Based on the pin number the app then request a unique key from WhatsApp. This key is used as "password" for all future calls. (this "permanent" key is stored on the device). This also means that registering a new device will invalidate the key on the old device.
- Initially ran once a minute. As the systems were driven harder one second polling resolution was required because events that happened in the space if a minute were invisible. Really fine grained stats to see how everything is performing.
- Keep server count low. Constantly work to keep server counts as low as possible while leaving enough headroom for events that create short-term spikes in usage. Analyze and optimize until the point of diminishing returns is hit on those efforts and then deploy more hardware.
-
-
Centralized Logging - Jason Wilder's Blog
tags: centralized logging
-
Analytics/Kraken/Logging Solutions Recommendation - Wikitech
-
Using elasticsearch, logstash & kibana to create realtime dashboards // Speaker Deck
Thursday, March 6, 2014
Wednesday, March 5, 2014
Daily Tag 03/06/2014
Monday, March 3, 2014
Sunday, March 2, 2014
Saturday, March 1, 2014
Daily Tag 03/02/2014
-
Immediate Opening: Quantitative Research Analysts (Data Scientists) | LinkedIn
- • Computer proficiency should include familiarity with UNIX, shell scripting, distributed/parallel computing, a scripting language such as Python or Perl, fluency with regular expressions, and one of: Haskell, Julia, R, or MATLAB. Demonstrated proficiency in a wide range of programmer’s tools such as sed, awk, xargs, etc. Familiarity with scikit-learn, HLearn, Orange, Weka, or another machine learning library.
-