Is there a story for the Hadoop Storage Stack (HDFS+HBase) on Solid State Drive (SSD) ? This is a question that I have been asked by quite a few people in the last two days, mostly by people at ...
http://hadoopblog.blogspot.com/2012/05/hadoop-and-solid-state-drives.html
Recently, I was asked to write up about my vision of a BigData Benchmark. That begs the question: What is BigData? Does it refer to a dataset that is large in size, and if so, what is large? Does...
http://hadoopblog.blogspot.com/2012/02/salient-features-for-bigdata-benchmark.html
I had earlier blogged about why Facebook is starting to use Apache Hadoop technologies to serve realtime workloads . We presented the paper at the SIGMOD 2011 conference and it was very well re...
http://hadoopblog.blogspot.com/2011/07/realtime-hadoop-usage-at-facebook.html
This is the second part of our SIGMOD-2011 paper that describes our use case for Apache Hadoop and Apache HBase in realtime workloads. You can find the first part here . We describe why Hadoop...
http://hadoopblog.blogspot.com/2011/05/realtime-hadoop-usage-at-facebook-part_28.html
Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. It uses HDFS and HBase as core technologies for this solution. Since th...
http://hadoopblog.blogspot.com/2011/05/realtime-hadoop-usage-at-facebook-part.html
Many people have asked me to describe the best practices that we have adopted to run a multi PB data warehouse using Hadoop. Most of the details were described in a paper that we presented at SI...
http://hadoopblog.blogspot.com/2011/04/data-warehousing-at-facebook.html
Recently, I visited a few premier educational institutes in India, e.g. Indian Institute of Technology (IIT) at Delhi and Guwahati. Most of the undergraduate students at these two institutes are ...
http://hadoopblog.blogspot.com/2010/11/hadoop-research-topics.html
HANDLING FAILURES IN HADOOP I have been asked many many questions about the failure rates of machines in our Hadoop cluster. These questions vary from the innocuous how much time do you spend e...
http://hadoopblog.blogspot.com/2010/06/americas-most-wanted-metric-to-detect.html
IT IS NOT A SECRET ANYMORE! The Datawarehouse Hadoop cluster at Facebook has become the largest known Hadoop storage cluster in the world. Here are some of the details about this single HDFS cl...
http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds-largest-hadoop.html
INTRODUCTION HDFS is designed to be a highly scalable storage system and sites at Facebook and Yahoo have 20PB size file systems in production deployments . The HDFS NameNode is the master ...
http://hadoopblog.blogspot.com/2010/04/curse-of-singletons-vertical.html
OUR USE-CASE The Hadoop Distributed File System's (HDFS) NameNode is a single point of falure. This has been a major stumbling block in using HDFS for a 24x7 type of deployment. It has been a t...
http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
I have encountered plenty of questions about the single point of failure for the HDFS NameNode. The most common concern being that if the NameNode dies, then the whole cluster is unavailable. Thi...
http://hadoopblog.blogspot.com/2009/11/hdfs-high-availability.html
I was invited to present a talk about Hadoop File System Architecture at Microsoft Research at Seattle. This is a research group and is focussed on long-term research, so it is no surprise that t...
http://hadoopblog.blogspot.com/2009/10/hadoop-discussions-at-microsoft.html
I presented a set of slides that describes the Hadoop development at Facebook at the HadoopWorld conference in New York today. It was well received by more than 100 people. I have presented at ...
http://hadoopblog.blogspot.com/2009/10/i-presented-set-of-slides-that.html
Most Hadoop administrators set the default replication factor for their files to be three. The main assumption here is that if you keep three copies of the data, your data is safe. I have observe...
http://hadoopblog.blogspot.com/2009/09/hdfs-block-replica-placement-in-your.html
The Hadoop Distributed File System has been great in providing a cloud-type file system. It is robust (when administered correctly :-)) and highly scalable. However, one of the main drawbacks of ...
http://hadoopblog.blogspot.com/2009/08/hdfs-and-erasure-codes-hdfs-raid.html
My graduate work in the mid-nineties at the University of Wisconsin focussed on Condor . Condor has an amazing way to do process checkpointing and migrating processes from one machine to another ...
http://hadoopblog.blogspot.com/2009/07/hadoop-and-condor.html
Netflix is interested in using Hadooo/Hive to process click logs from the users of their website. Here is what I presented to them in a meeting that was well attended by about 50 engineers. Follo...
http://hadoopblog.blogspot.com/2009/06/hadoop-at-netflix.html
It is finally here: you can configure the open source log-aggregator, scribe , to log data directly into the Hadoop distributed file system. Many Web 2.0 companies have to deploy a bunch of cos...
http://hadoopblog.blogspot.com/2009/06/hdfs-scribe-integration.html
I went to attend the UC Berkeley RAD Lab Spring Retreat held at Santa Cruz. This Lab has about 30 Phd students and the quality of their work really impressed me a lot. Most of their work is base...
http://hadoopblog.blogspot.com/2009/05/report-from-my-visit-to-berkeley-rad.html
For quite a while, I have been thinking on blogging about Hadoop in general and Hadoop distributed file system (HDFS) in particular. Why, you may ask? Firstly, I have been contacted by students ...
http://hadoopblog.blogspot.com/2009/05/better-late-than-never.html