Retail is one of the most important business domains for data science and data mining applications because of its prolific data and numerous optimization problems such as optimal prices, discount...
https://highlyscalable.wordpress.com/2015/03/10/data-mining-problems-in-retail/
The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-str...
https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
Scalability is one of the main drivers of the NoSQL movement. As such, it encompasses distributed system coordination, failover, resource management and many other capabilities. It sounds like a ...
https://highlyscalable.wordpress.com/2012/09/18/distributed-algorithms-in-nosql-databases/
We recently worked with one of the Hadoop vendors on the continuous integration system for Hadoop core and other Hadoop-related projects like Pig, Hive, HBase. One of the challenges we faced was ...
Intersection of sorted lists is a cornerstone operation in many applications including search engines and databases because indexes are often implemented using different types of sorted structu...
https://highlyscalable.wordpress.com/2012/06/05/fast-intersection-sorted-lists-sse/
Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets o...
https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/
Some time ago I participated in design of a backend for one large online retailer company. From the business logic point of view, this was a pretty typical eCommerce service for hierarchical and ...
https://highlyscalable.wordpress.com/2012/04/02/architecture-of-high-performance-ecommerce-backend/
NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This aspect of NoSQL is well-studied both in practice and theory because ...
https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
Java was initially designed as a safe, managed environment. Nevertheless, Java HotSpot VM contains a “backdoor” that provides a number of low-level operations to manipulate memory and threads...
https://highlyscalable.wordpress.com/2012/02/02/direct-memory-access-in-java/
In this article I digested a number of MapReduce patterns and algorithms to give a systematic view of the different techniques that can be found on the web or scientific articles. Several pract...
https://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/