Category Archives: big-data

Big Data Ecosystem

Tools in the big data ecosystem. Summarized from Quora MapReduce is the Google paper that started it all (Map Reduce Paper). It’s a paradigm for writing distributed code inspired by some elements of functional programming. The Google internal implementation is … Continue reading

Posted in big-data | Leave a comment

Big-data introduction series – Apache Spark

What is Apache Spark? Apache Spark is an open-source data processing framework for Big Data Analytics. It is unified and parallel data processing framework, designed to cover a wide range of big data workloads such as Batch processing, Real-time processing, … Continue reading

Posted in big-data | Leave a comment

Data Mining

ML algorithms are an evolution over normal algorithms. They make your programs “smarter”, by allowing them to automatically learn from the data you provide. https://www.quora.com/How-do-you-explain-Machine-Learning-and-Data-Mining-to-non-Computer-Science-people The top 10 algorithms used in data mining . This paper presents the top 10 data … Continue reading

Posted in big-data | Leave a comment

Introduction to Hadoop and Map Reduce

Big Data – Introduction to Hadoop . Hadoop is a Map Reduce framework processing large datasets in parallel, on clusters of commodity hardware. This is cheaper, as it’s a open source solution that can run on commodity hardware . It’s … Continue reading

Posted in big-data | Leave a comment