Skip to content

Latest commit

 

History

History
40 lines (16 loc) · 988 Bytes

Big Data.md

File metadata and controls

40 lines (16 loc) · 988 Bytes

Big Data

s

Big Data involves the processing and analysis of large and complex data sets using tools like Hadoop and Spark. As a software engineer, you'll need to be familiar with concepts like data warehousing, data mining, and data visualization, as well as tools like SQL and Tableau. Here's an example of a simple Spark program written in Scala that counts the number of occurrences of each word in a text file:

import org.apache.spark.SparkContext

import org.apache.spark.SparkConf

object WordCount {

def main(args: Array[String]) {

val conf = new SparkConf().setAppName("Word Count")

val sc = new SparkContext(conf)



val textFile = sc.textFile("hdfs://input.txt")

val wordCounts = textFile.flatMap(line => line.split(" "))

  .map(word => (word, 1))

  .reduceByKey(_ + _)



wordCounts.saveAsTextFile("hdfs://output")

}

}