Add spark related dependencies and packaging plugins (sixth bullet)

Hits: 0

Table of contents

Add spark related dependencies and packaging plugins

Step 1 Open the following dependencies added in pom.xmlà, click enable auto-import in the lower right corner to automatically download

Step 2 Right-click the Scala file under main to create a package and name it cn.itcast

Step 3 Create WordCount.scala file for word frequency statistics alt+Enter: select import package

Step 3 Create WordCount.scala file for word frequency statistics alt+Enter: select import package

Note: The contents of words.txt under the word folder need to be created in the D drive in advance as follows: (It is best not to use the Chinese path)

To solve the above problems, the operation results are as follows


Add [spark] related dependencies and packaging plugins

Step 1 Open pom.xml à the following dependencies added, click enable auto-import in the lower right corner to automatically download

<!--Set the dependency version number-->

   <properties>

    <scala.version>2.11.8</scala.version>

    <hadoop.version>2.7.1</hadoop.version>

    <spark.version>2.0.0</spark.version>

   </properties>

<dependencies>

    <!--Scala-->

    <dependency>

    <groupId>org.scala-lang</groupId>

    <artifactId>scala-library</artifactId>

    <version>${scala.version}</version>

    </dependency>

<!--Spark-->

    <dependency>

    <groupId>org.apache.spark</groupId>

    <artifactId>spark-core_2.11</artifactId>

    <version>${spark.version}</version>

    </dependency>

    <!--Hadoop-->

    <dependency>

    <groupId>org.apache.hadoop</groupId>

    <artifactId>hadoop-client</artifactId>

    <version>${hadoop.version}</version>

    </dependency>

</dependencies>

Select autoload after adding dependencies

Step 2 Right-click the Scala file under main to create a package and name it cn.itcast

Step 3 Create WordCount.scala file for word frequency statistics alt+Enter: select import package

Problem: No scala file creation option

Solution:

Step 3 Create WordCount.scala file for word frequency statistics alt+Enter: select import package

Note: The contents of words.txt under the word folder need to be created in the D drive in advance as follows: (It is best not to use the Chinese path)

package cn.itcast

# import package

import org.apache.spark.rdd.RDD   
 import org.apache.spark.{SparkConf, SparkContext}
object WordCount {
  def main(args: Array [ String ]): Unit = {
     //1. Create SparkConf object, set appName and Master address 
    val sparkconf = new SparkConf().setAppName( "WordCount" ).setMaster( "local[2]" )
     //2. Create a SparkContext object, which is the source of all task calculations, it will create DAGScheduler and TaskScheduler 
    val sparkContext = new SparkContext(sparkconf)
     //Set the log level 
    //sparkContext.setLogLevel("WARN") 
    //3. To read the data file, RDD can be simply understood as a collection, and the elements stored in the collection are of type String 
    val data : RDD[ String ] = sparkContext.textFile( "D:\\word\\words.txt" )
     // 4. Divide each line and get all the words 
    val words :RDD[String ] = data.flatMap(_.split( " " ))
     //5. Each word is marked as 1 and converted to (word, 1) 
    val wordAndOne :RDD[( String , Int)] = words.map( x => (x, 1 ))
     //6. Summarize the same words, the first underscore indicates the accumulated data, the second underscore indicates the new data 
    val result: RDD[( String , Int)] = wordAndOne.reduceByKey(_+_)
     / /7. Collect print result data 
    val finalResult: Array [( String , Int)] = result.collect()
    println(finalResult.toBuffer)
    //8. Close the sparkContext object
    sparkContext.stop()
  }
}

You can see the calculated word frequency itcast (1) [Hadoop] [Hadoop] (1) spark (1) hello (3)

Possible problems:

If you encounter an error in the running result or the result does not come out, you have not put Scala-sdk-2.11.8

Solution:

If not, you need to manually add:

To solve the above problems, the operation results are as follows

You can see the calculated word frequency itcast (1) Hadoop (1) spark (1) hello (3)

You may also like...

Leave a Reply

Your email address will not be published.