Skip to content
Snippets Groups Projects
Commit dd72b10a authored by Christiam Camacho's avatar Christiam Camacho Committed by Sean Owen
Browse files

Fix Java SimpleApp spark application

## What changes were proposed in this pull request?

Add missing import and missing parentheses to invoke `SparkSession::text()`.

## How was this patch tested?

Built and the code for this application, ran jekyll locally per docs/README.md.

Author: Christiam Camacho <camacho@ncbi.nlm.nih.gov>

Closes #18795 from christiam/master.
parent bb7afb4e
No related branches found
No related tags found
No related merge requests found
......@@ -297,12 +297,13 @@ We'll create a very simple Spark application, `SimpleApp.java`:
{% highlight java %}
/* SimpleApp.java */
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
public class SimpleApp {
public static void main(String[] args) {
String logFile = "YOUR_SPARK_HOME/README.md"; // Should be some file on your system
SparkSession spark = SparkSession.builder().appName("Simple Application").getOrCreate();
Dataset<String> logData = spark.read.textFile(logFile).cache();
Dataset<String> logData = spark.read().textFile(logFile).cache();
long numAs = logData.filter(s -> s.contains("a")).count();
long numBs = logData.filter(s -> s.contains("b")).count();
......
......@@ -1041,8 +1041,8 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta
<div data-lang="java" markdown="1">
{% highlight java %}
Dataset<Row> staticDf = spark.read. ...;
Dataset<Row> streamingDf = spark.readStream. ...;
Dataset<Row> staticDf = spark.read(). ...;
Dataset<Row> streamingDf = spark.readStream(). ...;
streamingDf.join(staticDf, "type"); // inner equi-join with a static DF
streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF
{% endhighlight %}
......@@ -1087,7 +1087,7 @@ streamingDf
<div data-lang="java" markdown="1">
{% highlight java %}
Dataset<Row> streamingDf = spark.readStream. ...; // columns: guid, eventTime, ...
Dataset<Row> streamingDf = spark.readStream(). ...; // columns: guid, eventTime, ...
// Without watermark using guid column
streamingDf.dropDuplicates("guid");
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment