Skip to content
Snippets Groups Projects
  1. Jul 23, 2016
    • Wenchen Fan's avatar
      [SPARK-16690][TEST] rename SQLTestUtils.withTempTable to withTempView · 86c27520
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      after https://github.com/apache/spark/pull/12945, we renamed the `registerTempTable` to `createTempView`, as we do create a view actually. This PR renames `SQLTestUtils.withTempTable` to reflect this change.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #14318 from cloud-fan/minor4.
      86c27520
    • WeichenXu's avatar
      [SPARK-16662][PYSPARK][SQL] fix HiveContext warning bug · ab6e4aea
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      move the `HiveContext` deprecate warning printing statement into `HiveContext` constructor.
      so that this warning will appear only when we use `HiveContext`
      otherwise this warning will always appear if we reference the pyspark.ml.context code file.
      
      ## How was this patch tested?
      
      Manual.
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14301 from WeichenXu123/hiveContext_python_warning_update.
      ab6e4aea
    • WeichenXu's avatar
      [SPARK-16561][MLLIB] fix multivarOnlineSummary min/max bug · 25db5167
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      renaming var names to make code more clear:
      nnz => weightSum
      weightSum => totalWeightSum
      
      and add a new member vector `nnz` (not `nnz` in previous code, which renamed to `weightSum`) to count each dimensions non-zero value number.
      using `nnz` which I added above instead of `weightSum` when calculating min/max so that it fix several numerical error in some extreme case.
      
      ## How was this patch tested?
      
      A new testcase added.
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14216 from WeichenXu123/multivarOnlineSummary.
      25db5167
  2. Jul 22, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-16622][SQL] Fix NullPointerException when the returned value of the... · e10b8741
      Liang-Chi Hsieh authored
      [SPARK-16622][SQL] Fix NullPointerException when the returned value of the called method in Invoke is null
      
      ## What changes were proposed in this pull request?
      
      Currently we don't check the value returned by called method in `Invoke`. When the returned value is null and is assigned to a variable of primitive type, `NullPointerException` will be thrown.
      
      ## How was this patch tested?
      
      Jenkins tests.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      
      Closes #14259 from viirya/agg-empty-ds.
      e10b8741
    • Dongjoon Hyun's avatar
      [SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description more... · 47f5b88d
      Dongjoon Hyun authored
      [SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description more consistent with Scala API
      
      ## What changes were proposed in this pull request?
      
      `withColumnRenamed` and `drop` is a no-op if the given column name does not exists. Python documentation also describe that, but this PR adds more explicit line consistently with Scala to reduce the ambiguity.
      
      ## How was this patch tested?
      
      It's about docs.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #14288 from dongjoon-hyun/SPARK-16651.
      47f5b88d
    • Tom Graves's avatar
      [SPARK-16650] Improve documentation of spark.task.maxFailures · 6c56fff1
      Tom Graves authored
      Clarify documentation on spark.task.maxFailures
      
      No tests run as its documentation
      
      Author: Tom Graves <tgraves@yahoo-inc.com>
      
      Closes #14287 from tgravescs/SPARK-16650.
      6c56fff1
    • WeichenXu's avatar
      [GIT] add pydev & Rstudio project file to gitignore list · b4e16bd5
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      Add Pydev & Rstudio project file to gitignore list, I think the two IEDs are used by many developers.
      so that won't need personal gitignore_global config.
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14293 from WeichenXu123/update_gitignore.
      b4e16bd5
    • Ahmed Mahran's avatar
      [SPARK-16487][STREAMING] Fix some batches might not get marked as fully processed in JobGenerator · 2c72a443
      Ahmed Mahran authored
      ## What changes were proposed in this pull request?
      
      In `JobGenerator`, the code reads like that some batches might not get marked as fully processed. In the following flowchart, the batch should get marked fully processed before endpoint C however it is not. Currently, this does not actually cause an issue, as the condition `(time - zeroTime) is multiple of checkpoint duration?` always evaluates to `true` as the `checkpoint duration` is always set to be equal to the `batch duration`.
      
      ![Flowchart](https://s31.postimg.org/udy9lti2j/spark_streaming_job_generator.png)
      
      This PR fixes this issue so as to improve code readability and to avoid any potential issue in case there is any future change making checkpoint duration to be set different from batch duration.
      
      Author: Ahmed Mahran <ahmed.mahran@mashin.io>
      
      Closes #14145 from ahmed-mahran/b-mark-batch-fully-processed.
      2c72a443
    • Jacek Laskowski's avatar
      [SPARK-16287][HOTFIX][BUILD][SQL] Fix annotation argument needs to be a constant · e1bd70f4
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Build fix for [SPARK-16287][SQL] Implement str_to_map SQL function that has introduced this compilation error:
      
      ```
      /Users/jacek/dev/oss/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala:402: error: annotation argument needs to be a constant; found: "_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text ".+("into key/value pairs using delimiters. ").+("Default delimiters are \',\' for pairDelim and \':\' for keyValueDelim.")
          "into key/value pairs using delimiters. " +
                                                    ^
      ```
      
      ## How was this patch tested?
      
      Local build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #14315 from jaceklaskowski/build-fix-complexTypeCreator.
      e1bd70f4
    • gatorsmile's avatar
      [SPARK-16556][SPARK-16559][SQL] Fix Two Bugs in Bucket Specification · 94f14b52
      gatorsmile authored
      ### What changes were proposed in this pull request?
      
      **Issue 1: Silent Ignorance of Bucket Specification When Creating Table Using Schema Inference**
      
      When creating a data source table without explicit specification of schema or SELECT clause, we silently ignore the bucket specification (CLUSTERED BY... SORTED BY...) in [the code](https://github.com/apache/spark/blob/ce3b98bae28af72299722f56e4e4ef831f471ec0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L339-L354).
      
      For example,
      ```SQL
      CREATE TABLE jsonTable
      USING org.apache.spark.sql.json
      OPTIONS (
        path '${tempDir.getCanonicalPath}'
      )
      CLUSTERED BY (inexistentColumnA) SORTED BY (inexistentColumnB) INTO 2 BUCKETS
      ```
      
      This PR captures it and issues an error message.
      
      **Issue 2: Got a run-time `java.lang.ArithmeticException` when num of buckets is set to zero.**
      
      For example,
      ```SQL
      CREATE TABLE t USING PARQUET
      OPTIONS (PATH '${path.toString}')
      CLUSTERED BY (a) SORTED BY (b) INTO 0 BUCKETS
      AS SELECT 1 AS a, 2 AS b
      ```
      The exception we got is
      ```
      ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 1.0 (TID 2)
      java.lang.ArithmeticException: / by zero
      ```
      
      This PR captures the misuse and issues an appropriate error message.
      
      ### How was this patch tested?
      Added a test case in DDLSuite
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #14210 from gatorsmile/createTableWithoutSchema.
      94f14b52
  3. Jul 21, 2016
  4. Jul 20, 2016
    • Wenchen Fan's avatar
      [SPARK-16644][SQL] Aggregate should not propagate constraints containing aggregate expressions · cfa5ae84
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      aggregate expressions can only be executed inside `Aggregate`, if we propagate it up with constraints, the parent operator can not execute it and will fail at runtime.
      
      ## How was this patch tested?
      
      new test in SQLQuerySuite
      
      Author: Wenchen Fan <wenchen@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #14281 from cloud-fan/bug.
      cfa5ae84
    • Marcelo Vanzin's avatar
      [SPARK-16272][CORE] Allow config values to reference conf, env, system props. · 75a06aa2
      Marcelo Vanzin authored
      This allows configuration to be more flexible, for example, when the cluster does
      not have a homogeneous configuration (e.g. packages are installed on different
      paths in different nodes). By allowing one to reference the environment from
      the conf, it becomes possible to work around those in certain cases.
      
      As part of the implementation, ConfigEntry now keeps track of all "known" configs
      (i.e. those created through the use of ConfigBuilder), since that list is used
      by the resolution code. This duplicates some code in SQLConf, which could potentially
      be merged with this now. It will also make it simpler to implement some missing
      features such as filtering which configs show up in the UI or in event logs - which
      are not part of this change.
      
      Another change is in the way ConfigEntry reads config data; it now takes a string
      map and a function that reads env variables, so that it can be called both from
      SparkConf and SQLConf. This makes it so both places follow the same read path,
      instead of having to replicate certain logic in SQLConf. There are still a
      couple of methods in SQLConf that peek into fields of ConfigEntry directly,
      though.
      
      Tested via unit tests, and by using the new variable expansion functionality
      in a shell session with a custom spark.sql.hive.metastore.jars value.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #14022 from vanzin/SPARK-16272.
      75a06aa2
    • Cheng Lian's avatar
      [SPARK-16344][SQL] Decoding Parquet array of struct with a single field named "element" · e651900b
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      Due to backward-compatibility reasons, the following Parquet schema is ambiguous:
      
      ```
      optional group f (LIST) {
        repeated group list {
          optional group element {
            optional int32 element;
          }
        }
      }
      ```
      
      According to the parquet-format spec, when interpreted as a standard 3-level layout, this type is equivalent to the following SQL type:
      
      ```
      ARRAY<STRUCT<element: INT>>
      ```
      
      However, when interpreted as a legacy 2-level layout, it's equivalent to
      
      ```
      ARRAY<STRUCT<element: STRUCT<element: INT>>>
      ```
      
      Historically, to disambiguate these cases, we employed two methods:
      
      - `ParquetSchemaConverter.isElementType()`
      
        Used to disambiguate the above cases while converting Parquet types to Spark types.
      
      - `ParquetRowConverter.isElementType()`
      
        Used to disambiguate the above cases while instantiating row converters that convert Parquet records to Spark rows.
      
      Unfortunately, these two methods make different decision about the above problematic Parquet type, and caused SPARK-16344.
      
      `ParquetRowConverter.isElementType()` is necessary for Spark 1.4 and earlier versions because Parquet requested schemata are directly converted from Spark schemata in these versions. The converted Parquet schemata may be incompatible with actual schemata of the underlying physical files when the files are written by a system/library that uses a schema conversion scheme that is different from Spark when writing Parquet LIST and MAP fields.
      
      In Spark 1.5, Parquet requested schemata are always properly tailored from schemata of physical files to be read. Thus `ParquetRowConverter.isElementType()` is no longer necessary. This PR replaces this method with a simply yet accurate scheme: whenever an ambiguous Parquet type is hit, convert the type in question back to a Spark type using `ParquetSchemaConverter` and check whether it matches the corresponding Spark type.
      
      ## How was this patch tested?
      
      New test cases added in `ParquetHiveCompatibilitySuite` and `ParquetQuerySuite`.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #14014 from liancheng/spark-16344-for-master-and-2.0.
      e651900b
    • Marcelo Vanzin's avatar
      [SPARK-16634][SQL] Workaround JVM bug by moving some code out of ctor. · e3cd5b30
      Marcelo Vanzin authored
      Some 1.7 JVMs have a bug that is triggered by certain Scala-generated
      bytecode. GenericArrayData suffers from that and fails to load in certain
      JVMs.
      
      Moving the offending code out of the constructor and into a helper method
      avoids the issue.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #14271 from vanzin/SPARK-16634.
      e3cd5b30
    • Kishor Patil's avatar
      [SPARK-15951] Change Executors Page to use datatables to support sorting columns and searching · b9bab4dc
      Kishor Patil authored
      1. Create the executorspage-template.html for displaying application information in datables.
      2. Added REST API endpoint "allexecutors" to be able to see all executors created for particular job.
      3. The executorspage.js uses jQuery to access the data from /api/v1/applications/appid/allexecutors REST API, and use DataTable to display executors for the application. It also, generates summary of dead/live and total executors created during life of the application.
      4. Similar changes applicable to Executors Page on history server for a given application.
      
      Snapshots for how it looks like now:
      <img width="938" alt="screen shot 2016-06-14 at 2 45 44 pm" src="https://cloud.githubusercontent.com/assets/6090397/16060092/ad1de03a-324b-11e6-8469-9eaa3f2548b5.png">
      
      New Executors Page screenshot looks like this:
      <img width="1436" alt="screen shot 2016-06-15 at 10 12 01 am" src="https://cloud.githubusercontent.com/assets/6090397/16085514/ee7004f0-32e1-11e6-9340-33d91e407f2b.png">
      
      Author: Kishor Patil <kpatil@yahoo-inc.com>
      
      Closes #13670 from kishorvpatil/execTemplates.
      b9bab4dc
    • Sean Owen's avatar
      [SPARK-16613][CORE] RDD.pipe returns values for empty partitions · 4b079dc3
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Document RDD.pipe semantics; don't execute process for empty input partitions.
      
      Note this includes the fix in https://github.com/apache/spark/pull/14256 because it's necessary to even test this. One or the other will merge the fix.
      
      ## How was this patch tested?
      
      Jenkins tests including new test.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14260 from srowen/SPARK-16613.
      4b079dc3
    • Weiqing Yang's avatar
      [SPARK-15923][YARN] Spark Application rest api returns 'no such app: … · 95abbe53
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      Update monitoring.md.
      
      …<appId>'
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14163 from Sherry302/master.
      95abbe53
    • Anthony Truchet's avatar
      [SPARK-16440][MLLIB] Destroy broadcasted variables even on driver · 0dc79ffd
      Anthony Truchet authored
      ## What changes were proposed in this pull request?
      Forgotten broadcasted variables were persisted into a previous #PR 14153). This PR turns those `unpersist()` into `destroy()` so that memory is freed even on the driver.
      
      ## How was this patch tested?
      Unit Tests in Word2VecSuite were run locally.
      
      This contribution is done on behalf of Criteo, according to the
      terms of the Apache license 2.0.
      
      Author: Anthony Truchet <a.truchet@criteo.com>
      
      Closes #14268 from AnthonyTruchet/SPARK-16440.
      0dc79ffd
    • Marcelo Vanzin's avatar
      [SPARK-16632][SQL] Respect Hive schema when merging parquet schema. · 75146be6
      Marcelo Vanzin authored
      When Hive (or at least certain versions of Hive) creates parquet files
      containing tinyint or smallint columns, it stores them as int32, but
      doesn't annotate the parquet field as containing the corresponding
      int8 / int16 data. When Spark reads those files using the vectorized
      reader, it follows the parquet schema for these fields, but when
      actually reading the data it tries to use the type fetched from
      the metastore, and then fails because data has been loaded into the
      wrong fields in OnHeapColumnVector.
      
      So instead of blindly trusting the parquet schema, check whether the
      Catalyst-provided schema disagrees with it, and adjust the types so
      that the necessary metadata is present when loading the data into
      the ColumnVector instance.
      
      Tested with unit tests and with tests that create byte / short columns
      in Hive and try to read them from Spark.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #14272 from vanzin/SPARK-16632.
      75146be6
  5. Jul 19, 2016
    • Shivaram Venkataraman's avatar
      [SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to SparkSubmitSuite · fc232636
      Shivaram Venkataraman authored
      ## What changes were proposed in this pull request?
      
      This change moves the include jar test from R to SparkSubmitSuite and uses a dynamically compiled jar. This helps us remove the binary jar from the R package and solves both the CRAN warnings and the lack of source being available for this jar.
      
      ## How was this patch tested?
      SparkR unit tests, SparkSubmitSuite, check-cran.sh
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #14243 from shivaram/sparkr-jar-move.
      fc232636
    • WeichenXu's avatar
      [SPARK-16568][SQL][DOCUMENTATION] update sql programming guide refreshTable API in python code · 9674af6f
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      update `refreshTable` API in python code of the sql-programming-guide.
      
      This API is added in SPARK-15820
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14220 from WeichenXu123/update_sql_doc_catalog.
      9674af6f
    • Andrew Duffy's avatar
      [SPARK-14702] Make environment of SparkLauncher launched process more configurable · 004e29cb
      Andrew Duffy authored
      ## What changes were proposed in this pull request?
      
      Adds a few public methods to `SparkLauncher` to allow configuring some extra features of the `ProcessBuilder`, including the working directory, output and error stream redirection.
      
      ## How was this patch tested?
      
      Unit testing + simple Spark driver programs
      
      Author: Andrew Duffy <root@aduffy.org>
      
      Closes #14201 from andreweduffy/feature/launcher.
      004e29cb
    • Yin Huai's avatar
      [SPARK-15705][SQL] Change the default value of spark.sql.hive.convertMetastoreOrc to false. · 2ae7b88a
      Yin Huai authored
      ## What changes were proposed in this pull request?
      In 2.0, we add a new logic to convert HiveTableScan on ORC tables to Spark's native code path. However, during this conversion, we drop the original metastore schema (https://issues.apache.org/jira/browse/SPARK-15705). Because of this regression, I am changing the default value of `spark.sql.hive.convertMetastoreOrc` to false.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #14267 from yhuai/SPARK-15705-changeDefaultValue.
      2ae7b88a
    • Dongjoon Hyun's avatar
      [SPARK-16602][SQL] `Nvl` function should support numeric-string cases · 162d04a3
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      `Nvl` function should support numeric-straing cases like Hive/Spark1.6. Currently, `Nvl` finds the tightest common types among numeric types. This PR extends that to consider `String` type, too.
      
      ```scala
      - TypeCoercion.findTightestCommonTypeOfTwo(left.dataType, right.dataType).map { dtype =>
      + TypeCoercion.findTightestCommonTypeToString(left.dataType, right.dataType).map { dtype =>
      ```
      
      **Before**
      ```scala
      scala> sql("select nvl('0', 1)").collect()
      org.apache.spark.sql.AnalysisException: cannot resolve `nvl("0", 1)` due to data type mismatch:
      input to function coalesce should all be the same type, but it's [string, int]; line 1 pos 7
      ```
      
      **After**
      ```scala
      scala> sql("select nvl('0', 1)").collect()
      res0: Array[org.apache.spark.sql.Row] = Array([0])
      ```
      
      ## How was this patch tested?
      
      Pass the Jenkins tests.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #14251 from dongjoon-hyun/SPARK-16602.
      162d04a3
    • Liwei Lin's avatar
      [SPARK-16620][CORE] Add back the tokenization process in `RDD.pipe(command: String)` · 0bd76e87
      Liwei Lin authored
      ## What changes were proposed in this pull request?
      
      Currently `RDD.pipe(command: String)`:
      - works only when the command is specified without any options, such as `RDD.pipe("wc")`
      - does NOT work when the command is specified with some options, such as `RDD.pipe("wc -l")`
      
      This is a regression from Spark 1.6.
      
      This patch adds back the tokenization process in `RDD.pipe(command: String)` to fix this regression.
      
      ## How was this patch tested?
      Added a test which:
      - would pass in `1.6`
      - _[prior to this patch]_ would fail in `master`
      - _[after this patch]_ would pass in `master`
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #14256 from lw-lin/rdd-pipe.
      0bd76e87
    • Yanbo Liang's avatar
      [SPARK-16494][ML] Upgrade breeze version to 0.12 · 67089149
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      breeze 0.12 has been released for more than half a year, and it brings lots of new features, performance improvement and bug fixes.
      One of the biggest features is ```LBFGS-B``` which is an implementation of ```LBFGS``` with box constraints and much faster for some special case.
      We would like to implement Huber loss function for ```LinearRegression``` ([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)) and it requires ```LBFGS-B``` as the optimization solver. So we should bump up the dependent breeze version to 0.12.
      For more features, improvements and bug fixes of breeze 0.12, you can refer the following link:
      https://groups.google.com/forum/#!topic/scala-breeze/nEeRi_DcY5c
      
      ## How was this patch tested?
      No new tests, should pass the existing ones.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14150 from yanboliang/spark-16494.
      67089149
    • Michał Wesołowski's avatar
      [SPARK-16478] graphX (added graph caching in strongly connected components) · 5d92326b
      Michał Wesołowski authored
      ## What changes were proposed in this pull request?
      
      I added caching in every iteration for sccGraph that is returned in strongly connected components. Without this cache strongly connected components returned graph that needed to be computed from scratch when some intermediary caches didn't existed anymore.
      
      ## How was this patch tested?
      I tested it by running code similar to the one  [on databrics](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4889410027417133/3634650767364730/3117184429335832/latest.html). Basically I generated large graph  and computed strongly connected components with changed code, than simply run count on vertices and edges. Count after this update takes few seconds instead 20 minutes.
      
      # statement
      contribution is my original work and I license the work to the project under the project's open source license.
      
      Author: Michał Wesołowski <michal.wesolowski@bzwbk.pl>
      
      Closes #14137 from wesolowskim/SPARK-16478.
      5d92326b
    • Sean Owen's avatar
      [SPARK-16395][STREAMING] Fail if too many CheckpointWriteHandlers are queued... · 6c4b9f4b
      Sean Owen authored
      [SPARK-16395][STREAMING] Fail if too many CheckpointWriteHandlers are queued up in the fixed thread pool
      
      ## What changes were proposed in this pull request?
      
      Begin failing if checkpoint writes will likely keep up with storage's ability to write them, to fail fast instead of slowly filling memory
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14152 from srowen/SPARK-16395.
      6c4b9f4b
    • WeichenXu's avatar
      [SPARK-16600][MLLIB] fix some latex formula syntax error · 8310c074
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      `\partial\x` ==> `\partial x`
      `har{x_i}` ==> `hat{x_i}`
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14246 from WeichenXu123/fix_formular_err.
      8310c074
    • Ahmed Mahran's avatar
      [MINOR][SQL][STREAMING][DOCS] Fix minor typos, punctuations and grammar · 6caa2205
      Ahmed Mahran authored
      ## What changes were proposed in this pull request?
      
      Minor fixes correcting some typos, punctuations, grammar.
      Adding more anchors for easy navigation.
      Fixing minor issues with code snippets.
      
      ## How was this patch tested?
      
      `jekyll serve`
      
      Author: Ahmed Mahran <ahmed.mahran@mashin.io>
      
      Closes #14234 from ahmed-mahran/b-struct-streaming-docs.
      6caa2205
    • Xin Ren's avatar
      [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition... · 21a6dd2a
      Xin Ren authored
      [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition and inherited from the parent
      
      https://issues.apache.org/jira/browse/SPARK-16535
      
      ## What changes were proposed in this pull request?
      
      When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot
      ```
      Definition of groupId is redundant, because it's inherited from the parent
      ```
      ![screen shot 2016-07-13 at 3 13 11 pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png)
      
      I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok.
      ```
      <groupId>org.apache.spark</groupId>
      ```
      As I just find now `<maven.version>3.3.9</maven.version>` is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1).
      
      ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762
      
      ## How was this patch tested?
      
      I've tested by re-building the project, and build succeeded.
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #14189 from keypointt/SPARK-16535.
      21a6dd2a
Loading