Skip to content
Snippets Groups Projects
  1. May 17, 2015
    • Tathagata Das's avatar
      [SPARK-6514] [SPARK-5960] [SPARK-6656] [SPARK-7679] [STREAMING] [KINESIS]... · ca4257ae
      Tathagata Das authored
      [SPARK-6514] [SPARK-5960] [SPARK-6656] [SPARK-7679] [STREAMING] [KINESIS] Updates to the Kinesis API
      
      SPARK-6514 - Use correct region
      SPARK-5960 - Allow AWS Credentials to be directly passed
      SPARK-6656 - Specify kinesis application name explicitly
      SPARK-7679 - Upgrade to latest KCL and AWS SDK.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6147 from tdas/kinesis-api-update and squashes the following commits:
      
      f23ea77 [Tathagata Das] Updated versions and updated APIs
      373b201 [Tathagata Das] Updated Kinesis API
      ca4257ae
    • Michael Armbrust's avatar
      [SPARK-7491] [SQL] Allow configuration of classloader isolation for hive · 2ca60ace
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6167 from marmbrus/configureIsolation and squashes the following commits:
      
      6147cbe [Michael Armbrust] filter other conf
      22cc3bc7 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into configureIsolation
      07476ee [Michael Armbrust] filter empty prefixes
      dfdf19c [Michael Armbrust] [SPARK-6906][SQL] Allow configuration of classloader isolation for hive
      2ca60ace
    • Josh Rosen's avatar
      [SPARK-7686] [SQL] DescribeCommand is assigned wrong output attributes in SparkStrategies · 56456287
      Josh Rosen authored
      In `SparkStrategies`, `RunnableDescribeCommand` is called with the output attributes of the table being described rather than the attributes for the `describe` command's output.  I discovered this issue because it caused type conversion errors in some UnsafeRow conversion code that I'm writing.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6217 from JoshRosen/SPARK-7686 and squashes the following commits:
      
      953a344 [Josh Rosen] Fix SPARK-7686 with a simple change in SparkStrategies.
      a4eec9f [Josh Rosen] Add failing regression test for SPARK-7686
      56456287
    • Josh Rosen's avatar
      [SPARK-7660] Wrap SnappyOutputStream to work around snappy-java bug · f2cc6b5b
      Josh Rosen authored
      This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for https://github.com/xerial/snappy-java/issues/107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:
      
      8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660
      f2cc6b5b
    • Steve Loughran's avatar
      [SPARK-7669] Builds against Hadoop 2.6+ get inconsistent curator depend… · 50217667
      Steve Loughran authored
      This adds a new profile, `hadoop-2.6`, copying over the hadoop-2.4 properties, updating ZK to 3.4.6 and making the curator version a configurable option. That keeps the curator-recipes JAR in sync with that used in hadoop.
      
      There's one more option to consider: making the full curator-client version explicit with its own dependency version. This will pin down the version from hadoop and hive imports
      
      Author: Steve Loughran <stevel@hortonworks.com>
      
      Closes #6191 from steveloughran/stevel/SPARK-7669-hadoop-2.6 and squashes the following commits:
      
      e3e281a [Steve Loughran] SPARK-7669 declare the version of curator-client and curator-framework JARs
      2901ea9 [Steve Loughran] SPARK-7669 Builds against Hadoop 2.6+ get inconsistent curator dependencies
      50217667
    • Liang-Chi Hsieh's avatar
      [SPARK-7447] [SQL] Don't re-merge Parquet schema when the relation is deserialized · 33990557
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7447
      
      `MetadataCache` in `ParquetRelation2` is annotated as `transient`. When `ParquetRelation2` is deserialized, we ask `MetadataCache` to refresh and perform schema merging again. It is time-consuming especially for very many parquet files.
      
      With the new `FSBasedParquetRelation`, although `MetadataCache` is not `transient` now, `MetadataCache.refresh()` still performs schema merging again when the relation is deserialized.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6012 from viirya/without_remerge_schema and squashes the following commits:
      
      2663957 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into without_remerge_schema
      6ac7d93 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into without_remerge_schema
      b0fc09b [Liang-Chi Hsieh] Don't generate and merge parquetSchema multiple times.
      33990557
    • scwf's avatar
      [SQL] [MINOR] Skip unresolved expression for InConversion · edf09ea1
      scwf authored
      Author: scwf <wangfei1@huawei.com>
      
      Closes #6145 from scwf/InConversion and squashes the following commits:
      
      5c8ac6b [scwf] minir fix for InConversion
      edf09ea1
    • Shivaram Venkataraman's avatar
      [MINOR] Add 1.3, 1.3.1 to master branch EC2 scripts · 1a7b9ce8
      Shivaram Venkataraman authored
      cc pwendell
      
      P.S: I can't believe this was outdated all along ?
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6215 from shivaram/update-ec2-map and squashes the following commits:
      
      ae3937a [Shivaram Venkataraman] Add 1.3, 1.3.1 to master branch EC2 scripts
      1a7b9ce8
    • Cheng Lian's avatar
      [MINOR] [SQL] Removes an unreachable case clause · ba4f8ca0
      Cheng Lian authored
      This case clause is already covered by the one above, and generates a compilation warning.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6214 from liancheng/remove-unreachable-code and squashes the following commits:
      
      c38ca7c [Cheng Lian] Removes an unreachable case clause
      ba4f8ca0
    • Reynold Xin's avatar
      [SPARK-7654][SQL] Move JDBC into DataFrame's reader/writer interface. · 517eb37a
      Reynold Xin authored
      Also moved all the deprecated functions into one place for SQLContext and DataFrame, and updated tests to use the new API.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6210 from rxin/df-writer-reader-jdbc and squashes the following commits:
      
      7465c2c [Reynold Xin] Fixed unit test.
      118e609 [Reynold Xin] Updated tests.
      3441b57 [Reynold Xin] Updated javadoc.
      13cdd1c [Reynold Xin] [SPARK-7654][SQL] Move JDBC into DataFrame's reader/writer interface.
      517eb37a
  2. May 16, 2015
    • zsxwing's avatar
      [SPARK-7655][Core] Deserializing value should not hold the TaskSchedulerImpl lock · 3b6ef2c5
      zsxwing authored
      We should not call `DirectTaskResult.value` when holding the `TaskSchedulerImpl` lock. It may cost dozens of seconds to deserialize a large object.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6195 from zsxwing/SPARK-7655 and squashes the following commits:
      
      21f502e [zsxwing] Add more comments
      e25fa88 [zsxwing] Add comments
      15010b5 [zsxwing] Deserialize value should not hold the TaskSchedulerImpl lock
      3b6ef2c5
    • Reynold Xin's avatar
      [SPARK-7654][MLlib] Migrate MLlib to the DataFrame reader/writer API. · 161d0b4a
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6211 from rxin/mllib-reader and squashes the following commits:
      
      79a2cb9 [Reynold Xin] [SPARK-7654][MLlib] Migrate MLlib to the DataFrame reader/writer API.
      161d0b4a
    • Matthew Brandyberry's avatar
      [BUILD] update jblas dependency version to 1.2.4 · 1b4e710e
      Matthew Brandyberry authored
      jblas 1.2.4 includes native library support for PPC64LE.
      
      Author: Matthew Brandyberry <mbrandy@us.ibm.com>
      
      Closes #6199 from mtbrandy/jblas-1.2.4 and squashes the following commits:
      
      9df9301 [Matthew Brandyberry] [BUILD] update jblas dependency version to 1.2.4
      1b4e710e
    • Cheng Lian's avatar
      [HOTFIX] [SQL] Fixes DataFrameWriter.mode(String) · ce639129
      Cheng Lian authored
      We forgot an assignment there.
      
      /cc rxin
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6212 from liancheng/fix-df-writer and squashes the following commits:
      
      711fbb0 [Cheng Lian] Adds a test case
      3b72d78 [Cheng Lian] Fixes DataFrameWriter.mode(String)
      ce639129
    • zsxwing's avatar
      [SPARK-7655][Core][SQL] Remove... · 47e7ffe3
      zsxwing authored
      [SPARK-7655][Core][SQL] Remove 'scala.concurrent.ExecutionContext.Implicits.global' in 'ask' and 'BroadcastHashJoin'
      
      Because both `AkkaRpcEndpointRef.ask` and `BroadcastHashJoin` uses `scala.concurrent.ExecutionContext.Implicits.global`. However, because the tasks in `BroadcastHashJoin` are usually long-running tasks, which will occupy all threads in `global`. Then `ask` cannot get a chance to process the replies.
      
      For `ask`, actually the tasks are very simple, so we can use `MoreExecutors.sameThreadExecutor()`. For `BroadcastHashJoin`, it's better to use `ThreadUtils.newDaemonCachedThreadPool`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6200 from zsxwing/SPARK-7655-2 and squashes the following commits:
      
      cfdc605 [zsxwing] Remove redundant imort and minor doc fix
      cf83153 [zsxwing] Add "sameThread" and "newDaemonCachedThreadPool with maxThreadNumber" to ThreadUtils
      08ad0ee [zsxwing] Remove 'scala.concurrent.ExecutionContext.Implicits.global' in 'ask' and 'BroadcastHashJoin'
      47e7ffe3
    • Nishkam Ravi's avatar
      [SPARK-7672] [CORE] Use int conversion in translating kryoserializer.buffer.mb... · 0ac8b01a
      Nishkam Ravi authored
      [SPARK-7672] [CORE] Use int conversion in translating kryoserializer.buffer.mb to kryoserializer.buffer
      
      In translating spark.kryoserializer.buffer.mb to spark.kryoserializer.buffer, use of toDouble will lead to "Fractional values not supported" error even when spark.kryoserializer.buffer.mb is an integer.
      ilganeli, andrewor14
      
      Author: Nishkam Ravi <nravi@cloudera.com>
      Author: nishkamravi2 <nishkamravi@gmail.com>
      Author: nravi <nravi@c1704.halxg.cloudera.com>
      
      Closes #6198 from nishkamravi2/master_nravi and squashes the following commits:
      
      171a53c [nishkamravi2] Update SparkConfSuite.scala
      5261bf6 [Nishkam Ravi] Add a test for deprecated config spark.kryoserializer.buffer.mb
      5190f79 [Nishkam Ravi] In translating from deprecated spark.kryoserializer.buffer.mb to spark.kryoserializer.buffer use int conversion since fractions are not permissible
      059ce82 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      eaa13b5 [nishkamravi2] Update Client.scala
      981afd2 [Nishkam Ravi] Check for read permission before initiating copy
      1b81383 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      0f1abd0 [nishkamravi2] Update Utils.scala
      474e3bf [nishkamravi2] Update DiskBlockManager.scala
      97c383e [nishkamravi2] Update Utils.scala
      8691e0c [Nishkam Ravi] Add a try/catch block around Utils.removeShutdownHook
      2be1e76 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      1c13b79 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      bad4349 [nishkamravi2] Update Main.java
      36a6f87 [Nishkam Ravi] Minor changes and bug fixes
      b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument
      d9658d6 [Nishkam Ravi] Changes for SPARK-6406
      ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406)
      345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ac58975 [Nishkam Ravi] spark-class changes
      06bfeb0 [nishkamravi2] Update spark-class
      35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java
      4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java
      746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar)
      bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      d453197 [nishkamravi2] Update NewHadoopRDD.scala
      6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
      0ce2c32 [nishkamravi2] Update HadoopRDD.scala
      f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
      71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      494d8c0 [nishkamravi2] Update DiskBlockManager.scala
      3c5ddba [nishkamravi2] Update DiskBlockManager.scala
      f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
      79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
      535295a [nishkamravi2] Update TaskSetManager.scala
      3e1b616 [Nishkam Ravi] Modify test for maxResultSize
      9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
      5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      636a9ff [nishkamravi2] Update YarnAllocator.scala
      8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
      35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
      5ac2ec1 [Nishkam Ravi] Remove out
      dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
      42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
      362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
      c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
      1cf2d1e [nishkamravi2] Update YarnAllocator.scala
      ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
      2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
      2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
      3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
      5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
      eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
      df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
      6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
      5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
      681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
      0ac8b01a
    • Sean Owen's avatar
      [SPARK-4556] [BUILD] binary distribution assembly can't run in local mode · 1fd33815
      Sean Owen authored
      Add note on building a runnable distribution with make-distribution.sh
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6186 from srowen/SPARK-4556 and squashes the following commits:
      
      4002966 [Sean Owen] Add pointer to --help flag
      9fa7883 [Sean Owen] Add note on building a runnable distribution with make-distribution.sh
      1fd33815
    • FavioVazquez's avatar
      [SPARK-7671] Fix wrong URLs in MLlib Data Types Documentation · d41ae434
      FavioVazquez authored
      There is a mistake in the URL of Matrices in the MLlib Data Types documentation (Local matrix scala section), the URL points to https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices which is a mistake, since Matrices is an object that implements factory methods for Matrix that does not have a companion class. The correct link should point to https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices$
      
      There is another mistake, in the Local Vector section in Scala, Java and Python
      
      In the Scala section the URL of Vectors points to the trait Vector (https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and not to the factory methods implemented in Vectors.
      
      The correct link should be: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$
      
      In the Java section the URL of Vectors points to the Interface Vector (https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html) and not to the Class Vectors
      
      The correct link should be:
      https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vectors.html
      
      In the Python section the URL of Vectors points to the class Vector (https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vector) and not the Class Vectors
      
      The correct link should be:
      https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors
      
      Author: FavioVazquez <favio.vazquezp@gmail.com>
      
      Closes #6196 from FavioVazquez/fix-typo-matrices-mllib-datatypes and squashes the following commits:
      
      3e9efd5 [FavioVazquez] - Fixed wrong URLs in the MLlib Data Types Documentation
      9af7074 [FavioVazquez] Merge remote-tracking branch 'upstream/master'
      edab1ef [FavioVazquez] Merge remote-tracking branch 'upstream/master'
      b2e2f8c [FavioVazquez] Merge remote-tracking branch 'upstream/master'
      d41ae434
    • Reynold Xin's avatar
      [SPARK-7654][SQL] DataFrameReader and DataFrameWriter for input/output API · 578bfeef
      Reynold Xin authored
      This patch introduces DataFrameWriter and DataFrameReader.
      
      DataFrameReader interface, accessible through SQLContext.read, contains methods that create DataFrames. These methods used to reside in SQLContext. Example usage:
      ```scala
      sqlContext.read.json("...")
      sqlContext.read.parquet("...")
      ```
      
      DataFrameWriter interface, accessible through DataFrame.write, implements a builder pattern to avoid the proliferation of options in writing DataFrame out. It currently implements:
      - mode
      - format (e.g. "parquet", "json")
      - options (generic options passed down into data sources)
      - partitionBy (partitioning columns)
      Example usage:
      ```scala
      df.write.mode("append").format("json").partitionBy("date").saveAsTable("myJsonTable")
      ```
      
      TODO:
      
      - [ ] Documentation update
      - [ ] Move JDBC into reader / writer?
      - [ ] Deprecate the old interfaces
      - [ ] Move the generic load interface into reader.
      - [ ] Update example code and documentation
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6175 from rxin/reader-writer and squashes the following commits:
      
      b146c95 [Reynold Xin] Deprecation of old APIs.
      bd8abdf [Reynold Xin] Fixed merge conflict.
      26abea2 [Reynold Xin] Added general load methods.
      244fbec [Reynold Xin] Added equivalent to example.
      4f15d92 [Reynold Xin] Added documentation for partitionBy.
      7e91611 [Reynold Xin] [SPARK-7654][SQL] DataFrameReader and DataFrameWriter for input/output API.
      578bfeef
  3. May 15, 2015
    • AiHe's avatar
      [SPARK-7473] [MLLIB] Add reservoir sample in RandomForest · deb41133
      AiHe authored
      reservoir feature sample by using existing api
      
      Author: AiHe <ai.he@ussuning.com>
      
      Closes #5988 from AiHe/reservoir and squashes the following commits:
      
      e7a41ac [AiHe] remove non-robust testing case
      28ffb9a [AiHe] set seed as rng.nextLong
      37459e1 [AiHe] set fixed seed
      1e98a4c [AiHe] [MLLIB][tree] Add reservoir sample in RandomForest
      deb41133
    • Davies Liu's avatar
      [SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files · d7b69946
      Davies Liu authored
      dataframe.py is splited into column.py, group.py and dataframe.py:
      ```
         360 column.py
        1223 dataframe.py
         183 group.py
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6201 from davies/split_df and squashes the following commits:
      
      fc8f5ab [Davies Liu] split dataframe.py into multiple files
      d7b69946
    • Davies Liu's avatar
      [SPARK-7073] [SQL] [PySpark] Clean up SQL data type hierarchy in Python · adfd3668
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6206 from davies/sql_type and squashes the following commits:
      
      33d6860 [Davies Liu] [SPARK-7073] [SQL] [PySpark] Clean up SQL data type hierarchy in Python
      adfd3668
    • Ram Sriharsha's avatar
      [SPARK-7575] [ML] [DOC] Example code for OneVsRest · cc12a86f
      Ram Sriharsha authored
      Java and Scala examples for OneVsRest. Fixes the base classifier to be Logistic Regression and accepts the configuration parameters of the base classifier.
      
      Author: Ram Sriharsha <rsriharsha@hw11853.local>
      
      Closes #6115 from harsha2010/SPARK-7575 and squashes the following commits:
      
      87ad3c7 [Ram Sriharsha] extra line
      f5d9891 [Ram Sriharsha] Merge branch 'master' into SPARK-7575
      7076084 [Ram Sriharsha] cleanup
      dfd660c [Ram Sriharsha] cleanup
      8703e4f [Ram Sriharsha] update doc
      cb23995 [Ram Sriharsha] fix commandline options for JavaOneVsRestExample
      69e91f8 [Ram Sriharsha] cleanup
      7f4e127 [Ram Sriharsha] cleanup
      d4c40d0 [Ram Sriharsha] Code Review fixes
      461eb38 [Ram Sriharsha] cleanup
      e0106d9 [Ram Sriharsha] Fix typo
      935cf56 [Ram Sriharsha] Try to match Java and Scala Example Commandline options
      5323ff9 [Ram Sriharsha] cleanup
      196a59a [Ram Sriharsha] cleanup
      6adfa0c [Ram Sriharsha] Style Fix
      8cfc5d5 [Ram Sriharsha] [SPARK-7575] Example code for OneVsRest
      cc12a86f
    • Josh Rosen's avatar
      [SPARK-7563] OutputCommitCoordinator.stop() should only run on the driver · 2c04c8a1
      Josh Rosen authored
      This fixes a bug where an executor that exits can cause the driver's OutputCommitCoordinator to stop. To fix this, we use an `isDriver` flag and check it in `stop()`.
      
      See https://issues.apache.org/jira/browse/SPARK-7563 for more details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6197 from JoshRosen/SPARK-7563 and squashes the following commits:
      
      04b2cc5 [Josh Rosen] [SPARK-7563] OutputCommitCoordinator.stop() should only be executed on the driver
      2c04c8a1
    • Kay Ousterhout's avatar
      [SPARK-7676] Bug fix and cleanup of stage timeline view · e7454564
      Kay Ousterhout authored
      cc pwendell sarutak
      
      This commit cleans up some unnecessary code, eliminates the feature where when you mouse-over a box in the timeline, the corresponding task is highlighted in the table (because that feature is only useful in the rare case when you have a very small number of tasks, in which case it's easy to figure out the mapping anyway), and fixes a bug where nothing shows up if you try to visualize a stage with only 1 task.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #6202 from kayousterhout/SPARK-7676 and squashes the following commits:
      
      dfd29d4 [Kay Ousterhout] [SPARK-7676] Bug fix and cleanup of stage timeline view
      e7454564
    • Liang-Chi Hsieh's avatar
      [SPARK-7556] [ML] [DOC] Add user guide for spark.ml Binarizer, including... · c8696337
      Liang-Chi Hsieh authored
      [SPARK-7556] [ML] [DOC] Add user guide for spark.ml Binarizer, including Scala, Java and Python examples
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-7556
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6116 from viirya/binarizer_doc and squashes the following commits:
      
      40cb677 [Liang-Chi Hsieh] Better print out.
      5b7ef1d [Liang-Chi Hsieh] Make examples more clear.
      1bf9c09 [Liang-Chi Hsieh] For comments.
      6cf8cba [Liang-Chi Hsieh] Add user guide for Binarizer.
      c8696337
    • Iulian Dragos's avatar
      [SPARK-7677] [STREAMING] Add Kafka modules to the 2.11 build. · 6e77105e
      Iulian Dragos authored
      This is somewhat related to [SPARK-6154](https://issues.apache.org/jira/browse/SPARK-6154), though it only touches Kafka, not the jline dependency for thriftserver.
      
      I tested this locally on 2.11 (./run-tests) and everything looked good (I had to disable mima, because `MimaBuild` harcodes 2.10 for the previous version -- that's another PR).
      
      Author: Iulian Dragos <jaguarul@gmail.com>
      
      Closes #6149 from dragos/issue/spark-2.11-kafka and squashes the following commits:
      
      aa15d99 [Iulian Dragos] Add Kafka modules to the 2.11 build.
      6e77105e
    • qhuang's avatar
      [SPARK-7226] [SPARKR] Support math functions in R DataFrame · 50da9e89
      qhuang authored
      Author: qhuang <qian.huang@intel.com>
      
      Closes #6170 from hqzizania/master and squashes the following commits:
      
      f20c39f [qhuang] add tests units and fixes
      2a7d121 [qhuang] use a function name more familiar to R users
      07aa72e [qhuang] Support math functions in R DataFrame
      50da9e89
    • Kousuke Saruta's avatar
      [SPARK-7296] Add timeline visualization for stages in the UI. · 9b6cf285
      Kousuke Saruta authored
      This PR builds on #2342 by adding a timeline view for the Stage page,
      showing how tasks spend their time.
      
      With this timeline, we can understand following things of a Stage.
      
      * When/where each task ran
      * Total duration of each task
      * Proportion of the time each task spends
      
      Also, this timeline view can scrollable and zoomable.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #5843 from sarutak/stage-page-timeline and squashes the following commits:
      
      4ba9604 [Kousuke Saruta] Fixed the order of legends
      16bb552 [Kousuke Saruta] Removed border of legend area
      2e5d605 [Kousuke Saruta] Modified warning message
      16cb2e6 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into stage-page-timeline
      7ae328f [Kousuke Saruta] Modified code style
      d5f794a [Kousuke Saruta] Fixed performance issues more
      64e6642 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into stage-page-timeline
      e4a3354 [Kousuke Saruta] minor code style change
      878e3b8 [Kousuke Saruta] Fixed a bug that tooltip remains
      b9d8f1b [Kousuke Saruta] Fixed performance issue
      ac8842b [Kousuke Saruta] Fixed layout
      2319739 [Kousuke Saruta] Modified appearances more
      81903ab [Kousuke Saruta] Modified appearances
      a79dcc3 [Kousuke Saruta] Modified appearance
      55a390c [Kousuke Saruta] Ignored scalastyle for a line-comment
      29eae3e [Kousuke Saruta] limited to longest 1000 tasks
      2a9e376 [Kousuke Saruta] Minor cleanup
      385b6d2 [Kousuke Saruta] Added link feature
      ba1ac3e [Kousuke Saruta] Fixed style
      2ae8520 [Kousuke Saruta] Updated bootstrap-tooltip.js from 2.2.2 to 2.3.2
      af430f1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into stage-page-timeline
      e694b8e [Kousuke Saruta] Added timeline view to StagePage
      8f6610c [Kousuke Saruta] Fixed conflict
      b587cf2 [Kousuke Saruta] initial commit
      11fe67d [Kousuke Saruta] Fixed conflict
      79ac03d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      a91abd3 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
      ef34a5b [Kousuke Saruta] Implement tooltip using bootstrap
      b09d0c5 [Kousuke Saruta] Move `stroke` and `fill` attribute of rect elements to css
      d3c63c8 [Kousuke Saruta] Fixed a little bit bugs
      a36291b [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
      28714b6 [Kousuke Saruta] Fixed highlight issue
      0dc4278 [Kousuke Saruta] Addressed most of Patrics's feedbacks
      8110acf [Kousuke Saruta] Added scroll limit to Job timeline
      974a64a [Kousuke Saruta] Removed unused function
      ee7a7f0 [Kousuke Saruta] Refactored
      6a91872 [Kousuke Saruta] Temporary commit
      6693f34 [Kousuke Saruta] Added link to job/stage box in the timeline in order to move to corresponding row when we click
      8f88222 [Kousuke Saruta] Added job/stage description
      aeed4b1 [Kousuke Saruta] Removed stage timeline
      fc1696c [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
      999ccd4 [Kousuke Saruta] Improved scalability
      0fc6a31 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      19815ae [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      68b7540 [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
      52b5f0b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      dec85db [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      fcdab7d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      dab7cc1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      09cce97 [Kousuke Saruta] Cleanuped
      16f82cf [Kousuke Saruta] Cleanuped
      9fb522e [Kousuke Saruta] Cleanuped
      d05f2c2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      e85e9aa [Kousuke Saruta] Cleanup: Added TimelineViewUtils.scala
      a76e569 [Kousuke Saruta] Removed unused setting in timeline-view.css
      5ce1b21 [Kousuke Saruta] Added vis.min.js, vis.min.css and vis.map to .rat-exclude
      082f709 [Kousuke Saruta] Added Timeline-View feature for Applications, Jobs and Stages
      9b6cf285
    • ehnalis's avatar
      [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode · 8e3822a0
      ehnalis authored
      Added a simple checking for SparkContext.
      Also added two rational checking against null at AM object.
      
      Author: ehnalis <zoltan.zvara@gmail.com>
      
      Closes #6083 from ehnalis/cluster and squashes the following commits:
      
      926bd96 [ehnalis] Moved check to SparkContext.
      7c89b6e [ehnalis] Remove false line.
      ea2a5fe [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      4924e01 [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      39e4fa3 [ehnalis] SPARK-7504 [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      9f287c5 [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      8e3822a0
    • Kousuke Saruta's avatar
      [SPARK-7664] [WEBUI] DAG visualization: Fix incorrect link paths of DAG. · ad92af9d
      Kousuke Saruta authored
      In JobPage, we can jump a StagePage when we click corresponding box of DAG viz but the link path is incorrect.
      
      When we click a box like as follows ...
      ![screenshot_from_2015-05-15 19 24 25](https://cloud.githubusercontent.com/assets/4736016/7651528/5f7ef824-fb3c-11e4-9518-8c9ade2dff7a.png)
      
      We jump to index page.
      ![screenshot_from_2015-05-15 19 24 45](https://cloud.githubusercontent.com/assets/4736016/7651534/6d666274-fb3c-11e4-971c-c3f2dc2b1da2.png)
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #6184 from sarutak/fix-link-path-of-dag-viz and squashes the following commits:
      
      faba3ba [Kousuke Saruta] Fix a incorrect link
      ad92af9d
    • Sean Owen's avatar
      [SPARK-5412] [DEPLOY] Cannot bind Master to a specific hostname as per the documentation · 8ab1450d
      Sean Owen authored
      Pass args to start-master.sh through to start-daemon.sh, as other scripts do, so that things like --host have effect on start-master.sh as per docs
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6185 from srowen/SPARK-5412 and squashes the following commits:
      
      b3ce9da [Sean Owen] Pass args to start-master.sh through to start-daemon.sh, as other scripts do, so that things like --host have effect on start-master.sh as per docs
      8ab1450d
    • Tim Ellison's avatar
      [CORE] Protect additional test vars from early GC · 270d4b51
      Tim Ellison authored
      Fix more places in which some test variables could be collected early by aggressive JVM optimization.
      Added a couple of comments to note where existing references are sufficient in the same test pattern.
      
      Author: Tim Ellison <t.p.ellison@gmail.com>
      
      Closes #6187 from tellison/DefeatEarlyGC and squashes the following commits:
      
      27329d9 [Tim Ellison] [CORE] Protect additional test vars from early GC
      270d4b51
    • Oleksii Kostyliev's avatar
      [SPARK-7233] [CORE] Detect REPL mode once · b1b9d580
      Oleksii Kostyliev authored
      <h3>Description</h3>
      Detect REPL mode once per JVM lifespan.
      Previous behavior was to check presence of interpreter mode every time a job was submitted. In the case of execution of multiple short-living jobs this was causing massive mutual blocks between submission threads.
      
      For more details please refer to https://issues.apache.org/jira/browse/SPARK-7233.
      
      <h3>Notes</h3>
      * I inverted the return value in case of catching an exception from `true` to `false`. It seems more logical to assume that if the REPL class is not found, we aren't in the interpreter mode.
      * I'd personally would call `classForName` with just a Spark classloader (`org.apache.spark.util.Utils#getSparkClassLoader`) but `org.apache.spark.util.Utils#getContextOrSparkClassLoader` is said to be preferable.
      * I struggled to come up with a concise, readable and clear unit test. Suggestions are welcome if you feel necessary.
      
      Author: Oleksii Kostyliev <etander@gmail.com>
      Author: Oleksii Kostyliev <okostyliev@thunderhead.com>
      
      Closes #5835 from preeze/SPARK-7233 and squashes the following commits:
      
      69bb9e4 [Oleksii Kostyliev] SPARK-7527: fixed explanatory comment to meet style-checker requirements
      26dcc24 [Oleksii Kostyliev] SPARK-7527: fixed explanatory comment to meet style-checker requirements
      c6f9685 [Oleksii Kostyliev] Merge remote-tracking branch 'remotes/upstream/master' into SPARK-7233
      b78a983 [Oleksii Kostyliev] SPARK-7527: revert the fix and let it be addressed separately at a later stage
      b64d441 [Oleksii Kostyliev] SPARK-7233: inline inInterpreter parameter into instantiateClass
      86e2606 [Oleksii Kostyliev] SPARK-7233, SPARK-7527: Handle interpreter mode properly.
      c7ee69c [Oleksii Kostyliev] Merge remote-tracking branch 'upstream/master' into SPARK-7233
      d6c07fc [Oleksii Kostyliev] SPARK-7233: properly handle the inverted meaning of isInInterpreter
      c319039 [Oleksii Kostyliev] SPARK-7233: move inInterpreter to Utils and make it lazy
      b1b9d580
    • FlytxtRnD's avatar
      [SPARK-7651] [MLLIB] [PYSPARK] GMM predict, predictSoft should raise error on bad input · 8f4aaba0
      FlytxtRnD authored
      In the Python API for Gaussian Mixture Model, predict() and predictSoft() methods should raise an error when the input argument is not an RDD.
      
      Author: FlytxtRnD <meethu.mathew@flytxt.com>
      
      Closes #6180 from FlytxtRnD/GmmPredictException and squashes the following commits:
      
      4b6aa11 [FlytxtRnD] Raise error if the input to predict()/predictSoft() is not an RDD
      8f4aaba0
    • Liang-Chi Hsieh's avatar
      [SPARK-7668] [MLLIB] Preserve isTransposed property for Matrix after calling map function · f96b85ab
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7668
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6188 from viirya/fix_matrix_map and squashes the following commits:
      
      2a7cc97 [Liang-Chi Hsieh] Preserve isTransposed property for Matrix after calling map function.
      f96b85ab
    • Kousuke Saruta's avatar
      [SPARK-7503] [YARN] Resources in .sparkStaging directory can't be cleaned up on error · c64ff803
      Kousuke Saruta authored
      When we run applications on YARN with cluster mode, uploaded resources on .sparkStaging directory can't be cleaned up in case of failure of uploading local resources.
      
      You can see this issue by running following command.
      ```
      bin/spark-submit --master yarn --deploy-mode cluster --class <someClassName> <non-existing-jar>
      ```
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #6026 from sarutak/delete-uploaded-resources-on-error and squashes the following commits:
      
      caef9f4 [Kousuke Saruta] Fixed style
      882f921 [Kousuke Saruta] Wrapped Client#submitApplication with try/catch blocks in order to delete resources on error
      1786ca4 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into delete-uploaded-resources-on-error
      f61071b [Kousuke Saruta] Fixed cleanup problem
      c64ff803
    • Cheng Lian's avatar
      [SPARK-7591] [SQL] Partitioning support API tweaks · fdf5bba3
      Cheng Lian authored
      Please see [SPARK-7591] [1] for the details.
      
      /cc rxin marmbrus yhuai
      
      [1]: https://issues.apache.org/jira/browse/SPARK-7591
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6150 from liancheng/spark-7591 and squashes the following commits:
      
      af422e7 [Cheng Lian] Addresses @rxin's comments
      37d1738 [Cheng Lian] Fixes HadoopFsRelation partition columns initialization
      2fc680a [Cheng Lian] Fixes Scala style issue
      189ad23 [Cheng Lian] Removes HadoopFsRelation constructor arguments
      522c24e [Cheng Lian] Adds OutputWriterFactory
      047d40d [Cheng Lian] Renames FSBased* to HadoopFs*, also renamed FSBasedParquetRelation back to ParquetRelation2
      fdf5bba3
    • Yanbo Liang's avatar
      [SPARK-6258] [MLLIB] GaussianMixture Python API parity check · 94761485
      Yanbo Liang authored
      Implement Python API for major disparities of GaussianMixture cluster algorithm between Scala & Python
      ```scala
      GaussianMixture
          setInitialModel
      GaussianMixtureModel
          k
      ```
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6087 from yanboliang/spark-6258 and squashes the following commits:
      
      b3af21c [Yanbo Liang] fix typo
      2b645c1 [Yanbo Liang] fix doc
      638b4b7 [Yanbo Liang] address comments
      b5bcade [Yanbo Liang] GaussianMixture Python API parity check
      94761485
    • zsxwing's avatar
      [SPARK-7650] [STREAMING] [WEBUI] Move streaming css and js files to the streaming project · cf842d42
      zsxwing authored
      cc tdas
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6160 from zsxwing/SPARK-7650 and squashes the following commits:
      
      fe6ae15 [zsxwing] Fix the import order
      a4ffd99 [zsxwing] Merge branch 'master' into SPARK-7650
      dc402b6 [zsxwing] Move streaming css and js files to the streaming project
      cf842d42
Loading