Skip to content
Snippets Groups Projects
  1. Jan 10, 2017
    • hyukjinkwon's avatar
      [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all identified tests failed due... · 4e27578f
      hyukjinkwon authored
      [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all identified tests failed due to path and resource-not-closed problems on Windows
      
      ## What changes were proposed in this pull request?
      
      This PR proposes to fix all the test failures identified by testing with AppVeyor.
      
      **Scala - aborted tests**
      
      ```
      WindowQuerySuite:
        Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.WindowQuerySuite *** ABORTED *** (156 milliseconds)
         org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive   argetscala-2.11   est-classesdatafilespart_tiny.txt;
      
      OrcSourceSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.orc.OrcSourceSuite *** ABORTED *** (62 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
      ParquetMetastoreSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.ParquetMetastoreSuite *** ABORTED *** (4 seconds, 703 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
      ParquetSourceSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.ParquetSourceSuite *** ABORTED *** (3 seconds, 907 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark  arget mpspark-581a6575-454f-4f21-a516-a07f95266143;
      
      KafkaRDDSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.KafkaRDDSuite *** ABORTED *** (5 seconds, 212 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-4722304d-213e-4296-b556-951df1a46807
      
      DirectKafkaStreamSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite *** ABORTED *** (7 seconds, 127 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-d0d3eba7-4215-4e10-b40e-bb797e89338e
         at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
      
      ReliableKafkaStreamSuite
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite *** ABORTED *** (5 seconds, 498 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-d33e45a0-287e-4bed-acae-ca809a89d888
      
      KafkaStreamSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.KafkaStreamSuite *** ABORTED *** (2 seconds, 892 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-59c9d169-5a56-4519-9ef0-cefdbd3f2e6c
      
      KafkaClusterSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.KafkaClusterSuite *** ABORTED *** (1 second, 690 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-3ef402b0-8689-4a60-85ae-e41e274f179d
      
      DirectKafkaStreamSuite:
       Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite *** ABORTED *** (59 seconds, 626 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-426107da-68cf-4d94-b0d6-1f428f1c53f6
      
      KafkaRDDSuite:
      Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka010.KafkaRDDSuite *** ABORTED *** (2 minutes, 6 seconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-b9ce7929-5dae-46ab-a0c4-9ef6f58fbc2
      ```
      
      **Java - failed tests**
      
      ```
      Test org.apache.spark.streaming.kafka.JavaKafkaRDDSuite.testKafkaRDD failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-1cee32f4-4390-4321-82c9-e8616b3f0fb0, took 9.61 sec
      
      Test org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-f42695dd-242e-4b07-847c-f299b8e4676e, took 11.797 sec
      
      Test org.apache.spark.streaming.kafka.JavaDirectKafkaStreamSuite.testKafkaStream failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-85c0d062-78cf-459c-a2dd-7973572101ce, took 1.581 sec
      
      Test org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite.testKafkaRDD failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-49eb6b5c-8366-47a6-83f2-80c443c48280, took 17.895 sec
      
      org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite.testKafkaStream failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-898cf826-d636-4b1c-a61a-c12a364c02e7, took 8.858 sec
      ```
      
      **Scala - failed tests**
      
      ```
      PartitionProviderCompatibilitySuite:
       - insert overwrite partition of new datasource table overwrites just partition *** FAILED *** (828 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-bb6337b9-4f99-45ab-ad2c-a787ab965c09
      
       - SPARK-18635 special chars in partition values - partition management true *** FAILED *** (5 seconds, 360 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - SPARK-18635 special chars in partition values - partition management false *** FAILED *** (141 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      ```
      
      ```
      UtilsSuite:
       - reading offset bytes of a file (compressed) *** FAILED *** (0 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-ecb2b7d5-db8b-43a7-b268-1bf242b5a491
      
       - reading offset bytes across multiple files (compressed) *** FAILED *** (0 milliseconds)
         java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-25cc47a8-1faa-4da5-8862-cf174df63ce0
      ```
      
      ```
      StatisticsSuite:
       - MetastoreRelations fallback to HDFS for size estimation *** FAILED *** (110 milliseconds)
         org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'csv_table' not found in database 'default';
      ```
      
      ```
      SQLQuerySuite:
       - permanent UDTF *** FAILED *** (125 milliseconds)
         org.apache.spark.sql.AnalysisException: Undefined function: 'udtf_count_temp'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 24
      
       - describe functions - user defined functions *** FAILED *** (125 milliseconds)
         org.apache.spark.sql.AnalysisException: Undefined function: 'udtf_count'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
      
       - CTAS without serde with location *** FAILED *** (16 milliseconds)
         java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:projectsspark%09arget%09mpspark-ed673d73-edfc-404e-829e-2e2b9725d94e/c1
      
       - derived from Hive query file: drop_database_removes_partition_dirs.q *** FAILED *** (47 milliseconds)
         java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:projectsspark%09arget%09mpspark-d2ddf08e-699e-45be-9ebd-3dfe619680fe/drop_database_removes_partition_dirs_table
      
       - derived from Hive query file: drop_table_removes_partition_dirs.q *** FAILED *** (0 milliseconds)
         java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:projectsspark%09arget%09mpspark-d2ddf08e-699e-45be-9ebd-3dfe619680fe/drop_table_removes_partition_dirs_table2
      
       - SPARK-17796 Support wildcard character in filename for LOAD DATA LOCAL INPATH *** FAILED *** (109 milliseconds)
         java.nio.file.InvalidPathException: Illegal char <:> at index 2: /C:/projects/spark/sql/hive/projectsspark	arget	mpspark-1a122f8c-dfb3-46c4-bab1-f30764baee0e/*part-r*
      ```
      
      ```
      HiveDDLSuite:
       - drop external tables in default database *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - add/drop partitions - external table *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - create/drop database - location without pre-created directory *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - create/drop database - location with pre-created directory *** FAILED *** (32 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - drop database containing tables - CASCADE *** FAILED *** (94 milliseconds)
         CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
      
       - drop an empty database - CASCADE *** FAILED *** (63 milliseconds)
         CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
      
       - drop database containing tables - RESTRICT *** FAILED *** (47 milliseconds)
         CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
      
       - drop an empty database - RESTRICT *** FAILED *** (47 milliseconds)
         CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
      
       - CREATE TABLE LIKE an external data source table *** FAILED *** (140 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-c5eba16d-07ae-4186-95bb-21c5811cf888;
      
       - CREATE TABLE LIKE an external Hive serde table *** FAILED *** (16 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - desc table for data source table - no user-defined schema *** FAILED *** (125 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-e8bf5bf5-721a-4cbe-9d6	at scala.collection.immutable.List.foreach(List.scala:381)d-5543a8301c1d;
      ```
      
      ```
      MetastoreDataSourcesSuite
       - CTAS: persisted bucketed data source table *** FAILED *** (16 milliseconds)
         java.lang.IllegalArgumentException: Can not create a Path from an empty string
      ```
      
      ```
      ShowCreateTableSuite:
       - simple external hive table *** FAILED *** (0 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      ```
      
      ```
      PartitionedTablePerfStatsSuite:
       - hive table: partitioned pruned table reports only selected files *** FAILED *** (313 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: partitioned pruned table reports only selected files *** FAILED *** (219 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-311f45f8-d064-4023-a4bb-e28235bff64d;
      
       - hive table: lazy partition pruning reads only necessary partition data *** FAILED *** (203 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: lazy partition pruning reads only necessary partition data *** FAILED *** (187 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-fde874ca-66bd-4d0b-a40f-a043b65bf957;
      
       - hive table: lazy partition pruning with file status caching enabled *** FAILED *** (188 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: lazy partition pruning with file status caching enabled *** FAILED *** (187 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-e6d20183-dd68-4145-acbe-4a509849accd;
      
       - hive table: file status caching respects refresh table and refreshByPath *** FAILED *** (172 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: file status caching respects refresh table and refreshByPath *** FAILED *** (203 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-8b2c9651-2adf-4d58-874f-659007e21463;
      
       - hive table: file status cache respects size limit *** FAILED *** (219 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: file status cache respects size limit *** FAILED *** (171 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-7835ab57-cb48-4d2c-bb1d-b46d5a4c47e4;
      
       - datasource table: table setup does not scan filesystem *** FAILED *** (266 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-20598d76-c004-42a7-8061-6c56f0eda5e2;
      
       - hive table: table setup does not scan filesystem *** FAILED *** (266 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - hive table: num hive client calls does not scale with partition count *** FAILED *** (2 seconds, 281 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: num hive client calls does not scale with partition count *** FAILED *** (2 seconds, 422 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-4cfed321-4d1d-4b48-8d34-5c169afff383;
      
       - hive table: files read and cached when filesource partition management is off *** FAILED *** (234 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      
       - datasource table: all partition data cached in memory when partition management is off *** FAILED *** (203 milliseconds)
         org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-4bcc0398-15c9-4f6a-811e-12d40f3eec12;
      
       - SPARK-18700: table loaded only once even when resolved concurrently *** FAILED *** (1 second, 266 milliseconds)
         org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
      ```
      
      ```
      HiveSparkSubmitSuite:
       - temporary Hive UDF: define a UDF and use it *** FAILED *** (2 seconds, 94 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - permanent Hive UDF: define a UDF and use it *** FAILED *** (281 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - permanent Hive UDF: use a already defined permanent function *** FAILED *** (718 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-8368: includes jars passed in through --jars *** FAILED *** (3 seconds, 521 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-8020: set sql conf in spark conf *** FAILED *** (0 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-8489: MissingRequirementError during reflection *** FAILED *** (94 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-9757 Persist Parquet relation with decimal column *** FAILED *** (16 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-11009 fix wrong result of Window function in cluster mode *** FAILED *** (16 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-14244 fix window partition size attribute binding failure *** FAILED *** (78 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - set spark.sql.warehouse.dir *** FAILED *** (16 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - set hive.metastore.warehouse.dir *** FAILED *** (15 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-16901: set javax.jdo.option.ConnectionURL *** FAILED *** (16 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      
       - SPARK-18360: default table path of tables in default database should depend on the location of default database *** FAILED *** (15 milliseconds)
         java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
      ```
      
      ```
      UtilsSuite:
       - resolveURIs with multiple paths *** FAILED *** (0 milliseconds)
         ".../jar3,file:/C:/pi.py[%23]py.pi,file:/C:/path%..." did not equal ".../jar3,file:/C:/pi.py[#]py.pi,file:/C:/path%..." (UtilsSuite.scala:468)
      ```
      
      ```
      CheckpointSuite:
       - recovery with file input stream *** FAILED *** (10 seconds, 205 milliseconds)
         The code passed to eventually never returned normally. Attempted 660 times over 10.014272499999999 seconds. Last failure message: Unexpected internal error near index 1
         \
          ^. (CheckpointSuite.scala:680)
      ```
      
      ## How was this patch tested?
      
      Manually via AppVeyor as below:
      
      **Scala - aborted tests**
      
      ```
      WindowQuerySuite - all passed
      OrcSourceSuite:
      - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (4 seconds, 417 milliseconds)
        org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:625)
      ParquetMetastoreSuite - all passed
      ParquetSourceSuite - all passed
      KafkaRDDSuite - all passed
      DirectKafkaStreamSuite - all passed
      ReliableKafkaStreamSuite - all passed
      KafkaStreamSuite - all passed
      KafkaClusterSuite - all passed
      DirectKafkaStreamSuite - all passed
      KafkaRDDSuite - all passed
      ```
      
      **Java - failed tests**
      
      ```
      org.apache.spark.streaming.kafka.JavaKafkaRDDSuite - all passed
      org.apache.spark.streaming.kafka.JavaDirectKafkaStreamSuite - all passed
      org.apache.spark.streaming.kafka.JavaKafkaStreamSuite - all passed
      org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite - all passed
      org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite - all passed
      ```
      
      **Scala - failed tests**
      
      ```
      PartitionProviderCompatibilitySuite:
      - insert overwrite partition of new datasource table overwrites just partition (1 second, 953 milliseconds)
      - SPARK-18635 special chars in partition values - partition management true (6 seconds, 31 milliseconds)
      - SPARK-18635 special chars in partition values - partition management false (4 seconds, 578 milliseconds)
      ```
      
      ```
      UtilsSuite:
      - reading offset bytes of a file (compressed) (203 milliseconds)
      - reading offset bytes across multiple files (compressed) (0 milliseconds)
      ```
      
      ```
      StatisticsSuite:
      - MetastoreRelations fallback to HDFS for size estimation (94 milliseconds)
      ```
      
      ```
      SQLQuerySuite:
       - permanent UDTF (407 milliseconds)
       - describe functions - user defined functions (441 milliseconds)
       - CTAS without serde with location (2 seconds, 831 milliseconds)
       - derived from Hive query file: drop_database_removes_partition_dirs.q (734 milliseconds)
       - derived from Hive query file: drop_table_removes_partition_dirs.q (563 milliseconds)
       - SPARK-17796 Support wildcard character in filename for LOAD DATA LOCAL INPATH (453 milliseconds)
      ```
      
      ```
      HiveDDLSuite:
       - drop external tables in default database (3 seconds, 5 milliseconds)
       - add/drop partitions - external table (2 seconds, 750 milliseconds)
       - create/drop database - location without pre-created directory (500 milliseconds)
       - create/drop database - location with pre-created directory (407 milliseconds)
       - drop database containing tables - CASCADE (453 milliseconds)
       - drop an empty database - CASCADE (375 milliseconds)
       - drop database containing tables - RESTRICT (328 milliseconds)
       - drop an empty database - RESTRICT (391 milliseconds)
       - CREATE TABLE LIKE an external data source table (953 milliseconds)
       - CREATE TABLE LIKE an external Hive serde table (3 seconds, 782 milliseconds)
       - desc table for data source table - no user-defined schema (1 second, 150 milliseconds)
      ```
      
      ```
      MetastoreDataSourcesSuite
       - CTAS: persisted bucketed data source table (875 milliseconds)
      ```
      
      ```
      ShowCreateTableSuite:
       - simple external hive table (78 milliseconds)
      ```
      
      ```
      PartitionedTablePerfStatsSuite:
       - hive table: partitioned pruned table reports only selected files (1 second, 109 milliseconds)
      - datasource table: partitioned pruned table reports only selected files (860 milliseconds)
       - hive table: lazy partition pruning reads only necessary partition data (859 milliseconds)
       - datasource table: lazy partition pruning reads only necessary partition data (1 second, 219 milliseconds)
       - hive table: lazy partition pruning with file status caching enabled (875 milliseconds)
       - datasource table: lazy partition pruning with file status caching enabled (890 milliseconds)
       - hive table: file status caching respects refresh table and refreshByPath (922 milliseconds)
       - datasource table: file status caching respects refresh table and refreshByPath (640 milliseconds)
       - hive table: file status cache respects size limit (469 milliseconds)
       - datasource table: file status cache respects size limit (453 milliseconds)
       - datasource table: table setup does not scan filesystem (328 milliseconds)
       - hive table: table setup does not scan filesystem (313 milliseconds)
       - hive table: num hive client calls does not scale with partition count (5 seconds, 431 milliseconds)
       - datasource table: num hive client calls does not scale with partition count (4 seconds, 79 milliseconds)
       - hive table: files read and cached when filesource partition management is off (656 milliseconds)
       - datasource table: all partition data cached in memory when partition management is off (484 milliseconds)
       - SPARK-18700: table loaded only once even when resolved concurrently (2 seconds, 578 milliseconds)
      ```
      
      ```
      HiveSparkSubmitSuite:
       - temporary Hive UDF: define a UDF and use it (1 second, 745 milliseconds)
       - permanent Hive UDF: define a UDF and use it (406 milliseconds)
       - permanent Hive UDF: use a already defined permanent function (375 milliseconds)
       - SPARK-8368: includes jars passed in through --jars (391 milliseconds)
       - SPARK-8020: set sql conf in spark conf (156 milliseconds)
       - SPARK-8489: MissingRequirementError during reflection (187 milliseconds)
       - SPARK-9757 Persist Parquet relation with decimal column (157 milliseconds)
       - SPARK-11009 fix wrong result of Window function in cluster mode (156 milliseconds)
       - SPARK-14244 fix window partition size attribute binding failure (156 milliseconds)
       - set spark.sql.warehouse.dir (172 milliseconds)
       - set hive.metastore.warehouse.dir (156 milliseconds)
       - SPARK-16901: set javax.jdo.option.ConnectionURL (157 milliseconds)
       - SPARK-18360: default table path of tables in default database should depend on the location of default database (172 milliseconds)
      ```
      
      ```
      UtilsSuite:
       - resolveURIs with multiple paths (0 milliseconds)
      ```
      
      ```
      CheckpointSuite:
       - recovery with file input stream (4 seconds, 452 milliseconds)
      ```
      
      Note: after resolving the aborted tests, there is a test failure identified as below:
      
      ```
      OrcSourceSuite:
      - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (4 seconds, 417 milliseconds)
        org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:625)
      ```
      
      This does not look due to this problem so this PR does not fix it here.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16451 from HyukjinKwon/all-path-resource-fixes.
      4e27578f
  2. Jan 04, 2017
    • Niranjan Padmanabhan's avatar
      [MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo · a1e40b1f
      Niranjan Padmanabhan authored
      ## What changes were proposed in this pull request?
      There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words.
      
      ## How was this patch tested?
      N/A since only docs or comments were updated.
      
      Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com>
      
      Closes #16455 from neurons/np.structure_streaming_doc.
      a1e40b1f
  3. Dec 22, 2016
    • saturday_s's avatar
      [SPARK-18537][WEB UI] Add a REST api to serve spark streaming information · ce99f51d
      saturday_s authored
      ## What changes were proposed in this pull request?
      
      This PR is an inheritance from #16000, and is a completion of #15904.
      
      **Description**
      
      - Augment the `org.apache.spark.status.api.v1` package for serving streaming information.
      - Retrieve the streaming information through StreamingJobProgressListener.
      
      > this api should cover exceptly the same amount of information as you can get from the web interface
      > the implementation is base on the current REST implementation of spark-core
      > and will be available for running applications only
      >
      > https://issues.apache.org/jira/browse/SPARK-18537
      
      ## How was this patch tested?
      
      Local test.
      
      Author: saturday_s <shi.indetail@gmail.com>
      Author: Chan Chor Pang <ChorPang.Chan@access-company.com>
      Author: peterCPChan <universknight@gmail.com>
      
      Closes #16253 from saturday-shi/SPARK-18537.
      ce99f51d
  4. Dec 21, 2016
  5. Dec 02, 2016
  6. Dec 01, 2016
    • Shixiong Zhu's avatar
      [SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite.... · 086b0c8f
      Shixiong Zhu authored
      [SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly
      
      ## What changes were proposed in this pull request?
      
      Avoid to create multiple threads to stop StreamingContext. Otherwise, the latch added in #16091 can be passed too early.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16105 from zsxwing/SPARK-18617-2.
      086b0c8f
  7. Nov 30, 2016
    • Shixiong Zhu's avatar
      [SPARK-18617][SPARK-18560][TEST] Fix flaky test: StreamingContextSuite.... · 0a811210
      Shixiong Zhu authored
      [SPARK-18617][SPARK-18560][TEST] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly
      
      ## What changes were proposed in this pull request?
      
      Fixed the potential SparkContext leak in `StreamingContextSuite.SPARK-18560 Receiver data should be deserialized properly` which was added in #16052. I also removed FakeByteArrayReceiver and used TestReceiver directly.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16091 from zsxwing/SPARK-18617-follow-up.
      0a811210
    • uncleGen's avatar
      [SPARK-18617][CORE][STREAMING] Close "kryo auto pick" feature for Spark Streaming · 56c82eda
      uncleGen authored
      ## What changes were proposed in this pull request?
      
      #15992 provided a solution to fix the bug, i.e. **receiver data can not be deserialized properly**. As zsxwing said, it is a critical bug, but we should not break APIs between maintenance releases. It may be a rational choice to close auto pick kryo serializer for Spark Streaming in the first step. I will continue #15992 to optimize the solution.
      
      ## How was this patch tested?
      
      existing ut
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #16052 from uncleGen/SPARK-18617.
      56c82eda
  8. Nov 29, 2016
  9. Nov 19, 2016
    • hyukjinkwon's avatar
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note... · d5b1d5fc
      hyukjinkwon authored
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that`/`'''Note:'''` across Scala/Java API documentation
      
      ## What changes were proposed in this pull request?
      
      It seems in Scala/Java,
      
      - `Note:`
      - `NOTE:`
      - `Note that`
      - `'''Note:'''`
      - `note`
      
      This PR proposes to fix those to `note` to be consistent.
      
      **Before**
      
      - Scala
        ![2016-11-17 6 16 39](https://cloud.githubusercontent.com/assets/6477701/20383180/1a7aed8c-acf2-11e6-9611-5eaf6d52c2e0.png)
      
      - Java
        ![2016-11-17 6 14 41](https://cloud.githubusercontent.com/assets/6477701/20383096/c8ffc680-acf1-11e6-914a-33460bf1401d.png)
      
      **After**
      
      - Scala
        ![2016-11-17 6 16 44](https://cloud.githubusercontent.com/assets/6477701/20383167/09940490-acf2-11e6-937a-0d5e1dc2cadf.png)
      
      - Java
        ![2016-11-17 6 13 39](https://cloud.githubusercontent.com/assets/6477701/20383132/e7c2a57e-acf1-11e6-9c47-b849674d4d88.png)
      
      ## How was this patch tested?
      
      The notes were found via
      
      ```bash
      grep -r "NOTE: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// NOTE: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \ # note that this is a regular expression. So actual matches were mostly `org/apache/spark/api/java/functions ...`
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note that " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note that " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "'''Note:'''" . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// '''Note:''' " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      And then fixed one by one comparing with API documentation/access modifiers.
      
      After that, manually tested via `jekyll build`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #15889 from HyukjinKwon/SPARK-18437.
      d5b1d5fc
  10. Nov 15, 2016
  11. Nov 10, 2016
  12. Nov 08, 2016
    • jiangxingbo's avatar
      [SPARK-18191][CORE] Port RDD API to use commit protocol · 9c419698
      jiangxingbo authored
      ## What changes were proposed in this pull request?
      
      This PR port RDD API to use commit protocol, the changes made here:
      1. Add new internal helper class that saves an RDD using a Hadoop OutputFormat named `SparkNewHadoopWriter`, it's similar with `SparkHadoopWriter` but uses commit protocol. This class supports the newer `mapreduce` API, instead of the old `mapred` API which is supported by `SparkHadoopWriter`;
      2. Rewrite `PairRDDFunctions.saveAsNewAPIHadoopDataset` function, so it uses commit protocol now.
      
      ## How was this patch tested?
      Exsiting test cases.
      
      Author: jiangxingbo <jiangxb1987@gmail.com>
      
      Closes #15769 from jiangxb1987/rdd-commit.
      9c419698
  13. Nov 07, 2016
    • Hyukjin Kwon's avatar
      [SPARK-14914][CORE] Fix Resource not closed after using, mostly for unit tests · 8f0ea011
      Hyukjin Kwon authored
      ## What changes were proposed in this pull request?
      
      Close `FileStreams`, `ZipFiles` etc to release the resources after using. Not closing the resources will cause IO Exception to be raised while deleting temp files.
      ## How was this patch tested?
      
      Existing tests
      
      Author: U-FAREAST\tl <tl@microsoft.com>
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: Tao LI <tl@microsoft.com>
      
      Closes #15618 from HyukjinKwon/SPARK-14914-1.
      8f0ea011
  14. Nov 02, 2016
  15. Sep 22, 2016
    • Shixiong Zhu's avatar
      [SPARK-17638][STREAMING] Stop JVM StreamingContext when the Python process is dead · 3cdae0ff
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      When the Python process is dead, the JVM StreamingContext is still running. Hence we will see a lot of Py4jException before the JVM process exits. It's better to stop the JVM StreamingContext to avoid those annoying logs.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #15201 from zsxwing/stop-jvm-ssc.
      3cdae0ff
    • Dhruve Ashar's avatar
      [SPARK-17365][CORE] Remove/Kill multiple executors together to reduce RPC call time. · 17b72d31
      Dhruve Ashar authored
      ## What changes were proposed in this pull request?
      We are killing multiple executors together instead of iterating over expensive RPC calls to kill single executor.
      
      ## How was this patch tested?
      Executed sample spark job to observe executors being killed/removed with dynamic allocation enabled.
      
      Author: Dhruve Ashar <dashar@yahoo-inc.com>
      Author: Dhruve Ashar <dhruveashar@gmail.com>
      
      Closes #15152 from dhruve/impr/SPARK-17365.
      17b72d31
  16. Sep 21, 2016
    • Marcelo Vanzin's avatar
      [SPARK-4563][CORE] Allow driver to advertise a different network address. · 2cd1bfa4
      Marcelo Vanzin authored
      The goal of this feature is to allow the Spark driver to run in an
      isolated environment, such as a docker container, and be able to use
      the host's port forwarding mechanism to be able to accept connections
      from the outside world.
      
      The change is restricted to the driver: there is no support for achieving
      the same thing on executors (or the YARN AM for that matter). Those still
      need full access to the outside world so that, for example, connections
      can be made to an executor's block manager.
      
      The core of the change is simple: add a new configuration that tells what's
      the address the driver should bind to, which can be different than the address
      it advertises to executors (spark.driver.host). Everything else is plumbing
      the new configuration where it's needed.
      
      To use the feature, the host starting the container needs to set up the
      driver's port range to fall into a range that is being forwarded; this
      required the block manager port to need a special configuration just for
      the driver, which falls back to the existing spark.blockManager.port when
      not set. This way, users can modify the driver settings without affecting
      the executors; it would theoretically be nice to also have different
      retry counts for driver and executors, but given that docker (at least)
      allows forwarding port ranges, we can probably live without that for now.
      
      Because of the nature of the feature it's kinda hard to add unit tests;
      I just added a simple one to make sure the configuration works.
      
      This was tested with a docker image running spark-shell with the following
      command:
      
       docker blah blah blah \
         -p 38000-38100:38000-38100 \
         [image] \
         spark-shell \
           --num-executors 3 \
           --conf spark.shuffle.service.enabled=false \
           --conf spark.dynamicAllocation.enabled=false \
           --conf spark.driver.host=[host's address] \
           --conf spark.driver.port=38000 \
           --conf spark.driver.blockManager.port=38020 \
           --conf spark.ui.port=38040
      
      Running on YARN; verified the driver works, executors start up and listen
      on ephemeral ports (instead of using the driver's config), and that caching
      and shuffling (without the shuffle service) works. Clicked through the UI
      to make sure all pages (including executor thread dumps) worked. Also tested
      apps without docker, and ran unit tests.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #15120 from vanzin/SPARK-4563.
      2cd1bfa4
  17. Sep 07, 2016
    • Liwei Lin's avatar
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of... · 3ce3a282
      Liwei Lin authored
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of ArrayBuffer.append(A) in performance critical paths
      
      ## What changes were proposed in this pull request?
      
      We should generally use `ArrayBuffer.+=(A)` rather than `ArrayBuffer.append(A)`, because `append(A)` would involve extra boxing / unboxing.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #14914 from lw-lin/append_to_plus_eq_v2.
      3ce3a282
  18. Sep 06, 2016
    • Josh Rosen's avatar
      [SPARK-17110] Fix StreamCorruptionException in BlockManager.getRemoteValues() · 29cfab3f
      Josh Rosen authored
      ## What changes were proposed in this pull request?
      
      This patch fixes a `java.io.StreamCorruptedException` error affecting remote reads of cached values when certain data types are used. The problem stems from #11801 / SPARK-13990, a patch to have Spark automatically pick the "best" serializer when caching RDDs. If PySpark cached a PythonRDD, then this would be cached as an `RDD[Array[Byte]]` and the automatic serializer selection would pick KryoSerializer for replication and block transfer. However, the `getRemoteValues()` / `getRemoteBytes()` code path did not pass proper class tags in order to enable the same serializer to be used during deserialization, causing Java to be inappropriately used instead of Kryo, leading to the StreamCorruptedException.
      
      We already fixed a similar bug in #14311, which dealt with similar issues in block replication. Prior to that patch, it seems that we had no tests to ensure that block replication actually succeeded. Similarly, prior to this bug fix patch it looks like we had no tests to perform remote reads of cached data, which is why this bug was able to remain latent for so long.
      
      This patch addresses the bug by modifying `BlockManager`'s `get()` and  `getRemoteValues()` methods to accept ClassTags, allowing the proper class tag to be threaded in the `getOrElseUpdate` code path (which is used by `rdd.iterator`)
      
      ## How was this patch tested?
      
      Extended the caching tests in `DistributedSuite` to exercise the `getRemoteValues` path, plus manual testing to verify that the PySpark bug reproduction in SPARK-17110 is fixed.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #14952 from JoshRosen/SPARK-17110.
      29cfab3f
  19. Sep 04, 2016
    • Shivansh's avatar
      [SPARK-17308] Improved the spark core code by replacing all pattern match on... · e75c162e
      Shivansh authored
      [SPARK-17308] Improved the spark core code by replacing all pattern match on boolean value by if/else block.
      
      ## What changes were proposed in this pull request?
      Improved the code quality of spark by replacing all pattern match on boolean value by if/else block.
      
      ## How was this patch tested?
      
      By running the tests
      
      Author: Shivansh <shiv4nsh@gmail.com>
      
      Closes #14873 from shiv4nsh/SPARK-17308.
      e75c162e
  20. Aug 17, 2016
    • Xin Ren's avatar
      [SPARK-17038][STREAMING] fix metrics retrieval source of 'lastReceivedBatch' · e6bef7d5
      Xin Ren authored
      https://issues.apache.org/jira/browse/SPARK-17038
      
      ## What changes were proposed in this pull request?
      
      StreamingSource's lastReceivedBatch_submissionTime, lastReceivedBatch_processingTimeStart, and lastReceivedBatch_processingTimeEnd all use data from lastCompletedBatch instead of lastReceivedBatch.
      
      In particular, this makes it impossible to match lastReceivedBatch_records with a batchID/submission time.
      
      This is apparent when looking at StreamingSource.scala, lines 89-94.
      
      ## How was this patch tested?
      
      Manually running unit tests on local laptop
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #14681 from keypointt/SPARK-17038.
      e6bef7d5
    • Steve Loughran's avatar
      [SPARK-16736][CORE][SQL] purge superfluous fs calls · cc97ea18
      Steve Loughran authored
      A review of the code, working back from Hadoop's `FileSystem.exists()` and `FileSystem.isDirectory()` code, then removing uses of the calls when superfluous.
      
      1. delete is harmless if called on a nonexistent path, so don't do any checks before deletes
      1. any `FileSystem.exists()`  check before `getFileStatus()` or `open()` is superfluous as the operation itself does the check. Instead the `FileNotFoundException` is caught and triggers the downgraded path. When a `FileNotFoundException` was thrown before, the code still creates a new FNFE with the error messages. Though now the inner exceptions are nested, for easier diagnostics.
      
      Initially, relying on Jenkins test runs.
      
      One troublespot here is that some of the codepaths are clearly error situations; it's not clear that they have coverage anyway. Trying to create the failure conditions in tests would be ideal, but it will also be hard.
      
      Author: Steve Loughran <stevel@apache.org>
      
      Closes #14371 from steveloughran/cloud/SPARK-16736-superfluous-fs-calls.
      cc97ea18
  21. Aug 08, 2016
    • Holden Karau's avatar
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d
      Holden Karau authored
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting
      
      ## What changes were proposed in this pull request?
      
      Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14407 from holdenk/SPARK-16779.
      9216901d
  22. Aug 01, 2016
  23. Jul 30, 2016
    • Sean Owen's avatar
      [SPARK-16694][CORE] Use for/foreach rather than map for Unit expressions whose... · 0dc4310b
      Sean Owen authored
      [SPARK-16694][CORE] Use for/foreach rather than map for Unit expressions whose side effects are required
      
      ## What changes were proposed in this pull request?
      
      Use foreach/for instead of map where operation requires execution of body, not actually defining a transformation
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14332 from srowen/SPARK-16694.
      0dc4310b
  24. Jul 26, 2016
    • Dhruve Ashar's avatar
      [SPARK-15703][SCHEDULER][CORE][WEBUI] Make ListenerBus event queue size configurable · 0b71d9ae
      Dhruve Ashar authored
      ## What changes were proposed in this pull request?
      This change adds a new configuration entry to specify the size of the spark listener bus event queue. The value for this config ("spark.scheduler.listenerbus.eventqueue.size") is set to a default to 10000.
      
      Note:
      I haven't currently documented the configuration entry. We can decide whether it would be appropriate to make it a public configuration or keep it as an undocumented one. Refer JIRA for more details.
      
      ## How was this patch tested?
      Ran existing jobs and verified the event queue size with debug logs and from the Spark WebUI Environment tab.
      
      Author: Dhruve Ashar <dhruveashar@gmail.com>
      
      Closes #14269 from dhruve/bug/SPARK-15703.
      0b71d9ae
  25. Jul 25, 2016
    • Shixiong Zhu's avatar
      [SPARK-16722][TESTS] Fix a StreamingContext leak in StreamingContextSuite when eventually fails · e164a04b
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR moves `ssc.stop()` into `finally` for `StreamingContextSuite.createValidCheckpoint` to avoid leaking a StreamingContext since leaking a StreamingContext will fail a lot of tests and make us hard to find the real failure one.
      
      ## How was this patch tested?
      
      Jenkins unit tests
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #14354 from zsxwing/ssc-leak.
      e164a04b
  26. Jul 24, 2016
    • Mikael Ståldal's avatar
      [SPARK-16416][CORE] force eager creation of loggers to avoid shutdown hook conflicts · 23e047f4
      Mikael Ståldal authored
      ## What changes were proposed in this pull request?
      
      Force eager creation of loggers to avoid shutdown hook conflicts.
      
      ## How was this patch tested?
      
      Manually tested with a project using Log4j 2, verified that the shutdown hook conflict issue was solved.
      
      Author: Mikael Ståldal <mikael.staldal@magine.com>
      
      Closes #14320 from mikaelstaldal/shutdown-hook-logging.
      23e047f4
  27. Jul 22, 2016
    • Ahmed Mahran's avatar
      [SPARK-16487][STREAMING] Fix some batches might not get marked as fully processed in JobGenerator · 2c72a443
      Ahmed Mahran authored
      ## What changes were proposed in this pull request?
      
      In `JobGenerator`, the code reads like that some batches might not get marked as fully processed. In the following flowchart, the batch should get marked fully processed before endpoint C however it is not. Currently, this does not actually cause an issue, as the condition `(time - zeroTime) is multiple of checkpoint duration?` always evaluates to `true` as the `checkpoint duration` is always set to be equal to the `batch duration`.
      
      ![Flowchart](https://s31.postimg.org/udy9lti2j/spark_streaming_job_generator.png)
      
      This PR fixes this issue so as to improve code readability and to avoid any potential issue in case there is any future change making checkpoint duration to be set different from batch duration.
      
      Author: Ahmed Mahran <ahmed.mahran@mashin.io>
      
      Closes #14145 from ahmed-mahran/b-mark-batch-fully-processed.
      2c72a443
  28. Jul 19, 2016
  29. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  30. Jun 24, 2016
Loading