Skip to content
Snippets Groups Projects
  1. Dec 08, 2015
  2. Dec 07, 2015
    • Joseph K. Bradley's avatar
      [SPARK-12160][MLLIB] Use SQLContext.getOrCreate in MLlib · 3e7e05f5
      Joseph K. Bradley authored
      Switched from using SQLContext constructor to using getOrCreate, mainly in model save/load methods.
      
      This covers all instances in spark.mllib.  There were no uses of the constructor in spark.ml.
      
      CC: mengxr yhuai
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #10161 from jkbradley/mllib-sqlcontext-fix.
      3e7e05f5
    • Andrew Ray's avatar
      [SPARK-12184][PYTHON] Make python api doc for pivot consistant with scala doc · 36282f78
      Andrew Ray authored
      In SPARK-11946 the API for pivot was changed a bit and got updated doc, the doc changes were not made for the python api though. This PR updates the python doc to be consistent.
      
      Author: Andrew Ray <ray.andrew@gmail.com>
      
      Closes #10176 from aray/sql-pivot-python-doc.
      36282f78
    • tedyu's avatar
      [SPARK-11884] Drop multiple columns in the DataFrame API · 84b80944
      tedyu authored
      See the thread Ben started:
      http://search-hadoop.com/m/q3RTtveEuhjsr7g/
      
      This PR adds drop() method to DataFrame which accepts multiple column names
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #9862 from ted-yu/master.
      84b80944
    • Xusen Yin's avatar
      [SPARK-11963][DOC] Add docs for QuantileDiscretizer · 871e85d9
      Xusen Yin authored
      https://issues.apache.org/jira/browse/SPARK-11963
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #9962 from yinxusen/SPARK-11963.
      871e85d9
    • Shixiong Zhu's avatar
      [SPARK-12060][CORE] Avoid memory copy in JavaSerializerInstance.serialize · 3f4efb5c
      Shixiong Zhu authored
      Merged #10051 again since #10083 is resolved.
      
      This reverts commit 328b757d.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10167 from zsxwing/merge-SPARK-12060.
      3f4efb5c
    • Tathagata Das's avatar
      [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present · 5d80d8c6
      Tathagata Das authored
      The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004).
      
      While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #9988 from tdas/SPARK-11932.
      5d80d8c6
    • Davies Liu's avatar
      [SPARK-12132] [PYSPARK] raise KeyboardInterrupt inside SIGINT handler · ef3f047c
      Davies Liu authored
      Currently, the current line is not cleared by Cltr-C
      
      After this patch
      ```
      >>> asdfasdf^C
      Traceback (most recent call last):
        File "~/spark/python/pyspark/context.py", line 225, in signal_handler
          raise KeyboardInterrupt()
      KeyboardInterrupt
      ```
      
      It's still worse than 1.5 (and before).
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10134 from davies/fix_cltrc.
      ef3f047c
    • Sun Rui's avatar
      [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases. · 39d677c8
      Sun Rui authored
      This PR:
      1. Suppress all known warnings.
      2. Cleanup test cases and fix some errors in test cases.
      3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext.
      4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat'
      5. Make sure the default Hadoop file system is local when running test cases.
      6. Turn on warnings into errors.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #10030 from sun-rui/SPARK-12034.
      39d677c8
    • Davies Liu's avatar
      [SPARK-12032] [SQL] Re-order inner joins to do join with conditions first · 9cde7d5f
      Davies Liu authored
      Currently, the order of joins is exactly the same as SQL query, some conditions may not pushed down to the correct join, then those join will become cross product and is extremely slow.
      
      This patch try to re-order the inner joins (which are common in SQL query), pick the joins that have self-contain conditions first, delay those that does not have conditions.
      
      After this patch, the TPCDS query Q64/65 can run hundreds times faster.
      
      cc marmbrus nongli
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10073 from davies/reorder_joins.
      9cde7d5f
    • Burak Yavuz's avatar
      [SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when Jenkins load is high · 6fd9e70e
      Burak Yavuz authored
      We need to make sure that the last entry is indeed the last entry in the queue.
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #10110 from brkyvz/batch-wal-test-fix.
      6fd9e70e
  3. Dec 06, 2015
    • Josh Rosen's avatar
      [SPARK-12152][PROJECT-INFRA] Speed up Scalastyle checks by only invoking SBT once · 80a824d3
      Josh Rosen authored
      Currently, `dev/scalastyle` invokes SBT four times, but these invocations can be replaced with a single invocation, saving about one minute of build time.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10151 from JoshRosen/speed-up-scalastyle.
      80a824d3
    • gatorsmile's avatar
      [SPARK-12138][SQL] Escape \u in the generated comments of codegen · 49efd03b
      gatorsmile authored
      When \u appears in a comment block (i.e. in /**/), code gen will break. So, in Expression and CodegenFallback, we escape \u to \\u.
      
      yhuai Please review it. I did reproduce it and it works after the fix. Thanks!
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #10155 from gatorsmile/escapeU.
      49efd03b
    • gcc's avatar
      [SPARK-12048][SQL] Prevent to close JDBC resources twice · 04b67999
      gcc authored
      Author: gcc <spark-src@condor.rhaag.ip>
      
      Closes #10101 from rh99/master.
      04b67999
    • Yanbo Liang's avatar
      [SPARK-12044][SPARKR] Fix usage of isnan, isNaN · b6e8e63a
      Yanbo Liang authored
      1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```.
      2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0.
      <del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del>
      
      cc shivaram sun-rui felixcheung
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10037 from yanboliang/spark-12044.
      b6e8e63a
  4. Dec 05, 2015
  5. Dec 04, 2015
    • Shixiong Zhu's avatar
      [SPARK-12084][CORE] Fix codes that uses ByteBuffer.array incorrectly · 3af53e61
      Shixiong Zhu authored
      `ByteBuffer` doesn't guarantee all contents in `ByteBuffer.array` are valid. E.g, a ByteBuffer returned by `ByteBuffer.slice`. We should not use the whole content of `ByteBuffer` unless we know that's correct.
      
      This patch fixed all places that use `ByteBuffer.array` incorrectly.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10083 from zsxwing/bytebuffer-array.
      3af53e61
    • rotems's avatar
      [SPARK-12080][CORE] Kryo - Support multiple user registrators · f30373f5
      rotems authored
      Author: rotems <roter>
      
      Closes #10078 from Botnaim/KryoMultipleCustomRegistrators.
      f30373f5
    • meiyoula's avatar
      [SPARK-12142][CORE]Reply false when container allocator is not ready and reset target · bbfc16ec
      meiyoula authored
      Using Dynamic Allocation function, when a new AM is starting, and ExecutorAllocationManager send RequestExecutor message to AM. If the container allocator is not ready, the whole app will hang on
      
      Author: meiyoula <1039320815@qq.com>
      
      Closes #10138 from XuTingjun/patch-1.
      bbfc16ec
    • Josh Rosen's avatar
      [SPARK-12112][BUILD] Upgrade to SBT 0.13.9 · b7204e1d
      Josh Rosen authored
      We should upgrade to SBT 0.13.9, since this is a requirement in order to use SBT's new Maven-style resolution features (which will be done in a separate patch, because it's blocked by some binary compatibility issues in the POM reader plugin).
      
      I also upgraded Scalastyle to version 0.8.0, which was necessary in order to fix a Scala 2.10.5 compatibility issue (see https://github.com/scalastyle/scalastyle/issues/156). The newer Scalastyle is slightly stricter about whitespace surrounding tokens, so I fixed the new style violations.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10112 from JoshRosen/upgrade-to-sbt-0.13.9.
      b7204e1d
    • Marcelo Vanzin's avatar
      [SPARK-11314][BUILD][HOTFIX] Add exclusion for moved YARN classes. · d64806b3
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #10147 from vanzin/SPARK-11314.
      d64806b3
    • Burak Yavuz's avatar
      [SPARK-12058][STREAMING][KINESIS][TESTS] fix Kinesis python tests · 302d68de
      Burak Yavuz authored
      Python tests require access to the `KinesisTestUtils` file. When this file exists under src/test, python can't access it, since it is not available in the assembly jar.
      
      However, if we move KinesisTestUtils to src/main, we need to add the KinesisProducerLibrary as a dependency. In order to avoid this, I moved KinesisTestUtils to src/main, and extended it with ExtendedKinesisTestUtils which is under src/test that adds support for the KPL.
      
      cc zsxwing tdas
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #10050 from brkyvz/kinesis-py.
      302d68de
    • Dmitry Erastov's avatar
      [SPARK-6990][BUILD] Add Java linting script; fix minor warnings · d0d82227
      Dmitry Erastov authored
      This replaces https://github.com/apache/spark/pull/9696
      
      Invoke Checkstyle and print any errors to the console, failing the step.
      Use Google's style rules modified according to
      https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
      Some important checks are disabled (see TODOs in `checkstyle.xml`) due to
      multiple violations being present in the codebase.
      
      Suggest fixing those TODOs in a separate PR(s).
      
      More on Checkstyle can be found on the [official website](http://checkstyle.sourceforge.net/).
      
      Sample output (from [build 46345](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46345/consoleFull)) (duplicated because I run the build twice with different profiles):
      
      > Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
      > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
      > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
      > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
      > [error] running /home/jenkins/workspace/SparkPullRequestBuilder2/dev/lint-java ; received return code 1
      
      Also fix some of the minor violations that didn't require sweeping changes.
      
      Apologies for the previous botched PRs - I finally figured out the issue.
      
      cr: JoshRosen, pwendell
      
      > I state that the contribution is my original work, and I license the work to the project under the project's open source license.
      
      Author: Dmitry Erastov <derastov@gmail.com>
      
      Closes #9867 from dskrvk/master.
      d0d82227
    • Nong's avatar
      [SPARK-12089] [SQL] Fix memory corrupt due to freeing a page being referenced · 95296d9b
      Nong authored
      When the spillable sort iterator was spilled, it was mistakenly keeping
      the last page in memory rather than the current page. This causes the
      current record to get corrupted.
      
      Author: Nong <nong@cloudera.com>
      
      Closes #10142 from nongli/spark-12089.
      95296d9b
    • kaklakariada's avatar
      Add links howto to setup IDEs for developing spark · 17e4e021
      kaklakariada authored
      These links make it easier for new developers to work with Spark in their IDE.
      
      Author: kaklakariada <kaklakariada@users.noreply.github.com>
      
      Closes #10104 from kaklakariada/readme-developing-ide-gettting-started.
      17e4e021
    • Tathagata Das's avatar
      [SPARK-12122][STREAMING] Prevent batches from being submitted twice after... · 4106d80f
      Tathagata Das authored
      [SPARK-12122][STREAMING] Prevent batches from being submitted twice after recovering StreamingContext from checkpoint
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #10127 from tdas/SPARK-12122.
      4106d80f
  6. Dec 03, 2015
    • Sun Rui's avatar
      [SPARK-12104][SPARKR] collect() does not handle multiple columns with same name. · 5011f264
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #10118 from sun-rui/SPARK-12104.
      5011f264
    • Carson Wang's avatar
      [SPARK-11206] Support SQL UI on the history server (resubmit) · b6e9963e
      Carson Wang authored
      Resubmit #9297 and #9991
      On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution.
      
      To support SQL UI on the history server:
      1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus.
      2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module.
      3. The new SQL events are written to event log using Jackson.
      4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #10061 from carsonwang/SqlHistoryUI.
      b6e9963e
    • Anderson de Andrade's avatar
      [SPARK-12056][CORE] Create a TaskAttemptContext only after calling setConf. · f434f36d
      Anderson de Andrade authored
      TaskAttemptContext's constructor will clone the configuration instead of referencing it. Calling setConf after creating TaskAttemptContext makes any changes to the configuration made inside setConf unperceived by RecordReader instances.
      
      As an example, Titan's InputFormat will change conf when calling setConf. They wrap their InputFormat around Cassandra's ColumnFamilyInputFormat, and append Cassandra's configuration. This change fixes the following error when using Titan's CassandraInputFormat with Spark:
      
      *java.lang.RuntimeException: org.apache.thrift.protocol.TProtocolException: Required field 'keyspace' was not present! Struct: set_key space_args(keyspace:null)*
      
      There's a discussion of this error here: https://groups.google.com/forum/#!topic/aureliusgraphs/4zpwyrYbGAE
      
      Author: Anderson de Andrade <adeandrade@verticalscope.com>
      
      Closes #10046 from adeandrade/newhadooprdd-fix.
      f434f36d
    • felixcheung's avatar
      [SPARK-12019][SPARKR] Support character vector for sparkR.init(), check param and fix doc · 2213441e
      felixcheung authored
      and add tests.
      Spark submit expects comma-separated list
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #10034 from felixcheung/sparkrinitdoc.
      2213441e
Loading