Skip to content
Snippets Groups Projects
  1. Jul 18, 2014
    • Sandy Ryza's avatar
      SPARK-2553. Fix compile error · 30b8d369
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1479 from sryza/sandy-spark-2553 and squashes the following commits:
      
      2cb5ed8 [Sandy Ryza] SPARK-2553. Fix compile error
      30b8d369
    • Sandy Ryza's avatar
      SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency... · e52b8719
      Sandy Ryza authored
      ... per key
      
      My humble opinion is that avoiding allocations in this performance-critical section is worth the extra code.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1461 from sryza/sandy-spark-2553 and squashes the following commits:
      
      7eaf7f2 [Sandy Ryza] SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency per key
      e52b8719
    • Cheng Hao's avatar
      [SPARK-2570] [SQL] Fix the bug of ClassCastException · 29809a6d
      Cheng Hao authored
      Exception thrown when running the example of HiveFromSpark.
      Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
      	at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
      	at org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(Row.scala:145)
      	at org.apache.spark.examples.sql.hive.HiveFromSpark$.main(HiveFromSpark.scala:45)
      	at org.apache.spark.examples.sql.hive.HiveFromSpark.main(HiveFromSpark.scala)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
      	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
      	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #1475 from chenghao-intel/hive_from_spark and squashes the following commits:
      
      d4c0500 [Cheng Hao] Fix the bug of ClassCastException
      29809a6d
  2. Jul 17, 2014
    • Andrew Or's avatar
      [SPARK-2411] Add a history-not-found page to standalone Master · 6afca2d1
      Andrew Or authored
      **Problem.** Right now, if you click on an application after it has finished, it simply refreshes the page if there are no event logs for the application. This is not super intuitive especially because event logging is not enabled by default. We should direct the user to enable this if they attempt to view a SparkUI after the fact without event logs.
      
      **Fix.** The new page conveys different messages in each of the following scenarios:
      (1) Application did not enable event logging,
      (2) Event logs are not found in the specified directory, and
      (3) Exception is thrown while replaying the logs
      
      Here are screenshots of what the page looks like in each of the above scenarios:
      
      (1)
      <img src="https://issues.apache.org/jira/secure/attachment/12656204/Event%20logging%20not%20enabled.png" width="75%">
      
      (2)
      <img src="https://issues.apache.org/jira/secure/attachment/12656203/Application%20history%20not%20found.png">
      
      (3)
      <img src="https://issues.apache.org/jira/secure/attachment/12656202/Application%20history%20load%20error.png" width="95%">
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1336 from andrewor14/master-link and squashes the following commits:
      
      2f06206 [Andrew Or] Merge branch 'master' of github.com:apache/spark into master-link
      97cddc0 [Andrew Or] Add different severity levels
      832b687 [Andrew Or] Mention spark.eventLog.dir in error message
      51980c3 [Andrew Or] Merge branch 'master' of github.com:apache/spark into master-link
      ded208c [Andrew Or] Merge branch 'master' of github.com:apache/spark into master-link
      89d6405 [Andrew Or] Reword message
      e7df7ed [Andrew Or] Add a history not found page to standalone Master
      6afca2d1
    • Reynold Xin's avatar
      [SPARK-2299] Consolidate various stageIdTo* hash maps in JobProgressListener · 72e9021e
      Reynold Xin authored
      This should reduce memory usage for the web ui as well as slightly increase its speed in draining the UI event queue.
      
      @andrewor14
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1262 from rxin/ui-consolidate-hashtables and squashes the following commits:
      
      1ac3f97 [Reynold Xin] Oops. Properly handle description.
      f5736ad [Reynold Xin] Code review comments.
      b8828dc [Reynold Xin] Merge branch 'master' into ui-consolidate-hashtables
      7a7b6c4 [Reynold Xin] Revert css change.
      f959bb8 [Reynold Xin] [SPARK-2299] Consolidate various stageIdTo* hash maps in JobProgressListener to speed it up.
      63256f5 [Reynold Xin] [SPARK-2320] Reduce <pre> block font size.
      72e9021e
    • Joseph K. Bradley's avatar
      SPARK-1215 [MLLIB]: Clustering: Index out of bounds error (2) · 935fe65f
      Joseph K. Bradley authored
      Added check to LocalKMeans.scala: kMeansPlusPlus initialization to handle case with fewer distinct data points than clusters k.  Added two related unit tests to KMeansSuite.  (Re-submitting PR after tangling commits in PR 1407 https://github.com/apache/spark/pull/1407 )
      
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      
      Closes #1468 from jkbradley/kmeans-fix and squashes the following commits:
      
      4e9bd1e [Joseph K. Bradley] Updated PR per comments from mengxr
      6c7a2ec [Joseph K. Bradley] Added check to LocalKMeans.scala: kMeansPlusPlus initialization to handle case with fewer distinct data points than clusters k.  Added two related unit tests to KMeansSuite.
      935fe65f
    • Sean Owen's avatar
      SPARK-1478.2 Fix incorrect NioServerSocketChannelFactory constructor call · 1fcd5dcd
      Sean Owen authored
      The line break inadvertently means this was interpreted as a call to the no-arg constructor. This doesn't exist in older Netty even. (Also fixed a val name typo.)
      
      Author: Sean Owen <srowen@gmail.com>
      
      Closes #1466 from srowen/SPARK-1478.2 and squashes the following commits:
      
      59c3501 [Sean Owen] Line break caused Scala to interpret NioServerSocketChannelFactory constructor as the no-arg version, which is not even present in some versions of Netty
      1fcd5dcd
    • Reynold Xin's avatar
      [SPARK-2534] Avoid pulling in the entire RDD in various operators · d988d345
      Reynold Xin authored
      This should go into both master and branch-1.0.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1450 from rxin/agg-closure and squashes the following commits:
      
      e40f363 [Reynold Xin] Mima check excludes.
      9186364 [Reynold Xin] Define the return type more explicitly.
      38e348b [Reynold Xin] Fixed the cases in RDD.scala.
      ea6b34d [Reynold Xin] Blah
      89b9c43 [Reynold Xin] Fix other instances of accidentally pulling in extra stuff in closures.
      73b2783 [Reynold Xin] [SPARK-2534] Avoid pulling in the entire RDD in groupByKey.
      d988d345
    • Andrew Or's avatar
      [SPARK-2423] Clean up SparkSubmit for readability · 9c73822a
      Andrew Or authored
      It is currently non-trivial to trace through how different combinations of cluster managers (e.g. yarn) and deploy modes (e.g. cluster) are processed in SparkSubmit. Moving forward, it will be easier to extend SparkSubmit if we first re-organize the code by grouping related logic together.
      
      This is a precursor to fixing standalone-cluster mode, which is currently broken (SPARK-2260).
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1349 from andrewor14/submit-cleanup and squashes the following commits:
      
      8f99200 [Andrew Or] script -> program (minor)
      30f2e65 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-cleanup
      fe484a1 [Andrew Or] Move deploy mode checks after yarn code
      7167824 [Andrew Or] Re-order config options and update comments
      0b01ff8 [Andrew Or] Clean up SparkSubmit for readability
      9c73822a
    • Patrick Wendell's avatar
      SPARK-2526: Simplify options in make-distribution.sh · d0ea4968
      Patrick Wendell authored
      Right now we have a bunch of parallel logic in make-distribution.sh
      that's just extra work to maintain. We should just pass through
      Maven profiles in this case and keep the script simple. See
      the JIRA for more details.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #1445 from pwendell/make-distribution.sh and squashes the following commits:
      
      f1294ea [Patrick Wendell] Simplify options in make-distribution.sh.
      d0ea4968
    • Aaron Davidson's avatar
      [SPARK-2412] CoalescedRDD throws exception with certain pref locs · 7c23c0dc
      Aaron Davidson authored
      If the first pass of CoalescedRDD does not find the target number of locations AND the second pass finds new locations, an exception is thrown, as "groupHash.get(nxt_replica).get" is not valid.
      
      The fix is just to add an ArrayBuffer to groupHash for that replica if it didn't already exist.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #1337 from aarondav/2412 and squashes the following commits:
      
      f587b5d [Aaron Davidson] getOrElseUpdate
      3ad8a3c [Aaron Davidson] [SPARK-2412] CoalescedRDD throws exception with certain pref locs
      7c23c0dc
  3. Jul 16, 2014
    • Aaron Davidson's avatar
      [SPARK-2154] Schedule next Driver when one completes (standalone mode) · 9c249743
      Aaron Davidson authored
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #1405 from aarondav/2154 and squashes the following commits:
      
      24e9ef9 [Aaron Davidson] [SPARK-2154] Schedule next Driver when one completes (standalone mode)
      9c249743
    • Aaron Davidson's avatar
      SPARK-1097: Do not introduce deadlock while fixing concurrency bug · 8867cd0b
      Aaron Davidson authored
      We recently added this lock on 'conf' in order to prevent concurrent creation. However, it turns out that this can introduce a deadlock because Hadoop also synchronizes on the Configuration objects when creating new Configurations (and they do so via a static REGISTRY which contains all created Configurations).
      
      This fix forces all Spark initialization of Configuration objects to occur serially by using a static lock that we control, and thus also prevents introducing the deadlock.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #1409 from aarondav/1054 and squashes the following commits:
      
      7d1b769 [Aaron Davidson] SPARK-1097: Do not introduce deadlock while fixing concurrency bug
      8867cd0b
    • Reynold Xin's avatar
      [SPARK-2317] Improve task logging. · 7c8d1232
      Reynold Xin authored
      We use TID to indicate task logging. However, TID itself does not capture stage or retries, making it harder to correlate with the application itself. This pull request changes all logging messages for tasks to include both the TID and the stage id, stage attempt, task id, and task attempt.  I've consulted various people but unfortunately this is a really hard task.
      
      Driver log looks like:
      
      ```
      14/06/28 18:53:29 INFO DAGScheduler: Submitting 10 missing tasks from Stage 0 (MappedRDD[1] at map at <console>:13)
      14/06/28 18:53:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
      14/06/28 18:53:29 INFO TaskSetManager: Re-computing pending task lists.
      14/07/15 19:44:40 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, PROCESS_LOCAL, 1855 bytes)
      14/07/15 19:44:40 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1855 bytes)
      14/07/15 19:44:40 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 2, localhost, PROCESS_LOCAL, 1855 bytes)
      14/07/15 19:44:40 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 3, localhost, PROCESS_LOCAL, 1855 bytes)
      14/07/15 19:44:40 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 4, localhost, PROCESS_LOCAL, 1855 bytes)
      14/07/15 19:44:40 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 5, localhost, PROCESS_LOCAL, 1855 bytes)
      14/07/15 19:44:40 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 6, localhost, PROCESS_LOCAL, 1855 bytes)
      ...
      14/07/15 19:44:40 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 64 ms on localhost (4/10)
      14/07/15 19:44:40 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 4) in 63 ms on localhost (5/10)
      14/07/15 19:44:40 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 2) in 63 ms on localhost (6/10)
      14/07/15 19:44:40 INFO TaskSetManager: Finished task 7.0 in stage 1.0 (TID 7) in 62 ms on localhost (7/10)
      14/07/15 19:44:40 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 6) in 63 ms on localhost (8/10)
      14/07/15 19:44:40 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 9) in 8 ms on localhost (9/10)
      14/07/15 19:44:40 INFO TaskSetManager: Finished task 8.0 in stage 1.0 (TID 8) in 9 ms on localhost (10/10)
      
      ```
      
      Executor log looks like
      ```
      14/07/15 19:44:40 INFO Executor: Running task 0.0 in stage 1.0 (TID 0)
      14/07/15 19:44:40 INFO Executor: Running task 3.0 in stage 1.0 (TID 3)
      14/07/15 19:44:40 INFO Executor: Running task 1.0 in stage 1.0 (TID 1)
      14/07/15 19:44:40 INFO Executor: Running task 4.0 in stage 1.0 (TID 4)
      14/07/15 19:44:40 INFO Executor: Running task 2.0 in stage 1.0 (TID 2)
      14/07/15 19:44:40 INFO Executor: Running task 5.0 in stage 1.0 (TID 5)
      14/07/15 19:44:40 INFO Executor: Running task 6.0 in stage 1.0 (TID 6)
      14/07/15 19:44:40 INFO Executor: Running task 7.0 in stage 1.0 (TID 7)
      14/07/15 19:44:40 INFO Executor: Finished task 3.0 in stage 1.0 (TID 3). 847 bytes result sent to driver
      14/07/15 19:44:40 INFO Executor: Finished task 2.0 in stage 1.0 (TID 2). 847 bytes result sent to driver
      14/07/15 19:44:40 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 847 bytes result sent to driver
      14/07/15 19:44:40 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 847 bytes result sent to driver
      14/07/15 19:44:40 INFO Executor: Finished task 5.0 in stage 1.0 (TID 5). 847 bytes result sent to driver
      14/07/15 19:44:40 INFO Executor: Finished task 4.0 in stage 1.0 (TID 4). 847 bytes result sent to driver
      14/07/15 19:44:40 INFO Executor: Finished task 6.0 in stage 1.0 (TID 6). 847 bytes result sent to driver
      14/07/15 19:44:40 INFO Executor: Finished task 7.0 in stage 1.0 (TID 7). 847 bytes result sent to driver
      ```
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1259 from rxin/betterTaskLogging and squashes the following commits:
      
      c28ada1 [Reynold Xin] Fix unit test failure.
      987d043 [Reynold Xin] Updated log messages.
      c6cfd46 [Reynold Xin] Merge branch 'master' into betterTaskLogging
      b7b1bcc [Reynold Xin] Fixed a typo.
      f9aba3c [Reynold Xin] Made it compile.
      f8a5c06 [Reynold Xin] Merge branch 'master' into betterTaskLogging
      07264e6 [Reynold Xin] Defensive check against unknown TaskEndReason.
      76bbd18 [Reynold Xin] FailureSuite not serializable reporting.
      4659b20 [Reynold Xin] Remove unused variable.
      53888e3 [Reynold Xin] [SPARK-2317] Improve task logging.
      7c8d1232
    • James Z.M. Gao's avatar
      fix compile error of streaming project · caa163f0
      James Z.M. Gao authored
      explicit return type for implicit function
      
      Author: James Z.M. Gao <gaozhm@mediav.com>
      
      Closes #153 from gzm55/work/streaming-compile and squashes the following commits:
      
      11e9c8d [James Z.M. Gao] fix style error
      fe88109 [James Z.M. Gao] fix compile error of streaming project
      caa163f0
    • Xiangrui Meng's avatar
      [SPARK-2522] set default broadcast factory to torrent · 96f28c97
      Xiangrui Meng authored
      HttpBroadcastFactory is the current default broadcast factory. It sends the broadcast data to each worker one by one, which is slow when the cluster is big. TorrentBroadcastFactory scales much better than http. Maybe we should make torrent the default broadcast method.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1437 from mengxr/bt-broadcast and squashes the following commits:
      
      ed492fe [Xiangrui Meng] set default broadcast factory to torrent
      96f28c97
    • Reynold Xin's avatar
      [SPARK-2517] Remove some compiler warnings. · ef48222c
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1433 from rxin/compile-warning and squashes the following commits:
      
      8d0b890 [Reynold Xin] Remove some compiler warnings.
      ef48222c
    • Takuya UESHIN's avatar
      [SPARK-2518][SQL] Fix foldability of Substring expression. · cc965eea
      Takuya UESHIN authored
      This is a follow-up of #1428.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1432 from ueshin/issues/SPARK-2518 and squashes the following commits:
      
      37d1ace [Takuya UESHIN] Fix foldability of Substring expression.
      cc965eea
    • Sandy Ryza's avatar
      SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical... · fc7edc9e
      Sandy Ryza authored
      ... aggregation code
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1435 from sryza/sandy-spark-2519 and squashes the following commits:
      
      640706a [Sandy Ryza] SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical aggregation code
      fc7edc9e
    • Reynold Xin's avatar
      [SQL] Cleaned up ConstantFolding slightly. · 1c5739f6
      Reynold Xin authored
      Moved couple rules out of NullPropagation and added more comments.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1430 from rxin/sql-folding-rule and squashes the following commits:
      
      7f9a197 [Reynold Xin] Updated documentation for ConstantFolding.
      7f8cf61 [Reynold Xin] [SQL] Cleaned up ConstantFolding slightly.
      1c5739f6
    • Yin Huai's avatar
      [SPARK-2525][SQL] Remove as many compilation warning messages as possible in Spark SQL · df95d82d
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-2525.
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1444 from yhuai/SPARK-2517 and squashes the following commits:
      
      edbac3f [Yin Huai] Removed some compiler type erasure warnings.
      df95d82d
    • Reynold Xin's avatar
      Tightening visibility for various Broadcast related classes. · efe2a8b1
      Reynold Xin authored
      In preparation for SPARK-2521.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1438 from rxin/broadcast and squashes the following commits:
      
      432f1cc [Reynold Xin] Tightening visibility for various Broadcast related classes.
      efe2a8b1
    • Rui Li's avatar
      SPARK-2277: make TaskScheduler track hosts on rack · 33e64eca
      Rui Li authored
      Hi mateiz, I've created [SPARK-2277](https://issues.apache.org/jira/browse/SPARK-2277) to make TaskScheduler track hosts on each rack. Please help to review, thanks.
      
      Author: Rui Li <rui.li@intel.com>
      
      Closes #1212 from lirui-intel/trackHostOnRack and squashes the following commits:
      
      2b4bd0f [Rui Li] SPARK-2277: refine UT
      fbde838 [Rui Li] SPARK-2277: add UT
      7bbe658 [Rui Li] SPARK-2277: rename the method
      5e4ef62 [Rui Li] SPARK-2277: remove unnecessary import
      79ac750 [Rui Li] SPARK-2277: make TaskScheduler track hosts on rack
      33e64eca
    • Cheng Lian's avatar
      [SPARK-2119][SQL] Improved Parquet performance when reading off S3 · efc452a1
      Cheng Lian authored
      JIRA issue: [SPARK-2119](https://issues.apache.org/jira/browse/SPARK-2119)
      
      Essentially this PR fixed three issues to gain much better performance when reading large Parquet file off S3.
      
      1. When reading the schema, fetching Parquet metadata from a part-file rather than the `_metadata` file
      
         The `_metadata` file contains metadata of all row groups, and can be very large if there are many row groups. Since schema information and row group metadata are coupled within a single Thrift object, we have to read the whole `_metadata` to fetch the schema. On the other hand, schema is replicated among footers of all part-files, which are fairly small.
      
      1. Only add the root directory of the Parquet file rather than all the part-files to input paths
      
         HDFS API can automatically filter out all hidden files and underscore files (`_SUCCESS` & `_metadata`), there's no need to filter out all part-files and add them individually to input paths. What make it much worse is that, `FileInputFormat.listStatus()` calls `FileSystem.globStatus()` on each individual input path sequentially, each results a blocking remote S3 HTTP request.
      
      1. Worked around [PARQUET-16](https://issues.apache.org/jira/browse/PARQUET-16)
      
         Essentially PARQUET-16 is similar to the above issue, and results lots of sequential `FileSystem.getFileStatus()` calls, which are further translated into a bunch of remote S3 HTTP requests.
      
         `FilteringParquetRowInputFormat` should be cleaned up once PARQUET-16 is fixed.
      
      Below is the micro benchmark result. The dataset used is a S3 Parquet file consists of 3,793 partitions, about 110MB per partition in average. The benchmark is done with a 9-node AWS cluster.
      
      - Creating a Parquet `SchemaRDD` (Parquet schema is fetched)
      
        ```scala
        val tweets = parquetFile(uri)
        ```
      
        - Before: 17.80s
        - After: 8.61s
      
      - Fetching partition information
      
        ```scala
        tweets.getPartitions
        ```
      
        - Before: 700.87s
        - After: 21.47s
      
      - Counting the whole file (both steps above are executed altogether)
      
        ```scala
        parquetFile(uri).count()
        ```
      
        - Before: ??? (haven't test yet)
        - After: 53.26s
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1370 from liancheng/faster-parquet and squashes the following commits:
      
      94a2821 [Cheng Lian] Added comments about schema consistency
      d2c4417 [Cheng Lian] Worked around PARQUET-16 to improve Parquet performance
      1c0d1b9 [Cheng Lian] Accelerated Parquet schema retrieving
      5bd3d29 [Cheng Lian] Fixed Parquet log level
      efc452a1
    • Takuya UESHIN's avatar
      [SPARK-2504][SQL] Fix nullability of Substring expression. · 632fb3d9
      Takuya UESHIN authored
      This is a follow-up of #1359 with nullability narrowing.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1426 from ueshin/issues/SPARK-2504 and squashes the following commits:
      
      5157832 [Takuya UESHIN] Remove unnecessary white spaces.
      80958ac [Takuya UESHIN] Fix nullability of Substring expression.
      632fb3d9
    • Takuya UESHIN's avatar
      [SPARK-2509][SQL] Add optimization for Substring. · 9b38b7c7
      Takuya UESHIN authored
      `Substring` including `null` literal cases could be added to `NullPropagation`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1428 from ueshin/issues/SPARK-2509 and squashes the following commits:
      
      d9eb85f [Takuya UESHIN] Add Substring cases to NullPropagation.
      9b38b7c7
  4. Jul 15, 2014
    • Aaron Staple's avatar
      [SPARK-2314][SQL] Override collect and take in JavaSchemaRDD, forwarding to... · 90ca532a
      Aaron Staple authored
      [SPARK-2314][SQL] Override collect and take in JavaSchemaRDD, forwarding to SchemaRDD implementations.
      
      Author: Aaron Staple <aaron.staple@gmail.com>
      
      Closes #1421 from staple/SPARK-2314 and squashes the following commits:
      
      73e04dc [Aaron Staple] [SPARK-2314] Override collect and take in JavaSchemaRDD, forwarding to SchemaRDD implementations.
      90ca532a
    • Ken Takagiwa's avatar
      follow pep8 None should be compared using is or is not · 563acf5e
      Ken Takagiwa authored
      http://legacy.python.org/dev/peps/pep-0008/
      ## Programming Recommendations
      - Comparisons to singletons like None should always be done with is or is not, never the equality operators.
      
      Author: Ken Takagiwa <ken@Kens-MacBook-Pro.local>
      
      Closes #1422 from giwa/apache_master and squashes the following commits:
      
      7b361f3 [Ken Takagiwa] follow pep8 None should be checked using is or is not
      563acf5e
    • Henry Saputra's avatar
      [SPARK-2500] Move the logInfo for registering BlockManager to... · 9c12de50
      Henry Saputra authored
      [SPARK-2500] Move the logInfo for registering BlockManager to BlockManagerMasterActor.register method
      
      PR for SPARK-2500
      
      Move the logInfo call for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor.
      
      Previously the loginfo call for registering the registering a BlockManager is happening in the BlockManagerInfo constructor. This kind of confusing because the code could call "new BlockManagerInfo" without actually registering a BlockManager and could confuse when reading the log files.
      
      Author: Henry Saputra <henry.saputra@gmail.com>
      
      Closes #1424 from hsaputra/move_registerblockmanager_log_to_registration_method and squashes the following commits:
      
      3370b4a [Henry Saputra] Move the loginfo for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor.
      9c12de50
    • Reynold Xin's avatar
      [SPARK-2469] Use Snappy (instead of LZF) for default shuffle compression codec · 4576d80a
      Reynold Xin authored
      This reduces shuffle compression memory usage by 3x.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1415 from rxin/snappy and squashes the following commits:
      
      06c1a01 [Reynold Xin] SPARK-2469: Use Snappy (instead of LZF) for default shuffle compression codec.
      4576d80a
    • Zongheng Yang's avatar
      [SPARK-2498] [SQL] Synchronize on a lock when using scala reflection inside data type objects. · c2048a51
      Zongheng Yang authored
      JIRA ticket: https://issues.apache.org/jira/browse/SPARK-2498
      
      Author: Zongheng Yang <zongheng.y@gmail.com>
      
      Closes #1423 from concretevitamin/scala-ref-catalyst and squashes the following commits:
      
      325a149 [Zongheng Yang] Synchronize on a lock when initializing data type objects in Catalyst.
      c2048a51
    • Michael Armbrust's avatar
      [SQL] Attribute equality comparisons should be done by exprId. · 502f9078
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1414 from marmbrus/exprIdResolution and squashes the following commits:
      
      97b47bc [Michael Armbrust] Attribute equality comparisons should be done by exprId.
      502f9078
    • William Benton's avatar
      SPARK-2407: Added internal implementation of SQL SUBSTR() · 61de65bc
      William Benton authored
      This replaces the Hive UDF for SUBSTR(ING) with an implementation in Catalyst
      and adds tests to verify correct operation.
      
      Author: William Benton <willb@redhat.com>
      
      Closes #1359 from willb/internalSqlSubstring and squashes the following commits:
      
      ccedc47 [William Benton] Fixed too-long line.
      a30a037 [William Benton] replace view bounds with implicit parameters
      ec35c80 [William Benton] Adds fixes from review:
      4f3bfdb [William Benton] Added internal implementation of SQL SUBSTR()
      61de65bc
    • Yin Huai's avatar
      [SPARK-2474][SQL] For a registered table in OverrideCatalog, the Analyzer... · 8af46d58
      Yin Huai authored
      [SPARK-2474][SQL] For a registered table in OverrideCatalog, the Analyzer failed to resolve references in the format of "tableName.fieldName"
      
      Please refer to JIRA (https://issues.apache.org/jira/browse/SPARK-2474) for how to reproduce the problem and my understanding of the root cause.
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1406 from yhuai/SPARK-2474 and squashes the following commits:
      
      96b1627 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2474
      af36d65 [Yin Huai] Fix comment.
      be86ba9 [Yin Huai] Correct SQL console settings.
      c43ad00 [Yin Huai] Wrap the relation in a Subquery named by the table name in OverrideCatalog.lookupRelation.
      a5c2145 [Yin Huai] Support sql/console.
      8af46d58
    • Michael Armbrust's avatar
      [SQL] Whitelist more Hive tests. · bcd0c30c
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1396 from marmbrus/moreTests and squashes the following commits:
      
      6660b60 [Michael Armbrust] Blacklist a test that requires DFS command.
      8b6001c [Michael Armbrust] Add golden files.
      ccd8f97 [Michael Armbrust] Whitelist more tests.
      bcd0c30c
    • Michael Armbrust's avatar
      [SPARK-2483][SQL] Fix parsing of repeated, nested data access. · 0f98ef1a
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1411 from marmbrus/nestedRepeated and squashes the following commits:
      
      044fa09 [Michael Armbrust] Fix parsing of repeated, nested data access.
      0f98ef1a
    • Xiangrui Meng's avatar
      [SPARK-2471] remove runtime scope for jets3t · a21f9a75
      Xiangrui Meng authored
      The assembly jar (built by sbt) doesn't include jets3t if we set it to runtime only, but I don't know whether it was set this way for a particular reason.
      
      CC: srowen ScrapCodes
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1402 from mengxr/jets3t and squashes the following commits:
      
      bfa2d17 [Xiangrui Meng] remove runtime scope for jets3t
      a21f9a75
    • Reynold Xin's avatar
      Added LZ4 to compression codec in configuration page. · e7ec815d
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1417 from rxin/lz4 and squashes the following commits:
      
      472f6a1 [Reynold Xin] Set the proper default.
      9cf0b2f [Reynold Xin] Added LZ4 to compression codec in configuration page.
      e7ec815d
    • witgo's avatar
      SPARK-1291: Link the spark UI to RM ui in yarn-client mode · 72ea56da
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #1112 from witgo/SPARK-1291 and squashes the following commits:
      
      6022bcd [witgo] review commit
      1fbb925 [witgo] add addAmIpFilter to yarn alpha
      210299c [witgo] review commit
      1b92a07 [witgo] review commit
      6896586 [witgo] Add comments to addWebUIFilter
      3e9630b [witgo] review commit
      142ee29 [witgo] review commit
      1fe7710 [witgo] Link the spark UI to RM ui in yarn-client mode
      72ea56da
    • witgo's avatar
      SPARK-2480: Resolve sbt warnings "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag" · 9dd635eb
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #1404 from witgo/run-tests and squashes the following commits:
      
      f703aee [witgo] fix Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
      2944f51 [witgo] Remove "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag"
      ef59c70 [witgo] fix Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
      6cefee5 [witgo] Remove "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag"
      9dd635eb
Loading