Skip to content
Snippets Groups Projects
  1. Jan 20, 2015
    • Yin Huai's avatar
      [SPARK-5287][SQL] Add defaultSizeOf to every data type. · bc20a52b
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5287
      
      This PR only add `defaultSizeOf` to data types and make those internal type classes `protected[sql]`. I will use another PR to cleanup the type hierarchy of data types.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4081 from yhuai/SPARK-5287 and squashes the following commits:
      
      90cec75 [Yin Huai] Update unit test.
      e1c600c [Yin Huai] Make internal classes protected[sql].
      7eaba68 [Yin Huai] Add `defaultSize` method to data types.
      fd425e0 [Yin Huai] Add all native types to NativeType.defaultSizeOf.
      bc20a52b
    • Travis Galoppo's avatar
      SPARK-5019 [MLlib] - GaussianMixtureModel exposes instances of MultivariateGauss... · 23e25543
      Travis Galoppo authored
      This PR modifies GaussianMixtureModel to expose instances of MutlivariateGaussian rather than separate mean and covariance arrays.
      
      Author: Travis Galoppo <tjg2107@columbia.edu>
      
      Closes #4088 from tgaloppo/spark-5019 and squashes the following commits:
      
      3ef6c7f [Travis Galoppo] In GaussianMixtureModel: Changed name of weight, gaussian to weights, gaussians.  Other sources modified accordingly.
      091e8da [Travis Galoppo] SPARK-5019 - GaussianMixtureModel exposes instances of MultivariateGaussian rather than mean/covariance matrices
      23e25543
    • Kousuke Saruta's avatar
      [SPARK-5329][WebUI] UIWorkloadGenerator should stop SparkContext. · 769aced9
      Kousuke Saruta authored
      UIWorkloadGenerator don't stop SparkContext. I ran UIWorkloadGenerator and try to watch the result at WebUI but Jobs are marked as finished.
      It's because SparkContext is not stopped.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #4112 from sarutak/SPARK-5329 and squashes the following commits:
      
      bcc0fa9 [Kousuke Saruta] Disabled scalastyle for a bock comment
      86a3b95 [Kousuke Saruta] Fixed UIWorkloadGenerator to stop SparkContext in it
      769aced9
    • Jacek Lewandowski's avatar
      SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840... · c93a57f0
      Jacek Lewandowski authored
      ... by Piotr Kolaczkowski)
      
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      
      Closes #4113 from jacek-lewandowski/SPARK-4660-master and squashes the following commits:
      
      a5e84ca [Jacek Lewandowski] SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840 by Piotr Kolaczkowski)
      c93a57f0
    • Cheng Lian's avatar
      [SQL][Minor] Refactors deeply nested FP style code in BooleanSimplification · 81408027
      Cheng Lian authored
      This is a follow-up of #4090. The original deeply nested `reduceOption` code is hard to grasp.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4091)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4091 from liancheng/refactor-boolean-simplification and squashes the following commits:
      
      cd8860b [Cheng Lian] Improves `compareConditions` to handle more subtle cases
      1bf3258 [Cheng Lian] Avoids converting predicate sets to lists
      e833ca4 [Cheng Lian] Refactors deeply nested FP style code
      81408027
    • Jongyoul Lee's avatar
      [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException · 9d9294ae
      Jongyoul Lee authored
      - Rewind ByteBuffer before making ByteString
      
      (This fixes a bug introduced in #3849 / SPARK-4014)
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #4119 from jongyoul/SPARK-5333 and squashes the following commits:
      
      c6693a8 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - changed logDebug location
      4141f58 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Added license information
      2190606 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Adjusted imported libraries
      b7f5517 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Rewind ByteBuffer before making ByteString
      9d9294ae
    • Ilayaperumal Gopinathan's avatar
      [SPARK-4803] [streaming] Remove duplicate RegisterReceiver message · 4afad9c7
      Ilayaperumal Gopinathan authored
        - The ReceiverTracker receivers `RegisterReceiver` messages two times
           1) When the actor at `ReceiverSupervisorImpl`'s preStart is invoked
           2) After the receiver is started at the executor `onReceiverStart()` at `ReceiverSupervisorImpl`
      
      Though, RegisterReceiver message uses the same streamId and the receiverInfo gets updated everytime
      the message is processed at the `ReceiverTracker`, it makes sense to call register receiver only after the
      receiver is started.
      
      Author: Ilayaperumal Gopinathan <igopinathan@pivotal.io>
      
      Closes #3648 from ilayaperumalg/RTActor-remove-prestart and squashes the following commits:
      
      868efab [Ilayaperumal Gopinathan] Increase receiverInfo collector timeout to 2 secs
      3118e5e [Ilayaperumal Gopinathan] Fix StreamingListenerSuite's startedReceiverStreamIds size
      634abde [Ilayaperumal Gopinathan] Remove duplicate RegisterReceiver message
      4afad9c7
    • Reynold Xin's avatar
      [SQL][minor] Add a log4j file for catalyst test. · debc0319
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4117 from rxin/catalyst-test-log4j and squashes the following commits:
      
      8ad610b [Reynold Xin] [SQL][minor] Add a log4j file for catalyst test.
      debc0319
    • Sean Owen's avatar
      SPARK-5270 [CORE] Provide isEmpty() function in RDD API · 306ff187
      Sean Owen authored
      Pretty minor, but submitted for consideration -- this would at least help people make this check in the most efficient way I know.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4074 from srowen/SPARK-5270 and squashes the following commits:
      
      66885b8 [Sean Owen] Add note that JavaRDDLike should not be implemented by user code
      2e9b490 [Sean Owen] More tests, and Mima-exclude the new isEmpty method in JavaRDDLike
      28395ff [Sean Owen] Add isEmpty to Java, Python
      7dd04b7 [Sean Owen] Add efficient RDD.isEmpty()
      306ff187
  2. Jan 19, 2015
    • zsxwing's avatar
      [SPARK-5214][Core] Add EventLoop and change DAGScheduler to an EventLoop · e69fb8c7
      zsxwing authored
      This PR adds a simple `EventLoop` and use it to replace Actor in DAGScheduler. `EventLoop` is a general class to support that posting events in multiple threads and handling events in a single event thread.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #4016 from zsxwing/event-loop and squashes the following commits:
      
      aefa1ce [zsxwing] Add protected to on*** methods
      5cfac83 [zsxwing] Remove null check of eventProcessLoop
      dba35b2 [zsxwing] Add a test that onReceive swallows InterruptException
      460f7b3 [zsxwing] Use volatile instead of Atomic things in unit tests
      227bf33 [zsxwing] Add a stop flag and some tests
      37f79c6 [zsxwing] Fix docs
      55fb6f6 [zsxwing] Add private[spark] to EventLoop
      1f73eac [zsxwing] Fix the import order
      3b2e59c [zsxwing] Add EventLoop and change DAGScheduler to an EventLoop
      e69fb8c7
    • Venkata Ramana Gollamudi's avatar
      [SPARK-4504][Examples] fix run-example failure if multiple assembly jars exist · 74de94ea
      Venkata Ramana Gollamudi authored
      Fix run-example script to fail fast with useful error message if multiple
      example assembly JARs are present.
      
      Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
      
      Closes #3377 from gvramana/run-example_fails and squashes the following commits:
      
      fa7f481 [Venkata Ramana Gollamudi] Fixed review comments, avoiding ls output scanning.
      6aa1ab7 [Venkata Ramana Gollamudi] Fix run-examples script error during multiple jars
      74de94ea
    • Yin Huai's avatar
      [SPARK-5286][SQL] Fail to drop an invalid table when using the data source API · 2604bc35
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5286
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4076 from yhuai/SPARK-5286 and squashes the following commits:
      
      6b69ed1 [Yin Huai] Catch all exception when we try to uncache a query.
      2604bc35
    • Yin Huai's avatar
      [SPARK-5284][SQL] Insert into Hive throws NPE when a inner complex type field has a null value · cd5da428
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5284
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4077 from yhuai/SPARK-5284 and squashes the following commits:
      
      fceacd6 [Yin Huai] Check if a value is null when the field has a complex type.
      cd5da428
    • Yuhao Yang's avatar
      [SPARK-5282][mllib]: RowMatrix easily gets int overflow in the memory size warning · 4432568a
      Yuhao Yang authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5282
      
      fix the possible int overflow in the memory computation warning
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #4069 from hhbyyh/addscStop and squashes the following commits:
      
      e54e5c8 [Yuhao Yang] change to MB based number
      7afac23 [Yuhao Yang] 5282: fix int overflow in the warning
      4432568a
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 1ac1c1dc
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #3584 (close requested by 'pwendell')
      Closes #2433 (close requested by 'pwendell')
      Closes #1697 (close requested by 'pwendell')
      Closes #4042 (close requested by 'pwendell')
      Closes #3723 (close requested by 'pwendell')
      Closes #1560 (close requested by 'pwendell')
      Closes #3515 (close requested by 'pwendell')
      Closes #1386 (close requested by 'pwendell')
      1ac1c1dc
    • Jongyoul Lee's avatar
      [SPARK-5088] Use spark-class for running executors directly · 4a4f9ccb
      Jongyoul Lee authored
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #3897 from jongyoul/SPARK-5088 and squashes the following commits:
      
      8232aa8 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Added a listenerBus for fixing test cases
      932289f [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Rebased from master
      613cb47 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Fixed code if spark.executor.uri doesn't have any value - Added test cases
      ff57bda [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Adjusted orders of import
      97e4bd4 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Changed command for using spark-class directly - Delete sbin/spark-executor and moved some codes into spark-class' case statement
      4a4f9ccb
    • Ilya Ganelin's avatar
      [SPARK-3288] All fields in TaskMetrics should be private and use getters/setters · 3453d578
      Ilya Ganelin authored
      I've updated the fields and all usages of these fields in the Spark code. I've verified that this did not break anything on my local repo.
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #4020 from ilganeli/SPARK-3288 and squashes the following commits:
      
      39f3810 [Ilya Ganelin] resolved merge issues
      e446287 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288
      b8c05cb [Ilya Ganelin] Missed making a variable private
      6444391 [Ilya Ganelin] Made inc/dec functions private[spark]
      1149e78 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288
      26b312b [Ilya Ganelin] Debugging tests
      17146c2 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288
      5525c20 [Ilya Ganelin] Completed refactoring to make vars in TaskMetrics class private
      c64da4f [Ilya Ganelin] Partially updated task metrics to make some vars private
      3453d578
    • Prashant Sharma's avatar
      SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage. · 851b6a9b
      Prashant Sharma authored
      ![screenshot from 2015-01-16 13 43 25](https://cloud.githubusercontent.com/assets/992952/5773256/d61df300-9d85-11e4-9b5a-6730058839fa.png)
      
      This is a first step towards having time remaining estimates for queued and running jobs. See SPARK-5216
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #4043 from ScrapCodes/SPARK-5216/5217-show-waiting-stages and squashes the following commits:
      
      3b11803 [Prashant Sharma] Review feedback.
      0992842 [Prashant Sharma] Switched to Linked hashmap, changed the order to active->pending->completed->failed. And changed pending stages to not reverse sort.
      c19d82a [Prashant Sharma] SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage.
      851b6a9b
    • Jacky Li's avatar
      [SQL] fix typo in class description · 7dbf1fdb
      Jacky Li authored
      Author: Jacky Li <jacky.likun@gmail.com>
      
      Closes #4100 from jackylk/patch-9 and squashes the following commits:
      
      b13b9d6 [Jacky Li] Update SQLConf.scala
      4d3f83d [Jacky Li] Update SQLConf.scala
      fcc8c85 [Jacky Li] [SQL] fix typo in class description
      7dbf1fdb
  3. Jan 18, 2015
    • Reynold Xin's avatar
      [SQL][minor] Put DataTypes.java in java dir. · 19556454
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4097 from rxin/javarename and squashes the following commits:
      
      c5ce96a [Reynold Xin] [SQL][minor] Put DataTypes.java in java dir.
      19556454
    • scwf's avatar
      [SQL][Minor] Update sql doc according to data type APIs changes · 1a200a3e
      scwf authored
      Follow up of #3925
      /cc rxin
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #4095 from scwf/sql-doc and squashes the following commits:
      
      97e311b [scwf] update sql doc since now expose only one version of the data type APIs
      1a200a3e
    • Reynold Xin's avatar
      [SPARK-5279][SQL] Use java.math.BigDecimal as the exposed Decimal type. · 1727e084
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4092 from rxin/bigdecimal and squashes the following commits:
      
      27b08c9 [Reynold Xin] Fixed test.
      10cb496 [Reynold Xin] [SPARK-5279][SQL] Use java.math.BigDecimal as the exposed Decimal type.
      1727e084
    • Patrick Wendell's avatar
      [HOTFIX]: Minor clean up regarding skipped artifacts in build files. · ad16da1b
      Patrick Wendell authored
      There are two relevant 'skip' configurations in the build, the first
      is for "mvn install" and the second is for "mvn deploy". As of 1.2,
      we actually use "mvn install" to generate our deployed artifcts,
      because we have some customization of the nexus upload due to having
      to cross compile for Scala 2.10 and 2.11.
      
      There is no reason to have differents settings for these values,
      this patch simply cleans this up for the repl/ and yarn/
      projects.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4080 from pwendell/master and squashes the following commits:
      
      e21b78b [Patrick Wendell] [HOTFIX]: Minor clean up regarding skipped artifacts in build files.
      ad16da1b
  4. Jan 17, 2015
  5. Jan 16, 2015
    • Reynold Xin's avatar
      [SPARK-5193][SQL] Remove Spark SQL Java-specific API. · 61b427d4
      Reynold Xin authored
      After the following patches, the main (Scala) API is now usable for Java users directly.
      
      https://github.com/apache/spark/pull/4056
      https://github.com/apache/spark/pull/4054
      https://github.com/apache/spark/pull/4049
      https://github.com/apache/spark/pull/4030
      https://github.com/apache/spark/pull/3965
      https://github.com/apache/spark/pull/3958
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4065 from rxin/sql-java-api and squashes the following commits:
      
      b1fd860 [Reynold Xin] Fix Mima
      6d86578 [Reynold Xin] Ok one more attempt in fixing Python...
      e8f1455 [Reynold Xin] Fix Python again...
      3e53f91 [Reynold Xin] Fixed Python.
      83735da [Reynold Xin] Fix BigDecimal test.
      e9f1de3 [Reynold Xin] Use scala BigDecimal.
      500d2c4 [Reynold Xin] Fix Decimal.
      ba3bfa2 [Reynold Xin] Updated javadoc for RowFactory.
      c4ae1c5 [Reynold Xin] [SPARK-5193][SQL] Remove Spark SQL Java-specific API.
      61b427d4
    • scwf's avatar
      [SPARK-4937][SQL] Adding optimization to simplify the And, Or condition in spark sql · ee1c1f3a
      scwf authored
      Adding optimization to simplify the And/Or condition in spark sql.
      
      There are two kinds of Optimization
      1 Numeric condition optimization, such as:
      a < 3 && a > 5 ---- False
      a < 1 || a > 0 ---- True
      a > 3 && a > 5 => a > 5
      (a < 2 || b > 5) && a < 2 => a < 2
      
      2 optimizing the some query from a cartesian product into equi-join, such as this sql (one of hive-testbench):
      ```
      select
      sum(l_extendedprice* (1 - l_discount)) as revenue
      from
      lineitem,
      part
      where
      (
      p_partkey = l_partkey
      and p_brand = 'Brand#32'
      and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
      and l_quantity >= 7 and l_quantity <= 7 + 10
      and p_size between 1 and 5
      and l_shipmode in ('AIR', 'AIR REG')
      and l_shipinstruct = 'DELIVER IN PERSON'
      )
      or
      (
      p_partkey = l_partkey
      and p_brand = 'Brand#35'
      and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
      and l_quantity >= 15 and l_quantity <= 15 + 10
      and p_size between 1 and 10
      and l_shipmode in ('AIR', 'AIR REG')
      and l_shipinstruct = 'DELIVER IN PERSON'
      )
      or
      (
      p_partkey = l_partkey
      and p_brand = 'Brand#24'
      and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
      and l_quantity >= 26 and l_quantity <= 26 + 10
      and p_size between 1 and 15
      and l_shipmode in ('AIR', 'AIR REG')
      and l_shipinstruct = 'DELIVER IN PERSON'
      )
      ```
      It has a repeated expression in Or, so we can optimize it by ``` (a && b) || (a && c) = a && (b || c)```
      Before optimization, this sql hang in my locally test, and the physical plan is:
      ![image](https://cloud.githubusercontent.com/assets/7018048/5539175/31cf38e8-8af9-11e4-95e3-336f9b3da4a4.png)
      
      After optimization, this sql run successfully in 20+ seconds, and its physical plan is:
      ![image](https://cloud.githubusercontent.com/assets/7018048/5539176/39a558e0-8af9-11e4-912b-93de94b20075.png)
      
      This PR focus on the second optimization and some simple ones of the first. For complex Numeric condition optimization, I will make a follow up PR.
      
      Author: scwf <wangfei1@huawei.com>
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #3778 from scwf/filter1 and squashes the following commits:
      
      58bcbc2 [scwf] minor format fix
      9570211 [scwf] conflicts fix
      527e6ce [scwf] minor comment improvements
      5c6f134 [scwf] remove numeric optimizations and move to BooleanSimplification
      546a82b [wangfei] style fix
      825fa69 [wangfei] adding more tests
      a001e8c [wangfei] revert pom changes
      32a595b [scwf] improvement and test fix
      e99a26c [wangfei] refactory And/Or optimization to make it more readable and clean
      ee1c1f3a
    • Ilya Ganelin's avatar
      [SPARK-733] Add documentation on use of accumulators in lazy transformation · fd3a8a1d
      Ilya Ganelin authored
      I've added documentation clarifying the particular lack of clarity highlighted in the relevant JIRA. I've also added code examples for this issue to clarify the explanation.
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #4022 from ilganeli/SPARK-733 and squashes the following commits:
      
      587def5 [Ilya Ganelin] Updated to clarify verbage
      df3afd7 [Ilya Ganelin] Revert "Partially updated task metrics to make some vars private"
      3f6c512 [Ilya Ganelin] Revert "Completed refactoring to make vars in TaskMetrics class private"
      58034fb [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-733
      4dc2cdb [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-733
      3a38db1 [Ilya Ganelin] Verified documentation update by building via jekyll
      33b5a2d [Ilya Ganelin] Added code examples for java and python
      1fd59b2 [Ilya Ganelin] Updated documentation for accumulators to highlight lazy evaluation issue
      5525c20 [Ilya Ganelin] Completed refactoring to make vars in TaskMetrics class private
      c64da4f [Ilya Ganelin] Partially updated task metrics to make some vars private
      fd3a8a1d
    • Chip Senkbeil's avatar
      [SPARK-4923][REPL] Add Developer API to REPL to allow re-publishing the REPL jar · d05c9ee6
      Chip Senkbeil authored
      As requested in [SPARK-4923](https://issues.apache.org/jira/browse/SPARK-4923), I've provided a rough DeveloperApi for the repl. I've only done this for Scala 2.10 because it does not appear that Scala 2.11 is implemented. The Scala 2.11 repl still has the old `scala.tools.nsc` package and the SparkIMain does not appear to have the class server needed for shipping code over (unless this functionality has been moved elsewhere?). I also left alone the `ExecutorClassLoader` and `ConstructorCleaner` as I have no experience working with those classes.
      
      This marks the majority of methods in `SparkIMain` as _private_ with a few special cases being _private[repl]_ as other classes within the same package access them. Any public method has been marked with `DeveloperApi` as suggested by pwendell and I took the liberty of writing up a Scaladoc for each one to further elaborate their usage.
      
      As the Scala 2.11 REPL [conforms]((https://github.com/scala/scala/pull/2206)) to [JSR-223](http://docs.oracle.com/javase/8/docs/technotes/guides/scripting/), the [Spark Kernel](https://github.com/ibm-et/spark-kernel) uses the SparkIMain of Scala 2.10 in the same manner. So, I've taken care to expose methods predominately related to necessary functionality towards a JSR-223 scripting engine implementation.
      
      1. The ability to _get_ variables from the interpreter (and other information like class/symbol/type)
      2. The ability to _put_ variables into the interpreter
      3. The ability to _compile_ code
      4. The ability to _execute_ code
      5. The ability to get contextual information regarding the scripting environment
      
      Additional functionality that I marked as exposed included the following:
      
      1. The blocking initialization method (needed to actually start SparkIMain instance)
      2. The class server uri (needed to set the _spark.repl.class.uri_ property after initialization), reduced from the entire class server
      3. The class output directory (beneficial for tools like ours that need to inspect and use the directory where class files are served)
      4. Suppression (quiet/silence) mechanics for output
      5. Ability to add a jar to the compile/runtime classpath
      6. The reset/close functionality
      7. Metric information (last variable assignment, "needed" for extracting results from last execution, real variable name for better debugging)
      8. Execution wrapper (useful to have, but debatable)
      
      Aside from `SparkIMain`, I updated other classes/traits and their methods in the _repl_ package to be private/package protected where possible. A few odd cases (like the SparkHelper being in the scala.tools.nsc package to expose a private variable) still exist, but I did my best at labelling them.
      
      `SparkCommandLine` has proven useful to extract settings and `SparkJLineCompletion` has proven to be useful in implementing auto-completion in the [Spark Kernel](https://github.com/ibm-et/spark-kernel) project. Other than those - and `SparkIMain` - my experience has yielded that other classes/methods are not necessary for interactive applications taking advantage of the REPL API.
      
      Tested via the following:
      
          $ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
          $ mvn -Phadoop-2.3 -DskipTests clean package && mvn -Phadoop-2.3 test
      
      Also did a quick verification that I could start the shell and execute some code:
      
          $ ./bin/spark-shell
          ...
      
          scala> val x = 3
          x: Int = 3
      
          scala> sc.parallelize(1 to 10).reduce(_+_)
          ...
          res1: Int = 55
      
      Author: Chip Senkbeil <rcsenkbe@us.ibm.com>
      Author: Chip Senkbeil <chip.senkbeil@gmail.com>
      
      Closes #4034 from rcsenkbeil/AddDeveloperApiToRepl and squashes the following commits:
      
      053ca75 [Chip Senkbeil] Fixed failed build by adding missing DeveloperApi import
      c1b88aa [Chip Senkbeil] Added DeveloperApi to public classes in repl
      6dc1ee2 [Chip Senkbeil] Added missing method to expose error reporting flag
      26fd286 [Chip Senkbeil] Refactored other Scala 2.10 classes and methods to be private/package protected where possible
      925c112 [Chip Senkbeil] Added DeveloperApi and Scaladocs to SparkIMain for Scala 2.10
      d05c9ee6
    • Kousuke Saruta's avatar
      [WebUI] Fix collapse of WebUI layout · ecf943d3
      Kousuke Saruta authored
      When we decrease the width of browsers, the header of WebUI wraps and collapses like as following image.
      
      ![2015-01-11 19 49 37](https://cloud.githubusercontent.com/assets/4736016/5698887/b0b9aeee-99cd-11e4-9020-08f3f0014de0.png)
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3995 from sarutak/fixed-collapse-webui-layout and squashes the following commits:
      
      3e60b5b [Kousuke Saruta] Modified line-height property in webui.css
      7bfb5fb [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fixed-collapse-webui-layout
      5d83e18 [Kousuke Saruta] Fixed collapse of WebUI layout
      ecf943d3
    • Kousuke Saruta's avatar
      [SPARK-5231][WebUI] History Server shows wrong job submission time. · e8422c52
      Kousuke Saruta authored
      History Server doesn't show collect job submission time.
      It's because `JobProgressListener` updates job submission time every time `onJobStart` method is invoked from `ReplayListenerBus`.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #4029 from sarutak/SPARK-5231 and squashes the following commits:
      
      0af9e22 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5231
      da8bd14 [Kousuke Saruta] Made submissionTime in SparkListenerJobStartas and completionTime in SparkListenerJobEnd as regular Long
      0412a6a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5231
      26b9b99 [Kousuke Saruta] Fixed the test cases
      2d47bd3 [Kousuke Saruta] Fixed to record job submission time and completion time collectly
      e8422c52
    • Sean Owen's avatar
      [DOCS] Fix typo in return type of cogroup · f6b852aa
      Sean Owen authored
      This fixes a simple typo in the cogroup docs noted in http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAMAsSdJ8_24evMAMg7fOZCQjwimisbYWa9v8BN6Rc3JCauja6wmail.gmail.com%3E
      
      I didn't bother with a JIRA
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4072 from srowen/CogroupDocFix and squashes the following commits:
      
      43c850b [Sean Owen] Fix typo in return type of cogroup
      f6b852aa
    • Ye Xianjin's avatar
      [SPARK-5201][CORE] deal with int overflow in the ParallelCollectionRDD.slice method · e200ac8e
      Ye Xianjin authored
      There is an int overflow in the ParallelCollectionRDD.slice method. That's originally reported by SaintBacchus.
      ```
      sc.makeRDD(1 to (Int.MaxValue)).count       // result = 0
      sc.makeRDD(1 to (Int.MaxValue - 1)).count   // result = 2147483646 = Int.MaxValue - 1
      sc.makeRDD(1 until (Int.MaxValue)).count    // result = 2147483646 = Int.MaxValue - 1
      ```
      see https://github.com/apache/spark/pull/2874 for more details.
      This pr try to fix the overflow. However, There's another issue I don't address.
      ```
      val largeRange = Int.MinValue to Int.MaxValue
      largeRange.length // throws java.lang.IllegalArgumentException: -2147483648 to 2147483647 by 1: seqs cannot contain more than Int.MaxValue elements.
      ```
      
      So, the range we feed to sc.makeRDD cannot contain more than Int.MaxValue elements. This is the limitation of Scala. However I think  we may want to support that kind of range. But the fix is beyond this pr.
      
      srowen andrewor14 would you mind take a look at this pr?
      
      Author: Ye Xianjin <advancedxy@gmail.com>
      
      Closes #4002 from advancedxy/SPARk-5201 and squashes the following commits:
      
      96265a1 [Ye Xianjin] Update slice method comment and some responding docs.
      e143d7a [Ye Xianjin] Update inclusive range check for splitting inclusive range.
      b3f5577 [Ye Xianjin] We can include the last element in the last slice in general for inclusive range, hence eliminate the need to check Int.MaxValue or Int.MinValue.
      7d39b9e [Ye Xianjin] Convert the two cases pattern matching to one case.
      651c959 [Ye Xianjin] rename sign to needsInclusiveRange. add some comments
      196f8a8 [Ye Xianjin] Add test cases for ranges end with Int.MaxValue or Int.MinValue
      e66e60a [Ye Xianjin] Deal with inclusive and exclusive ranges in one case. If the range is inclusive and the end of the range is (Int.MaxValue or Int.MinValue), we should use inclusive range instead of exclusive
      e200ac8e
    • WangTaoTheTonic's avatar
      [SPARK-1507][YARN]specify # cores for ApplicationMaster · 2be82b1e
      WangTaoTheTonic authored
      Based on top of changes in https://github.com/apache/spark/pull/3806.
      
      https://issues.apache.org/jira/browse/SPARK-1507
      
      `--driver-cores` and `spark.driver.cores` for all cluster modes and `spark.yarn.am.cores` for yarn client mode.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #4018 from WangTaoTheTonic/SPARK-1507 and squashes the following commits:
      
      01419d3 [WangTaoTheTonic] amend the args name
      b255795 [WangTaoTheTonic] indet thing
      d86557c [WangTaoTheTonic] some comments amend
      43c9392 [WangTao] fix compile error
      b39a100 [WangTao] specify # cores for ApplicationMaster
      2be82b1e
  6. Jan 15, 2015
    • Kostas Sakellis's avatar
      [SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds · a79a9f92
      Kostas Sakellis authored
      When calculating the input metrics there was an assumption that one task only reads from one block - this is not true for some operations including coalesce. This patch simply increments the task's input metrics if previous ones existed of the same read method.
      
      A limitation to this patch is that if a task reads from two different blocks of different read methods, one will override the other.
      
      Author: Kostas Sakellis <kostas@cloudera.com>
      
      Closes #3120 from ksakellis/kostas-spark-4092 and squashes the following commits:
      
      54e6658 [Kostas Sakellis] Drops metrics if conflicting read methods exist
      f0e0cc5 [Kostas Sakellis] Add bytesReadCallback to InputMetrics
      a2a36d4 [Kostas Sakellis] CR feedback
      5a0c770 [Kostas Sakellis] [SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds
      a79a9f92
    • Kostas Sakellis's avatar
      [SPARK-4857] [CORE] Adds Executor membership events to SparkListener · 96c2c714
      Kostas Sakellis authored
      Adds onExecutorAdded and onExecutorRemoved events to the SparkListener. This will allow a client to get notified when an executor has been added/removed and provide additional information such as how many vcores it is consuming.
      
      In addition, this commit adds a SparkListenerAdapter to the Java API that provides default implementations to the SparkListener. This is to get around the fact that default implementations for traits don't work in Java. Having Java clients extend SparkListenerAdapter moving forward will prevent breakage in java when we add new events to SparkListener.
      
      Author: Kostas Sakellis <kostas@cloudera.com>
      
      Closes #3711 from ksakellis/kostas-spark-4857 and squashes the following commits:
      
      946d2c5 [Kostas Sakellis] Added executorAdded/Removed events to MesosSchedulerBackend
      b1d054a [Kostas Sakellis] Remove executorInfo from ExecutorRemoved event
      1727b38 [Kostas Sakellis] Renamed ExecutorDetails back to ExecutorInfo and other CR feedback
      14fe78d [Kostas Sakellis] Added executor added/removed events to json protocol
      93d087b [Kostas Sakellis] [SPARK-4857] [CORE] Adds Executor membership events to SparkListener
      96c2c714
    • Kousuke Saruta's avatar
      [Minor] Fix tiny typo in BlockManager · 65858ba5
      Kousuke Saruta authored
      In BlockManager, there is a word `BlockTranserService` but I think it's typo for `BlockTransferService`.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #4046 from sarutak/fix-tiny-typo and squashes the following commits:
      
      a3e2a2f [Kousuke Saruta] Fixed tiny typo in BlockManager
      65858ba5
Loading