Skip to content
Snippets Groups Projects
  1. Apr 28, 2015
    • Joseph K. Bradley's avatar
      [SPARK-7208] [ML] [PYTHON] Added Matrix, SparseMatrix to __all__ list in linalg.py · a8aeadb7
      Joseph K. Bradley authored
      Added Matrix, SparseMatrix to __all__ list in linalg.py
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #5759 from jkbradley/SPARK-7208 and squashes the following commits:
      
      deb51a2 [Joseph K. Bradley] Added Matrix, SparseMatrix to __all__ list in linalg.py
      a8aeadb7
    • Tathagata Das's avatar
      [SPARK-7138] [STREAMING] Add method to BlockGenerator to add multiple records... · 5c8f4bd5
      Tathagata Das authored
      [SPARK-7138] [STREAMING] Add method to BlockGenerator to add multiple records to BlockGenerator with single callback
      
      This is to ensure that receivers that receive data in small batches (like Kinesis) and want to add them but want the callback function to be called only once. This is for internal use only for improvement to Kinesis Receiver that we are planning to do.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5695 from tdas/SPARK-7138 and squashes the following commits:
      
      a35cf7d [Tathagata Das] Fixed style.
      a7a4cb9 [Tathagata Das] Added extra method to BlockGenerator.
      5c8f4bd5
    • Xiangrui Meng's avatar
      [SPARK-6965] [MLLIB] StringIndexer handles numeric input. · d36e6735
      Xiangrui Meng authored
      Cast numeric types to String for indexing. Boolean type is not handled in this PR. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5753 from mengxr/SPARK-6965 and squashes the following commits:
      
      2e34f3c [Xiangrui Meng] add actual type in the error message
      ad938bf [Xiangrui Meng] StringIndexer handles numeric input.
      d36e6735
    • Xiangrui Meng's avatar
      Closes #4807 · 555213eb
      Xiangrui Meng authored
      Closes #5055
      Closes #3583
      555213eb
    • Xiangrui Meng's avatar
      [SPARK-7201] [MLLIB] move Identifiable to ml.util · f0a1f90f
      Xiangrui Meng authored
      It shouldn't live directly under `spark.ml`.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5749 from mengxr/SPARK-7201 and squashes the following commits:
      
      53847f9 [Xiangrui Meng] move Identifiable to ml.util
      f0a1f90f
    • Marcelo Vanzin's avatar
      [MINOR] [CORE] Warn users who try to cache RDDs with dynamic allocation on. · 28b1af74
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5751 from vanzin/cached-rdd-warning and squashes the following commits:
      
      554cc07 [Marcelo Vanzin] Change message.
      9efb9da [Marcelo Vanzin] [minor] [core] Warn users who try to cache RDDs with dynamic allocation on.
      28b1af74
    • Timothy Chen's avatar
      [SPARK-5338] [MESOS] Add cluster mode support for Mesos · 53befacc
      Timothy Chen authored
      This patch adds the support for cluster mode to run on Mesos.
      It introduces a new Mesos framework dedicated to launch new apps/drivers, and can be called with the spark-submit script and specifying --master flag to the cluster mode REST interface instead of Mesos master.
      
      Example:
      ./bin/spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkPi --master mesos://10.0.0.206:8077 --executor-memory 1G --total-executor-cores 100 examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar 30
      
      Part of this patch is also to abstract the StandaloneRestServer so it can have different implementations of the REST endpoints.
      
      Features of the cluster mode in this PR:
      - Supports supervise mode where scheduler will keep trying to reschedule exited job.
      - Adds a new UI for the cluster mode scheduler to see all the running jobs, finished jobs, and supervise jobs waiting to be retried
      - Supports state persistence to ZK, so when the cluster scheduler fails over it can pick up all the queued and running jobs
      
      Author: Timothy Chen <tnachen@gmail.com>
      Author: Luc Bourlier <luc.bourlier@typesafe.com>
      
      Closes #5144 from tnachen/mesos_cluster_mode and squashes the following commits:
      
      069e946 [Timothy Chen] Fix rebase.
      e24b512 [Timothy Chen] Persist submitted driver.
      390c491 [Timothy Chen] Fix zk conf key for mesos zk engine.
      e324ac1 [Timothy Chen] Fix merge.
      fd5259d [Timothy Chen] Address review comments.
      1553230 [Timothy Chen] Address review comments.
      c6c6b73 [Timothy Chen] Pass spark properties to mesos cluster tasks.
      f7d8046 [Timothy Chen] Change app name to spark cluster.
      17f93a2 [Timothy Chen] Fix head of line blocking in scheduling drivers.
      6ff8e5c [Timothy Chen] Address comments and add logging.
      df355cd [Timothy Chen] Add metrics to mesos cluster scheduler.
      20f7284 [Timothy Chen] Address review comments
      7252612 [Timothy Chen] Fix tests.
      a46ad66 [Timothy Chen] Allow zk cli param override.
      920fc4b [Timothy Chen] Fix scala style issues.
      862b5b5 [Timothy Chen] Support asking driver status when it's retrying.
      7f214c2 [Timothy Chen] Fix RetryState visibility
      e0f33f7 [Timothy Chen] Add supervise support and persist retries.
      371ce65 [Timothy Chen] Handle cluster mode recovery and state persistence.
      3d4dfa1 [Luc Bourlier] Adds support to kill submissions
      febfaba [Timothy Chen] Bound the finished drivers in memory
      543a98d [Timothy Chen] Schedule multiple jobs
      6887e5e [Timothy Chen] Support looking at SPARK_EXECUTOR_URI env variable in schedulers
      8ec76bc [Timothy Chen] Fix Mesos dispatcher UI.
      d57d77d [Timothy Chen] Add documentation
      825afa0 [Luc Bourlier] Supports more spark-submit parameters
      b8e7181 [Luc Bourlier] Adds a shutdown latch to keep the deamon running
      0fa7780 [Luc Bourlier] Launch task through the mesos scheduler
      5b7a12b [Timothy Chen] WIP: Making a cluster mode a mesos framework.
      4b2f5ef [Timothy Chen] Specify user jar in command to be replaced with local.
      e775001 [Timothy Chen] Support fetching remote uris in driver runner.
      7179495 [Timothy Chen] Change Driver page output and add logging
      880bc27 [Timothy Chen] Add Mesos Cluster UI to display driver results
      9986731 [Timothy Chen] Kill drivers when shutdown
      67cbc18 [Timothy Chen] Rename StandaloneRestClient to RestClient and add sbin scripts
      e3facdd [Timothy Chen] Add Mesos Cluster dispatcher
      53befacc
    • Zhang, Liye's avatar
      [SPARK-6314] [CORE] handle JsonParseException for history server · 80098109
      Zhang, Liye authored
      This is handled in the same way with [SPARK-6197](https://issues.apache.org/jira/browse/SPARK-6197). The result of this PR is that exception showed in history server log will be replaced by a warning, and the application that with un-complete history log file will be listed on history server webUI
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #5736 from liyezhang556520/SPARK-6314 and squashes the following commits:
      
      b8d2d88 [Zhang, Liye] handle JsonParseException for history server
      80098109
    • Ilya Ganelin's avatar
      [SPARK-5932] [CORE] Use consistent naming for size properties · 2d222fb3
      Ilya Ganelin authored
      I've added an interface to JavaUtils to do byte conversion and added hooks within Utils.scala to handle conversion within Spark code (like for time strings). I've added matching tests for size conversion, and then updated all deprecated configs and documentation as per SPARK-5933.
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #5574 from ilganeli/SPARK-5932 and squashes the following commits:
      
      11f6999 [Ilya Ganelin] Nit fixes
      49a8720 [Ilya Ganelin] Whitespace fix
      2ab886b [Ilya Ganelin] Scala style
      fc85733 [Ilya Ganelin] Got rid of floating point math
      852a407 [Ilya Ganelin] [SPARK-5932] Added much improved overflow handling. Can now handle sizes up to Long.MAX_VALUE Petabytes instead of being capped at Long.MAX_VALUE Bytes
      9ee779c [Ilya Ganelin] Simplified fraction matches
      22413b1 [Ilya Ganelin] Made MAX private
      3dfae96 [Ilya Ganelin] Fixed some nits. Added automatic conversion of old paramter for kryoserializer.mb to new values.
      e428049 [Ilya Ganelin] resolving merge conflict
      8b43748 [Ilya Ganelin] Fixed error in pattern matching for doubles
      84a2581 [Ilya Ganelin] Added smoother handling of fractional values for size parameters. This now throws an exception and added a warning for old spark.kryoserializer.buffer
      d3d09b6 [Ilya Ganelin] [SPARK-5932] Fixing error in KryoSerializer
      fe286b4 [Ilya Ganelin] Resolved merge conflict
      c7803cd [Ilya Ganelin] Empty lines
      54b78b4 [Ilya Ganelin] Simplified byteUnit class
      69e2f20 [Ilya Ganelin] Updates to code
      f32bc01 [Ilya Ganelin] [SPARK-5932] Fixed error in API in SparkConf.scala where Kb conversion wasn't being done properly (was Mb). Added test cases for both timeUnit and ByteUnit conversion
      f15f209 [Ilya Ganelin] Fixed conversion of kryo buffer size
      0f4443e [Ilya Ganelin]     Merge remote-tracking branch 'upstream/master' into SPARK-5932
      35a7fa7 [Ilya Ganelin] Minor formatting
      928469e [Ilya Ganelin] [SPARK-5932] Converted some longs to ints
      5d29f90 [Ilya Ganelin] [SPARK-5932] Finished documentation updates
      7a6c847 [Ilya Ganelin] [SPARK-5932] Updated spark.shuffle.file.buffer
      afc9a38 [Ilya Ganelin] [SPARK-5932] Updated spark.broadcast.blockSize and spark.storage.memoryMapThreshold
      ae7e9f6 [Ilya Ganelin] [SPARK-5932] Updated spark.io.compression.snappy.block.size
      2d15681 [Ilya Ganelin] [SPARK-5932] Updated spark.executor.logs.rolling.size.maxBytes
      1fbd435 [Ilya Ganelin] [SPARK-5932] Updated spark.broadcast.blockSize
      eba4de6 [Ilya Ganelin] [SPARK-5932] Updated spark.shuffle.file.buffer.kb
      b809a78 [Ilya Ganelin] [SPARK-5932] Updated spark.kryoserializer.buffer.max
      0cdff35 [Ilya Ganelin] [SPARK-5932] Updated to use bibibytes in method names. Updated spark.kryoserializer.buffer.mb and spark.reducer.maxMbInFlight
      475370a [Ilya Ganelin] [SPARK-5932] Simplified ByteUnit code, switched to using longs. Updated docs to clarify that we use kibi, mebi etc instead of kilo, mega
      851d691 [Ilya Ganelin] [SPARK-5932] Updated memoryStringToMb to use new interfaces
      a9f4fcf [Ilya Ganelin] [SPARK-5932] Added unit tests for unit conversion
      747393a [Ilya Ganelin] [SPARK-5932] Added unit tests for ByteString conversion
      09ea450 [Ilya Ganelin] [SPARK-5932] Added byte string conversion to Jav utils
      5390fd9 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5932
      db9a963 [Ilya Ganelin] Closing second spark context
      1dc0444 [Ilya Ganelin] Added ref equality check
      8c884fa [Ilya Ganelin] Made getOrCreate synchronized
      cb0c6b7 [Ilya Ganelin] Doc updates and code cleanup
      270cfe3 [Ilya Ganelin] [SPARK-6703] Documentation fixes
      15e8dea [Ilya Ganelin] Updated comments and added MiMa Exclude
      0e1567c [Ilya Ganelin] Got rid of unecessary option for AtomicReference
      dfec4da [Ilya Ganelin] Changed activeContext to AtomicReference
      733ec9f [Ilya Ganelin] Fixed some bugs in test code
      8be2f83 [Ilya Ganelin] Replaced match with if
      e92caf7 [Ilya Ganelin] [SPARK-6703] Added test to ensure that getOrCreate both allows creation, retrieval, and a second context if desired
      a99032f [Ilya Ganelin] Spacing fix
      d7a06b8 [Ilya Ganelin] Updated SparkConf class to add getOrCreate method. Started test suite implementation
      2d222fb3
    • Iulian Dragos's avatar
      [SPARK-4286] Add an external shuffle service that can be run as a daemon. · 8aab94d8
      Iulian Dragos authored
      This allows Mesos deployments to use the shuffle service (and implicitly dynamic allocation). It does so by adding a new "main" class and two corresponding scripts in `sbin`:
      
      - `sbin/start-shuffle-service.sh`
      - `sbin/stop-shuffle-service.sh`
      
      Specific options can be passed in `SPARK_SHUFFLE_OPTS`.
      
      This is picking up work from #3861 /cc tnachen
      
      Author: Iulian Dragos <jaguarul@gmail.com>
      
      Closes #4990 from dragos/feature/external-shuffle-service and squashes the following commits:
      
      6c2b148 [Iulian Dragos] Import order and wrong name fixup.
      07804ad [Iulian Dragos] Moved ExternalShuffleService to the `deploy` package + other minor tweaks.
      4dc1f91 [Iulian Dragos] Reviewer’s comments:
      8145429 [Iulian Dragos] Add an external shuffle service that can be run as a daemon.
      8aab94d8
    • Zhang, Liye's avatar
      [Core][test][minor] replace try finally block with tryWithSafeFinally · 52ccf1d3
      Zhang, Liye authored
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #5739 from liyezhang556520/trySafeFinally and squashes the following commits:
      
      55683e5 [Zhang, Liye] replace try finally block with tryWithSafeFinally
      52ccf1d3
    • Xiangrui Meng's avatar
      [SPARK-7140] [MLLIB] only scan the first 16 entries in Vector.hashCode · b14cd236
      Xiangrui Meng authored
      The Python SerDe calls `Object.hashCode`, which is very expensive for Vectors. It is not necessary to scan the whole vector, especially for large ones. In this PR, we only scan the first 16 nonzeros. srowen
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5697 from mengxr/SPARK-7140 and squashes the following commits:
      
      2abc86d [Xiangrui Meng] typo
      8fb7d74 [Xiangrui Meng] update impl
      1ebad60 [Xiangrui Meng] only scan the first 16 nonzeros in Vector.hashCode
      b14cd236
    • DB Tsai's avatar
      [SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN · 6a827d5d
      DB Tsai authored
      Author: DB Tsai <dbt@netflix.com>
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #4259 from dbtsai/lir and squashes the following commits:
      
      a81c201 [DB Tsai] add import org.apache.spark.util.Utils back
      9fc48ed [DB Tsai] rebase
      2178b63 [DB Tsai] add comments
      9988ca8 [DB Tsai] addressed feedback and fixed a bug. TODO: documentation and build another synthetic dataset which can catch the bug fixed in this commit.
      fcbaefe [DB Tsai] Refactoring
      4eb078d [DB Tsai] first commit
      6a827d5d
    • Masayoshi TSUZUKI's avatar
      [SPARK-6435] spark-shell --jars option does not add all jars to classpath · 268c419f
      Masayoshi TSUZUKI authored
      Modified to accept double-quotated args properly in spark-shell.cmd.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #5227 from tsudukim/feature/SPARK-6435-2 and squashes the following commits:
      
      ac55787 [Masayoshi TSUZUKI] removed unnecessary argument.
      60789a7 [Masayoshi TSUZUKI] Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6435-2
      1fee420 [Masayoshi TSUZUKI] fixed test code for escaping '='.
      0d4dc41 [Masayoshi TSUZUKI] - escaped comman and semicolon in CommandBuilderUtils.java - added random string to the temporary filename - double-quotation followed by `cmd /c` did not worked properly - no need to escape `=` by `^` - if double-quoted string ended with `\` like classpath, the last `\` is parsed as the escape charactor and the closing `"` didn't work properly
      2a332e5 [Masayoshi TSUZUKI] Merge branch 'master' into feature/SPARK-6435-2
      04f4291 [Masayoshi TSUZUKI] [SPARK-6435] spark-shell --jars option does not add all jars to classpath
      268c419f
    • Jim Carroll's avatar
      [SPARK-7100] [MLLIB] Fix persisted RDD leak in GradientBoostTrees · 75905c57
      Jim Carroll authored
      This fixes a leak of a persisted RDD where GradientBoostTrees can call persist but never unpersists.
      
      Jira: https://issues.apache.org/jira/browse/SPARK-7100
      
      Discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/GradientBoostTrees-leaks-a-persisted-RDD-td11750.html
      
      Author: Jim Carroll <jim@dontcallme.com>
      
      Closes #5669 from jimfcarroll/gb-unpersist-fix and squashes the following commits:
      
      45f4b03 [Jim Carroll] [SPARK-7100][MLLib] Fix persisted RDD leak in GradientBoostTrees
      75905c57
    • Sean Owen's avatar
      [SPARK-7168] [BUILD] Update plugin versions in Maven build and centralize versions · 7f3b3b7e
      Sean Owen authored
      Update Maven build plugin versions and centralize plugin version management
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5720 from srowen/SPARK-7168 and squashes the following commits:
      
      98a8947 [Sean Owen] Make install, deploy plugin versions explicit
      4ecf3b2 [Sean Owen] Update Maven build plugin versions and centralize plugin version management
      7f3b3b7e
    • Pei-Lun Lee's avatar
      [SPARK-6352] [SQL] Custom parquet output committer · e13cd865
      Pei-Lun Lee authored
      Add new config "spark.sql.parquet.output.committer.class" to allow custom parquet output committer and an output committer class specific to use on s3.
      Fix compilation error introduced by https://github.com/apache/spark/pull/5042.
      Respect ParquetOutputFormat.ENABLE_JOB_SUMMARY flag.
      
      Author: Pei-Lun Lee <pllee@appier.com>
      
      Closes #5525 from ypcat/spark-6352 and squashes the following commits:
      
      54c6b15 [Pei-Lun Lee] error handling
      472870e [Pei-Lun Lee] add back custom parquet output committer
      ddd0f69 [Pei-Lun Lee] Merge branch 'master' of https://github.com/apache/spark into spark-6352
      9ece5c5 [Pei-Lun Lee] compatibility with hadoop 1.x
      8413fcd [Pei-Lun Lee] Merge branch 'master' of https://github.com/apache/spark into spark-6352
      fe65915 [Pei-Lun Lee] add support for parquet config parquet.enable.summary-metadata
      e17bf47 [Pei-Lun Lee] Merge branch 'master' of https://github.com/apache/spark into spark-6352
      9ae7545 [Pei-Lun Lee] [SPARL-6352] [SQL] Change to allow custom parquet output committer.
      0d540b9 [Pei-Lun Lee] [SPARK-6352] [SQL] add license
      c42468c [Pei-Lun Lee] [SPARK-6352] [SQL] add test case
      0fc03ca [Pei-Lun Lee] [SPARK-6532] [SQL] hide class DirectParquetOutputCommitter
      769bd67 [Pei-Lun Lee] DirectParquetOutputCommitter
      f75e261 [Pei-Lun Lee] DirectParquetOutputCommitter
      e13cd865
    • Reynold Xin's avatar
      [SPARK-7135][SQL] DataFrame expression for monotonically increasing IDs. · d94cd1a7
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5709 from rxin/inc-id and squashes the following commits:
      
      7853611 [Reynold Xin] private sql.
      a9fda0d [Reynold Xin] Missed a few numbers.
      343d896 [Reynold Xin] Self review feedback.
      a7136cb [Reynold Xin] [SPARK-7135][SQL] DataFrame expression for monotonically increasing IDs.
      d94cd1a7
    • Andrew Or's avatar
      [SPARK-7187] SerializationDebugger should not crash user code · bf35edd9
      Andrew Or authored
      rxin
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #5734 from andrewor14/ser-deb and squashes the following commits:
      
      e8aad6c [Andrew Or] NonFatal
      57d0ef4 [Andrew Or] try catch improveException
      bf35edd9
    • jerryshao's avatar
      [SPARK-5946] [STREAMING] Add Python API for direct Kafka stream · 9e4e82b7
      jerryshao authored
      Currently only added `createDirectStream` API, I'm not sure if `createRDD` is also needed, since some Java object needs to be wrapped in Python. Please help to review, thanks a lot.
      
      Author: jerryshao <saisai.shao@intel.com>
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #4723 from jerryshao/direct-kafka-python-api and squashes the following commits:
      
      a1fe97c [jerryshao] Fix rebase issue
      eebf333 [jerryshao] Address the comments
      da40f4e [jerryshao] Fix Python 2.6 Syntax error issue
      5c0ee85 [jerryshao] Style fix
      4aeac18 [jerryshao] Fix bug in example code
      7146d86 [jerryshao] Add unit test
      bf3bdd6 [jerryshao] Add more APIs and address the comments
      f5b3801 [jerryshao] Small style fix
      8641835 [Saisai Shao] Rebase and update the code
      589c05b [Saisai Shao] Fix the style
      d6fcb6a [Saisai Shao] Address the comments
      dfda902 [Saisai Shao] Style fix
      0f7d168 [Saisai Shao] Add the doc and fix some style issues
      67e6880 [Saisai Shao] Fix test bug
      917b0db [Saisai Shao] Add Python createRDD API for Kakfa direct stream
      c3fc11d [jerryshao] Modify the docs
      2c00936 [Saisai Shao] address the comments
      3360f44 [jerryshao] Fix code style
      e0e0f0d [jerryshao] Code clean and bug fix
      338c41f [Saisai Shao] Add python API and example for direct kafka stream
      9e4e82b7
    • Burak Yavuz's avatar
      [SPARK-6829] Added math functions for DataFrames · 29576e78
      Burak Yavuz authored
      Implemented almost all math functions found in scala.math (max, min and abs were already present).
      
      cc mengxr marmbrus
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #5616 from brkyvz/math-udfs and squashes the following commits:
      
      fb27153 [Burak Yavuz] reverted exception message
      836a098 [Burak Yavuz] fixed test and addressed small comment
      e5f0d13 [Burak Yavuz] addressed code review v2.2
      b26c5fb [Burak Yavuz] addressed review v2.1
      2761f08 [Burak Yavuz] addressed review v2
      6588a5b [Burak Yavuz] fixed merge conflicts
      b084e10 [Burak Yavuz] Addressed code review
      029e739 [Burak Yavuz] fixed atan2 test
      534cc11 [Burak Yavuz] added more tests, addressed comments
      fa68dbe [Burak Yavuz] added double specific test data
      937d5a5 [Burak Yavuz] use doubles instead of ints
      8e28fff [Burak Yavuz] Added apache header
      7ec8f7f [Burak Yavuz] Added math functions for DataFrames
      29576e78
  2. Apr 27, 2015
    • zsxwing's avatar
      [SPARK-7174][Core] Move calling `TaskScheduler.executorHeartbeatReceived` to another thread · 874a2ca9
      zsxwing authored
      `HeartbeatReceiver` will call `TaskScheduler.executorHeartbeatReceived`, which is a blocking operation because `TaskScheduler.executorHeartbeatReceived` will call
      
      ```Scala
          blockManagerMaster.driverEndpoint.askWithReply[Boolean](
            BlockManagerHeartbeat(blockManagerId), 600 seconds)
      ```
      
      finally. Even if it asks from a local Actor, it may block the current Akka thread. E.g., the reply may be dispatched to the same thread of the ask operation. So the reply cannot be processed. An extreme case is setting the thread number of Akka dispatch thread pool to 1.
      
      jstack log:
      
      ```
      "sparkDriver-akka.actor.default-dispatcher-14" daemon prio=10 tid=0x00007f2a8c02d000 nid=0x725 waiting on condition [0x00007f2b1d6d0000]
         java.lang.Thread.State: TIMED_WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x00000006197a0868> (a scala.concurrent.impl.Promise$CompletionLatch)
      	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
      	at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
      	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
      	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
      	at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
      	at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:169)
      	at scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640)
      	at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:167)
      	at scala.concurrent.Await$.result(package.scala:107)
      	at org.apache.spark.rpc.RpcEndpointRef.askWithReply(RpcEnv.scala:355)
      	at org.apache.spark.scheduler.DAGScheduler.executorHeartbeatReceived(DAGScheduler.scala:169)
      	at org.apache.spark.scheduler.TaskSchedulerImpl.executorHeartbeatReceived(TaskSchedulerImpl.scala:367)
      	at org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1.applyOrElse(HeartbeatReceiver.scala:103)
      	at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:182)
      	at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:128)
      	at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:203)
      	at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:127)
      	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
      	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
      	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
      	at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
      	at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
      	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
      	at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
      	at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
      	at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:94)
      	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
      	at akka.actor.ActorCell.invoke(ActorCell.scala:487)
      	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
      	at akka.dispatch.Mailbox.run(Mailbox.scala:220)
      	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
      	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
      	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
      ```
      
      This PR moved this blocking operation to a separated thread.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5723 from zsxwing/SPARK-7174 and squashes the following commits:
      
      98bfe48 [zsxwing] Use a single thread for checking timeout and reporting executorHeartbeatReceived
      5b3b545 [zsxwing] Move calling `TaskScheduler.executorHeartbeatReceived` to another thread to avoid blocking the Akka thread pool
      874a2ca9
    • Yuhao Yang's avatar
      [SPARK-7090] [MLLIB] Introduce LDAOptimizer to LDA to further improve extensibility · 4d9e560b
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-7090
      
      LDA was implemented with extensibility in mind. And with the development of OnlineLDA and Gibbs Sampling, we are collecting more detailed requirements from different algorithms.
      As Joseph Bradley jkbradley proposed in https://github.com/apache/spark/pull/4807 and with some further discussion, we'd like to adjust the code structure a little to present the common interface and extension point clearly.
      Basically class LDA would be a common entrance for LDA computing. And each LDA object will refer to a LDAOptimizer for the concrete algorithm implementation. Users can customize LDAOptimizer with specific parameters and assign it to LDA.
      
      Concrete changes:
      
      1. Add a trait `LDAOptimizer`, which defines the common iterface for concrete implementations. Each subClass is a wrapper for a specific LDA algorithm.
      
      2. Move EMOptimizer to file LDAOptimizer and inherits from LDAOptimizer, rename to EMLDAOptimizer. (in case a more generic EMOptimizer comes in the future)
              -adjust the constructor of EMOptimizer, since all the parameters should be passed in through initialState method. This can avoid unwanted confusion or overwrite.
              -move the code from LDA.initalState to initalState of EMLDAOptimizer
      
      3. Add property ldaOptimizer to LDA and its getter/setter, and EMLDAOptimizer is the default Optimizer.
      
      4. Change the return type of LDA.run from DistributedLDAModel to LDAModel.
      
      Further work:
      add OnlineLDAOptimizer and other possible Optimizers once ready.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #5661 from hhbyyh/ldaRefactor and squashes the following commits:
      
      0e2e006 [Yuhao Yang] respond to review comments
      08a45da [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into ldaRefactor
      e756ce4 [Yuhao Yang] solve mima exception
      d74fd8f [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into ldaRefactor
      0bb8400 [Yuhao Yang] refactor LDA with Optimizer
      ec2f857 [Yuhao Yang] protoptype for discussion
      4d9e560b
    • GuoQiang Li's avatar
      [SPARK-7162] [YARN] Launcher error in yarn-client · 62888a4d
      GuoQiang Li authored
      jira: https://issues.apache.org/jira/browse/SPARK-7162
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #5716 from witgo/SPARK-7162 and squashes the following commits:
      
      b64564c [GuoQiang Li] Launcher error in yarn-client
      62888a4d
    • Sean Owen's avatar
      [SPARK-7145] [CORE] commons-lang (2.x) classes used instead of commons-lang3... · ab5adb7a
      Sean Owen authored
      [SPARK-7145] [CORE] commons-lang (2.x) classes used instead of commons-lang3 (3.x); commons-io used without dependency
      
      Remove use of commons-lang in favor of commons-lang3 classes; remove commons-io use in favor of Guava
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5703 from srowen/SPARK-7145 and squashes the following commits:
      
      21fbe03 [Sean Owen] Remove use of commons-lang in favor of commons-lang3 classes; remove commons-io use in favor of Guava
      ab5adb7a
    • Marcelo Vanzin's avatar
      [SPARK-3090] [CORE] Stop SparkContext if user forgets to. · 5d45e1f6
      Marcelo Vanzin authored
      Set up a shutdown hook to try to stop the Spark context in
      case the user forgets to do it. The main effect is that any
      open logs files are flushed and closed, which is particularly
      interesting for event logs.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5696 from vanzin/SPARK-3090 and squashes the following commits:
      
      3b554b5 [Marcelo Vanzin] [SPARK-3090] [core] Stop SparkContext if user forgets to.
      5d45e1f6
    • Hong Shen's avatar
      [SPARK-6738] [CORE] Improve estimate the size of a large array · 8e1c00db
      Hong Shen authored
      Currently, SizeEstimator.visitArray is not correct in the follow case,
      ```
      array size > 200,
      elem has the share object
      ```
      
      when I add a debug log in SizeTracker.scala:
      ```
       System.err.println(s"numUpdates:$numUpdates, size:$ts, bytesPerUpdate:$bytesPerUpdate, cost time:$b")
      ```
      I get the following log:
      ```
       numUpdates:1, size:262448, bytesPerUpdate:0.0, cost time:35
       numUpdates:2, size:420698, bytesPerUpdate:158250.0, cost time:35
       numUpdates:4, size:420754, bytesPerUpdate:28.0, cost time:32
       numUpdates:7, size:420754, bytesPerUpdate:0.0, cost time:27
       numUpdates:12, size:420754, bytesPerUpdate:0.0, cost time:28
       numUpdates:20, size:420754, bytesPerUpdate:0.0, cost time:25
       numUpdates:32, size:420754, bytesPerUpdate:0.0, cost time:21
       numUpdates:52, size:420754, bytesPerUpdate:0.0, cost time:20
       numUpdates:84, size:420754, bytesPerUpdate:0.0, cost time:20
       numUpdates:135, size:420754, bytesPerUpdate:0.0, cost time:20
       numUpdates:216, size:420754, bytesPerUpdate:0.0, cost time:11
       numUpdates:346, size:420754, bytesPerUpdate:0.0, cost time:6
       numUpdates:554, size:488911, bytesPerUpdate:327.67788461538464, cost time:8
       numUpdates:887, size:2312259426, bytesPerUpdate:6942253.798798799, cost time:198
      15/04/21 14:27:26 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling in-memory map of 3.0 GB to disk (1 time so far)
      15/04/21 14:27:26 INFO collection.ExternalAppendOnlyMap: /data11/yarnenv/local/usercache/spark/appcache/application_1426746631567_11745/spark-local-20150421142719-c001/30/temp_local_066af981-c2fc-4b70-a00e-110e23006fbc
      ```
      But in fact the file size is only 162K:
      ```
      $ ll -h /data11/yarnenv/local/usercache/spark/appcache/application_1426746631567_11745/spark-local-20150421142719-c001/30/temp_local_066af981-c2fc-4b70-a00e-110e23006fbc
      -rw-r----- 1 spark users 162K Apr 21 14:27 /data11/yarnenv/local/usercache/spark/appcache/application_1426746631567_11745/spark-local-20150421142719-c001/30/temp_local_066af981-c2fc-4b70-a00e-110e23006fbc
      ```
      
      In order to test case, I change visitArray to:
      ```
             var size = 0l
               for (i <- 0 until length) {
                val obj = JArray.get(array, i)
                size += SizeEstimator.estimate(obj, state.visited).toLong
              }
             state.size += size
      ```
      I get the following log:
      ```
      ...
      14895 277016088 566.9046118590662 time:8470
      23832 281840544 552.3308270676691 time:8031
      38132 289891824 539.8294729775092 time:7897
      61012 302803640 563.0265734265735 time:13044
      97620 322904416 564.3276223776223 time:13554
      15/04/14 11:46:43 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling in-memory map of 314.5 MB to disk (1 time so far)
      15/04/14 11:46:43 INFO collection.ExternalAppendOnlyMap: /data1/yarnenv/local/usercache/spark/appcache/application_1426746631567_8477/spark-local-20150414114020-2fcb/14/temp_local_5b6b98d5-5bfa-47e2-8216-059482ccbda0
      ```
       the file size is 85M.
      ```
      $ ll -h /data1/yarnenv/local/usercache/spark/appcache/application_1426746631567_8477/spark- local-20150414114020-2fcb/14/
      total 85M
      -rw-r----- 1 spark users 85M Apr 14 11:46 temp_local_5b6b98d5-5bfa-47e2-8216-059482ccbda0
      ```
      
      The following log is when I use this patch,
      ```
      ....
      numUpdates:32, size:365484, bytesPerUpdate:0.0, cost time:7
      numUpdates:52, size:365484, bytesPerUpdate:0.0, cost time:5
      numUpdates:84, size:365484, bytesPerUpdate:0.0, cost time:5
      numUpdates:135, size:372208, bytesPerUpdate:131.84313725490196, cost time:86
      numUpdates:216, size:379020, bytesPerUpdate:84.09876543209876, cost time:21
      numUpdates:346, size:1865208, bytesPerUpdate:11432.215384615385, cost time:23
      numUpdates:554, size:2052380, bytesPerUpdate:899.8653846153846, cost time:16
      numUpdates:887, size:2142820, bytesPerUpdate:271.59159159159157, cost time:15
      ..
      numUpdates:14895, size:251675500, bytesPerUpdate:438.5263157894737, cost time:13
      numUpdates:23832, size:257010268, bytesPerUpdate:596.9305135951662, cost time:14
      numUpdates:38132, size:263922396, bytesPerUpdate:483.3655944055944, cost time:15
      numUpdates:61012, size:268962596, bytesPerUpdate:220.28846153846155, cost time:24
      numUpdates:97620, size:286980644, bytesPerUpdate:492.1888111888112, cost time:22
      15/04/21 14:45:12 INFO collection.ExternalAppendOnlyMap: Thread 53 spilling in-memory map of 328.7 MB to disk (1 time so far)
      15/04/21 14:45:12 INFO collection.ExternalAppendOnlyMap: /data4/yarnenv/local/usercache/spark/appcache/application_1426746631567_11758/spark-local-20150421144456-a2a5/2a/temp_local_9c109510-af16-4468-8f23-48cad04da88f
      ```
       the file size is 88M.
      ```
      $ ll -h /data4/yarnenv/local/usercache/spark/appcache/application_1426746631567_11758/spark-local-20150421144456-a2a5/2a/
      total 88M
      -rw-r----- 1 spark users 88M Apr 21 14:45 temp_local_9c109510-af16-4468-8f23-48cad04da88f
      ```
      
      Author: Hong Shen <hongshen@tencent.com>
      
      Closes #5608 from shenh062326/my_change5 and squashes the following commits:
      
      5506bae [Hong Shen] Fix compile error
      c275dd3 [Hong Shen] Alter code style
      fe202a2 [Hong Shen] Change the code style and add documentation.
      a9fca84 [Hong Shen] Add test case for SizeEstimator
      4877eee [Hong Shen] Improve estimate the size of a large array
      a2ea7ac [Hong Shen] Alter code style
      4c28e36 [Hong Shen] Improve estimate the size of a large array
      8e1c00db
    • Steven She's avatar
      [SPARK-7103] Fix crash with SparkContext.union when RDD has no partitioner · b9de9e04
      Steven She authored
      Added a check to the SparkContext.union method to check that a partitioner is defined on all RDDs when instantiating a PartitionerAwareUnionRDD.
      
      Author: Steven She <steven@canopylabs.com>
      
      Closes #5679 from stevencanopy/SPARK-7103 and squashes the following commits:
      
      5a3d846 [Steven She] SPARK-7103: Fix crash with SparkContext.union when at least one RDD has no partitioner
      b9de9e04
    • hlin09's avatar
      [SPARK-6991] [SPARKR] Adds support for zipPartitions. · ca9f4ebb
      hlin09 authored
      Author: hlin09 <hlin09pu@gmail.com>
      
      Closes #5568 from hlin09/zipPartitions and squashes the following commits:
      
      12c08a5 [hlin09] Fix comments
      d2d32db [hlin09] Merge branch 'master' into zipPartitions
      ec56d2f [hlin09] Fix test.
      27655d3 [hlin09] Adds support for zipPartitions.
      ca9f4ebb
    • tedyu's avatar
      SPARK-7107 Add parameter for zookeeper.znode.parent to hbase_inputformat... · ef82bddc
      tedyu authored
      ....py
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #5673 from tedyu/master and squashes the following commits:
      
      ab7c72b [tedyu] SPARK-7107 Adjust indentation to pass Python style tests
      6e25939 [tedyu] Adjust line length to be shorter than 100 characters
      18d172a [tedyu] SPARK-7107 Add parameter for zookeeper.znode.parent to hbase_inputformat.py
      ef82bddc
    • Jeff Harrison's avatar
      [SPARK-6856] [R] Make RDD information more useful in SparkR · 7078f602
      Jeff Harrison authored
      Author: Jeff Harrison <jeffrharrison@gmail.com>
      
      Closes #5667 from His-name-is-Joof/joofspark and squashes the following commits:
      
      f8814a6 [Jeff Harrison] newline added after RDD show() output
      4d9d972 [Jeff Harrison] Merge branch 'master' into joofspark
      9d2295e [Jeff Harrison] parallelize with 1:10
      878b830 [Jeff Harrison] Merge branch 'master' into joofspark
      c8c0b80 [Jeff Harrison] add test for RDD function show()
      123be65 [Jeff Harrison] SPARK-6856
      7078f602
    • Misha Chernetsov's avatar
      [SPARK-4925] Publish Spark SQL hive-thriftserver maven artifact · 998aac21
      Misha Chernetsov authored
      turned on hive-thriftserver profile in release script
      
      Author: Misha Chernetsov <chernetsov@gmail.com>
      
      Closes #5429 from chernetsov/master and squashes the following commits:
      
      9cc36af [Misha Chernetsov] [SPARK-4925] Publish Spark SQL hive-thriftserver maven artifact turned on hive-thriftserver profile in release script for scala 2.10
      998aac21
    • baishuo's avatar
      [SPARK-6505] [SQL] Remove the reflection call in HiveFunctionWrapper · 82bb7fd4
      baishuo authored
      according liancheng‘s  comment in https://issues.apache.org/jira/browse/SPARK-6505,  this patch remove the  reflection call in HiveFunctionWrapper, and implement the functions named "deserializeObjectByKryo" and "serializeObjectByKryo" according the functions with the save name in
      org.apache.hadoop.hive.ql.exec.Utilities.java
      
      Author: baishuo <vc_java@hotmail.com>
      
      Closes #5660 from baishuo/SPARK-6505-20150423 and squashes the following commits:
      
      ae61ec4 [baishuo] modify code style
      78d9fa3 [baishuo] modify code style
      0b522a7 [baishuo] modify code style
      a5ff9c7 [baishuo] Remove the reflection call in HiveFunctionWrapper
      82bb7fd4
  3. Apr 26, 2015
    • wangfei's avatar
      [SQL][Minor] rename DataTypeParser.apply to DataTypeParser.parse · d188b8ba
      wangfei authored
      rename DataTypeParser.apply to DataTypeParser.parse to make it more clear and readable.
      /cc rxin
      
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #5710 from scwf/apply and squashes the following commits:
      
      c319977 [wangfei] rename apply to parse
      d188b8ba
    • Reynold Xin's avatar
      [SPARK-7152][SQL] Add a Column expression for partition ID. · ca55dc95
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5705 from rxin/df-pid and squashes the following commits:
      
      401018f [Reynold Xin] [SPARK-7152][SQL] Add a Column expression for partition ID.
      ca55dc95
    • Alain's avatar
      [MINOR] [MLLIB] Refactor toString method in MLLIB · 9a5bbe05
      Alain authored
      1. predict(predict.toString) has already output prefix “predict” thus it’s duplicated to print ", predict = " again
      2. there are some extra spaces
      
      Author: Alain <aihe@usc.edu>
      
      Closes #5687 from AiHe/tree-node-issue-2 and squashes the following commits:
      
      9862b9a [Alain] Pass scala coding style checking
      44ba947 [Alain] Minor][MLLIB] Format toString method in MLLIB
      bdc402f [Alain] [Minor][MLLIB] Fix a formatting bug in toString method in Node
      426eee7 [Alain] [Minor][MLLIB] Fix a formatting bug in toString method in Node.scala
      9a5bbe05
  4. Apr 25, 2015
    • Nishkam Ravi's avatar
      [SPARK-6014] [CORE] [HOTFIX] Add try-catch block around ShutDownHook · f5473c2b
      Nishkam Ravi authored
      Add a try/catch block around removeShutDownHook else IllegalStateException thrown in YARN cluster mode (see https://github.com/apache/spark/pull/4690)
      
      cc andrewor14, srowen
      
      Author: Nishkam Ravi <nravi@cloudera.com>
      Author: nishkamravi2 <nishkamravi@gmail.com>
      Author: nravi <nravi@c1704.halxg.cloudera.com>
      
      Closes #5672 from nishkamravi2/master_nravi and squashes the following commits:
      
      0f1abd0 [nishkamravi2] Update Utils.scala
      474e3bf [nishkamravi2] Update DiskBlockManager.scala
      97c383e [nishkamravi2] Update Utils.scala
      8691e0c [Nishkam Ravi] Add a try/catch block around Utils.removeShutdownHook
      2be1e76 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      1c13b79 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      bad4349 [nishkamravi2] Update Main.java
      36a6f87 [Nishkam Ravi] Minor changes and bug fixes
      b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument
      d9658d6 [Nishkam Ravi] Changes for SPARK-6406
      ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406)
      345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ac58975 [Nishkam Ravi] spark-class changes
      06bfeb0 [nishkamravi2] Update spark-class
      35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java
      4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java
      746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar)
      bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      d453197 [nishkamravi2] Update NewHadoopRDD.scala
      6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
      0ce2c32 [nishkamravi2] Update HadoopRDD.scala
      f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
      71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      494d8c0 [nishkamravi2] Update DiskBlockManager.scala
      3c5ddba [nishkamravi2] Update DiskBlockManager.scala
      f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
      79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
      535295a [nishkamravi2] Update TaskSetManager.scala
      3e1b616 [Nishkam Ravi] Modify test for maxResultSize
      9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
      5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      636a9ff [nishkamravi2] Update YarnAllocator.scala
      8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
      35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
      5ac2ec1 [Nishkam Ravi] Remove out
      dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
      42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
      362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
      c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
      1cf2d1e [nishkamravi2] Update YarnAllocator.scala
      ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
      2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
      2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
      3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
      5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
      eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
      df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
      6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
      5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
      681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
      f5473c2b
    • Prashant Sharma's avatar
      [SPARK-7092] Update spark scala version to 2.11.6 · a11c8683
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #5662 from ScrapCodes/SPARK-7092/scala-update-2.11.6 and squashes the following commits:
      
      58cf4f9 [Prashant Sharma] [SPARK-7092] Update spark scala version to 2.11.6
      a11c8683
    • Yin Huai's avatar
      [SQL] Update SQL readme to include instructions on generating golden answer... · aa6966ff
      Yin Huai authored
      [SQL] Update SQL readme to include instructions on generating golden answer files based on Hive 0.13.1.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #5702 from yhuai/howToGenerateGoldenFiles and squashes the following commits:
      
      9c4a7f8 [Yin Huai] Update readme to include instructions on generating golden answer files based on Hive 0.13.1.
      aa6966ff
    • Joseph K. Bradley's avatar
      [SPARK-6113] [ML] Tree ensembles for Pipelines API · a7160c4e
      Joseph K. Bradley authored
      This is a continuation of [https://github.com/apache/spark/pull/5530] (which was for Decision Trees), but for ensembles: Random Forests and Gradient-Boosted Trees.  Please refer to the JIRA [https://issues.apache.org/jira/browse/SPARK-6113], the design doc linked from the JIRA, and the previous PR linked above for design discussions.
      
      This PR follows the example set by the previous PR for Decision Trees.  It includes a few cleanups to Decision Trees.
      
      Note: There is one issue which will be addressed in a separate PR: Ensembles' component Models have no parent or fittingParamMap.  I plan to submit a separate PR which makes those values in Model be Options.  It does not matter much which PR gets merged first.
      
      CC: mengxr manishamde codedeft chouqin
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #5626 from jkbradley/dt-api-ensembles and squashes the following commits:
      
      729167a [Joseph K. Bradley] small cleanups based on code review
      bbae2a2 [Joseph K. Bradley] Updated per all comments in code review
      855aa9a [Joseph K. Bradley] scala style fix
      ea3d901 [Joseph K. Bradley] Added GBT to spark.ml, with tests and examples
      c0f30c1 [Joseph K. Bradley] Added random forests and test suites to spark.ml.  Not tested yet.  Need to add example as well
      d045ebd [Joseph K. Bradley] some more updates, but far from done
      ee1a10b [Joseph K. Bradley] Added files from old PR and did some initial updates.
      a7160c4e
Loading