Skip to content
Snippets Groups Projects
  1. Oct 20, 2014
    • wangxiaojing's avatar
      [SPARK-3940][SQL] Avoid console printing error messages three times · 0fe1c093
      wangxiaojing authored
      If  wrong sql,the console print error one times。
      eg:
      <pre>
      spark-sql> show tabless;
      show tabless;
      14/10/13 21:03:48 INFO ParseDriver: Parsing command: show tabless
      ............
      	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:274)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
      	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:209)
      	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
      Caused by: org.apache.hadoop.hive.ql.parse.ParseException: line 1:5 cannot recognize input near 'show' 'tabless' '<EOF>' in ddl statement
      
      	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:193)
      	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:161)
      	at org.apache.spark.sql.hive.HiveQl$.getAst(HiveQl.scala:218)
      	at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:226)
      	... 47 more
      Time taken: 4.35 seconds
      14/10/13 21:03:51 INFO CliDriver: Time taken: 4.35 seconds
      </pre>
      
      Author: wangxiaojing <u9jing@gmail.com>
      
      Closes #2790 from wangxiaojing/spark-3940 and squashes the following commits:
      
      e2e5c14 [wangxiaojing] sql Print the error code three times
      0fe1c093
    • Takuya UESHIN's avatar
      [SPARK-3969][SQL] Optimizer should have a super class as an interface. · 7586e2e6
      Takuya UESHIN authored
      Some developers want to replace `Optimizer` to fit their projects but can't do so because currently `Optimizer` is an `object`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #2825 from ueshin/issues/SPARK-3969 and squashes the following commits:
      
      abbc53c [Takuya UESHIN] Re-rename Optimizer object.
      4d2e1bc [Takuya UESHIN] Rename Optimizer object.
      9547a23 [Takuya UESHIN] Extract abstract class from Optimizer for developers to be able to replace Optimizer.
      7586e2e6
    • luogankun's avatar
      [SPARK-3945]Properties of hive-site.xml is invalid in running the Thrift JDBC server · fce1d416
      luogankun authored
      Write properties of hive-site.xml to HiveContext when initilize session state in SparkSQLEnv.scala.
      
      The method of SparkSQLEnv.init() in HiveThriftServer2.scala can not write the properties of hive-site.xml to HiveContext. Such as: add configuration property spark.sql.shuffle.partititions in the hive-site.xml.
      
      Author: luogankun <luogankun@gmail.com>
      
      Closes #2800 from luogankun/SPARK-3945 and squashes the following commits:
      
      3679efc [luogankun] [SPARK-3945]Write properties of hive-site.xml to HiveContext when initilize session state In SparkSQLEnv.scala
      fce1d416
    • Takuya UESHIN's avatar
      [SPARK-3966][SQL] Fix nullabilities of Cast related to DateType. · 364d52b7
      Takuya UESHIN authored
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #2820 from ueshin/issues/SPARK-3966 and squashes the following commits:
      
      ca4a745 [Takuya UESHIN] Fix nullabilities of Cast related to DateType.
      364d52b7
    • Michael Armbrust's avatar
      [SPARK-3800][SQL] Clean aliases from grouping expressions · e9c1afa8
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2658 from marmbrus/nestedAggs and squashes the following commits:
      
      862b763 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into nestedAggs
      3234521 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into nestedAggs
      8b06fdc [Michael Armbrust] possible fix for grouping on nested fields
      e9c1afa8
    • Cheng Lian's avatar
      [SPARK-3906][SQL] Adds multiple join support for SQLContext · 1b3ce61c
      Cheng Lian authored
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2767 from liancheng/multi-join and squashes the following commits:
      
      9dc0d18 [Cheng Lian] Adds multiple join support for SQLContext
      1b3ce61c
    • Qiping Li's avatar
      [SPARK-3207][MLLIB]Choose splits for continuous features in DecisionTree more adaptively · eadc4c59
      Qiping Li authored
      DecisionTree splits on continuous features by choosing an array of values from a subsample of the data.
      Currently, it does not check for identical values in the subsample, so it could end up having multiple copies of the same split. In this PR, we choose splits for a continuous feature in 3 steps:
      
      1. Sort sample values for this feature
      2. Get number of occurrence of each distinct value
      3. Iterate the value count array computed in step 2 to choose splits.
      
      After find splits, `numSplits` and `numBins` in metadata will be updated.
      
      CC: mengxr manishamde jkbradley, please help me review this, thanks.
      
      Author: Qiping Li <liqiping1991@gmail.com>
      Author: chouqin <liqiping1991@gmail.com>
      Author: liqi <liqiping1991@gmail.com>
      Author: qiping.lqp <qiping.lqp@alibaba-inc.com>
      
      Closes #2780 from chouqin/dt-findsplits and squashes the following commits:
      
      18d0301 [Qiping Li] check explicitly findsplits return distinct splits
      8dc28ab [chouqin] remove blank lines
      ffc920f [chouqin] adjust code based on comments and add more test cases
      9857039 [chouqin] Merge branch 'master' of https://github.com/apache/spark into dt-findsplits
      d353596 [qiping.lqp] fix pyspark doc test
      9e64699 [Qiping Li] fix random forest unit test
      3c72913 [Qiping Li] fix random forest unit test
      092efcb [Qiping Li] fix bug
      f69f47f [Qiping Li] fix bug
      ab303a4 [Qiping Li] fix bug
      af6dc97 [Qiping Li] fix bug
      2a8267a [Qiping Li] fix bug
      c339a61 [Qiping Li] fix bug
      369f812 [Qiping Li] fix style
      8f46af6 [Qiping Li] add comments and unit test
      9e7138e [Qiping Li] Merge branch 'dt-findsplits' of https://github.com/chouqin/spark into dt-findsplits
      1b25a35 [Qiping Li] Merge branch 'master' of https://github.com/apache/spark into dt-findsplits
      0cd744a [liqi] fix bug
      3652823 [Qiping Li] fix bug
      af7cb79 [Qiping Li] Choose splits for continuous features in DecisionTree more adaptively
      eadc4c59
    • mcheah's avatar
      [SPARK-3736] Workers reconnect when disassociated from the master. · 4afe9a48
      mcheah authored
      Before, if the master node is killed and restarted, the worker nodes
      would not attempt to reconnect to the Master. Therefore, when the Master
      node was restarted, the worker nodes needed to be restarted as well.
      
      Now, when the Master node is disconnected, the worker nodes will
      continuously ping the master node in attempts to reconnect to it. Once
      the master node restarts, it will detect one of the registration
      requests from its former workers. The result is that the cluster
      re-enters a healthy state.
      
      In addition, when the master does not receive a heartbeat from the
      worker, the worker was removed; however, when the worker sent a
      heartbeat to the master, the master used to ignore the heartbeat. Now,
      a master that receives a heartbeat from a worker that had been
      disconnected will request the worker to re-attempt the registration
      process, at which point the worker will send a RegisterWorker request
      and be re-connected accordingly.
      
      Re-connection attempts per worker are submitted every N seconds, where N
      is configured by the property spark.worker.reconnect.interval - this has
      a default of 60 seconds right now.
      
      Author: mcheah <mcheah@palantir.com>
      
      Closes #2828 from mccheah/reconnect-dead-workers and squashes the following commits:
      
      83f8bc9 [mcheah] [SPARK-3736] More informative log message, and fixing some indentation.
      fe0e02f [mcheah] [SPARK-3736] Moving reconnection logic to registerWithMaster().
      94ddeca [mcheah] [SPARK-3736] Changing a log warning to a log info.
      a698e35 [mcheah] [SPARK-3736] Addressing PR comment to make some defs private.
      b9a3077 [mcheah] [SPARK-3736] Addressing PR comments related to reconnection.
      2ad5ed5 [mcheah] [SPARK-3736] Cancel attempts to reconnect if the master changes.
      b5b34af [mcheah] [SPARK-3736] Workers reconnect when disassociated from the master.
      4afe9a48
    • Takuya UESHIN's avatar
      [SPARK-3986][SQL] Fix package names to fit their directory names. · ea054e1f
      Takuya UESHIN authored
      Package names of 2 test suites are different from their directory names.
      - `GeneratedEvaluationSuite`
      - `GeneratedMutableEvaluationSuite`
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #2835 from ueshin/issues/SPARK-3986 and squashes the following commits:
      
      fa2cc05 [Takuya UESHIN] Fix package names to fit their directory names.
      ea054e1f
    • GuoQiang Li's avatar
      [SPARK-4010][Web UI]Spark UI returns 500 in yarn-client mode · 51afde9d
      GuoQiang Li authored
      The problem caused by #1966
      CC YanTangZhai andrewor14
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #2858 from witgo/SPARK-4010 and squashes the following commits:
      
      9866fbf [GuoQiang Li] Spark UI returns 500 in yarn-client mode
      51afde9d
    • jerryshao's avatar
      [SPARK-3948][Shuffle]Fix stream corruption bug in sort-based shuffle · c7aeecd0
      jerryshao authored
      Kernel 2.6.32 bug will lead to unexpected behavior of transferTo in copyStream, and this will corrupt the shuffle output file in sort-based shuffle, which will somehow introduce PARSING_ERROR(2), deserialization error or offset out of range. Here fix this by adding append flag, also add some position checking code. Details can be seen in [SPARK-3948](https://issues.apache.org/jira/browse/SPARK-3948).
      
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #2824 from jerryshao/SPARK-3948 and squashes the following commits:
      
      be0533a [jerryshao] Address the comments
      a82b184 [jerryshao] add configuration to control the NIO way of copying stream
      e17ada2 [jerryshao] Fix kernel 2.6.32 bug led unexpected behavior of transferTo
      c7aeecd0
  2. Oct 19, 2014
    • Josh Rosen's avatar
      [SPARK-3902] [SPARK-3590] Stabilize AsynRDDActions and add Java API · d1966f3a
      Josh Rosen authored
      This PR adds a Java API for AsyncRDDActions and promotes the API from `Experimental` to stable.
      
      Author: Josh Rosen <joshrosen@apache.org>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #2760 from JoshRosen/async-rdd-actions-in-java and squashes the following commits:
      
      0d45fbc [Josh Rosen] Whitespace fix.
      ad3ae53 [Josh Rosen] Merge remote-tracking branch 'origin/master' into async-rdd-actions-in-java
      c0153a5 [Josh Rosen] Remove unused variable.
      e8e2867 [Josh Rosen] Updates based on Marcelo's review feedback
      7a1417f [Josh Rosen] Removed unnecessary java.util import.
      6f8f6ac [Josh Rosen] Fix import ordering.
      ff28e49 [Josh Rosen] Add MiMa excludes and fix a scalastyle error.
      346e46e [Josh Rosen] [SPARK-3902] Stabilize AsyncRDDActions; add Java API.
      d1966f3a
    • Josh Rosen's avatar
      [SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport) · 7e63bb49
      Josh Rosen authored
      
      This patch attempts to fix SPARK-2546 in `branch-1.0` and `branch-1.1`.  The underlying problem is that thread-safety issues in Hadoop Configuration objects may cause Spark tasks to get stuck in infinite loops.  The approach taken here is to clone a new copy of the JobConf for each task rather than sharing a single copy between tasks.  Note that there are still Configuration thread-safety issues that may affect the driver, but these seem much less likely to occur in practice and will be more complex to fix (see discussion on the SPARK-2546 ticket).
      
      This cloning is guarded by a new configuration option (`spark.hadoop.cloneConf`) and is disabled by default in order to avoid unexpected performance regressions for workloads that are unaffected by the Configuration thread-safety issues.
      
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #2684 from JoshRosen/jobconf-fix-backport and squashes the following commits:
      
      f14f259 [Josh Rosen] Add configuration option to control cloning of Hadoop JobConf.
      b562451 [Josh Rosen] Remove unused jobConfCacheKey field.
      dd25697 [Josh Rosen] [SPARK-2546] [1.0 / 1.1 backport] Clone JobConf for each task.
      
      (cherry picked from commit 2cd40db2)
      Signed-off-by: default avatarJosh Rosen <joshrosen@databricks.com>
      
      Conflicts:
      	core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
      7e63bb49
  3. Oct 18, 2014
    • Davies Liu's avatar
      [SPARK-3952] [Streaming] [PySpark] add Python examples in Streaming Programming Guide · 05db2da7
      Davies Liu authored
      Having Python examples in Streaming Programming Guide.
      
      Also add RecoverableNetworkWordCount example.
      
      Author: Davies Liu <davies.liu@gmail.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #2808 from davies/pyguide and squashes the following commits:
      
      8d4bec4 [Davies Liu] update readme
      26a7e37 [Davies Liu] fix format
      3821c4d [Davies Liu] address comments, add missing file
      7e4bb8a [Davies Liu] add Python examples in Streaming Programming Guide
      05db2da7
    • Sean Owen's avatar
      SPARK-3926 [CORE] Result of JavaRDD.collectAsMap() is not Serializable · f406a839
      Sean Owen authored
      Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2805 from srowen/SPARK-3926 and squashes the following commits:
      
      ecb78ee [Sean Owen] Fix conflict between java.io.Serializable and use of Scala's Serializable
      f4717f9 [Sean Owen] Oops, fix compile problem
      ae1b36f [Sean Owen] Expand to cover Maps returned from other Java API methods as well
      51c26c2 [Sean Owen] Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are
      f406a839
  4. Oct 17, 2014
    • Joseph K. Bradley's avatar
      [SPARK-3934] [SPARK-3918] [mllib] Bug fixes for RandomForest, DecisionTree · 477c6481
      Joseph K. Bradley authored
      SPARK-3934: When run with a mix of unordered categorical and continuous features, on multiclass classification, RandomForest fails. The bug is in the sanity checks in getFeatureOffset and getLeftRightFeatureOffsets, which use the wrong indices for checking whether features are unordered.
      Fix: Remove the sanity checks since they are not really needed, and since they would require DTStatsAggregator to keep track of an extra set of indices (for the feature subset).
      
      Added test to RandomForestSuite which failed with old version but now works.
      
      SPARK-3918: Added baggedInput.unpersist at end of training.
      
      Also:
      * I removed DTStatsAggregator.isUnordered since it is no longer used.
      * DecisionTreeMetadata: Added logWarning when maxBins is automatically reduced.
      * Updated DecisionTreeRunner to explicitly fix the test data to have the same number of features as the training data.  This is a temporary fix which should eventually be replaced by pre-indexing both datasets.
      * RandomForestModel: Updated toString to print total number of nodes in forest.
      * Changed Predict class to be public DeveloperApi.  This was necessary to allow users to create their own trees by hand (for testing).
      
      CC: mengxr  manishamde chouqin codedeft  Just notifying you of these small bug fixes.
      
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      
      Closes #2785 from jkbradley/dtrunner-update and squashes the following commits:
      
      9132321 [Joseph K. Bradley] merged with master, fixed imports
      9dbd000 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dtrunner-update
      e116473 [Joseph K. Bradley] Changed Predict class to be public DeveloperApi.
      f502e65 [Joseph K. Bradley] bug fix for SPARK-3934
      7f3d60f [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dtrunner-update
      ba567ab [Joseph K. Bradley] Changed DTRunner to load test data using same number of features as in training data.
      4e88c1f [Joseph K. Bradley] changed RF toString to print total number of nodes
      477c6481
    • Daoyuan Wang's avatar
      [SPARK-3985] [Examples] fix file path using os.path.join · 23f6171d
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #2834 from adrian-wang/sqlpypath and squashes the following commits:
      
      da7aa95 [Daoyuan Wang] fix file path using path.join
      23f6171d
    • Michael Armbrust's avatar
      [SPARK-3855][SQL] Preserve the result attribute of python UDFs though transformations · adcb7d33
      Michael Armbrust authored
      In the current implementation it was possible for the reference to change after analysis.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #2717 from marmbrus/pythonUdfResults and squashes the following commits:
      
      da14879 [Michael Armbrust] Fix test
      6343bcb [Michael Armbrust] add test
      9533286 [Michael Armbrust] Correctly preserve the result attribute of python UDFs though transformations
      adcb7d33
    • Marcelo Vanzin's avatar
      [SPARK-3979] [yarn] Use fs's default replication. · 803e7f08
      Marcelo Vanzin authored
      This avoids issues when HDFS is configured in a way that would not
      allow the hardcoded default replication of "3".
      
      Note: getDefaultReplication(Path) was added in 0.23.3, and the oldest
      one available on Maven Central is 0.23.7, so I chose to not add code
      to access that method via reflection.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2831 from vanzin/SPARK-3979 and squashes the following commits:
      
      b0e3a97 [Marcelo Vanzin] [SPARK-3979] [yarn] Use fs's default replication.
      803e7f08
    • likun's avatar
      [SPARK-3935][Core] log the number of records that has been written · c3518620
      likun authored
      There is a unused variable(count) in saveAsHadoopDataset in PairRDDFunctions.scala. The initial idea of this variable seems to count the number of records, so I am adding a log statement to log the number of records that has been written to the writer.
      
      Author: likun <jacky.likun@huawei.com>
      Author: jackylk <jacky.likun@huawei.com>
      
      Closes #2791 from jackylk/SPARK-3935 and squashes the following commits:
      
      a874047 [jackylk] removing the unused variable in PairRddFunctions.scala
      3bf43c7 [likun] log the number of records has been written
      c3518620
  5. Oct 16, 2014
    • Shivaram Venkataraman's avatar
      [SPARK-3973] Print call site information for broadcasts · e678b9f0
      Shivaram Venkataraman authored
      Its hard to debug which broadcast variables refer to what in a big codebase. Printing call site information helps in debugging.
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #2829 from shivaram/spark-broadcast-print and squashes the following commits:
      
      cd6dbdf [Shivaram Venkataraman] Print call site information for broadcasts
      e678b9f0
    • yantangzhai's avatar
      [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes · dedace83
      yantangzhai authored
      JobProgressPage could not show Fair Scheduler Pools section sometimes.
      SparkContext starts webui and then postEnvironmentUpdate. Sometimes JobProgressPage is accessed between webui starting and postEnvironmentUpdate, then the lazy val isFairScheduler will be false. The Fair Scheduler Pools section will not display any more.
      
      Author: yantangzhai <tyz0303@163.com>
      Author: YanTangZhai <hakeemzhai@tencent.com>
      
      Closes #1966 from YanTangZhai/SPARK-3067 and squashes the following commits:
      
      d4323f8 [yantangzhai] update [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes
      8a00106 [YanTangZhai] Merge pull request #6 from apache/master
      b6391cc [yantangzhai] revert [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes
      d2226cd [yantangzhai] [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes
      cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
      aac7f7b [yantangzhai] [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes
      cdef539 [YanTangZhai] Merge pull request #1 from apache/master
      dedace83
    • zsxwing's avatar
      [SPARK-3741] Add afterExecute for handleConnectExecutor · 56fd34af
      zsxwing authored
      Sorry. I found that I forgot to add `afterExecute` for `handleConnectExecutor` in #2593.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #2794 from zsxwing/SPARK-3741 and squashes the following commits:
      
      a0bc4dd [zsxwing] Add afterExecute for handleConnectExecutor
      56fd34af
    • WangTaoTheTonic's avatar
      [SPARK-3890][Docs]remove redundant spark.executor.memory in doc · e7f4ea8a
      WangTaoTheTonic authored
      Introduced in https://github.com/pwendell/spark/commit/f7e79bc42c1635686c3af01eef147dae92de2529, I'm not sure why we need two spark.executor.memory here.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #2745 from WangTaoTheTonic/redundantconfig and squashes the following commits:
      
      e7564dc [WangTao] too long line
      fdbdb1f [WangTaoTheTonic] trivial workaround
      d06b6e5 [WangTaoTheTonic] remove redundant spark.executor.memory in doc
      e7f4ea8a
    • Zhang, Liye's avatar
      [SPARK-3941][CORE] _remainingmem should not increase twice when updateBlockInfo · 642b246b
      Zhang, Liye authored
      In BlockManagermasterActor, _remainingMem would increase memSize for twice when updateBlockInfo if new storageLevel is invalid and old storageLevel is "useMemory". Also, _remainingMem should increase with original memory size instead of new memSize.
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #2792 from liyezhang556520/spark-3941-remainMem and squashes the following commits:
      
      3d487cc [Zhang, Liye] make the code concise
      0380a32 [Zhang, Liye] [SPARK-3941][CORE] _remainingmem should not increase twice when updateBlockInfo
      642b246b
    • Kun Li's avatar
      [SQL]typo in HiveFromSpark · be2ec4a9
      Kun Li authored
      Author: Kun Li <jacky.likun@gmail.com>
      
      Closes #2809 from jackylk/patch-1 and squashes the following commits:
      
      46c926b [Kun Li] typo in HiveFromSpark
      be2ec4a9
    • Aaron Davidson's avatar
      [SPARK-3923] Increase Akka heartbeat pause above heartbeat interval · 7f7b50ed
      Aaron Davidson authored
      Something about the 2.3.4 upgrade seems to have made the issue manifest where all the services disconnect from each other after exactly 1000 seconds (which is the heartbeat interval). [This post](https://groups.google.com/forum/#!topic/akka-user/X3xzpTCbEFs) suggests that heartbeat pause should be greater than heartbeat interval, and increasing the pause from 600s to 6000s seems to have rectified the issue. My current cluster has now exceeded 1400s of uptime without failure!
      
      I do not know why this fixed it, because the threshold we have set for the failure detector is the exponent of a timeout, and 300 is extremely large. Perhaps the default failure detector changed in 2.3.4 and now ignores threshold.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #2784 from aarondav/fix-timeout and squashes the following commits:
      
      bd1151a [Aaron Davidson] Increase pause, don't decrease interval
      9cb0372 [Aaron Davidson] [SPARK-3923] Decrease Akka heartbeat interval below heartbeat pause
      7f7b50ed
    • Prashant Sharma's avatar
      SPARK-3874: Provide stable TaskContext API · 2fe0ba95
      Prashant Sharma authored
      This is a small number of clean-up changes on top of #2782. Closes #2782.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #2803 from pwendell/pr-2782 and squashes the following commits:
      
      56d5b7a [Patrick Wendell] Minor clean-up
      44089ec [Patrick Wendell] Clean-up the TaskContext API.
      ed551ce [Prashant Sharma] Fixed a typo
      df261d0 [Prashant Sharma] Josh's suggestion
      facf3b1 [Prashant Sharma] Fixed the mima issue.
      7ecc2fe [Prashant Sharma] CR, Moved implementations to TaskContextImpl
      bbd9e05 [Prashant Sharma] adding missed out files to git.
      ef633f5 [Prashant Sharma] SPARK-3874, Provide stable TaskContext API
      2fe0ba95
    • Cheng Lian's avatar
      [SQL] Fixes the race condition that may cause test failure · 99e416b6
      Cheng Lian authored
      The removed `Future` was used to end the test case as soon as the Spark SQL CLI process exits. When the process exits prematurely, this mechanism prevents the test case to wait until timeout. But it also creates a race condition: when `foundAllExpectedAnswers.tryFailure` is called, there are chances that the last expected output line of the CLI process hasn't been caught by the main logics of the test code, thus fails the test case.
      
      Removing this `Future` doesn't affect correctness.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #2823 from liancheng/clean-clisuite and squashes the following commits:
      
      489a97c [Cheng Lian] Fixes the race condition that may cause test failure
      99e416b6
    • Davies Liu's avatar
      [SPARK-3971] [MLLib] [PySpark] hotfix: Customized pickler should work in cluster mode · 091d32c5
      Davies Liu authored
      Customized pickler should be registered before unpickling, but in executor, there is no way to register the picklers before run the tasks.
      
      So, we need to register the picklers in the tasks itself, duplicate the javaToPython() and pythonToJava() in MLlib, call SerDe.initialize() before pickling or unpickling.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2830 from davies/fix_pickle and squashes the following commits:
      
      0c85fb9 [Davies Liu] revert the privacy change
      6b94e15 [Davies Liu] use JavaConverters instead of JavaConversions
      0f02050 [Davies Liu] hotfix: Customized pickler does not work in cluster
      091d32c5
    • Shiti's avatar
      [SPARK-3944][Core] Code re-factored as suggested · 4c589cac
      Shiti authored
      Author: Shiti <ssaxena.ece@gmail.com>
      
      Closes #2810 from Shiti/master and squashes the following commits:
      
      051d82f [Shiti] setting the default value of uri scheme to "file"  where matching "file" or None yields the same result
      4c589cac
    • prudhvi's avatar
      [Core] Upgrading ScalaStyle version to 0.5 and removing SparkSpaceAfterCommentStartChecker. · 044583a2
      prudhvi authored
      Author: prudhvi <prudhvi953@gmail.com>
      
      Closes #2799 from prudhvije/ScalaStyle/space-after-comment-start and squashes the following commits:
      
      fc263a1 [prudhvi] [Core] Using scalastyle to check the space after comment start
      044583a2
  6. Oct 15, 2014
    • GuoQiang Li's avatar
      [SPARK-2098] All Spark processes should support spark-defaults.conf, config file · 293a0b5d
      GuoQiang Li authored
      This is another implementation about #1256
      cc andrewor14 vanzin
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #2379 from witgo/SPARK-2098-new and squashes the following commits:
      
      4ef1cbd [GuoQiang Li] review commit
      49ef70e [GuoQiang Li] Refactor getDefaultPropertiesFile
      c45d20c [GuoQiang Li] All Spark processes should support spark-defaults.conf, config file
      293a0b5d
  7. Oct 14, 2014
    • Sean Owen's avatar
      SPARK-1307 [DOCS] Don't use term 'standalone' to refer to a Spark Application · 18ab6bd7
      Sean Owen authored
      HT to Diana, just proposing an implementation of her suggestion, which I rather agreed with. Is there a second/third for the motion?
      
      Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2787 from srowen/SPARK-1307 and squashes the following commits:
      
      b5b82e2 [Sean Owen] Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
      18ab6bd7
    • Masayoshi TSUZUKI's avatar
      [SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in Windows · 66af8e25
      Masayoshi TSUZUKI authored
      Modified not to pollute environment variables.
      Just moved the main logic into `XXX2.cmd` from `XXX.cmd`, and call `XXX2.cmd` with cmd command in `XXX.cmd`.
      `pyspark.cmd` and `spark-class.cmd` are already using the same way, but `spark-shell.cmd`, `spark-submit.cmd` and `/python/docs/make.bat` are not.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2797 from tsudukim/feature/SPARK-3943 and squashes the following commits:
      
      b397a7d [Masayoshi TSUZUKI] [SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in Windows
      66af8e25
    • cocoatomo's avatar
      [SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set · 7b4f39f6
      cocoatomo authored
      When _JAVA_OPTIONS environment variable is set, a command "java -version" outputs a message like "Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8".
      ./bin/spark-class knows java version from the first line of "java -version" output, so it mistakes java version with _JAVA_OPTIONS set.
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2725 from cocoatomo/issues/3869-mistake-java-version and squashes the following commits:
      
      f894ebd [cocoatomo] [SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set
      7b4f39f6
    • Sean Owen's avatar
      SPARK-3803 [MLLIB] ArrayIndexOutOfBoundsException found in executing computePrincipalComponents · 56096dba
      Sean Owen authored
      Avoid overflow in computing n*(n+1)/2 as much as possible; throw explicit error when Gramian computation will fail due to negative array size; warn about large result when computing Gramian too
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2801 from srowen/SPARK-3803 and squashes the following commits:
      
      b4e6d92 [Sean Owen] Avoid overflow in computing n*(n+1)/2 as much as possible; throw explicit error when Gramian computation will fail due to negative array size; warn about large result when computing Gramian too
      56096dba
    • shitis's avatar
      [SPARK-3944][Core] Using Option[String] where value of String can be null · 24b818b9
      shitis authored
      Author: shitis <ssaxena.ece@gmail.com>
      
      Closes #2795 from Shiti/master and squashes the following commits:
      
      46897d7 [shitis] Using Option Wrapper to convert String with value null to None
      24b818b9
    • Masayoshi TSUZUKI's avatar
      [SPARK-3946] gitignore in /python includes wrong directory · 7ced88b0
      Masayoshi TSUZUKI authored
      Modified to ignore not the docs/ directory, but only the docs/_build/ which is the output directory of sphinx build.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2796 from tsudukim/feature/SPARK-3946 and squashes the following commits:
      
      2bea6a9 [Masayoshi TSUZUKI] [SPARK-3946] gitignore in /python includes wrong directory
      7ced88b0
    • Bill Bejeck's avatar
      SPARK-3178 setting SPARK_WORKER_MEMORY to a value without a label (m or g)... · 9b6de6fb
      Bill Bejeck authored
      SPARK-3178  setting SPARK_WORKER_MEMORY to a value without a label (m or g) sets the worker memory limit to zero
      
      Validate the memory is greater than zero when set from the SPARK_WORKER_MEMORY environment variable or command line without a g or m label.  Added unit tests. If memory is 0 an IllegalStateException is thrown. Updated unit tests to mock environment variables by subclassing SparkConf (tip provided by Josh Rosen).   Updated WorkerArguments to use SparkConf.getenv instead of System.getenv for reading the SPARK_WORKER_MEMORY environment variable.
      
      Author: Bill Bejeck <bbejeck@gmail.com>
      
      Closes #2309 from bbejeck/spark-memory-worker and squashes the following commits:
      
      51cf915 [Bill Bejeck] SPARK-3178 - Validate the memory is greater than zero when set from the SPARK_WORKER_MEMORY environment variable or command line without a g or m label.  Added unit tests. If memory is 0 an IllegalStateException is thrown. Updated unit tests to mock environment variables by subclassing SparkConf (tip provided by Josh Rosen).   Updated WorkerArguments to use SparkConf.getenv instead of System.getenv for reading the SPARK_WORKER_MEMORY environment variable.
      9b6de6fb
Loading