Skip to content
Snippets Groups Projects
  1. May 16, 2015
  2. May 15, 2015
    • AiHe's avatar
      [SPARK-7473] [MLLIB] Add reservoir sample in RandomForest · deb41133
      AiHe authored
      reservoir feature sample by using existing api
      
      Author: AiHe <ai.he@ussuning.com>
      
      Closes #5988 from AiHe/reservoir and squashes the following commits:
      
      e7a41ac [AiHe] remove non-robust testing case
      28ffb9a [AiHe] set seed as rng.nextLong
      37459e1 [AiHe] set fixed seed
      1e98a4c [AiHe] [MLLIB][tree] Add reservoir sample in RandomForest
      deb41133
    • Davies Liu's avatar
      [SPARK-7543] [SQL] [PySpark] split dataframe.py into multiple files · d7b69946
      Davies Liu authored
      dataframe.py is splited into column.py, group.py and dataframe.py:
      ```
         360 column.py
        1223 dataframe.py
         183 group.py
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6201 from davies/split_df and squashes the following commits:
      
      fc8f5ab [Davies Liu] split dataframe.py into multiple files
      d7b69946
    • Davies Liu's avatar
      [SPARK-7073] [SQL] [PySpark] Clean up SQL data type hierarchy in Python · adfd3668
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6206 from davies/sql_type and squashes the following commits:
      
      33d6860 [Davies Liu] [SPARK-7073] [SQL] [PySpark] Clean up SQL data type hierarchy in Python
      adfd3668
    • Ram Sriharsha's avatar
      [SPARK-7575] [ML] [DOC] Example code for OneVsRest · cc12a86f
      Ram Sriharsha authored
      Java and Scala examples for OneVsRest. Fixes the base classifier to be Logistic Regression and accepts the configuration parameters of the base classifier.
      
      Author: Ram Sriharsha <rsriharsha@hw11853.local>
      
      Closes #6115 from harsha2010/SPARK-7575 and squashes the following commits:
      
      87ad3c7 [Ram Sriharsha] extra line
      f5d9891 [Ram Sriharsha] Merge branch 'master' into SPARK-7575
      7076084 [Ram Sriharsha] cleanup
      dfd660c [Ram Sriharsha] cleanup
      8703e4f [Ram Sriharsha] update doc
      cb23995 [Ram Sriharsha] fix commandline options for JavaOneVsRestExample
      69e91f8 [Ram Sriharsha] cleanup
      7f4e127 [Ram Sriharsha] cleanup
      d4c40d0 [Ram Sriharsha] Code Review fixes
      461eb38 [Ram Sriharsha] cleanup
      e0106d9 [Ram Sriharsha] Fix typo
      935cf56 [Ram Sriharsha] Try to match Java and Scala Example Commandline options
      5323ff9 [Ram Sriharsha] cleanup
      196a59a [Ram Sriharsha] cleanup
      6adfa0c [Ram Sriharsha] Style Fix
      8cfc5d5 [Ram Sriharsha] [SPARK-7575] Example code for OneVsRest
      cc12a86f
    • Josh Rosen's avatar
      [SPARK-7563] OutputCommitCoordinator.stop() should only run on the driver · 2c04c8a1
      Josh Rosen authored
      This fixes a bug where an executor that exits can cause the driver's OutputCommitCoordinator to stop. To fix this, we use an `isDriver` flag and check it in `stop()`.
      
      See https://issues.apache.org/jira/browse/SPARK-7563 for more details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6197 from JoshRosen/SPARK-7563 and squashes the following commits:
      
      04b2cc5 [Josh Rosen] [SPARK-7563] OutputCommitCoordinator.stop() should only be executed on the driver
      2c04c8a1
    • Kay Ousterhout's avatar
      [SPARK-7676] Bug fix and cleanup of stage timeline view · e7454564
      Kay Ousterhout authored
      cc pwendell sarutak
      
      This commit cleans up some unnecessary code, eliminates the feature where when you mouse-over a box in the timeline, the corresponding task is highlighted in the table (because that feature is only useful in the rare case when you have a very small number of tasks, in which case it's easy to figure out the mapping anyway), and fixes a bug where nothing shows up if you try to visualize a stage with only 1 task.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #6202 from kayousterhout/SPARK-7676 and squashes the following commits:
      
      dfd29d4 [Kay Ousterhout] [SPARK-7676] Bug fix and cleanup of stage timeline view
      e7454564
    • Liang-Chi Hsieh's avatar
      [SPARK-7556] [ML] [DOC] Add user guide for spark.ml Binarizer, including... · c8696337
      Liang-Chi Hsieh authored
      [SPARK-7556] [ML] [DOC] Add user guide for spark.ml Binarizer, including Scala, Java and Python examples
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-7556
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6116 from viirya/binarizer_doc and squashes the following commits:
      
      40cb677 [Liang-Chi Hsieh] Better print out.
      5b7ef1d [Liang-Chi Hsieh] Make examples more clear.
      1bf9c09 [Liang-Chi Hsieh] For comments.
      6cf8cba [Liang-Chi Hsieh] Add user guide for Binarizer.
      c8696337
    • Iulian Dragos's avatar
      [SPARK-7677] [STREAMING] Add Kafka modules to the 2.11 build. · 6e77105e
      Iulian Dragos authored
      This is somewhat related to [SPARK-6154](https://issues.apache.org/jira/browse/SPARK-6154), though it only touches Kafka, not the jline dependency for thriftserver.
      
      I tested this locally on 2.11 (./run-tests) and everything looked good (I had to disable mima, because `MimaBuild` harcodes 2.10 for the previous version -- that's another PR).
      
      Author: Iulian Dragos <jaguarul@gmail.com>
      
      Closes #6149 from dragos/issue/spark-2.11-kafka and squashes the following commits:
      
      aa15d99 [Iulian Dragos] Add Kafka modules to the 2.11 build.
      6e77105e
    • qhuang's avatar
      [SPARK-7226] [SPARKR] Support math functions in R DataFrame · 50da9e89
      qhuang authored
      Author: qhuang <qian.huang@intel.com>
      
      Closes #6170 from hqzizania/master and squashes the following commits:
      
      f20c39f [qhuang] add tests units and fixes
      2a7d121 [qhuang] use a function name more familiar to R users
      07aa72e [qhuang] Support math functions in R DataFrame
      50da9e89
    • Kousuke Saruta's avatar
      [SPARK-7296] Add timeline visualization for stages in the UI. · 9b6cf285
      Kousuke Saruta authored
      This PR builds on #2342 by adding a timeline view for the Stage page,
      showing how tasks spend their time.
      
      With this timeline, we can understand following things of a Stage.
      
      * When/where each task ran
      * Total duration of each task
      * Proportion of the time each task spends
      
      Also, this timeline view can scrollable and zoomable.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #5843 from sarutak/stage-page-timeline and squashes the following commits:
      
      4ba9604 [Kousuke Saruta] Fixed the order of legends
      16bb552 [Kousuke Saruta] Removed border of legend area
      2e5d605 [Kousuke Saruta] Modified warning message
      16cb2e6 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into stage-page-timeline
      7ae328f [Kousuke Saruta] Modified code style
      d5f794a [Kousuke Saruta] Fixed performance issues more
      64e6642 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into stage-page-timeline
      e4a3354 [Kousuke Saruta] minor code style change
      878e3b8 [Kousuke Saruta] Fixed a bug that tooltip remains
      b9d8f1b [Kousuke Saruta] Fixed performance issue
      ac8842b [Kousuke Saruta] Fixed layout
      2319739 [Kousuke Saruta] Modified appearances more
      81903ab [Kousuke Saruta] Modified appearances
      a79dcc3 [Kousuke Saruta] Modified appearance
      55a390c [Kousuke Saruta] Ignored scalastyle for a line-comment
      29eae3e [Kousuke Saruta] limited to longest 1000 tasks
      2a9e376 [Kousuke Saruta] Minor cleanup
      385b6d2 [Kousuke Saruta] Added link feature
      ba1ac3e [Kousuke Saruta] Fixed style
      2ae8520 [Kousuke Saruta] Updated bootstrap-tooltip.js from 2.2.2 to 2.3.2
      af430f1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into stage-page-timeline
      e694b8e [Kousuke Saruta] Added timeline view to StagePage
      8f6610c [Kousuke Saruta] Fixed conflict
      b587cf2 [Kousuke Saruta] initial commit
      11fe67d [Kousuke Saruta] Fixed conflict
      79ac03d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      a91abd3 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
      ef34a5b [Kousuke Saruta] Implement tooltip using bootstrap
      b09d0c5 [Kousuke Saruta] Move `stroke` and `fill` attribute of rect elements to css
      d3c63c8 [Kousuke Saruta] Fixed a little bit bugs
      a36291b [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
      28714b6 [Kousuke Saruta] Fixed highlight issue
      0dc4278 [Kousuke Saruta] Addressed most of Patrics's feedbacks
      8110acf [Kousuke Saruta] Added scroll limit to Job timeline
      974a64a [Kousuke Saruta] Removed unused function
      ee7a7f0 [Kousuke Saruta] Refactored
      6a91872 [Kousuke Saruta] Temporary commit
      6693f34 [Kousuke Saruta] Added link to job/stage box in the timeline in order to move to corresponding row when we click
      8f88222 [Kousuke Saruta] Added job/stage description
      aeed4b1 [Kousuke Saruta] Removed stage timeline
      fc1696c [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
      999ccd4 [Kousuke Saruta] Improved scalability
      0fc6a31 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      19815ae [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      68b7540 [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
      52b5f0b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      dec85db [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      fcdab7d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      dab7cc1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      09cce97 [Kousuke Saruta] Cleanuped
      16f82cf [Kousuke Saruta] Cleanuped
      9fb522e [Kousuke Saruta] Cleanuped
      d05f2c2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      e85e9aa [Kousuke Saruta] Cleanup: Added TimelineViewUtils.scala
      a76e569 [Kousuke Saruta] Removed unused setting in timeline-view.css
      5ce1b21 [Kousuke Saruta] Added vis.min.js, vis.min.css and vis.map to .rat-exclude
      082f709 [Kousuke Saruta] Added Timeline-View feature for Applications, Jobs and Stages
      9b6cf285
    • ehnalis's avatar
      [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode · 8e3822a0
      ehnalis authored
      Added a simple checking for SparkContext.
      Also added two rational checking against null at AM object.
      
      Author: ehnalis <zoltan.zvara@gmail.com>
      
      Closes #6083 from ehnalis/cluster and squashes the following commits:
      
      926bd96 [ehnalis] Moved check to SparkContext.
      7c89b6e [ehnalis] Remove false line.
      ea2a5fe [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      4924e01 [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      39e4fa3 [ehnalis] SPARK-7504 [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      9f287c5 [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      8e3822a0
    • Kousuke Saruta's avatar
      [SPARK-7664] [WEBUI] DAG visualization: Fix incorrect link paths of DAG. · ad92af9d
      Kousuke Saruta authored
      In JobPage, we can jump a StagePage when we click corresponding box of DAG viz but the link path is incorrect.
      
      When we click a box like as follows ...
      ![screenshot_from_2015-05-15 19 24 25](https://cloud.githubusercontent.com/assets/4736016/7651528/5f7ef824-fb3c-11e4-9518-8c9ade2dff7a.png)
      
      We jump to index page.
      ![screenshot_from_2015-05-15 19 24 45](https://cloud.githubusercontent.com/assets/4736016/7651534/6d666274-fb3c-11e4-971c-c3f2dc2b1da2.png)
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #6184 from sarutak/fix-link-path-of-dag-viz and squashes the following commits:
      
      faba3ba [Kousuke Saruta] Fix a incorrect link
      ad92af9d
    • Sean Owen's avatar
      [SPARK-5412] [DEPLOY] Cannot bind Master to a specific hostname as per the documentation · 8ab1450d
      Sean Owen authored
      Pass args to start-master.sh through to start-daemon.sh, as other scripts do, so that things like --host have effect on start-master.sh as per docs
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6185 from srowen/SPARK-5412 and squashes the following commits:
      
      b3ce9da [Sean Owen] Pass args to start-master.sh through to start-daemon.sh, as other scripts do, so that things like --host have effect on start-master.sh as per docs
      8ab1450d
    • Tim Ellison's avatar
      [CORE] Protect additional test vars from early GC · 270d4b51
      Tim Ellison authored
      Fix more places in which some test variables could be collected early by aggressive JVM optimization.
      Added a couple of comments to note where existing references are sufficient in the same test pattern.
      
      Author: Tim Ellison <t.p.ellison@gmail.com>
      
      Closes #6187 from tellison/DefeatEarlyGC and squashes the following commits:
      
      27329d9 [Tim Ellison] [CORE] Protect additional test vars from early GC
      270d4b51
    • Oleksii Kostyliev's avatar
      [SPARK-7233] [CORE] Detect REPL mode once · b1b9d580
      Oleksii Kostyliev authored
      <h3>Description</h3>
      Detect REPL mode once per JVM lifespan.
      Previous behavior was to check presence of interpreter mode every time a job was submitted. In the case of execution of multiple short-living jobs this was causing massive mutual blocks between submission threads.
      
      For more details please refer to https://issues.apache.org/jira/browse/SPARK-7233.
      
      <h3>Notes</h3>
      * I inverted the return value in case of catching an exception from `true` to `false`. It seems more logical to assume that if the REPL class is not found, we aren't in the interpreter mode.
      * I'd personally would call `classForName` with just a Spark classloader (`org.apache.spark.util.Utils#getSparkClassLoader`) but `org.apache.spark.util.Utils#getContextOrSparkClassLoader` is said to be preferable.
      * I struggled to come up with a concise, readable and clear unit test. Suggestions are welcome if you feel necessary.
      
      Author: Oleksii Kostyliev <etander@gmail.com>
      Author: Oleksii Kostyliev <okostyliev@thunderhead.com>
      
      Closes #5835 from preeze/SPARK-7233 and squashes the following commits:
      
      69bb9e4 [Oleksii Kostyliev] SPARK-7527: fixed explanatory comment to meet style-checker requirements
      26dcc24 [Oleksii Kostyliev] SPARK-7527: fixed explanatory comment to meet style-checker requirements
      c6f9685 [Oleksii Kostyliev] Merge remote-tracking branch 'remotes/upstream/master' into SPARK-7233
      b78a983 [Oleksii Kostyliev] SPARK-7527: revert the fix and let it be addressed separately at a later stage
      b64d441 [Oleksii Kostyliev] SPARK-7233: inline inInterpreter parameter into instantiateClass
      86e2606 [Oleksii Kostyliev] SPARK-7233, SPARK-7527: Handle interpreter mode properly.
      c7ee69c [Oleksii Kostyliev] Merge remote-tracking branch 'upstream/master' into SPARK-7233
      d6c07fc [Oleksii Kostyliev] SPARK-7233: properly handle the inverted meaning of isInInterpreter
      c319039 [Oleksii Kostyliev] SPARK-7233: move inInterpreter to Utils and make it lazy
      b1b9d580
    • FlytxtRnD's avatar
      [SPARK-7651] [MLLIB] [PYSPARK] GMM predict, predictSoft should raise error on bad input · 8f4aaba0
      FlytxtRnD authored
      In the Python API for Gaussian Mixture Model, predict() and predictSoft() methods should raise an error when the input argument is not an RDD.
      
      Author: FlytxtRnD <meethu.mathew@flytxt.com>
      
      Closes #6180 from FlytxtRnD/GmmPredictException and squashes the following commits:
      
      4b6aa11 [FlytxtRnD] Raise error if the input to predict()/predictSoft() is not an RDD
      8f4aaba0
    • Liang-Chi Hsieh's avatar
      [SPARK-7668] [MLLIB] Preserve isTransposed property for Matrix after calling map function · f96b85ab
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7668
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6188 from viirya/fix_matrix_map and squashes the following commits:
      
      2a7cc97 [Liang-Chi Hsieh] Preserve isTransposed property for Matrix after calling map function.
      f96b85ab
    • Kousuke Saruta's avatar
      [SPARK-7503] [YARN] Resources in .sparkStaging directory can't be cleaned up on error · c64ff803
      Kousuke Saruta authored
      When we run applications on YARN with cluster mode, uploaded resources on .sparkStaging directory can't be cleaned up in case of failure of uploading local resources.
      
      You can see this issue by running following command.
      ```
      bin/spark-submit --master yarn --deploy-mode cluster --class <someClassName> <non-existing-jar>
      ```
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #6026 from sarutak/delete-uploaded-resources-on-error and squashes the following commits:
      
      caef9f4 [Kousuke Saruta] Fixed style
      882f921 [Kousuke Saruta] Wrapped Client#submitApplication with try/catch blocks in order to delete resources on error
      1786ca4 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into delete-uploaded-resources-on-error
      f61071b [Kousuke Saruta] Fixed cleanup problem
      c64ff803
    • Cheng Lian's avatar
      [SPARK-7591] [SQL] Partitioning support API tweaks · fdf5bba3
      Cheng Lian authored
      Please see [SPARK-7591] [1] for the details.
      
      /cc rxin marmbrus yhuai
      
      [1]: https://issues.apache.org/jira/browse/SPARK-7591
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6150 from liancheng/spark-7591 and squashes the following commits:
      
      af422e7 [Cheng Lian] Addresses @rxin's comments
      37d1738 [Cheng Lian] Fixes HadoopFsRelation partition columns initialization
      2fc680a [Cheng Lian] Fixes Scala style issue
      189ad23 [Cheng Lian] Removes HadoopFsRelation constructor arguments
      522c24e [Cheng Lian] Adds OutputWriterFactory
      047d40d [Cheng Lian] Renames FSBased* to HadoopFs*, also renamed FSBasedParquetRelation back to ParquetRelation2
      fdf5bba3
    • Yanbo Liang's avatar
      [SPARK-6258] [MLLIB] GaussianMixture Python API parity check · 94761485
      Yanbo Liang authored
      Implement Python API for major disparities of GaussianMixture cluster algorithm between Scala & Python
      ```scala
      GaussianMixture
          setInitialModel
      GaussianMixtureModel
          k
      ```
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #6087 from yanboliang/spark-6258 and squashes the following commits:
      
      b3af21c [Yanbo Liang] fix typo
      2b645c1 [Yanbo Liang] fix doc
      638b4b7 [Yanbo Liang] address comments
      b5bcade [Yanbo Liang] GaussianMixture Python API parity check
      94761485
    • zsxwing's avatar
      [SPARK-7650] [STREAMING] [WEBUI] Move streaming css and js files to the streaming project · cf842d42
      zsxwing authored
      cc tdas
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6160 from zsxwing/SPARK-7650 and squashes the following commits:
      
      fe6ae15 [zsxwing] Fix the import order
      a4ffd99 [zsxwing] Merge branch 'master' into SPARK-7650
      dc402b6 [zsxwing] Move streaming css and js files to the streaming project
      cf842d42
    • Kan Zhang's avatar
      [CORE] Remove unreachable Heartbeat message from Worker · daf4ae72
      Kan Zhang authored
      It doesn't look to me Heartbeat is sent to Worker from anyone.
      
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #6163 from kanzhang/deadwood and squashes the following commits:
      
      56be118 [Kan Zhang] [core] Remove unreachable Heartbeat message from Worker
      daf4ae72
    • Josh Rosen's avatar
  3. May 14, 2015
    • Yin Huai's avatar
      [SQL] When creating partitioned table scan, explicitly create UnionRDD. · e8f0e016
      Yin Huai authored
      Otherwise, it will cause stack overflow when there are many partitions.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6162 from yhuai/partitionUnionedRDD and squashes the following commits:
      
      fa016d8 [Yin Huai] Explicitly create UnionRDD.
      e8f0e016
    • Liang-Chi Hsieh's avatar
      [SPARK-7098][SQL] Make the WHERE clause with timestamp show consistent result · f9705d46
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7098
      
      The WHERE clause with timstamp shows inconsistent results. This pr fixes it.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #5682 from viirya/consistent_timestamp and squashes the following commits:
      
      171445a [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into consistent_timestamp
      4e98520 [Liang-Chi Hsieh] Make the WHERE clause with timestamp show consistent result.
      f9705d46
    • Michael Armbrust's avatar
      [SPARK-7548] [SQL] Add explode function for DataFrames · 6d0633e3
      Michael Armbrust authored
      Add an `explode` function for dataframes and modify the analyzer so that single table generating functions can be present in a select clause along with other expressions.   There are currently the following restrictions:
       - only top level TGFs are allowed (i.e. no `select(explode('list) + 1)`)
       - only one may be present in a single select to avoid potentially confusing implicit Cartesian products.
      
      TODO:
       - [ ] Python
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #6107 from marmbrus/explodeFunction and squashes the following commits:
      
      7ee2c87 [Michael Armbrust] whitespace
      6f80ba3 [Michael Armbrust] Update dataframe.py
      c176c89 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      81b5da3 [Michael Armbrust] style
      d3faa05 [Michael Armbrust] fix self join case
      f9e1e3e [Michael Armbrust] fix python, add since
      4f0d0a9 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
      e710fe4 [Michael Armbrust] add java and python
      52ca0dc [Michael Armbrust] [SPARK-7548][SQL] Add explode function for dataframes.
      6d0633e3
    • Xiangrui Meng's avatar
      [SPARK-7619] [PYTHON] fix docstring signature · 48fc38f5
      Xiangrui Meng authored
      Just realized that we need `\` at the end of the docstring. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6161 from mengxr/SPARK-7619 and squashes the following commits:
      
      e44495f [Xiangrui Meng] fix docstring signature
      48fc38f5
    • Xiangrui Meng's avatar
      [SPARK-7648] [MLLIB] Add weights and intercept to GLM wrappers in spark.ml · 723853ed
      Xiangrui Meng authored
      Otherwise, users can only use `transform` on the models. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6156 from mengxr/SPARK-7647 and squashes the following commits:
      
      1ae3d2d [Xiangrui Meng] add weights and intercept to LogisticRegression in Python
      f49eb46 [Xiangrui Meng] add weights and intercept to LinearRegressionModel
      723853ed
    • zsxwing's avatar
      [SPARK-7645] [STREAMING] [WEBUI] Show milliseconds in the UI if the batch interval < 1 second · b208f998
      zsxwing authored
      I also updated the summary of the Streaming page.
      
      ![screen shot 2015-05-14 at 11 52 59 am](https://cloud.githubusercontent.com/assets/1000778/7640103/13cdf68e-fa36-11e4-84ec-e2a3954f4319.png)
      ![screen shot 2015-05-14 at 12 39 33 pm](https://cloud.githubusercontent.com/assets/1000778/7640151/4cc066ac-fa36-11e4-8494-2821d6a6f17c.png)
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6154 from zsxwing/SPARK-7645 and squashes the following commits:
      
      5db6ca1 [zsxwing] Add UIUtils.formatBatchTime
      e4802df [zsxwing] Show milliseconds in the UI if the batch interval < 1 second
      b208f998
    • zsxwing's avatar
      [SPARK-7649] [STREAMING] [WEBUI] Use window.localStorage to store the status rather than the url · 0a317c12
      zsxwing authored
      Use window.localStorage to store the status rather than the url so that the url won't be changed.
      
      cc tdas
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6158 from zsxwing/SPARK-7649 and squashes the following commits:
      
      3c56fef [zsxwing] Use window.localStorage to store the status rather than the url
      0a317c12
    • Xiangrui Meng's avatar
      [SPARK-7643] [UI] use the correct size in RDDPage for storage info and partitions · 57ed16cf
      Xiangrui Meng authored
      `dataDistribution` and `partitions` are `Option[Seq[_]]`. andrewor14 squito
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6157 from mengxr/SPARK-7643 and squashes the following commits:
      
      99fe8a4 [Xiangrui Meng] use the correct size in RDDPage for storage info and partitions
      57ed16cf
    • Rex Xiong's avatar
      [SPARK-7598] [DEPLOY] Add aliveWorkers metrics in Master · 93dbb3ad
      Rex Xiong authored
      In Spark Standalone setup, when some workers are DEAD, they will stay in master worker list for a while.
      master.workers metrics for master is only showing the total number of workers, we need to monitor how many real ALIVE workers are there to ensure the cluster is healthy.
      
      Author: Rex Xiong <pengx@microsoft.com>
      
      Closes #6117 from twilightgod/add-aliveWorker-metrics and squashes the following commits:
      
      6be69a5 [Rex Xiong] Fix comment for aliveWorkers metrics
      a882f39 [Rex Xiong] Fix style for aliveWorkers metrics
      38ce955 [Rex Xiong] Add aliveWorkers metrics in Master
      93dbb3ad
    • tedyu's avatar
      Make SPARK prefix a variable · 11a1a135
      tedyu authored
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #6153 from ted-yu/master and squashes the following commits:
      
      4e0bac5 [tedyu] Use JIRA_PROJECT_NAME as variable name
      ab982aa [tedyu] Make SPARK prefix a variable
      11a1a135
    • ksonj's avatar
      [SPARK-7278] [PySpark] DateType should find datetime.datetime acceptable · 5d7d4f88
      ksonj authored
      DateType should not be restricted to `datetime.date` but accept `datetime.datetime` objects as well. Could someone with a little more insight verify this?
      
      Author: ksonj <kson@siberie.de>
      
      Closes #6057 from ksonj/dates and squashes the following commits:
      
      68a158e [ksonj] DateType should find datetime.datetime acceptable too
      5d7d4f88
    • Wenchen Fan's avatar
      [SQL][minor] rename apply for QueryPlanner · f2cd00be
      Wenchen Fan authored
      A follow-up of https://github.com/apache/spark/pull/5624
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #6142 from cloud-fan/tmp and squashes the following commits:
      
      971a92b [Wenchen Fan] use plan instead of execute
      24c5ffe [Wenchen Fan] rename apply
      f2cd00be
    • FavioVazquez's avatar
      [SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versions · 7fb715de
      FavioVazquez authored
      Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons.
      
      Changes proposed by vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly.
      
      Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned.
      
      Author: FavioVazquez <favio.vazquezp@gmail.com>
      
      Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits:
      
      11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh
      379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior
      3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies
      31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation
      cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about  hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies
      83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml
      93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM
      668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM
      fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh  due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml
      0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0
      a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml
      199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that.
      88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file
      70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles
      287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc.
      1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation
      6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff.
      7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons
      ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
      7fb715de
    • DB Tsai's avatar
      [SPARK-7568] [ML] ml.LogisticRegression doesn't output the right prediction · c1080b6f
      DB Tsai authored
      The difference is because we previously don't fit the intercept in Spark 1.3. Here, we change the input `String` so that the probability of instance 6 can be classified as `1.0` without any ambiguity.
      
      with lambda = 0.001 in current LOR implementation, the prediction is
      ```
      (4, spark i j k) --> prob=[0.1596407738787411,0.8403592261212589], prediction=1.0
      (5, l m n) --> prob=[0.8378325685476612,0.16216743145233883], prediction=0.0
      (6, spark hadoop spark) --> prob=[0.0692663313297627,0.9307336686702373], prediction=1.0
      (7, apache hadoop) --> prob=[0.9821575333444208,0.01784246665557917], prediction=0.0
      ```
      and the training accuracy is
      ```
      (0, a b c d e spark) --> prob=[0.0021342419881406746,0.9978657580118594], prediction=1.0
      (1, b d) --> prob=[0.9959176174854043,0.004082382514595685], prediction=0.0
      (2, spark f g h) --> prob=[0.0014541569986711233,0.9985458430013289], prediction=1.0
      (3, hadoop mapreduce) --> prob=[0.9982978367343561,0.0017021632656438518], prediction=0.0
      ```
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #6109 from dbtsai/lor-example and squashes the following commits:
      
      ac63ce4 [DB Tsai] first commit
      c1080b6f
    • Xiangrui Meng's avatar
      [SPARK-7407] [MLLIB] use uid + name to identify parameters · 1b8625f4
      Xiangrui Meng authored
      A param instance is strongly attached to an parent in the current implementation. So if we make a copy of an estimator or a transformer in pipelines and other meta-algorithms, it becomes error-prone to copy the params to the copied instances. In this PR, a param is identified by its parent's UID and the param name. So it becomes loosely attached to its parent and all its derivatives. The UID is preserved during copying or fitting. All components now have a default constructor and a constructor that takes a UID as input. I keep the constructors for Param in this PR to reduce the amount of diff and moved `parent` as a mutable field.
      
      This PR still needs some clean-ups, and there are several spark.ml PRs pending. I'll try to get them merged first and then update this PR.
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6019 from mengxr/SPARK-7407 and squashes the following commits:
      
      c4c8120 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7407
      520f0a2 [Xiangrui Meng] address comments
      2569168 [Xiangrui Meng] fix tests
      873caca [Xiangrui Meng] fix tests in OneVsRest; fix a racing condition in shouldOwn
      409ea08 [Xiangrui Meng] minor updates
      83a163c [Xiangrui Meng] update JavaDeveloperApiExample
      5db5325 [Xiangrui Meng] update OneVsRest
      7bde7ae [Xiangrui Meng] merge master
      697fdf9 [Xiangrui Meng] update Bucketizer
      7b4f6c2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7407
      629d402 [Xiangrui Meng] fix LRSuite
      154516f [Xiangrui Meng] merge master
      aa4a611 [Xiangrui Meng] fix examples/compile
      a4794dd [Xiangrui Meng] change Param to use  to reduce the size of diff
      fdbc415 [Xiangrui Meng] all tests passed
      c255f17 [Xiangrui Meng] fix tests in ParamsSuite
      818e1db [Xiangrui Meng] merge master
      e1160cf [Xiangrui Meng] fix tests
      fbc39f0 [Xiangrui Meng] pass test:compile
      108937e [Xiangrui Meng] pass compile
      8726d39 [Xiangrui Meng] use parent uid in Param
      eaeed35 [Xiangrui Meng] update Identifiable
      1b8625f4
Loading