Skip to content
Snippets Groups Projects
  1. Sep 09, 2015
    • Jeff Zhang's avatar
      [SPARK-10481] [YARN] SPARK_PREPEND_CLASSES make spark-yarn related jar could n… · c0052d8d
      Jeff Zhang authored
      Throw a more readable exception. Please help review. Thanks
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #8649 from zjffdu/SPARK-10481.
      c0052d8d
    • lewuathe's avatar
      [SPARK-10117] [MLLIB] Implement SQL data source API for reading LIBSVM data · 2ddeb631
      lewuathe authored
      It is convenient to implement data source API for LIBSVM format to have a better integration with DataFrames and ML pipeline API.
      
      Two option is implemented.
      * `numFeatures`: Specify the dimension of features vector
      * `featuresType`: Specify the type of output vector. `sparse` is default.
      
      Author: lewuathe <lewuathe@me.com>
      
      Closes #8537 from Lewuathe/SPARK-10117 and squashes the following commits:
      
      986999d [lewuathe] Change unit test phrase
      11d513f [lewuathe] Fix some reviews
      21600a4 [lewuathe] Merge branch 'master' into SPARK-10117
      9ce63c7 [lewuathe] Rewrite service loader file
      1fdd2df [lewuathe] Merge branch 'SPARK-10117' of github.com:Lewuathe/spark into SPARK-10117
      ba3657c [lewuathe] Merge branch 'master' into SPARK-10117
      0ea1c1c [lewuathe] LibSVMRelation is registered into META-INF
      4f40891 [lewuathe] Improve test suites
      5ab62ab [lewuathe] Merge branch 'master' into SPARK-10117
      8660d0e [lewuathe] Fix Java unit test
      b56a948 [lewuathe] Merge branch 'master' into SPARK-10117
      2c12894 [lewuathe] Remove unnecessary tag
      7d693c2 [lewuathe] Resolv conflict
      62010af [lewuathe] Merge branch 'master' into SPARK-10117
      a97ee97 [lewuathe] Fix some points
      aef9564 [lewuathe] Fix
      70ee4dd [lewuathe] Add Java test
      3fd8dce [lewuathe] [SPARK-10117] Implement SQL data source API for reading LIBSVM data
      40d3027 [lewuathe] Add Java test
      7056d4a [lewuathe] Merge branch 'master' into SPARK-10117
      99accaa [lewuathe] [SPARK-10117] Implement SQL data source API for reading LIBSVM data
      2ddeb631
    • Luc Bourlier's avatar
      [SPARK-10227] fatal warnings with sbt on Scala 2.11 · c1bc4f43
      Luc Bourlier authored
      The bulk of the changes are on `transient` annotation on class parameter. Often the compiler doesn't generate a field for this parameters, so the the transient annotation would be unnecessary.
      But if the class parameter are used in methods, then fields are created. So it is safer to keep the annotations.
      
      The remainder are some potential bugs, and deprecated syntax.
      
      Author: Luc Bourlier <luc.bourlier@typesafe.com>
      
      Closes #8433 from skyluc/issue/sbt-2.11.
      c1bc4f43
    • Yuhao Yang's avatar
      [SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User Guide · 91a577d2
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-10249
      
      update user guide since python support added.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #8620 from hhbyyh/swPyDocExample.
      91a577d2
    • Holden Karau's avatar
      [SPARK-9654] [ML] [PYSPARK] Add IndexToString to PySpark · 2f6fd525
      Holden Karau authored
      Adds IndexToString to PySpark.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #7976 from holdenk/SPARK-9654-add-string-indexer-inverse-in-pyspark.
      2f6fd525
  2. Sep 08, 2015
    • noelsmith's avatar
      [SPARK-10094] Pyspark ML Feature transformers marked as experimental · 0e2f2163
      noelsmith authored
      Modified class-level docstrings to mark all feature transformers in pyspark.ml as experimental.
      
      Author: noelsmith <mail@noelsmith.com>
      
      Closes #8623 from noel-smith/SPARK-10094-mark-pyspark-ml-trans-exp.
      0e2f2163
    • Davies Liu's avatar
      [SPARK-10373] [PYSPARK] move @since into pyspark from sql · 3a11e50e
      Davies Liu authored
      cc mengxr
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8657 from davies/move_since.
      3a11e50e
    • Yanbo Liang's avatar
      [SPARK-10464] [MLLIB] Add WeibullGenerator for RandomDataGenerator · a1573489
      Yanbo Liang authored
      Add WeibullGenerator for RandomDataGenerator.
      #8611 need use WeibullGenerator to generate random data based on Weibull distribution.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8622 from yanboliang/spark-10464.
      a1573489
    • Xiangrui Meng's avatar
      [SPARK-9834] [MLLIB] implement weighted least squares via normal equation · 52fe32f6
      Xiangrui Meng authored
      The goal of this PR is to have a weighted least squares implementation that takes the normal equation approach, and hence to be able to provide R-like summary statistics and support IRLS (used by GLMs). The tests match R's lm and glmnet.
      
      There are couple TODOs that can be addressed in future PRs:
      * consolidate summary statistics aggregators
      * move `dspr` to `BLAS`
      * etc
      
      It would be nice to have this merged first because it blocks couple other features.
      
      dbtsai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8588 from mengxr/SPARK-9834.
      52fe32f6
    • zsxwing's avatar
      [SPARK-10071] [STREAMING] Output a warning when writing QueueInputDStream and... · 820913f5
      zsxwing authored
      [SPARK-10071] [STREAMING] Output a warning when writing QueueInputDStream and throw a better exception when reading QueueInputDStream
      
      Output a warning when serializing QueueInputDStream rather than throwing an exception to allow unit tests use it. Moreover, this PR also throws an better exception when deserializing QueueInputDStream to make the user find out the problem easily. The previous exception is hard to understand: https://issues.apache.org/jira/browse/SPARK-8553
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8624 from zsxwing/SPARK-10071 and squashes the following commits:
      
      847cfa8 [zsxwing] Output a warning when writing QueueInputDStream and throw a better exception when reading QueueInputDStream
      820913f5
    • Reynold Xin's avatar
      [RELEASE] Add more contributors & only show names in release notes. · ae74c3fa
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8660 from rxin/contrib.
      ae74c3fa
    • Michael Armbrust's avatar
      [HOTFIX] Fix build break caused by #8494 · 2143d592
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #8659 from marmbrus/testBuildBreak.
      2143d592
    • Cheng Hao's avatar
      [SPARK-10327] [SQL] Cache Table is not working while subquery has alias in its project list · d637a666
      Cheng Hao authored
      ```scala
          import org.apache.spark.sql.hive.execution.HiveTableScan
          sql("select key, value, key + 1 from src").registerTempTable("abc")
          cacheTable("abc")
      
          val sparkPlan = sql(
            """select a.key, b.key, c.key from
              |abc a join abc b on a.key=b.key
              |join abc c on a.key=c.key""".stripMargin).queryExecution.sparkPlan
      
          assert(sparkPlan.collect { case e: InMemoryColumnarTableScan => e }.size === 3) // failed
          assert(sparkPlan.collect { case e: HiveTableScan => e }.size === 0) // failed
      ```
      
      The actual plan is:
      
      ```
      == Parsed Logical Plan ==
      'Project [unresolvedalias('a.key),unresolvedalias('b.key),unresolvedalias('c.key)]
       'Join Inner, Some(('a.key = 'c.key))
        'Join Inner, Some(('a.key = 'b.key))
         'UnresolvedRelation [abc], Some(a)
         'UnresolvedRelation [abc], Some(b)
        'UnresolvedRelation [abc], Some(c)
      
      == Analyzed Logical Plan ==
      key: int, key: int, key: int
      Project [key#14,key#61,key#66]
       Join Inner, Some((key#14 = key#66))
        Join Inner, Some((key#14 = key#61))
         Subquery a
          Subquery abc
           Project [key#14,value#15,(key#14 + 1) AS _c2#16]
            MetastoreRelation default, src, None
         Subquery b
          Subquery abc
           Project [key#61,value#62,(key#61 + 1) AS _c2#58]
            MetastoreRelation default, src, None
        Subquery c
         Subquery abc
          Project [key#66,value#67,(key#66 + 1) AS _c2#63]
           MetastoreRelation default, src, None
      
      == Optimized Logical Plan ==
      Project [key#14,key#61,key#66]
       Join Inner, Some((key#14 = key#66))
        Project [key#14,key#61]
         Join Inner, Some((key#14 = key#61))
          Project [key#14]
           InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc)
          Project [key#61]
           MetastoreRelation default, src, None
        Project [key#66]
         MetastoreRelation default, src, None
      
      == Physical Plan ==
      TungstenProject [key#14,key#61,key#66]
       BroadcastHashJoin [key#14], [key#66], BuildRight
        TungstenProject [key#14,key#61]
         BroadcastHashJoin [key#14], [key#61], BuildRight
          ConvertToUnsafe
           InMemoryColumnarTableScan [key#14], (InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc))
          ConvertToUnsafe
           HiveTableScan [key#61], (MetastoreRelation default, src, None)
        ConvertToUnsafe
         HiveTableScan [key#66], (MetastoreRelation default, src, None)
      ```
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #8494 from chenghao-intel/weird_cache.
      d637a666
    • Tathagata Das's avatar
      [SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about... · 52b24a60
      Tathagata Das authored
      [SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting and backpressure
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #8656 from tdas/SPARK-10492 and squashes the following commits:
      
      986cdd6 [Tathagata Das] Added information on backpressure
      52b24a60
    • Vinod K C's avatar
      [SPARK-10468] [ MLLIB ] Verify schema before Dataframe select API call · e6f8d368
      Vinod K C authored
      Loader.checkSchema was called to verify the schema after dataframe.select(...).
      Schema verification should be done before dataframe.select(...)
      
      Author: Vinod K C <vinod.kc@huawei.com>
      
      Closes #8636 from vinodkc/fix_GaussianMixtureModel_load_verification.
      e6f8d368
    • Yin Huai's avatar
      [SPARK-10441] [SQL] Save data correctly to json. · 7a9dcbc9
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-10441
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8597 from yhuai/timestampJson.
      7a9dcbc9
    • Yanbo Liang's avatar
      [SPARK-10470] [ML] ml.IsotonicRegressionModel.copy should set parent · f7b55dbf
      Yanbo Liang authored
      Copied model must have the same parent, but ml.IsotonicRegressionModel.copy did not set parent.
      Here fix it and add test case.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8637 from yanboliang/spark-10470.
      f7b55dbf
    • Wenchen Fan's avatar
      [SPARK-10316] [SQL] respect nondeterministic expressions in PhysicalOperation · 5fd57955
      Wenchen Fan authored
      We did a lot of special handling for non-deterministic expressions in `Optimizer`. However, `PhysicalOperation` just collects all Projects and Filters and mess it up. We should respect the operators order caused by non-deterministic expressions in `PhysicalOperation`.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8486 from cloud-fan/fix.
      5fd57955
    • Yanbo Liang's avatar
      [SPARK-10480] [ML] Fix ML.LinearRegressionModel.copy() · 5b2192e8
      Yanbo Liang authored
      This PR fix two model ```copy()``` related issues:
      [SPARK-10480](https://issues.apache.org/jira/browse/SPARK-10480)
      ```ML.LinearRegressionModel.copy()``` ignored argument ```extra```, it will not take effect when users setting this parameter.
      [SPARK-10479](https://issues.apache.org/jira/browse/SPARK-10479)
      ```ML.LogisticRegressionModel.copy()``` should copy model summary if available.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8641 from yanboliang/linear-regression-copy.
      5b2192e8
    • Liang-Chi Hsieh's avatar
      [SPARK-9170] [SQL] Use OrcStructInspector to be case preserving when writing ORC files · 990c9f79
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9170
      
      `StandardStructObjectInspector` will implicitly lowercase column names. But I think Orc format doesn't have such requirement. In fact, there is a `OrcStructInspector` specified for Orc format. We should use it when serialize rows to Orc file. It can be case preserving when writing ORC files.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7520 from viirya/use_orcstruct.
      990c9f79
    • Jacek Laskowski's avatar
      Docs small fixes · 6ceed852
      Jacek Laskowski authored
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #8629 from jaceklaskowski/docs-fixes.
      6ceed852
    • Stephen Hopper's avatar
      [DOC] Added R to the list of languages with "high-level API" support in the… · 9d8e838d
      Stephen Hopper authored
      … main README.
      
      Author: Stephen Hopper <shopper@shopper-osx.local>
      
      Closes #8646 from enragedginger/master.
      9d8e838d
  3. Sep 07, 2015
    • Reynold Xin's avatar
      [SPARK-9767] Remove ConnectionManager. · 5ffe752b
      Reynold Xin authored
      We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8161 from rxin/SPARK-9767.
      5ffe752b
  4. Sep 05, 2015
  5. Sep 04, 2015
    • Yin Huai's avatar
      [SPARK-9925] [SQL] [TESTS] Set SQLConf.SHUFFLE_PARTITIONS.key correctly for tests · 47058ca5
      Yin Huai authored
      This PR fix the failed test and conflict for #8155
      
      https://issues.apache.org/jira/browse/SPARK-9925
      
      Closes #8155
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8602 from davies/shuffle_partitions.
      47058ca5
    • Holden Karau's avatar
      [SPARK-10402] [DOCS] [ML] Add defaults to the scaladoc for params in ml/ · 22eab706
      Holden Karau authored
      We should make sure the scaladoc for params includes their default values through the models in ml/
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8591 from holdenk/SPARK-10402-add-scaladoc-for-default-values-of-params-in-ml.
      22eab706
    • xutingjun's avatar
      [SPARK-10311] [STREAMING] Reload appId and attemptId when app starts with... · eafe3723
      xutingjun authored
      [SPARK-10311] [STREAMING] Reload appId and attemptId when app starts with checkpoint file in cluster mode
      
      Author: xutingjun <xutingjun@huawei.com>
      
      Closes #8477 from XuTingjun/streaming-attempt.
      eafe3723
    • robbins's avatar
      [SPARK-10454] [SPARK CORE] wait for empty event queue · 2e1c1755
      robbins authored
      Author: robbins <robbins@uk.ibm.com>
      
      Closes #8605 from robbinspg/DAGSchedulerSuite-fix.
      2e1c1755
    • Timothy Chen's avatar
      [SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode. · b087d23e
      Timothy Chen authored
      Support running pyspark with cluster mode on Mesos!
      This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI.
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #8349 from tnachen/mesos_python.
      b087d23e
    • Andrew Or's avatar
      [SPARK-10450] [SQL] Minor improvements to readability / style / typos etc. · 3339e6f6
      Andrew Or authored
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8603 from andrewor14/minor-sql-changes.
      3339e6f6
    • Wenchen Fan's avatar
      [SPARK-10176] [SQL] Show partially analyzed plans when checkAnswer fails to analyze · c3c0e431
      Wenchen Fan authored
      This PR takes over https://github.com/apache/spark/pull/8389.
      
      This PR improves `checkAnswer` to print the partially analyzed plan in addition to the user friendly error message, in order to aid debugging failing tests.
      
      In doing so, I ran into a conflict with the various ways that we bring a SQLContext into the tests. Depending on the trait we refer to the current context as `sqlContext`, `_sqlContext`, `ctx` or `hiveContext` with access modifiers `public`, `protected` and `private` depending on the defining class.
      
      I propose we refactor as follows:
      
      1. All tests should only refer to a `protected sqlContext` when testing general features, and `protected hiveContext` when it is a method that only exists on a `HiveContext`.
      2. All tests should only import `testImplicits._` (i.e., don't import `TestHive.implicits._`)
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8584 from cloud-fan/cleanupTests.
      c3c0e431
    • Michael Armbrust's avatar
      MAINTENANCE: Automated closing of pull requests. · 804a0126
      Michael Armbrust authored
      This commit exists to close the following pull requests on Github:
      
      Closes #1890 (requested by andrewor14, JoshRosen)
      Closes #3558 (requested by JoshRosen, marmbrus)
      Closes #3890 (requested by marmbrus)
      Closes #3895 (requested by andrewor14, marmbrus)
      Closes #4055 (requested by andrewor14)
      Closes #4105 (requested by andrewor14)
      Closes #4812 (requested by marmbrus)
      Closes #5109 (requested by andrewor14)
      Closes #5178 (requested by andrewor14)
      Closes #5298 (requested by marmbrus)
      Closes #5393 (requested by marmbrus)
      Closes #5449 (requested by andrewor14)
      Closes #5468 (requested by marmbrus)
      Closes #5715 (requested by marmbrus)
      Closes #6192 (requested by marmbrus)
      Closes #6319 (requested by marmbrus)
      Closes #6326 (requested by marmbrus)
      Closes #6349 (requested by marmbrus)
      Closes #6380 (requested by andrewor14)
      Closes #6554 (requested by marmbrus)
      Closes #6696 (requested by marmbrus)
      Closes #6868 (requested by marmbrus)
      Closes #6951 (requested by marmbrus)
      Closes #7129 (requested by marmbrus)
      Closes #7188 (requested by marmbrus)
      Closes #7358 (requested by marmbrus)
      Closes #7379 (requested by marmbrus)
      Closes #7628 (requested by marmbrus)
      Closes #7715 (requested by marmbrus)
      Closes #7782 (requested by marmbrus)
      Closes #7914 (requested by andrewor14)
      Closes #8051 (requested by andrewor14)
      Closes #8269 (requested by andrewor14)
      Closes #8448 (requested by andrewor14)
      Closes #8576 (requested by andrewor14)
      804a0126
    • Shivaram Venkataraman's avatar
      [MINOR] Minor style fix in SparkR · 143e521d
      Shivaram Venkataraman authored
      `dev/lintr-r` passes on my machine now
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #8601 from shivaram/sparkr-style-fix.
      143e521d
  6. Sep 03, 2015
    • Andrew Or's avatar
      [SPARK-10003] Improve readability of DAGScheduler · cf421386
      Andrew Or authored
      Note: this is not intended to be in Spark 1.5!
      
      This patch rewrites some code in the `DAGScheduler` to make it more readable. In particular
      - there were blocks of code that are unnecessary and removed for simplicity
      - there were abstractions that are unnecessary and made the code hard to navigate
      - other minor changes
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8217 from andrewor14/dag-scheduler-readability and squashes the following commits:
      
      57abca3 [Andrew Or] Move comment back into if case
      574fb1e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-scheduler-readability
      64a9ed2 [Andrew Or] Remove unnecessary code + minor code rewrites
      cf421386
    • Marcelo Vanzin's avatar
      [SPARK-10421] [BUILD] Exclude curator artifacts from tachyon dependencies. · 208fbca1
      Marcelo Vanzin authored
      This avoids them being mistakenly pulled instead of the newer ones that
      Spark actually uses. Spark only depends on these artifacts transitively,
      so sometimes maven just decides to pick tachyon's version of the
      dependency for whatever reason.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8577 from vanzin/SPARK-10421.
      208fbca1
    • Andrew Or's avatar
      [SPARK-10435] Spark submit should fail fast for Mesos cluster mode with R · 08b07509
      Andrew Or authored
      It's not supported yet so we should error with a clear message.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8590 from andrewor14/mesos-cluster-r-guard.
      08b07509
    • jeanlyn's avatar
      [SPARK-9591] [CORE] Job may fail for exception during getting remote block · db4c130f
      jeanlyn authored
      [SPARK-9591](https://issues.apache.org/jira/browse/SPARK-9591)
      When we getting the broadcast variable, we can fetch the block form several location,but now when connecting the lost blockmanager(idle for enough time removed by driver when using dynamic resource allocate and so on) will cause task fail,and the worse case will cause the job fail.
      
      Author: jeanlyn <jeanlyn92@gmail.com>
      
      Closes #7927 from jeanlyn/catch_exception.
      db4c130f
Loading