Skip to content
Snippets Groups Projects
  1. Jul 02, 2015
    • Yanbo Liang's avatar
      [SPARK-8758] [MLLIB] Add Python user guide for PowerIterationClustering · 0a468a46
      Yanbo Liang authored
      Add Python user guide for PowerIterationClustering
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #7155 from yanboliang/spark-8758 and squashes the following commits:
      
      18d803b [Yanbo Liang] address comments
      dd29577 [Yanbo Liang] Add Python user guide for PowerIterationClustering
      0a468a46
    • Alok  Singh's avatar
      [SPARK-8647] [MLLIB] Potential issue with constant hashCode · 99c40cd0
      Alok Singh authored
      I added the code,
        // see [SPARK-8647], this achieves the needed constant hash code without constant no.
        override def hashCode(): Int = this.getClass.getName.hashCode()
      
      does getting the constant hash code as per jira
      
      Author: Alok  Singh <singhal@Aloks-MacBook-Pro.local>
      
      Closes #7146 from aloknsingh/aloknsingh_SPARK-8647 and squashes the following commits:
      
      e58bccf [Alok  Singh] [SPARK-8647][MLlib] to avoid the class derivation issues, change the constant hashCode to override def hashCode(): Int = classOf[MatrixUDT].getName.hashCode()
      43cdb89 [Alok  Singh] [SPARK-8647][MLlib] Potential issue with constant hashCode
      99c40cd0
    • Wisely Chen's avatar
      [SPARK-8690] [SQL] Add a setting to disable SparkSQL parquet schema merge by using datasource API · 246265f2
      Wisely Chen authored
      The detail problem story is in https://issues.apache.org/jira/browse/SPARK-8690
      
      General speaking, I add a config spark.sql.parquet.mergeSchema to achieve the  sqlContext.load("parquet" , Map( "path" -> "..." , "mergeSchema" -> "false" ))
      
      It will become a simple flag and without any side affect.
      
      Author: Wisely Chen <wiselychen@appier.com>
      
      Closes #7070 from thegiive/SPARK8690 and squashes the following commits:
      
      c6f3e86 [Wisely Chen] Refactor some code style and merge the test case to ParquetSchemaMergeConfigSuite
      94c9307 [Wisely Chen] Remove some style problem
      db8ef1b [Wisely Chen] Change config to SQLConf and add test case
      b6806fb [Wisely Chen] remove text
      c0edb8c [Wisely Chen] [SPARK-8690] add a config spark.sql.parquet.mergeSchema to disable datasource API schema merge feature.
      246265f2
    • Christian Kadner's avatar
      [SPARK-8746] [SQL] update download link for Hive 0.13.1 · 1bbdf9ea
      Christian Kadner authored
      updated the [Hive 0.13.1](https://archive.apache.org/dist/hive/hive-0.13.1) download link in `sql/README.md`
      
      Author: Christian Kadner <ckadner@us.ibm.com>
      
      Closes #7144 from ckadner/SPARK-8746 and squashes the following commits:
      
      65d80f7 [Christian Kadner] [SPARK-8746][SQL] update download link for Hive 0.13.1
      1bbdf9ea
    • Vinod K C's avatar
      [SPARK-8787] [SQL] Changed parameter order of @deprecated in package object sql · c572e256
      Vinod K C authored
      Parameter order of deprecated annotation in package object sql is wrong
      >>deprecated("1.3.0", "use DataFrame") .
      
      This has to be changed to deprecated("use DataFrame", "1.3.0")
      
      Author: Vinod K C <vinod.kc@huawei.com>
      
      Closes #7183 from vinodkc/fix_deprecated_param_order and squashes the following commits:
      
      1cbdbe8 [Vinod K C] Modified the message
      700911c [Vinod K C] Changed order of parameters
      c572e256
    • Kousuke Saruta's avatar
      [DOCS] Fix minor wrong lambda expression example. · 41588365
      Kousuke Saruta authored
      It's a really minor issue but there is an example with wrong lambda-expression usage in `SQLContext.scala` like as follows.
      
      ```
      sqlContext.udf().register("myUDF",
             (Integer arg1, String arg2) -> arg2 + arg1),  <- We have an extra `)` here.
             DataTypes.StringType);
      ```
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #7187 from sarutak/fix-minor-wrong-lambda-expression and squashes the following commits:
      
      a13196d [Kousuke Saruta] Fixed minor wrong lambda expression example.
      41588365
    • huangzhaowei's avatar
      [SPARK-8687] [YARN] Fix bug: Executor can't fetch the new set configuration in yarn-client · 1b0c8e61
      huangzhaowei authored
      Spark initi the properties CoarseGrainedSchedulerBackend.start
      ```scala
          // TODO (prashant) send conf instead of properties
          driverEndpoint = rpcEnv.setupEndpoint(
            CoarseGrainedSchedulerBackend.ENDPOINT_NAME, new DriverEndpoint(rpcEnv, properties))
      ```
      Then the yarn logic will set some configuration but not update in this `properties`.
      So `Executor` won't gain the `properties`.
      
      [Jira](https://issues.apache.org/jira/browse/SPARK-8687)
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #7066 from SaintBacchus/SPARK-8687 and squashes the following commits:
      
      1de4f48 [huangzhaowei] Ensure all necessary properties have already been set before startup ExecutorLaucher
      1b0c8e61
    • Ilya Ganelin's avatar
      [SPARK-3071] Increase default driver memory · 3697232b
      Ilya Ganelin authored
      I've updated default values in comments, documentation, and in the command line builder to be 1g based on comments in the JIRA. I've also updated most usages to point at a single variable defined in the Utils.scala and JavaUtils.java files. This wasn't possible in all cases (R, shell scripts etc.) but usage in most code is now pointing at the same place.
      
      Please let me know if I've missed anything.
      
      Will the spark-shell use the value within the command line builder during instantiation?
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #7132 from ilganeli/SPARK-3071 and squashes the following commits:
      
      4074164 [Ilya Ganelin] String fix
      271610b [Ilya Ganelin] Merge branch 'SPARK-3071' of github.com:ilganeli/spark into SPARK-3071
      273b6e9 [Ilya Ganelin] Test fix
      fd67721 [Ilya Ganelin] Update JavaUtils.java
      26cc177 [Ilya Ganelin] test fix
      e5db35d [Ilya Ganelin] Fixed test failure
      39732a1 [Ilya Ganelin] merge fix
      a6f7deb [Ilya Ganelin] Created default value for DRIVER MEM in Utils that's now used in almost all locations instead of setting manually in each
      09ad698 [Ilya Ganelin] Update SubmitRestProtocolSuite.scala
      19b6f25 [Ilya Ganelin] Missed one doc update
      2698a3d [Ilya Ganelin] Updated default value for driver memory
      3697232b
    • Josh Rosen's avatar
      [SPARK-8740] [PROJECT INFRA] Support GitHub OAuth tokens in dev/merge_spark_pr.py · 377ff4c9
      Josh Rosen authored
      This commit allows `dev/merge_spark_pr.py` to use personal GitHub OAuth tokens in order to make authenticated requests. This is necessary to work around per-IP rate limiting issues.
      
      To use a token, just set the `GITHUB_OAUTH_KEY` environment variable.  You can create a personal token at https://github.com/settings/tokens; we only require `public_repo` scope.
      
      If the script fails due to a rate-limit issue, it now logs a useful message directing the user to the OAuth token instructions.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7136 from JoshRosen/pr-merge-script-oauth-authentication and squashes the following commits:
      
      4d011bd [Josh Rosen] Fix error message
      23d92ff [Josh Rosen] Support GitHub OAuth tokens in dev/merge_spark_pr.py
      377ff4c9
    • Holden Karau's avatar
      [SPARK-8769] [TRIVIAL] [DOCS] toLocalIterator should mention it results in many jobs · 15d41cc5
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #7171 from holdenk/SPARK-8769-toLocalIterator-documentation-improvement and squashes the following commits:
      
      97ddd99 [Holden Karau] Add note
      15d41cc5
    • Holden Karau's avatar
      [SPARK-8771] [TRIVIAL] Add a version to the deprecated annotation for the actorSystem · d14338ea
      Holden Karau authored
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #7172 from holdenk/SPARK-8771-actor-system-deprecation-tag-uses-deprecated-deprecation-tag and squashes the following commits:
      
      7f1455b [Holden Karau] Add .0s to the versions for the derpecated anotations in SparkEnv.scala
      ca13c9d [Holden Karau] Add a version to the deprecated annotation for the actorSystem in SparkEnv
      d14338ea
    • huangzhaowei's avatar
      [SPARK-8688] [YARN] Bug fix: disable the cache fs to gain the HDFS connection. · 646366b5
      huangzhaowei authored
      If `fs.hdfs.impl.disable.cache` was `false`(default), `FileSystem` will use the cached `DFSClient` which use old token.
      [AMDelegationTokenRenewer](https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala#L196)
      ```scala
          val credentials = UserGroupInformation.getCurrentUser.getCredentials
          credentials.writeTokenStorageFile(tempTokenPath, discachedConfiguration)
      ```
      Although the `credentials` had the new Token, but it still use the cached client and old token.
      So It's better to set the `fs.hdfs.impl.disable.cache`  as `true` to avoid token expired.
      
      [Jira](https://issues.apache.org/jira/browse/SPARK-8688)
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #7069 from SaintBacchus/SPARK-8688 and squashes the following commits:
      
      f94cd0b [huangzhaowei] modify function parameter
      8fb9eb9 [huangzhaowei] explicit  the comment
      0cd55c9 [huangzhaowei] Rename function name to be an accurate one
      cf776a1 [huangzhaowei] [SPARK-8688][YARN]Bug fix: disable the cache fs to gain the HDFS connection.
      646366b5
    • Devaraj K's avatar
      [SPARK-8754] [YARN] YarnClientSchedulerBackend doesn't stop gracefully in failure conditions · 792fcd80
      Devaraj K authored
      In YarnClientSchedulerBackend.stop(), added a check for monitorThread.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #7153 from devaraj-kavali/master and squashes the following commits:
      
      66be9ad [Devaraj K] https://issues.apache.org/jira/browse/SPARK-8754 YarnClientSchedulerBackend doesn't stop gracefully in failure conditions
      792fcd80
    • zhichao.li's avatar
      [SPARK-8227] [SQL] Add function unhex · b285ac5b
      zhichao.li authored
      cc chenghao-intel  adrian-wang
      
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #7113 from zhichao-li/unhex and squashes the following commits:
      
      379356e [zhichao.li] remove exception checking
      a4ae6dc [zhichao.li] add udf_unhex to whitelist
      fe5c14a [zhichao.li] add todigit
      607d7a3 [zhichao.li] use checkInputTypes
      bffd37f [zhichao.li] change to use Hex in apache common package
      cde73f5 [zhichao.li] update to use AutoCastInputTypes
      11945c7 [zhichao.li] style
      c852d46 [zhichao.li] Add function unhex
      b285ac5b
  2. Jul 01, 2015
    • Rosstin's avatar
      [SPARK-8660] [MLLIB] removed > symbols from comments in... · 4e4f74b5
      Rosstin authored
      [SPARK-8660] [MLLIB] removed > symbols from comments in LogisticRegressionSuite.scala for ease of copypaste
      
      '>' symbols removed from comments in LogisticRegressionSuite.scala, for ease of copypaste
      
      also single-lined the multiline commands (is this desirable, or does it violate style?)
      
      Author: Rosstin <asterazul@gmail.com>
      
      Closes #7167 from Rosstin/SPARK-8660-2 and squashes the following commits:
      
      f4b9bc8 [Rosstin] SPARK-8660 restored character limit on multiline comments in LogisticRegressionSuite.scala
      fe6b112 [Rosstin] SPARK-8660 > symbols removed from LogisticRegressionSuite.scala for easy of copypaste
      39ddd50 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8661
      5a05dee [Rosstin] SPARK-8661 for LinearRegressionSuite.scala, changed javadoc-style comments to regular multiline comments to make it easier to copy-paste the R code.
      bb9a4b1 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8660
      242aedd [Rosstin] SPARK-8660, changed comment style from JavaDoc style to normal multiline comment in order to make copypaste into R easier, in file classification/LogisticRegressionSuite.scala
      2cd2985 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
      21ac1e5 [Rosstin] Merge branch 'master' of github.com:apache/spark into SPARK-8639
      6c18058 [Rosstin] fixed minor typos in docs/README.md and docs/api.md
      4e4f74b5
    • Reynold Xin's avatar
      [SPARK-8770][SQL] Create BinaryOperator abstract class. · 9fd13d56
      Reynold Xin authored
      Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression.
      
      This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7174 from rxin/binary-opterator and squashes the following commits:
      
      f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
      fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into binary-opterator
      d8518cf [Reynold Xin] Updated Python tests.
      9fd13d56
    • Reynold Xin's avatar
      Revert "[SPARK-8770][SQL] Create BinaryOperator abstract class." · 3a342ded
      Reynold Xin authored
      This reverts commit 27277899.
      3a342ded
    • Reynold Xin's avatar
      [SPARK-8770][SQL] Create BinaryOperator abstract class. · 27277899
      Reynold Xin authored
      Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression.
      
      This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7170 from rxin/binaryoperator and squashes the following commits:
      
      51264a5 [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
      27277899
    • Davies Liu's avatar
      [SPARK-8766] support non-ascii character in column names · f958f27e
      Davies Liu authored
      Use UTF-8 to encode the name of column in Python 2, or it may failed to encode with default encoding ('ascii').
      
      This PR also fix a bug when there is Java exception without error message.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7165 from davies/non_ascii and squashes the following commits:
      
      02cb61a [Davies Liu] fix tests
      3b09d31 [Davies Liu] add encoding in header
      867754a [Davies Liu] support non-ascii character in column names
      f958f27e
    • Marcelo Vanzin's avatar
      [SPARK-3444] [CORE] Restore INFO level after log4j test. · 1ce64289
      Marcelo Vanzin authored
      Otherwise other tests don't log anything useful...
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7140 from vanzin/SPARK-3444 and squashes the following commits:
      
      de14836 [Marcelo Vanzin] Better fix.
      6cff13a [Marcelo Vanzin] [SPARK-3444] [core] Restore INFO level after log4j test.
      1ce64289
    • Davies Liu's avatar
      [QUICKFIX] [SQL] fix copy of generated row · 3083e176
      Davies Liu authored
      copy() of generated Row doesn't check nullability of columns
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7163 from davies/fix_copy and squashes the following commits:
      
      661a206 [Davies Liu] fix copy of generated row
      3083e176
    • jerryshao's avatar
      [SPARK-7820] [BUILD] Fix Java8-tests suite compile and test error under sbt · 9f7db348
      jerryshao authored
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #7120 from jerryshao/SPARK-7820 and squashes the following commits:
      
      6902439 [jerryshao] fix Java8-tests suite compile error under sbt
      9f7db348
    • zsxwing's avatar
      [SPARK-8378] [STREAMING] Add the Python API for Flume · 75b9fe4c
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6830 from zsxwing/flume-python and squashes the following commits:
      
      78dfdac [zsxwing] Fix the compile error in the test code
      f1bf3c0 [zsxwing] Address TD's comments
      0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly
      e93736b [zsxwing] Fix the test case for determine_modules_to_test
      9d5821e [zsxwing] Fix pyspark_core dependencies
      f9ee681 [zsxwing] Merge branch 'master' into flume-python
      7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py
      b96b0de [zsxwing] Merge branch 'master' into flume-python
      ce85e83 [zsxwing] Fix incompatible issues for Python 3
      01cbb3d [zsxwing] Add import sys
      152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3
      14ba0ff [zsxwing] Add flume-assembly for sbt building
      b8d5551 [zsxwing] Merge branch 'master' into flume-python
      4762c34 [zsxwing] Fix the doc
      0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API
      9f33873 [zsxwing] Add the Python API for Flume
      75b9fe4c
    • Joseph K. Bradley's avatar
      [SPARK-8765] [MLLIB] [PYTHON] removed flaky python PIC test · b8faa328
      Joseph K. Bradley authored
      See failure: [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36133/console]
      
      CC yanboliang  mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #7164 from jkbradley/pic-python-test and squashes the following commits:
      
      156d55b [Joseph K. Bradley] removed flaky python PIC test
      b8faa328
    • Yuhao Yang's avatar
      [SPARK-8308] [MLLIB] add missing save load for python example · 20129133
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-8308
      
      1. add some missing save/load in python examples. , LogisticRegression, LinearRegression and NaiveBayes
      2. tune down iterations for MatrixFactorization, since current number will trigger StackOverflow for default java configuration (>1M)
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #6760 from hhbyyh/docUpdate and squashes the following commits:
      
      9bd3383 [Yuhao Yang] update scala example
      8a44692 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate
      077cbb8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate
      3e948dc [Yuhao Yang] add missing save load for python example
      20129133
    • lewuathe's avatar
      [SPARK-6263] [MLLIB] Python MLlib API missing items: Utils · 184de91d
      lewuathe authored
      Implement missing API in pyspark.
      
      MLUtils
      * appendBias
      * loadVectors
      
      `kFold` is also missing however I am not sure `ClassTag` can be passed or restored through python.
      
      Author: lewuathe <lewuathe@me.com>
      
      Closes #5707 from Lewuathe/SPARK-6263 and squashes the following commits:
      
      16863ea [lewuathe] Merge master
      3fc27e7 [lewuathe] Merge branch 'master' into SPARK-6263
      6084e9c [lewuathe] Resolv conflict
      d2aa2a0 [lewuathe] Resolv conflict
      9c329d8 [lewuathe] Fix efficiency
      3a12a2d [lewuathe] Merge branch 'master' into SPARK-6263
      1d4714b [lewuathe] Fix style
      b29e2bc [lewuathe] Remove scipy dependencies
      e32eb40 [lewuathe] Merge branch 'master' into SPARK-6263
      25d3c9d [lewuathe] Remove unnecessary imports
      7ec04db [lewuathe] Resolv conflict
      1502d13 [lewuathe] Resolv conflict
      d6bd416 [lewuathe] Check existence of scipy.sparse
      5d555b1 [lewuathe] Construct scipy.sparse matrix
      c345a44 [lewuathe] Merge branch 'master' into SPARK-6263
      b8b5ef7 [lewuathe] Fix unnecessary sort method
      d254be7 [lewuathe] Merge branch 'master' into SPARK-6263
      62a9c7e [lewuathe] Fix appendBias return type
      454c73d [lewuathe] Merge branch 'master' into SPARK-6263
      a353354 [lewuathe] Remove unnecessary appendBias implementation
      44295c2 [lewuathe] Merge branch 'master' into SPARK-6263
      64f72ad [lewuathe] Merge branch 'master' into SPARK-6263
      c728046 [lewuathe] Fix style
      2980569 [lewuathe] [SPARK-6263] Python MLlib API missing items: Utils
      184de91d
    • Wenchen Fan's avatar
      [SPARK-8621] [SQL] support empty string as column name · 31b4a3d7
      Wenchen Fan authored
      improve the empty check in `parseAttributeName` so that we can allow empty string as column name.
      Close https://github.com/apache/spark/pull/7117
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7149 from cloud-fan/8621 and squashes the following commits:
      
      efa9e3e [Wenchen Fan] support empty string
      31b4a3d7
    • Reynold Xin's avatar
      [SPARK-8752][SQL] Add ExpectsInputTypes trait for defining expected input types. · 4137f769
      Reynold Xin authored
      This patch doesn't actually introduce any code that uses the new ExpectsInputTypes. It just adds the trait so others can use it. Also renamed the old expectsInputTypes function to just inputTypes.
      
      We should add implicit type casting also in the future.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7151 from rxin/expects-input-types and squashes the following commits:
      
      16cf07b [Reynold Xin] [SPARK-8752][SQL] Add ExpectsInputTypes trait for defining expected input types.
      4137f769
    • Sun Rui's avatar
      [SPARK-7714] [SPARKR] SparkR tests should use more specific expectations than expect_true · 69c5dee2
      Sun Rui authored
      1. Update the pattern 'expect_true(a == b)' to 'expect_equal(a, b)'.
      2. Update the pattern 'expect_true(inherits(a, b))' to 'expect_is(a, b)'.
      3. Update the pattern 'expect_true(identical(a, b))' to 'expect_identical(a, b)'.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #7152 from sun-rui/SPARK-7714 and squashes the following commits:
      
      8ad2440 [Sun Rui] Fix test case errors.
      8fe9f0c [Sun Rui] Update the pattern 'expect_true(identical(a, b))' to 'expect_identical(a, b)'.
      f1b8005 [Sun Rui] Update the pattern 'expect_true(inherits(a, b))' to 'expect_is(a, b)'.
      f631e94 [Sun Rui] Update the pattern 'expect_true(a == b)' to 'expect_equal(a, b)'.
      69c5dee2
    • cocoatomo's avatar
      [SPARK-8763] [PYSPARK] executing run-tests.py with Python 2.6 fails with... · fdcad6ef
      cocoatomo authored
      [SPARK-8763] [PYSPARK] executing run-tests.py with Python 2.6 fails with absence of subprocess.check_output function
      
      Running run-tests.py with Python 2.6 cause following error:
      
      ```
      Running PySpark tests. Output is in python//Users/tomohiko/.jenkins/jobs/pyspark_test/workspace/python/unit-tests.log
      Will test against the following Python executables: ['python2.6', 'python3.4', 'pypy']
      Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
      Traceback (most recent call last):
        File "./python/run-tests.py", line 196, in <module>
          main()
        File "./python/run-tests.py", line 159, in main
          python_implementation = subprocess.check_output(
      AttributeError: 'module' object has no attribute 'check_output'
      ...
      ```
      
      The cause of this error is using subprocess.check_output function, which exists since Python 2.7.
      (ref. https://docs.python.org/2.7/library/subprocess.html#subprocess.check_output)
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #7161 from cocoatomo/issues/8763-test-fails-py26 and squashes the following commits:
      
      cf4f901 [cocoatomo] [SPARK-8763] backport process.check_output function from Python 2.7
      fdcad6ef
    • Reynold Xin's avatar
      [SPARK-8750][SQL] Remove the closure in functions.callUdf. · 97652416
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7148 from rxin/calludf-closure and squashes the following commits:
      
      00df372 [Reynold Xin] Fixed index out of bound exception.
      4beba76 [Reynold Xin] [SPARK-8750][SQL] Remove the closure in functions.callUdf.
      97652416
    • Wenchen Fan's avatar
      [SQL] [MINOR] remove internalRowRDD in DataFrame · 0eee0615
      Wenchen Fan authored
      Developers have already familiar with `queryExecution.toRDD` as internal row RDD, and we should not add new concept.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7116 from cloud-fan/internal-rdd and squashes the following commits:
      
      24756ca [Wenchen Fan] remove internalRowRDD
      0eee0615
    • Reynold Xin's avatar
      [SPARK-8749][SQL] Remove HiveTypeCoercion trait. · fc3a6fe6
      Reynold Xin authored
      Moved all the rules into the companion object.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7147 from rxin/SPARK-8749 and squashes the following commits:
      
      c1c6dc0 [Reynold Xin] [SPARK-8749][SQL] Remove HiveTypeCoercion trait.
      fc3a6fe6
    • Reynold Xin's avatar
      [SPARK-8748][SQL] Move castability test out from Cast case class into Cast object. · 365c1405
      Reynold Xin authored
      This patch moved resolve function in Cast case class into the companion object, and renamed it canCast. We can then use this in the analyzer without a Cast expr.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7145 from rxin/cast and squashes the following commits:
      
      cd086a9 [Reynold Xin] Whitespace changes.
      4d2d989 [Reynold Xin] [SPARK-8748][SQL] Move castability test out from Cast case class into Cast object.
      365c1405
  3. Jun 30, 2015
    • zsxwing's avatar
      [SPARK-6602][Core]Remove unnecessary synchronized · 64c14618
      zsxwing authored
      A follow-up pr to address https://github.com/apache/spark/pull/5392#discussion_r33627528
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #7141 from zsxwing/pr5392-follow-up and squashes the following commits:
      
      fcf7b50 [zsxwing] Remove unnecessary synchronized
      64c14618
    • x1-'s avatar
      [SPARK-8535] [PYSPARK] PySpark : Can't create DataFrame from Pandas dataframe... · b6e76edf
      x1- authored
      [SPARK-8535] [PYSPARK] PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name
      
      Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`.
      So I think `pandas.columns` are should be convert to `String`.
      
      ### issue
      
      * [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535)
      
      Author: x1- <viva008@gmail.com>
      
      Closes #7124 from x1-/SPARK-8535 and squashes the following commits:
      
      d68fd38 [x1-] modify unit-test using pandas.
      ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String.
      b6e76edf
    • Feynman Liang's avatar
      [SPARK-8471] [ML] Rename DiscreteCosineTransformer to DCT · f4575698
      Feynman Liang authored
      Rename DiscreteCosineTransformer and related classes to DCT.
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #7138 from feynmanliang/dct-features and squashes the following commits:
      
      e547b3e [Feynman Liang] Fix renaming bug
      9d5c9e4 [Feynman Liang] Lowercase JavaDCTSuite variable
      f9a8958 [Feynman Liang] Remove old files
      f8fe794 [Feynman Liang] Merge branch 'master' into dct-features
      894d0b2 [Feynman Liang] Rename DiscreteCosineTransformer to DCT
      433dbc7 [Feynman Liang] Test refactoring
      91e9636 [Feynman Liang] Style guide and test helper refactor
      b5ac19c [Feynman Liang] Use Vector types, add Java test
      530983a [Feynman Liang] Tests for other numeric datatypes
      195d7aa [Feynman Liang] Implement support for arbitrary numeric types
      95d4939 [Feynman Liang] Working DCT for 1D Doubles
      f4575698
    • zsxwing's avatar
      [SPARK-6602][Core] Update Master, Worker, Client, AppClient and related classes to use RpcEndpoint · 3bee0f14
      zsxwing authored
      This PR updates the rest Actors in core to RpcEndpoint.
      
      Because there is no `ActorSelection` in RpcEnv, I changes the logic of `registerWithMaster` in Worker and AppClient to avoid blocking the message loop. These changes need to be reviewed carefully.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #5392 from zsxwing/rpc-rewrite-part3 and squashes the following commits:
      
      2de7bed [zsxwing] Merge branch 'master' into rpc-rewrite-part3
      f12d943 [zsxwing] Address comments
      9137b82 [zsxwing] Fix the code style
      e734c71 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
      2d24fb5 [zsxwing] Fix the code style
      5a82374 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
      fa47110 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
      72304f0 [zsxwing] Update the error strategy for AkkaRpcEnv
      e56cb16 [zsxwing] Always send failure back to the sender
      a7b86e6 [zsxwing] Use JFuture for java.util.concurrent.Future
      aa34b9b [zsxwing] Fix the code style
      bd541e7 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
      25a84d8 [zsxwing] Use ThreadUtils
      060ff31 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
      dbfc916 [zsxwing] Improve the docs and comments
      837927e [zsxwing] Merge branch 'master' into rpc-rewrite-part3
      5c27f97 [zsxwing] Merge branch 'master' into rpc-rewrite-part3
      fadbb9e [zsxwing] Fix the code style
      6637e3c [zsxwing] Merge remote-tracking branch 'origin/master' into rpc-rewrite-part3
      7fdee0e [zsxwing] Fix the return type to ExecutorService and ScheduledExecutorService
      e8ad0a5 [zsxwing] Fix the code style
      6b2a104 [zsxwing] Log error and use SparkExitCode.UNCAUGHT_EXCEPTION exit code
      fbf3194 [zsxwing] Add Utils.newDaemonSingleThreadExecutor and newDaemonSingleThreadScheduledExecutor
      b776817 [zsxwing] Update Master, Worker, Client, AppClient and related classes to use RpcEndpoint
      3bee0f14
    • Tarek Auel's avatar
      [SPARK-8727] [SQL] Missing python api; md5, log2 · ccdb0522
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8727
      
      Author: Tarek Auel <tarek.auel@gmail.com>
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7114 from tarekauel/missing-python and squashes the following commits:
      
      ef4c61b [Tarek Auel] [SPARK-8727] revert dataframe change
      4029d4d [Tarek Auel] removed dataframe pi and e unit test
      66f0d2b [Tarek Auel] removed pi and e from python api and dataframe api; added _to_java_column(col) for strlen
      4d07318 [Tarek Auel] fixed python unit test
      45f2bee [Tarek Auel] fixed result of pi and e
      c39f47b [Tarek Auel] add python api
      bd50a3a [Tarek Auel] add missing python functions
      ccdb0522
    • Reynold Xin's avatar
      [SPARK-8741] [SQL] Remove e and pi from DataFrame functions. · 8133125c
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7137 from rxin/SPARK-8741 and squashes the following commits:
      
      32c7e75 [Reynold Xin] [SPARK-8741][SQL] Remove e and pi from DataFrame functions.
      8133125c
Loading