Skip to content
Snippets Groups Projects
  1. Nov 04, 2015
    • jerryshao's avatar
      [SPARK-2960][DEPLOY] Support executing Spark from symlinks (reopen) · 8aff36e9
      jerryshao authored
      This PR is based on the work of roji to support running Spark scripts from symlinks. Thanks for the great work roji . Would you mind taking a look at this PR, thanks a lot.
      
      For releases like HDP and others, normally it will expose the Spark executables as symlinks and put in `PATH`, but current Spark's scripts do not support finding real path from symlink recursively, this will make spark fail to execute from symlink. This PR try to solve this issue by finding the absolute path from symlink.
      
      Instead of using `readlink -f` like what this PR (https://github.com/apache/spark/pull/2386) implemented is that `-f` is not support for Mac, so here manually seeking the path through loop.
      
      I've tested with Mac and Linux (Cent OS), looks fine.
      
      This PR did not fix the scripts under `sbin` folder, not sure if it needs to be fixed also?
      
      Please help to review, any comment is greatly appreciated.
      
      Author: jerryshao <sshao@hortonworks.com>
      Author: Shay Rojansky <roji@roji.org>
      
      Closes #8669 from jerryshao/SPARK-2960.
      8aff36e9
  2. Nov 03, 2015
  3. Nov 02, 2015
    • Yin Huai's avatar
      [SPARK-11469][SQL] Allow users to define nondeterministic udfs. · 9cf56c96
      Yin Huai authored
      This is the first task (https://issues.apache.org/jira/browse/SPARK-11469) of https://issues.apache.org/jira/browse/SPARK-11438
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #9393 from yhuai/udfNondeterministic.
      9cf56c96
    • Yves Raimond's avatar
      [SPARK-11432][GRAPHX] Personalized PageRank shouldn't use uniform initialization · efaa4721
      Yves Raimond authored
      Changes the personalized pagerank initialization to be non-uniform.
      
      Author: Yves Raimond <yraimond@netflix.com>
      
      Closes #9386 from moustaki/personalized-pagerank-init.
      efaa4721
    • Nong Li's avatar
      [SPARK-11329][SQL] Support star expansion for structs. · 9cb5c731
      Nong Li authored
      1. Supporting expanding structs in Projections. i.e.
        "SELECT s.*" where s is a struct type.
        This is fixed by allowing the expand function to handle structs in addition to tables.
      
      2. Supporting expanding * inside aggregate functions of structs.
         "SELECT max(struct(col1, structCol.*))"
         This requires recursively expanding the expressions. In this case, it it the aggregate
         expression "max(...)" and we need to recursively expand its children inputs.
      
      Author: Nong Li <nongli@gmail.com>
      
      Closes #9343 from nongli/spark-11329.
      9cb5c731
    • Nong Li's avatar
      [SPARK-5354][SQL] Cached tables should preserve partitioning and ord… · 2cef1bb0
      Nong Li authored
      …ering.
      
      For cached tables, we can just maintain the partitioning and ordering from the
      source relation.
      
      Author: Nong Li <nongli@gmail.com>
      
      Closes #9404 from nongli/spark-5354.
      2cef1bb0
    • DB Tsai's avatar
      [MINOR][ML] removed the old `getModelWeights` function · 21ad8462
      DB Tsai authored
      Removed the old `getModelWeights` function which was private and renamed into `getModelCoefficients`
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #9426 from dbtsai/feature-minor.
      21ad8462
    • Calvin Jia's avatar
      [SPARK-11236] [TEST-MAVEN] [TEST-HADOOP1.0] [CORE] Update Tachyon dependency 0.7.1 -> 0.8.1 · 476f4348
      Calvin Jia authored
      This is a reopening of #9204 which failed hadoop1 sbt tests.
      
      With the original PR, a classpath issue would occur due to the MIMA plugin pulling in hadoop-2.2 dependencies regardless of the hadoop version when building the `oldDeps` project. These affect the hadoop1 sbt build because they are placed in `lib_managed` and Tachyon 0.8.0's default hadoop version is 2.2.
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #9395 from calvinjia/spark-11236.
      476f4348
    • vectorijk's avatar
      [SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead in ML models · c020f7d9
      vectorijk authored
      Deprecated in `LogisticRegression` and `LinearRegression`
      
      Author: vectorijk <jiangkai@gmail.com>
      
      Closes #9311 from vectorijk/spark-10592.
      c020f7d9
    • Dominik Dahlem's avatar
      [SPARK-11343][ML] Allow float and double prediction/label columns in RegressionEvaluator · ec03866a
      Dominik Dahlem authored
      mengxr, felixcheung
      
      This pull request just relaxes the type of the prediction/label columns to be float and double. Internally, these columns are casted to double. The other evaluators might need to be changed also.
      
      Author: Dominik Dahlem <dominik.dahlem@gmail.combination>
      
      Closes #9296 from dahlem/ddahlem_regression_evaluator_double_predictions_27102015.
      ec03866a
    • lihao's avatar
      [SPARK-10286][ML][PYSPARK][DOCS] Add @since annotation to pyspark.ml.param and pyspark.ml.* · ecfb3e73
      lihao authored
      Author: lihao <lihaowhu@gmail.com>
      
      Closes #9275 from lidinghao/SPARK-10286.
      ecfb3e73
    • Rishabh Bhardwaj's avatar
      [SPARK-11383][DOCS] Replaced example code in... · 2804674a
      Rishabh Bhardwaj authored
      [SPARK-11383][DOCS] Replaced example code in mllib-naive-bayes.md/mllib-isotonic-regression.md using include_example
      
      I have made the required changes in mllib-naive-bayes.md/mllib-isotonic-regression.md and also verified them.
      Kindle Review it.
      
      Author: Rishabh Bhardwaj <rbnext29@gmail.com>
      
      Closes #9353 from rishabhbhardwaj/SPARK-11383.
      2804674a
    • tedyu's avatar
      [SPARK-11371] Make "mean" an alias for "avg" operator · db11ee5e
      tedyu authored
      From Reynold in the thread 'Exception when using some aggregate operators' (http://search-hadoop.com/m/q3RTt0xFr22nXB4/):
      
      I don't think these are bugs. The SQL standard for average is "avg", not "mean". Similarly, a distinct count is supposed to be written as "count(distinct col)", not "countDistinct(col)".
      We can, however, make "mean" an alias for "avg" to improve compatibility between DataFrame and SQL.
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #9332 from ted-yu/master.
      db11ee5e
    • Xiangrui Meng's avatar
      [SPARK-11358][MLLIB] deprecate runs in k-means · 33ae7a35
      Xiangrui Meng authored
      This PR deprecates `runs` in k-means. `runs` introduces extra complexity and overhead in MLlib's k-means implementation. I haven't seen much usage with `runs` not equal to `1`. We don't have a unit test for it either. We can deprecate this method in 1.6, and void it in 1.7. It helps us simplify the implementation.
      
      cc: srowen
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #9322 from mengxr/SPARK-11358.
      33ae7a35
    • Sean Owen's avatar
      [SPARK-11456][TESTS] Remove deprecated junit.framework in Java tests · b3aedca6
      Sean Owen authored
      Replace use of `junit.framework` with `org.junit`, and touch up tests in question
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #9411 from srowen/SPARK-11456.
      b3aedca6
    • Jason White's avatar
      [SPARK-11437] [PYSPARK] Don't .take when converting RDD to DataFrame with provided schema · f92f334c
      Jason White authored
      When creating a DataFrame from an RDD in PySpark, `createDataFrame` calls `.take(10)` to verify the first 10 rows of the RDD match the provided schema. Similar to https://issues.apache.org/jira/browse/SPARK-8070, but that issue affected cases where a schema was not provided.
      
      Verifying the first 10 rows is of limited utility and causes the DAG to be executed non-lazily. If necessary, I believe this verification should be done lazily on all rows. However, since the caller is providing a schema to follow, I think it's acceptable to simply fail if the schema is incorrect.
      
      marmbrus We chatted about this at SparkSummitEU. davies you made a similar change for the infer-schema path in https://github.com/apache/spark/pull/6606
      
      Author: Jason White <jason.white@shopify.com>
      
      Closes #9392 from JasonMWhite/createDataFrame_without_take.
      f92f334c
    • Marcelo Vanzin's avatar
      [SPARK-10997][CORE] Add "client mode" to netty rpc env. · 71d1c907
      Marcelo Vanzin authored
      "Client mode" means the RPC env will not listen for incoming connections.
      This allows certain processes in the Spark stack (such as Executors or
      tha YARN client-mode AM) to act as pure clients when using the netty-based
      RPC backend, reducing the number of sockets needed by the app and also the
      number of open ports.
      
      Client connections are also preferred when endpoints that actually have
      a listening socket are involved; so, for example, if a Worker connects
      to a Master and the Master needs to send a message to a Worker endpoint,
      that client connection will be used, even though the Worker is also
      listening for incoming connections.
      
      With this change, the workaround for SPARK-10987 isn't necessary anymore, and
      is removed. The AM connects to the driver in "client mode", and that connection
      is used for all driver <-> AM communication, and so the AM is properly notified
      when the connection goes down.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9210 from vanzin/SPARK-10997.
      71d1c907
    • jerryshao's avatar
      [SPARK-9817][YARN] Improve the locality calculation of containers by taking... · a930e624
      jerryshao authored
      [SPARK-9817][YARN] Improve the locality calculation of containers by taking pending container requests into consideraion
      
      This is a follow-up PR to further improve the locality calculation by considering the pending container's request. Since the locality preferences of tasks may be shifted from time to time, current localities of pending container requests may not fully match the new preferences, this PR improve it by removing outdated, unmatched container requests and replace with new requests.
      
      sryza please help to review, thanks a lot.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #8100 from jerryshao/SPARK-9817.
      a930e624
Loading