Skip to content
Snippets Groups Projects
  1. May 25, 2014
    • Reynold Xin's avatar
      Added PEP8 style configuration file. · 5c7faecd
      Reynold Xin authored
      This sets the max line length to 100 as a PEP8 exception.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #872 from rxin/pep8 and squashes the following commits:
      
      2f26029 [Reynold Xin] Added PEP8 style configuration file.
      5c7faecd
    • Kan Zhang's avatar
      [SPARK-1822] SchemaRDD.count() should use query optimizer · 6052db9d
      Kan Zhang authored
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #841 from kanzhang/SPARK-1822 and squashes the following commits:
      
      2f8072a [Kan Zhang] [SPARK-1822] Minor style update
      cf4baa4 [Kan Zhang] [SPARK-1822] Adding Scaladoc
      e67c910 [Kan Zhang] [SPARK-1822] SchemaRDD.count() should use optimizer
      6052db9d
    • Colin Patrick Mccabe's avatar
      spark-submit: add exec at the end of the script · 6e9fb632
      Colin Patrick Mccabe authored
      Add an 'exec' at the end of the spark-submit script, to avoid keeping a
      bash process hanging around while it runs.  This makes ps look a little
      bit nicer.
      
      Author: Colin Patrick Mccabe <cmccabe@cloudera.com>
      
      Closes #858 from cmccabe/SPARK-1907 and squashes the following commits:
      
      7023b64 [Colin Patrick Mccabe] spark-submit: add exec at the end of the script
      6e9fb632
  2. May 24, 2014
    • Cheng Lian's avatar
      [SPARK-1913][SQL] Bug fix: column pruning error in Parquet support · 5afe6af0
      Cheng Lian authored
      JIRA issue: [SPARK-1913](https://issues.apache.org/jira/browse/SPARK-1913)
      
      When scanning Parquet tables, attributes referenced only in predicates that are pushed down are not passed to the `ParquetTableScan` operator and causes exception.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #863 from liancheng/spark-1913 and squashes the following commits:
      
      f976b73 [Cheng Lian] Addessed the readability issue commented by @rxin
      f5b257d [Cheng Lian] Added back comments deleted by mistake
      ae60ab3 [Cheng Lian] [SPARK-1913] Attributes referenced only in predicates pushed down should remain in ParquetTableScan operator
      5afe6af0
    • Zhen Peng's avatar
      [SPARK-1886] check executor id existence when executor exit · 4e4831b8
      Zhen Peng authored
      Author: Zhen Peng <zhenpeng01@baidu.com>
      
      Closes #827 from zhpengg/bugfix-executor-id-not-found and squashes the following commits:
      
      cd8bb65 [Zhen Peng] bugfix: check executor id existence when executor exit
      4e4831b8
    • Patrick Wendell's avatar
      SPARK-1911: Emphasize that Spark jars should be built with Java 6. · 75a03277
      Patrick Wendell authored
      This commit requires the user to manually say "yes" when buiding Spark
      without Java 6. The prompt can be bypassed with a flag (e.g. if the user
      is scripting around make-distribution).
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #859 from pwendell/java6 and squashes the following commits:
      
      4921133 [Patrick Wendell] Adding Pyspark Notice
      fee8c9e [Patrick Wendell] SPARK-1911: Emphasize that Spark jars should be built with Java 6.
      75a03277
    • Andrew Or's avatar
      [SPARK-1900 / 1918] PySpark on YARN is broken · 5081a0a9
      Andrew Or authored
      If I run the following on a YARN cluster
      ```
      bin/spark-submit sheep.py --master yarn-client
      ```
      it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:
      ```
      bin/spark-submit file:/path/to/sheep.py --master yarn-client
      ```
      However, this also fails. This time it is because python does not understand URI schemes.
      
      This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it.
      
      Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #853 from andrewor14/submit-paths and squashes the following commits:
      
      0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH
      323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell
      3c36587 [Andrew Or] Improve error messages (minor)
      854aa6a [Andrew Or] Guard against NPE if user gives pathological paths
      6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in
      3bb0359 [Andrew Or] Update more comments (minor)
      2a1f8a0 [Andrew Or] Update comments (minor)
      6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
      a68c4d1 [Andrew Or] Handle Windows python file path correctly
      427a250 [Andrew Or] Resolve paths properly for Windows
      a591a4a [Andrew Or] Update tests for resolving URIs
      6c8621c [Andrew Or] Move resolveURIs to Utils
      db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
      f542dce [Andrew Or] Fix outdated tests
      691c4ce [Andrew Or] Ignore special primary resource names
      5342ac7 [Andrew Or] Add missing space in error message
      02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly
      5081a0a9
  3. May 23, 2014
  4. May 22, 2014
    • Tathagata Das's avatar
      Updated scripts for auditing releases · b2bdd0e5
      Tathagata Das authored
      - Added script to automatically generate change list CHANGES.txt
      - Added test for verifying linking against maven distributions of `spark-sql` and `spark-hive`
      - Added SBT projects for testing functionality of `spark-sql` and `spark-hive`
      - Fixed issues in existing tests that might have come up because of changes in Spark 1.0
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #844 from tdas/update-dev-scripts and squashes the following commits:
      
      25090ba [Tathagata Das] Added missing license
      e2e20b3 [Tathagata Das] Updated tests for auditing releases.
      b2bdd0e5
    • Andrew Or's avatar
      [SPARK-1896] Respect spark.master (and --master) before MASTER in spark-shell · cce77457
      Andrew Or authored
      The hierarchy for configuring the Spark master in the shell is as follows:
      ```
      MASTER > --master > spark.master (spark-defaults.conf)
      ```
      This is inconsistent with the way we run normal applications, which is:
      ```
      --master > spark.master (spark-defaults.conf) > MASTER
      ```
      
      I was trying to run a shell locally on a standalone cluster launched through the ec2 scripts, which automatically set `MASTER` in spark-env.sh. It was surprising to me that `--master` didn't take effect, considering that this is the way we tell users to set their masters [here](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark).
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #846 from andrewor14/shell-master and squashes the following commits:
      
      2cb81c9 [Andrew Or] Respect spark.master before MASTER in REPL
      cce77457
    • Andrew Or's avatar
      [SPARK-1897] Respect spark.jars (and --jars) in spark-shell · 8edbee7d
      Andrew Or authored
      Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark), we explicitly tell the users to add the jars this way.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #849 from andrewor14/shell-jars and squashes the following commits:
      
      928a7e6 [Andrew Or] ',' -> "," (minor)
      afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit
      c6da113 [Andrew Or] Do not set spark.jars to ""
      d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell
      8edbee7d
    • Aaron Davidson's avatar
      Fix UISuite unit test that fails under Jenkins contention · f9f5fd5f
      Aaron Davidson authored
      Due to perhaps zombie processes on Jenkins, it seems that at least 10
      Spark ports are in use. It also doesn't matter that the port increases
      when used, it could in fact go down -- the only part that matters is
      that it selects a different port rather than failing to bind.
      Changed test to match this.
      
      Thanks to @andrewor14 for helping diagnose this.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #857 from aarondav/tiny and squashes the following commits:
      
      c199ec8 [Aaron Davidson] Fix UISuite unit test that fails under Jenkins contention
      f9f5fd5f
    • Xiangrui Meng's avatar
      [SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. · dba31402
      Xiangrui Meng authored
      Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0).
      
      `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing!
      
      I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet.
      
      CC: @dbtsai @sryza
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #848 from mengxr/yarn-classpath and squashes the following commits:
      
      23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods
      a40f6ed [Xiangrui Meng] standalone -> cluster
      65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client
      11e5354 [Xiangrui Meng] minor changes
      3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf
      dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn
      dba31402
  5. May 21, 2014
    • Reynold Xin's avatar
      Configuration documentation updates · 2a948e7e
      Reynold Xin authored
      
      1. Add < code > to configuration options
      2. List env variables in tabular format to be consistent with other pages.
      3. Moved Viewing Spark Properties section up.
      
      This is against branch-1.0, but should be cherry picked into master as well.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #851 from rxin/doc-config and squashes the following commits:
      
      28ac0d3 [Reynold Xin] Add <code> to configuration options, and list env variables in a table.
      
      (cherry picked from commit 75af8bd3)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      2a948e7e
    • Takuya UESHIN's avatar
      [SPARK-1889] [SQL] Apply splitConjunctivePredicates to join condition while finding join ke... · bb88875a
      Takuya UESHIN authored
      ...ys.
      
      When tables are equi-joined by multiple-keys `HashJoin` should be used, but `CartesianProduct` and then `Filter` are used.
      The join keys are paired by `And` expression so we need to apply `splitConjunctivePredicates` to join condition while finding join keys.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #836 from ueshin/issues/SPARK-1889 and squashes the following commits:
      
      fe1c387 [Takuya UESHIN] Apply splitConjunctivePredicates to join condition while finding join keys.
      bb88875a
    • Kan Zhang's avatar
      [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark · f18fd05b
      Kan Zhang authored
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #697 from kanzhang/SPARK-1519 and squashes the following commits:
      
      4f8d1ed [Kan Zhang] [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark
      f18fd05b
    • Andrew Or's avatar
      [Typo] Stoped -> Stopped · ba5d4a99
      Andrew Or authored
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #847 from andrewor14/yarn-typo and squashes the following commits:
      
      c1906af [Andrew Or] Stoped -> Stopped
      ba5d4a99
    • Andrew Or's avatar
      [Minor] Move JdbcRDDSuite to the correct package · 7c79ef7d
      Andrew Or authored
      It was in the wrong package
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #839 from andrewor14/jdbc-suite and squashes the following commits:
      
      f948c5a [Andrew Or] cache -> cache()
      b215279 [Andrew Or] Move JdbcRDDSuite to the correct package
      7c79ef7d
    • Andrew Or's avatar
      [Docs] Correct example of creating a new SparkConf · 1014668f
      Andrew Or authored
      The example code on the configuration page currently does not compile.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #842 from andrewor14/conf-docs and squashes the following commits:
      
      aabff57 [Andrew Or] Correct example of creating a new SparkConf
      1014668f
    • Sumedh Mungee's avatar
      [SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-class · 6e337380
      Sumedh Mungee authored
      Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark.
      
      Author: Sumedh Mungee <smungee@gmail.com>
      
      Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits:
      
      26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class
      6e337380
  6. May 20, 2014
    • Tathagata Das's avatar
      [Hotfix] Blacklisted flaky HiveCompatibility test · 7f0cfe47
      Tathagata Das authored
      `lateral_view_outer` query sometimes returns a different set of 10 rows.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #838 from tdas/hive-test-fix2 and squashes the following commits:
      
      9128a0d [Tathagata Das] Blacklisted flaky HiveCompatibility test.
      7f0cfe47
    • Tathagata Das's avatar
      [Spark 1877] ClassNotFoundException when loading RDD with serialized objects · 52eb54d0
      Tathagata Das authored
      Updated version of #821
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      Author: Ghidireac <bogdang@u448a5b0a73d45358d94a.ant.amazon.com>
      
      Closes #835 from tdas/SPARK-1877 and squashes the following commits:
      
      f346f71 [Tathagata Das] Addressed Patrick's comments.
      fee0c5d [Ghidireac] SPARK-1877: ClassNotFoundException when loading RDD with serialized objects
      52eb54d0
  7. May 19, 2014
    • Xiangrui Meng's avatar
      [SPARK-1874][MLLIB] Clean up MLlib sample data · bcb9dce6
      Xiangrui Meng authored
      1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`.
      2. Embedded instructions in the help message of those example apps.
      
      Per discussion with Matei on the JIRA page, new example data is under `data/mllib`.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #833 from mengxr/mllib-sample-data and squashes the following commits:
      
      59f0a18 [Xiangrui Meng] add sample binary classification data
      3c2f92f [Xiangrui Meng] add linear regression data
      050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example
      bcb9dce6
    • Aaron Davidson's avatar
      SPARK-1689: Spark application should die when removed by Master · b0ce22e0
      Aaron Davidson authored
      scheduler.error() will mask the error if there are active tasks. Being removed is a cataclysmic event for Spark applications, and should probably be treated as such.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #832 from aarondav/i-love-u and squashes the following commits:
      
      9f1200f [Aaron Davidson] SPARK-1689: Spark application should die when removed by Master
      b0ce22e0
    • witgo's avatar
      [SPARK-1875]NoClassDefFoundError: StringUtils when building with hadoop 1.x and hive · 6a2c5c61
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #824 from witgo/SPARK-1875_commons-lang-2.6 and squashes the following commits:
      
      ef7231d [witgo] review commit
      ead3c3b [witgo] SPARK-1875:NoClassDefFoundError: StringUtils when building against Hadoop 1
      6a2c5c61
    • Matei Zaharia's avatar
      SPARK-1879. Increase MaxPermSize since some of our builds have many classes · 5af99d76
      Matei Zaharia authored
      See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler.
      
      Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #823 from mateiz/spark-1879 and squashes the following commits:
      
      6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes
      5af99d76
    • zsxwing's avatar
      SPARK-1878: Fix the incorrect initialization order · 1811ba8c
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1878
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #822 from zsxwing/SPARK-1878 and squashes the following commits:
      
      4a47e27 [zsxwing] SPARK-1878: Fix the incorrect initialization order
      1811ba8c
    • Matei Zaharia's avatar
      [SPARK-1876] Windows fixes to deal with latest distribution layout changes · 7b70a707
      Matei Zaharia authored
      - Look for JARs in the right place
      - Launch examples the same way as on Unix
      - Load datanucleus JARs if they exist
      - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
      - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #819 from mateiz/win-fixes and squashes the following commits:
      
      d558f96 [Matei Zaharia] Fix comment
      228577b [Matei Zaharia] Review comments
      d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
      144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
      7b70a707
  8. May 18, 2014
    • Xiangrui Meng's avatar
      [WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0 · df0aa835
      Xiangrui Meng authored
      Some improvements to MLlib guide:
      
      1. [SPARK-1872] Update API links for unidoc.
      2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display.
      3. Add more Java/Python examples.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #816 from mengxr/mllib-doc and squashes the following commits:
      
      ec2e407 [Xiangrui Meng] format scala example for ALS
      cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types
      4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example
      d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles
      561fdc0 [Xiangrui Meng] add a displayTitle option to global layout
      195d06f [Xiangrui Meng] add Java example for summary stats and minor fix
      9f1ff89 [Xiangrui Meng] update java api links in mllib-basics
      7dad18e [Xiangrui Meng] update java api links in NB
      3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python
      35bdeb9 [Xiangrui Meng] api/mllib -> api/scala
      e4afaa8 [Xiangrui Meng] explicity state what might change
      df0aa835
    • Patrick Wendell's avatar
      SPARK-1873: Add README.md file when making distributions · 4ce47932
      Patrick Wendell authored
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #818 from pwendell/reamde and squashes the following commits:
      
      4020b11 [Patrick Wendell] SPARK-1873: Add README.md file when making distributions
      4ce47932
    • Neville Li's avatar
      Fix spark-submit path in spark-shell & pyspark · ebcd2d68
      Neville Li authored
      Author: Neville Li <neville@spotify.com>
      
      Closes #812 from nevillelyh/neville/v1.0 and squashes the following commits:
      
      0dc33ed [Neville Li] Fix spark-submit path in pyspark
      becec64 [Neville Li] Fix spark-submit path in spark-shell
      ebcd2d68
  9. May 17, 2014
    • Patrick Wendell's avatar
      Make deprecation warning less severe · 442808a7
      Patrick Wendell authored
      Just a small change. I think it's good not to scare people who are using the old options.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #810 from pwendell/warnings and squashes the following commits:
      
      cb8a311 [Patrick Wendell] Make deprecation warning less severe
      442808a7
    • Andrew Or's avatar
      [SPARK-1824] Remove <master> from Python examples · cf6cbe9f
      Andrew Or authored
      A recent PR (#552) fixed this for all Scala / Java examples. We need to do it for python too.
      
      Note that this blocks on #799, which makes `bin/pyspark` go through Spark submit. With only the changes in this PR, the only way to run these examples is through Spark submit. Once #799 goes in, you can use `bin/pyspark` to run them too. For example,
      
      ```
      bin/pyspark examples/src/main/python/pi.py 100 --master local-cluster[4,1,512]
      ```
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #802 from andrewor14/python-examples and squashes the following commits:
      
      cf50b9f [Andrew Or] De-indent python comments (minor)
      50f80b1 [Andrew Or] Remove pyFiles from SparkContext construction
      c362f69 [Andrew Or] Update docs to use spark-submit for python applications
      7072c6a [Andrew Or] Merge branch 'master' of github.com:apache/spark into python-examples
      427a5f0 [Andrew Or] Update docs
      d32072c [Andrew Or] Remove <master> from examples + update usages
      cf6cbe9f
    • Andrew Or's avatar
      [SPARK-1808] Route bin/pyspark through Spark submit · 4b8ec6fc
      Andrew Or authored
      **Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`.
      
      **Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent.
      
      **Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest.
      
      For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case.
      
      This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #799 from andrewor14/pyspark-submit and squashes the following commits:
      
      bf37e36 [Andrew Or] Minor changes
      01066fa [Andrew Or] bin/pyspark for Windows
      c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes)
      1866f85 [Andrew Or] Windows is not cooperating
      456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set
      7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
      b7ba0d8 [Andrew Or] Address a few comments (minor)
      06eb138 [Andrew Or] Use shlex instead of writing our own parser
      05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
      a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly
      6fba412 [Andrew Or] Deal with quotes + address various comments
      fe4c8a7 [Andrew Or] Update --help for bin/pyspark
      afe47bf [Andrew Or] Fix spark shell
      f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
      a371d26 [Andrew Or] Route bin/pyspark through Spark submit
      4b8ec6fc
  10. May 16, 2014
    • Patrick Wendell's avatar
      Version bump of spark-ec2 scripts · c0ab85d7
      Patrick Wendell authored
      This will allow us to change things in spark-ec2 related to the 1.0 release.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #809 from pwendell/spark-ec2 and squashes the following commits:
      
      59117fb [Patrick Wendell] Version bump of spark-ec2 scripts
      c0ab85d7
    • Michael Armbrust's avatar
      SPARK-1864 Look in spark conf instead of system properties when propagating... · a80a6a13
      Michael Armbrust authored
      SPARK-1864 Look in spark conf instead of system properties when propagating configuration to executors.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #808 from marmbrus/confClasspath and squashes the following commits:
      
      4c31d57 [Michael Armbrust] Look in spark conf instead of system properties when propagating configuration to executors.
      a80a6a13
    • Matei Zaharia's avatar
      Tweaks to Mesos docs · fed6303f
      Matei Zaharia authored
      - Mention Apache downloads first
      - Shorten some wording
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #806 from mateiz/doc-update and squashes the following commits:
      
      d9345cd [Matei Zaharia] typo
      a179f8d [Matei Zaharia] Tweaks to Mesos docs
      fed6303f
    • Andre Schumacher's avatar
      SPARK-1487 [SQL] Support record filtering via predicate pushdown in Parquet · 40d6acd6
      Andre Schumacher authored
      Simple filter predicates such as LessThan, GreaterThan, etc., where one side is a literal and the other one a NamedExpression are now pushed down to the underlying ParquetTableScan. Here are some results for a microbenchmark with a simple schema of six fields of different types where most records failed the test:
      
                   | Uncompressed  | Compressed
      -------------| ------------- | -------------
      File size  |     10 GB  | 2 GB
      Speedup |      2         | 1.8
      
      Since mileage may vary I added a new option to SparkConf:
      
      `org.apache.spark.sql.parquet.filter.pushdown`
      
      Default value would be `true` and setting it to `false` disables the pushdown. When most rows are expected to pass the filter or when there are few fields performance can be better when pushdown is disabled. The default should fit situations with a reasonable number of (possibly nested) fields where not too many records on average pass the filter.
      
      Because of an issue with Parquet ([see here](https://github.com/Parquet/parquet-mr/issues/371])) currently only predicates on non-nullable attributes are pushed down. If one would know that for a given table no optional fields have missing values one could also allow overriding this.
      
      Author: Andre Schumacher <andre.schumacher@iki.fi>
      
      Closes #511 from AndreSchumacher/parquet_filter and squashes the following commits:
      
      16bfe83 [Andre Schumacher] Removing leftovers from merge during rebase
      7b304ca [Andre Schumacher] Fixing formatting
      c36d5cb [Andre Schumacher] Scalastyle
      3da98db [Andre Schumacher] Second round of review feedback
      7a78265 [Andre Schumacher] Fixing broken formatting in ParquetFilter
      a86553b [Andre Schumacher] First round of code review feedback
      b0f7806 [Andre Schumacher] Optimizing imports in ParquetTestData
      85fea2d [Andre Schumacher] Adding SparkConf setting to disable filter predicate pushdown
      f0ad3cf [Andre Schumacher] Undoing changes not needed for this PR
      210e9cb [Andre Schumacher] Adding disjunctive filter predicates
      a93a588 [Andre Schumacher] Adding unit test for filtering
      6d22666 [Andre Schumacher] Extending ParquetFilters
      93e8192 [Andre Schumacher] First commit Parquet record filtering
      40d6acd6
    • Michael Armbrust's avatar
      [SQL] Implement between in hql · 032d6632
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #804 from marmbrus/between and squashes the following commits:
      
      ae24672 [Michael Armbrust] add golden answer.
      d9997ef [Michael Armbrust] Implement between in hql.
      9bd4433 [Michael Armbrust] Better error on parse failures.
      032d6632
    • Zhen Peng's avatar
      bugfix: overflow of graphx Edge compare function · fa6de408
      Zhen Peng authored
      Author: Zhen Peng <zhenpeng01@baidu.com>
      
      Closes #769 from zhpengg/bugfix-graphx-edge-compare and squashes the following commits:
      
      8a978ff [Zhen Peng] add ut for graphx Edge.lexicographicOrdering.compare
      413c258 [Zhen Peng] there maybe a overflow for two Long's substraction
      fa6de408
Loading