Skip to content
Snippets Groups Projects
  1. Aug 11, 2014
    • Michael Armbrust's avatar
      [SPARK-2650][SQL] Build column buffers in smaller batches · bad21ed0
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1880 from marmbrus/columnBatches and squashes the following commits:
      
      0649987 [Michael Armbrust] add test
      4756fad [Michael Armbrust] fix compilation
      2314532 [Michael Armbrust] Build column buffers in smaller batches
      bad21ed0
    • Takuya UESHIN's avatar
      [SPARK-2968][SQL] Fix nullabilities of Explode. · c686b7dd
      Takuya UESHIN authored
      Output nullabilities of `Explode` could be detemined by `ArrayType.containsNull` or `MapType.valueContainsNull`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1888 from ueshin/issues/SPARK-2968 and squashes the following commits:
      
      d128c95 [Takuya UESHIN] Fix nullability of Explode.
      c686b7dd
    • Takuya UESHIN's avatar
      [SPARK-2965][SQL] Fix HashOuterJoin output nullabilities. · c9c89c31
      Takuya UESHIN authored
      Output attributes of opposite side of `OuterJoin` should be nullable.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1887 from ueshin/issues/SPARK-2965 and squashes the following commits:
      
      bcb2d37 [Takuya UESHIN] Fix HashOuterJoin output nullabilities.
      c9c89c31
    • Yin Huai's avatar
      [SQL] A tiny refactoring in HiveContext#analyze · 647aeba3
      Yin Huai authored
      I should use `EliminateAnalysisOperators` in  `analyze` instead of manually pattern matching.
      
      Author: Yin Huai <huaiyin.thu@gmail.com>
      
      Closes #1881 from yhuai/useEliminateAnalysisOperators and squashes the following commits:
      
      f3e1e7f [Yin Huai] Use EliminateAnalysisOperators.
      647aeba3
    • wangfei's avatar
      [sql]use SparkSQLEnv.stop() in ShutdownHook · e83fdcd4
      wangfei authored
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #1852 from scwf/patch-3 and squashes the following commits:
      
      ae28c29 [wangfei] use SparkSQLEnv.stop() in ShutdownHook
      e83fdcd4
    • Cheng Lian's avatar
      [SPARK-2590][SQL] Added option to handle incremental collection, disabled by default · 21a95ef0
      Cheng Lian authored
      JIRA issue: [SPARK-2590](https://issues.apache.org/jira/browse/SPARK-2590)
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1853 from liancheng/inc-collect-option and squashes the following commits:
      
      cb3ea45 [Cheng Lian] Moved incremental collection option to Thrift server
      43ce3aa [Cheng Lian] Changed incremental collect option name
      623abde [Cheng Lian] Added option to handle incremental collection, disabled by default
      21a95ef0
    • Ahir Reddy's avatar
      [SPARK-2844][SQL] Correctly set JVM HiveContext if it is passed into Python HiveContext constructor · 490ecfa2
      Ahir Reddy authored
      https://issues.apache.org/jira/browse/SPARK-2844
      
      Author: Ahir Reddy <ahirreddy@gmail.com>
      
      Closes #1768 from ahirreddy/python-hive-context-fix and squashes the following commits:
      
      7972d3b [Ahir Reddy] Correctly set JVM HiveContext if it is passed into Python HiveContext constructor
      490ecfa2
    • DB Tsai's avatar
      [SPARK-2934][MLlib] Adding LogisticRegressionWithLBFGS Interface · 6fab941b
      DB Tsai authored
      for training with LBFGS Optimizer which will converge faster than SGD.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #1862 from dbtsai/dbtsai-lbfgs-lor and squashes the following commits:
      
      aa84b81 [DB Tsai] small change
      f852bcd [DB Tsai] Remove duplicate method
      f119fdc [DB Tsai] Formatting
      97776aa [DB Tsai] address more feedback
      85b4a91 [DB Tsai] address feedback
      3cf50c2 [DB Tsai] LogisticRegressionWithLBFGS interface
      6fab941b
    • Doris Xin's avatar
      [SPARK-2515][mllib] Chi Squared test · 32638b5e
      Doris Xin authored
      Author: Doris Xin <doris.s.xin@gmail.com>
      
      Closes #1733 from dorx/chisquare and squashes the following commits:
      
      cafb3a7 [Doris Xin] fixed p-value for extreme case.
      d286783 [Doris Xin] Merge branch 'master' into chisquare
      e95e485 [Doris Xin] reviewer comments.
      7dde711 [Doris Xin] ChiSqTestResult renaming and changed to Class
      80d03e2 [Doris Xin] Reviewer comments.
      c39eeb5 [Doris Xin] units passed with updated API
      e90d90a [Doris Xin] Merge branch 'master' into chisquare
      7eea80b [Doris Xin] WIP
      d64c2fb [Doris Xin] Merge branch 'master' into chisquare
      5686082 [Doris Xin] facelift
      bc7eb2e [Doris Xin] unit passed; still need docs and some refactoring
      50703a5 [Doris Xin] merge master
      4e4e361 [Doris Xin] WIP
      e6b83f3 [Doris Xin] reviewer comments
      3d61582 [Doris Xin] input names
      706d436 [Doris Xin] Added API for RDD[Vector]
      6598379 [Doris Xin] API and code structure.
      ff17423 [Doris Xin] WIP
      32638b5e
    • Josh Rosen's avatar
      [SPARK-2931] In TaskSetManager, reset currentLocalityIndex after recomputing locality levels · 7712e724
      Josh Rosen authored
      This addresses SPARK-2931, a bug where getAllowedLocalityLevel() could throw ArrayIndexOutOfBoundsException.  The fix here is to reset currentLocalityIndex after recomputing the locality levels.
      
      Thanks to kayousterhout, mridulm, and lirui-intel for helping me to debug this.
      
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #1896 from JoshRosen/SPARK-2931 and squashes the following commits:
      
      48b60b5 [Josh Rosen] Move FakeRackUtil.cleanUp() info beforeEach().
      6fec474 [Josh Rosen] Set currentLocalityIndex after recomputing locality levels.
      9384897 [Josh Rosen] Update SPARK-2931 test to reflect changes in 63bdb1f4.
      9ecd455 [Josh Rosen] Apply @mridulm's patch for reproducing SPARK-2931.
      7712e724
    • Reynold Xin's avatar
      [SPARK-2952] Enable logging actor messages at DEBUG level · 37338666
      Reynold Xin authored
      Example messages:
      ```
      14/08/09 21:37:01 DEBUG BlockManagerMasterActor: [actor] received message RegisterBlockManager(BlockManagerId(0, rxin-mbp, 58092, 0),278302556,Actor[akka.tcp://spark@rxin-mbp:58088/user/BlockManagerActor1#-63596539]) from Actor[akka.tcp://spark@rxin-mbp:58088/temp/$c]
      
      14/08/09 21:37:01 DEBUG BlockManagerMasterActor: [actor] handled message (0.279 ms) RegisterBlockManager(BlockManagerId(0, rxin-mbp, 58092, 0),278302556,Actor[akka.tcp://spark@rxin-mbp:58088/user/BlockManagerActor1#-63596539]) from Actor[akka.tcp://spark@rxin-mbp:58088/temp/$c]
      ```
      
      cc @mengxr @tdas @pwendell
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1870 from rxin/actorLogging and squashes the following commits:
      
      c531ee5 [Reynold Xin] Added license header for ActorLogReceive.
      f6b1ebe [Reynold Xin] [SPARK-2952] Enable logging actor messages at DEBUG level
      37338666
    • Josh Rosen's avatar
      [PySpark] [SPARK-2954] [SPARK-2948] [SPARK-2910] [SPARK-2101] Python 2.6 Fixes · db06a81f
      Josh Rosen authored
      - Modify python/run-tests to test with Python 2.6
      - Use unittest2 when running on Python 2.6.
      - Fix issue with namedtuple.
      - Skip TestOutputFormat.test_newhadoop on Python 2.6 until SPARK-2951 is fixed.
      - Fix MLlib _deserialize_double on Python 2.6.
      
      Closes #1868.  Closes #1042.
      
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #1874 from JoshRosen/python2.6 and squashes the following commits:
      
      983d259 [Josh Rosen] [SPARK-2954] Fix MLlib _deserialize_double on Python 2.6.
      5d18fd7 [Josh Rosen] [SPARK-2948] [SPARK-2910] [SPARK-2101] Python 2.6 fixes
      db06a81f
  2. Aug 10, 2014
    • Reynold Xin's avatar
      [SPARK-2936] Migrate Netty network module from Java to Scala · ba28a8fc
      Reynold Xin authored
      The Netty network module was originally written when Scala 2.9.x had a bug that prevents a pure Scala implementation, and a subset of the files were done in Java. We have since upgraded to Scala 2.10, and can migrate all Java files now to Scala.
      
      https://github.com/netty/netty/issues/781
      
      https://github.com/mesos/spark/pull/522
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1865 from rxin/netty and squashes the following commits:
      
      332422f [Reynold Xin] Code review feedback
      ca9eeee [Reynold Xin] Minor update.
      7f1434b [Reynold Xin] [SPARK-2936] Migrate Netty network module from Java to Scala
      ba28a8fc
    • Doris Xin's avatar
      [SPARK-2937] Separate out samplyByKeyExact as its own API in PairRDDFunction · b715aa0c
      Doris Xin authored
      To enable Python consistency and `Experimental` label of the `sampleByKeyExact` API.
      
      Author: Doris Xin <doris.s.xin@gmail.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1866 from dorx/stratified and squashes the following commits:
      
      0ad97b2 [Doris Xin] reviewer comments.
      2948aae [Doris Xin] remove unrelated changes
      e990325 [Doris Xin] Merge branch 'master' into stratified
      555a3f9 [Doris Xin] separate out sampleByKeyExact as its own API
      616e55c [Doris Xin] merge master
      245439e [Doris Xin] moved minSamplingRate to getUpperBound
      eaf5771 [Doris Xin] bug fixes.
      17a381b [Doris Xin] fixed a merge issue and a failed unit
      ea7d27f [Doris Xin] merge master
      b223529 [Xiangrui Meng] use approx bounds for poisson fix poisson mean for waitlisting add unit tests for Java
      b3013a4 [Xiangrui Meng] move math3 back to test scope
      eecee5f [Doris Xin] Merge branch 'master' into stratified
      f4c21f3 [Doris Xin] Reviewer comments
      a10e68d [Doris Xin] style fix
      a2bf756 [Doris Xin] Merge branch 'master' into stratified
      680b677 [Doris Xin] use mapPartitionWithIndex instead
      9884a9f [Doris Xin] style fix
      bbfb8c9 [Doris Xin] Merge branch 'master' into stratified
      ee9d260 [Doris Xin] addressed reviewer comments
      6b5b10b [Doris Xin] Merge branch 'master' into stratified
      254e03c [Doris Xin] minor fixes and Java API.
      4ad516b [Doris Xin] remove unused imports from PairRDDFunctions
      bd9dc6e [Doris Xin] unit bug and style violation fixed
      1fe1cff [Doris Xin] Changed fractionByKey to a map to enable arg check
      944a10c [Doris Xin] [SPARK-2145] Add lower bound on sampling rate
      0214a76 [Doris Xin] cleanUp
      90d94c0 [Doris Xin] merge master
      9e74ab5 [Doris Xin] Separated out most of the logic in sampleByKey
      7327611 [Doris Xin] merge master
      50581fc [Doris Xin] added a TODO for logging in python
      46f6c8c [Doris Xin] fixed the NPE caused by closures being cleaned before being passed into the aggregate function
      7e1a481 [Doris Xin] changed the permission on SamplingUtil
      1d413ce [Doris Xin] fixed checkstyle issues
      9ee94ee [Doris Xin] [SPARK-2082] stratified sampling in PairRDDFunctions that guarantees exact sample size
      e3fd6a6 [Doris Xin] Merge branch 'master' into takeSample
      7cab53a [Doris Xin] fixed import bug in rdd.py
      ffea61a [Doris Xin] SPARK-1939: Refactor takeSample method in RDD
      1441977 [Doris Xin] SPARK-1939 Refactor takeSample method in RDD to use ScaSRS
      b715aa0c
    • Davies Liu's avatar
      [SPARK-2898] [PySpark] fix bugs in deamon.py · 28dcbb53
      Davies Liu authored
      1. do not use signal handler for SIGCHILD, it's easy to cause deadlock
      2. handle EINTR during accept()
      3. pass errno into JVM
      4. handle EAGAIN during fork()
      
      Now, it can pass 50k tasks tests in 180 seconds.
      
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #1842 from davies/qa and squashes the following commits:
      
      f0ea451 [Davies Liu] fix lint
      03a2e8c [Davies Liu] cleanup dead children every seconds
      32cb829 [Davies Liu] fix lint
      0cd0817 [Davies Liu] fix bugs in deamon.py
      28dcbb53
    • Shivaram Venkataraman's avatar
      [SPARK-2950] Add gc time and shuffle write time to JobLogger · 1d03a26a
      Shivaram Venkataraman authored
      The JobLogger is very useful for performing offline performance profiling of Spark jobs. GC Time and Shuffle Write time are available in TaskMetrics but are currently missed from the JobLogger output. This patch adds these two fields.
      
      ~~Since this is a small change, I didn't create a JIRA. Let me know if I should do that.~~
      
      cc kayousterhout
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #1869 from shivaram/job-logger and squashes the following commits:
      
      1b709fc [Shivaram Venkataraman] Add a space before GC_TIME
      c418105 [Shivaram Venkataraman] Add gc time and shuffle write time to JobLogger
      1d03a26a
    • GuoQiang Li's avatar
      Remove extra semicolon in Task.scala · 3570119c
      GuoQiang Li authored
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #1876 from witgo/remove_semicolon_in_Task_scala and squashes the following commits:
      
      c6ea732 [GuoQiang Li] Remove extra semicolon in Task.scala
      3570119c
    • Reynold Xin's avatar
      Turn UpdateBlockInfo into case class. · 482c5afb
      Reynold Xin authored
      This helps us log UpdateBlockInfo properly once #1870 is merged.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1872 from rxin/UpdateBlockInfo and squashes the following commits:
      
      0cee1c2 [Reynold Xin] Turn UpdateBlockInfo into case class.
      482c5afb
    • Reynold Xin's avatar
      Updated Spark SQL README to include the hive-thriftserver module · 5b6585de
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1867 from rxin/sql-readme and squashes the following commits:
      
      42a5307 [Reynold Xin] Updated Spark SQL README to include the hive-thriftserver module
      5b6585de
  3. Aug 09, 2014
    • Kousuke Saruta's avatar
      [SPARK-2894] spark-shell doesn't accept flags · 4f4a9884
      Kousuke Saruta authored
      As sryza reported, spark-shell doesn't accept any flags.
      The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1715, Closes #1864, and Closes #1861
      
      Closes #1825 from sarutak/SPARK-2894 and squashes the following commits:
      
      47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894
      2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py
      98287ed [Kousuke Saruta] Removed useless code from java_gateway.py
      513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces
      28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments
      5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway
      e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
      4f4a9884
    • Chris Cope's avatar
      [SPARK-1766] sorted functions to meet pedantic requirements · e45daf22
      Chris Cope authored
      Pedantry is underrated
      
      Author: Chris Cope <ccope@resilientscience.com>
      
      Closes #1859 from copester/master and squashes the following commits:
      
      0fb4499 [Chris Cope] [SPARK-1766] sorted functions to meet pedantic requirements
      e45daf22
    • Chandan Kumar's avatar
      [SPARK-2861] Fix Doc comment of histogram method · b431e674
      Chandan Kumar authored
      Tested and ready to merge.
      
      Author: Chandan Kumar <chandan.kumar@imaginea.com>
      
      Closes #1786 from nrchandan/spark-2861 and squashes the following commits:
      
      cb0bc1e [Chandan Kumar] [SPARK-2861] Fix a typo in the histogram doc comment
      6a2a71b [Chandan Kumar] SPARK-2861. Fix Doc comment of histogram method
      b431e674
    • li-zhihui's avatar
      [SPARK-2635] Fix race condition at SchedulerBackend.isReady in standalone mode · 28dbae85
      li-zhihui authored
      In SPARK-1946(PR #900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.
      
      Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready.
      
      Author: li-zhihui <zhihui.li@intel.com>
      Author: Li Zhihui <zhihui.li@intel.com>
      
      Closes #1525 from li-zhihui/fixre4s and squashes the following commits:
      
      e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes
      abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes
      ca54bd9 [li-zhihui] Format log with String interpolation
      88c7dc6 [li-zhihui] Few codes and docs refactor
      41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode
      28dbae85
  4. Aug 08, 2014
    • Erik Erlandson's avatar
      [SPARK-2911] apply parent[T](j) to clarify UnionRDD code · 43af2817
      Erik Erlandson authored
      References to dependencies(j) for actually obtaining RDD parents are less common than I originally estimated.   It does clarify UnionRDD (also will clarify some of my other PRs)
      
      Use of firstParent[T] is ubiquitous, but not as sure that benefits from being replaced with parent(0)[T].
      
      Author: Erik Erlandson <eerlands@redhat.com>
      
      Closes #1858 from erikerlandson/spark-2911-pr2 and squashes the following commits:
      
      7ffea74 [Erik Erlandson] [SPARK-2911] apply parent[T](j) to clarify UnionRDD code
      43af2817
    • WangTao's avatar
      [Web UI]Make decision order of Worker's WebUI port consistent with Master's · 1c84dba9
      WangTao authored
      The decision order of Worker's WebUI port is "--webui-port", SPARK_WORKER_WEBUI_POR, 8081(default), spark.worker.ui.port. But in Master, the order is "--webui-port", spark.master.ui.port, SPARK_MASTER_WEBUI_PORT and 8080(default).
      
      So we change the order in Worker's to keep it consistent with Master.
      
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #1838 from WangTaoTheTonic/reOrder and squashes the following commits:
      
      460f4d4 [WangTao] Make decision order of Worker's WebUI consistent with Master's
      1c84dba9
    • GuoQiang Li's avatar
      [SPARK-2897][SPARK-2920]TorrentBroadcast does use the serializer class... · ec79063f
      GuoQiang Li authored
      [SPARK-2897][SPARK-2920]TorrentBroadcast does use the serializer class specified in the spark option "spark.serializer"
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #1836 from witgo/SPARK-2897 and squashes the following commits:
      
      23cdc5b [GuoQiang Li] review commit
      ada4fba [GuoQiang Li] TorrentBroadcast does not support broadcast compression
      fb91792 [GuoQiang Li] org.apache.spark.broadcast.TorrentBroadcast does use the serializer class specified in the spark option "spark.serializer"
      ec79063f
    • Xiangrui Meng's avatar
      [SPARK-1997][MLLIB] update breeze to 0.9 · 74d6f622
      Xiangrui Meng authored
      0.9 dependences (this version doesn't depend on scalalogging and I excluded commons-math3 from its transitive dependencies):
      ~~~
      +-org.scalanlp:breeze_2.10:0.9 [S]
        +-com.github.fommil.netlib:core:1.1.2
        +-com.github.rwl:jtransforms:2.4.0
        +-net.sf.opencsv:opencsv:2.3
        +-net.sourceforge.f2j:arpack_combined_all:0.1
        +-org.scalanlp:breeze-macros_2.10:0.3.1 [S]
        | +-org.scalamacros:quasiquotes_2.10:2.0.0 [S]
        |
        +-org.slf4j:slf4j-api:1.7.5
        +-org.spire-math:spire_2.10:0.7.4 [S]
          +-org.scalamacros:quasiquotes_2.10:2.0.0 [S]
          |
          +-org.spire-math:spire-macros_2.10:0.7.4 [S]
            +-org.scalamacros:quasiquotes_2.10:2.0.0 [S]
      ~~~
      
      Closes #1749
      
      CC: witgo avati
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1857 from mengxr/breeze-0.9 and squashes the following commits:
      
      7fc16b6 [Xiangrui Meng] don't know why but exclude a private method for mima
      dcc502e [Xiangrui Meng] update breeze to 0.9
      74d6f622
    • chutium's avatar
      [SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) should be... · b7c89a7f
      chutium authored
      [SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile
      
      Author: chutium <teng.qiu@gmail.com>
      
      Closes #1691 from chutium/SPARK-2700 and squashes the following commits:
      
      b76ae8c [chutium] [SPARK-2700] [SQL] fixed styling issue
      d75a8bd [chutium] [SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile
      b7c89a7f
    • Yin Huai's avatar
      [SPARK-2919] [SQL] Basic support for analyze command in HiveQl · 45d8f4de
      Yin Huai authored
      The command we will support is
      ```
      ANALYZE TABLE tablename COMPUTE STATISTICS noscan
      ```
      Other cases shown in https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables will still be treated as Hive native commands.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-2919
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1848 from yhuai/sqlAnalyze and squashes the following commits:
      
      0b79d36 [Yin Huai] Typo and format.
      c59d94b [Yin Huai] Support "ANALYZE TABLE tableName COMPUTE STATISTICS noscan".
      45d8f4de
    • Yin Huai's avatar
      [SPARK-2877] [SQL] MetastoreRelation should use SparkClassLoader when creating the tableDesc · c874723f
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-2877
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1806 from yhuai/SPARK-2877 and squashes the following commits:
      
      4142bcb [Yin Huai] Use Spark's classloader.
      c874723f
    • Yin Huai's avatar
      [SPARK-2908] [SQL] JsonRDD.nullTypeToStringType does not convert all NullType to StringType · 0489cee6
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-2908
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1840 from yhuai/SPARK-2908 and squashes the following commits:
      
      86e833e [Yin Huai] Update test.
      cb11759 [Yin Huai] nullTypeToStringType should check columns with the type of array of structs.
      0489cee6
    • Yin Huai's avatar
      [SPARK-2888] [SQL] Fix addColumnMetadataToConf in HiveTableScan · 9016af3f
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-2888
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1817 from yhuai/fixAddColumnMetadataToConf and squashes the following commits:
      
      fba728c [Yin Huai] Fix addColumnMetadataToConf.
      9016af3f
    • Erik Erlandson's avatar
      [SPARK-2911]: provide rdd.parent[T](j) to obtain jth parent RDD · 9a54de16
      Erik Erlandson authored
      Author: Erik Erlandson <eerlands@redhat.com>
      
      Closes #1841 from erikerlandson/spark-2911-pr and squashes the following commits:
      
      4699e2f [Erik Erlandson] [SPARK-2911]: provide rdd.parent[T](j) to obtain jth parent RDD
      9a54de16
  5. Aug 07, 2014
    • Kousuke Saruta's avatar
      [SPARK-2904] Remove non-used local variable in SparkSubmitArguments · 9de6a42b
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #1834 from sarutak/SPARK-2904 and squashes the following commits:
      
      38e7d45 [Kousuke Saruta] Removed non-used variable in SparkSubmitArguments
      9de6a42b
    • Sandy Ryza's avatar
      SPARK-2565. Update ShuffleReadMetrics as blocks are fetched · 4c51098f
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1507 from sryza/sandy-spark-2565 and squashes the following commits:
      
      74dad41 [Sandy Ryza] SPARK-2565. Update ShuffleReadMetrics as blocks are fetched
      4c51098f
    • Matei Zaharia's avatar
      SPARK-2787: Make sort-based shuffle write files directly when there's no... · 6906b69c
      Matei Zaharia authored
      SPARK-2787: Make sort-based shuffle write files directly when there's no sorting/aggregation and # partitions is small
      
      As described in https://issues.apache.org/jira/browse/SPARK-2787, right now sort-based shuffle is more expensive than hash-based for map operations that do no partial aggregation or sorting, such as groupByKey. This is because it has to serialize each data item twice (once when spilling to intermediate files, and then again when merging these files object-by-object). This patch adds a code path to just write separate files directly if the # of output partitions is small, and concatenate them at the end to produce a sorted file.
      
      On the unit test side, I added some tests that force or don't force this bypass path to be used, and checked that our tests for other features (e.g. all the operations) cover both cases.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #1799 from mateiz/SPARK-2787 and squashes the following commits:
      
      88cf26a [Matei Zaharia] Fix rebase
      10233af [Matei Zaharia] Review comments
      398cb95 [Matei Zaharia] Fix looking up shuffle manager in conf
      ca3efd9 [Matei Zaharia] Add docs for shuffle manager properties, and allow short names for them
      d0ae3c5 [Matei Zaharia] Fix some comments
      90d084f [Matei Zaharia] Add code path to bypass merge-sort in ExternalSorter, and tests
      31e5d7c [Matei Zaharia] Move existing logic for writing partitioned files into ExternalSorter
      6906b69c
    • Prashant Sharma's avatar
      SPARK-2899 Doc generation is back to working in new SBT Build. · 32096c2a
      Prashant Sharma authored
      The reason for this bug was introduciton of OldDeps project. It had to be excluded to prevent unidocs from trying to put it on "docs compile" classpath.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1830 from ScrapCodes/doc-fix and squashes the following commits:
      
      e5d52e6 [Prashant Sharma] SPARK-2899 Doc generation is back to working in new SBT Build.
      32096c2a
    • Oleg Danilov's avatar
      SPARK-2905 Fixed path sbin => bin · 80ec5bad
      Oleg Danilov authored
      Author: Oleg Danilov <oleg.danilov@wandisco.com>
      
      Closes #1835 from dosoft/SPARK-2905 and squashes the following commits:
      
      4df423c [Oleg Danilov] SPARK-2905 Fixed path sbin => bin
      80ec5bad
    • Xiangrui Meng's avatar
      [SPARK-2852][MLLIB] Separate model from IDF/StandardScaler algorithms · b9e9e537
      Xiangrui Meng authored
      This is part of SPARK-2828:
      
      1. separate IDF model from IDF algorithm (which generates a model)
      2. separate StandardScaler model from StandardScaler
      
      CC: dbtsai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1814 from mengxr/feature-api-update and squashes the following commits:
      
      40d863b [Xiangrui Meng] move mean and variance to model
      48a0fff [Xiangrui Meng] separate Model from StandardScaler algorithm
      89f3486 [Xiangrui Meng] update IDF to separate Model from Algorithm
      b9e9e537
    • Joseph K. Bradley's avatar
      [mllib] DecisionTree Strategy parameter checks · 8d1dec4f
      Joseph K. Bradley authored
      Added some checks to Strategy to print out meaningful error messages when given invalid DecisionTree parameters.
      CC mengxr
      
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      
      Closes #1821 from jkbradley/dt-robustness and squashes the following commits:
      
      4dc449a [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-robustness
      7a61f7b [Joseph K. Bradley] Added some checks to Strategy to print out meaningful error messages when given invalid DecisionTree parameters
      8d1dec4f
Loading