Skip to content
Snippets Groups Projects
  1. Mar 15, 2015
    • OopsOutOfMemory's avatar
      [SPARK-6285][SQL]Remove ParquetTestData in SparkBuild.scala and in README.md · 62ede538
      OopsOutOfMemory authored
      This is a following clean up PR for #5010
      This will resolve issues when launching `hive/console` like below:
      ```
      <console>:20: error: object ParquetTestData is not a member of package org.apache.spark.sql.parquet
             import org.apache.spark.sql.parquet.ParquetTestData
      ```
      
      Author: OopsOutOfMemory <victorshengli@126.com>
      
      Closes #5032 from OopsOutOfMemory/SPARK-6285 and squashes the following commits:
      
      2996aeb [OopsOutOfMemory] remove ParquetTestData
      62ede538
  2. Mar 14, 2015
    • Brennon York's avatar
      [SPARK-5790][GraphX]: VertexRDD's won't zip properly for `diff` capability (added tests) · c49d1566
      Brennon York authored
      Added tests that maropu [created](https://github.com/maropu/spark/blob/1f64794b2ce33e64f340e383d4e8a60639a7eb4b/graphx/src/test/scala/org/apache/spark/graphx/VertexRDDSuite.scala) for vertices with differing partition counts. Wanted to make sure his work got captured /merged as its not in the master branch and I don't believe there's a PR out already for it.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5023 from brennonyork/SPARK-5790 and squashes the following commits:
      
      83bbd29 [Brennon York] added maropu's tests for vertices with differing partition counts
      c49d1566
    • Brennon York's avatar
      [SPARK-6329][Docs]: Minor doc changes for Mesos and TOC · 127268bc
      Brennon York authored
      Updated the configuration docs from the minor items that Reynold had left over from SPARK-1182; specifically I updated the `running-on-mesos` link to point directly to `running-on-mesos#configuration` and upgraded the `yarn`, `mesos`, etc. bullets to `<h5>` tags in hopes that they'll get pushed into the TOC.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5022 from brennonyork/SPARK-6329 and squashes the following commits:
      
      42a10a9 [Brennon York] minor doc fixes
      127268bc
    • Cheng Lian's avatar
      [SPARK-6195] [SQL] Adds in-memory column type for fixed-precision decimals · 5be6b0e4
      Cheng Lian authored
      This PR adds a specialized in-memory column type for fixed-precision decimals.
      
      For all other column types, a single integer column type ID is enough to determine which column type to use. However, this doesn't apply to fixed-precision decimal types with different precision and scale parameters. Moreover, according to the previous design, there seems no trivial way to encode precision and scale information into the columnar byte buffer. On the other hand, considering we always know the data type of the column to be built / scanned ahead of time. This PR no longer use column type ID to construct `ColumnBuilder`s and `ColumnAccessor`s, but resorts to the actual column data type. In this way, we can pass precision / scale information along the way.
      
      The column type ID is now not used anymore and can be removed in a future PR.
      
      ### Micro benchmark result
      
      The following micro benchmark builds a simple table with 2 million decimals (precision = 10, scale = 0), cache it in memory, then count all the rows. Code (simply paste it into Spark shell):
      
      ```scala
      import sc._
      import sqlContext._
      import sqlContext.implicits._
      import org.apache.spark.sql.types._
      import com.google.common.base.Stopwatch
      
      def benchmark(n: Int)(f: => Long) {
        val stopwatch = new Stopwatch()
      
        def run() = {
          stopwatch.reset()
          stopwatch.start()
          f
          stopwatch.stop()
          stopwatch.elapsedMillis()
        }
      
        val records = (0 until n).map(_ => run())
      
        (0 until n).foreach(i => println(s"Round $i: ${records(i)} ms"))
        println(s"Average: ${records.sum / n.toDouble} ms")
      }
      
      // Explicit casting is required because ScalaReflection can't inspect decimal precision
      parallelize(1 to 2000000)
        .map(i => Tuple1(Decimal(i, 10, 0)))
        .toDF("dec")
        .select($"dec" cast DecimalType(10, 0))
        .registerTempTable("dec")
      
      sql("CACHE TABLE dec")
      val df = table("dec")
      
      // Warm up
      df.count()
      df.count()
      
      benchmark(5) {
        df.count()
      }
      ```
      
      With `FIXED_DECIMAL` column type:
      
      - Round 0: 75 ms
      - Round 1: 97 ms
      - Round 2: 75 ms
      - Round 3: 70 ms
      - Round 4: 72 ms
      - Average: 77.8 ms
      
      Without `FIXED_DECIMAL` column type:
      
      - Round 0: 1233 ms
      - Round 1: 1170 ms
      - Round 2: 1171 ms
      - Round 3: 1141 ms
      - Round 4: 1141 ms
      - Average: 1171.2 ms
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4938)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4938 from liancheng/decimal-column-type and squashes the following commits:
      
      fef5338 [Cheng Lian] Updates fixed decimal column type related test cases
      e08ab5b [Cheng Lian] Only resorts to FIXED_DECIMAL when the value can be held in a long
      4db713d [Cheng Lian] Adds in-memory column type for fixed-precision decimals
      5be6b0e4
    • ArcherShao's avatar
      [SQL]Delete some dupliate code in HiveThriftServer2 · ee15404a
      ArcherShao authored
      Author: ArcherShao <ArcherShao@users.noreply.github.com>
      Author: ArcherShao <shaochuan@huawei.com>
      
      Closes #5007 from ArcherShao/20150313 and squashes the following commits:
      
      ae422ae [ArcherShao] Updated
      459efbd [ArcherShao] [SQL]Delete some dupliate code in HiveThriftServer2
      ee15404a
    • Davies Liu's avatar
      [SPARK-6210] [SQL] use prettyString as column name in agg() · b38e073f
      Davies Liu authored
      use prettyString instead of toString() (which include id of expression) as column name in agg()
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5006 from davies/prettystring and squashes the following commits:
      
      cb1fdcf [Davies Liu] use prettyString as column name in agg()
      b38e073f
  3. Mar 13, 2015
    • vinodkc's avatar
      [SPARK-6317][SQL]Fixed HIVE console startup issue · e360d5e4
      vinodkc authored
      Author: vinodkc <vinod.kc.in@gmail.com>
      Author: Vinod K C <vinod.kc@huawei.com>
      
      Closes #5011 from vinodkc/HIVE_console_startupError and squashes the following commits:
      
      b43925f [vinodkc] Changed order of import
      b4f5453 [Vinod K C] Fixed HIVE console startup issue
      e360d5e4
    • Cheng Lian's avatar
      [SPARK-6285] [SQL] Removes unused ParquetTestData and duplicated TestGroupWriteSupport · cdc34ed9
      Cheng Lian authored
      All the contents in this file are not referenced anywhere and should have been removed in #4116 when I tried to get rid of the old Parquet test suites.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5010)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #5010 from liancheng/spark-6285 and squashes the following commits:
      
      06ed057 [Cheng Lian] Removes unused ParquetTestData and duplicated TestGroupWriteSupport
      cdc34ed9
    • Brennon York's avatar
      [SPARK-4600][GraphX]: org.apache.spark.graphx.VertexRDD.diff does not work · b943f5d9
      Brennon York authored
      Turns out, per the [convo on the JIRA](https://issues.apache.org/jira/browse/SPARK-4600), `diff` is acting exactly as should. It became a large misconception as I thought it meant set difference, when in fact it does not. To that extent I merely updated the `diff` documentation to, hopefully, better reflect its true intentions moving forward.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5015 from brennonyork/SPARK-4600 and squashes the following commits:
      
      1e1d1e5 [Brennon York] reverted internal diff docs
      92288f7 [Brennon York] reverted both the test suite and the diff function back to its origin functionality
      f428623 [Brennon York] updated diff documentation to better represent its function
      cc16d65 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600
      66818b9 [Brennon York] added small secondary diff test
      99ad412 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600
      74b8c95 [Brennon York] corrected  method by leveraging bitmask operations to correctly return only the portions of  that are different from the calling VertexRDD
      9717120 [Brennon York] updated diff impl to cause fewer objects to be created
      710a21c [Brennon York] working diff given test case
      aa57f83 [Brennon York] updated to set ShortestPaths to run 'forward' rather than 'backward'
      b943f5d9
    • Xiangrui Meng's avatar
      [SPARK-6278][MLLIB] Mention the change of objective in linear regression · 7f13434a
      Xiangrui Meng authored
      As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. srowen
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4978 from mengxr/SPARK-6278 and squashes the following commits:
      
      fb3bbe6 [Xiangrui Meng] mention regularization parameter
      bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-6278
      375fd09 [Xiangrui Meng] address Sean's comments
      f87ae71 [Xiangrui Meng] mention step size change
      7f13434a
    • Joseph K. Bradley's avatar
      [SPARK-6252] [mllib] Added getLambda to Scala NaiveBayes · dc4abd4d
      Joseph K. Bradley authored
      Note: not relevant for Python API since it only has a static train method
      
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4969 from jkbradley/SPARK-6252 and squashes the following commits:
      
      a471d90 [Joseph K. Bradley] small edits from review
      63eff48 [Joseph K. Bradley] Added getLambda to Scala NaiveBayes
      dc4abd4d
    • Wenchen Fan's avatar
      [CORE][minor] remove unnecessary ClassTag in `DAGScheduler` · ea3d2eed
      Wenchen Fan authored
      This existed at the very beginning, but became unnecessary after [this commit](https://github.com/apache/spark/commit/37d8f37a8ec110416fba0d51d8ba70370ac380c1#diff-6a9ff7fb74fd490a50462d45db2d5e11L272). I think we should remove it if we don't plan to use it in the future.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #4992 from cloud-fan/small and squashes the following commits:
      
      e857f2e [Wenchen Fan] remove unnecessary ClassTag
      ea3d2eed
    • Zhang, Liye's avatar
      [SPARK-6197][CORE] handle json exception when hisotry file not finished writing · 9048e810
      Zhang, Liye authored
      For details, please refer to [SPARK-6197](https://issues.apache.org/jira/browse/SPARK-6197)
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #4927 from liyezhang556520/jsonParseError and squashes the following commits:
      
      5cbdc82 [Zhang, Liye] without unnecessary wrap
      2b48831 [Zhang, Liye] small changes with sean owen's comments
      2973024 [Zhang, Liye] handle json exception when file not finished writing
      9048e810
    • Cheng Lian's avatar
      [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide · 69ff8e8c
      Cheng Lian authored
      Also fixed a bunch of minor styling issues.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5001)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #5001 from liancheng/parquet-doc and squashes the following commits:
      
      89ad3db [Cheng Lian] Addresses @rxin's comments
      7eb6955 [Cheng Lian] Docs for the new Parquet data source
      415eefb [Cheng Lian] Some minor formatting improvements
      69ff8e8c
    • Ilya Ganelin's avatar
      [SPARK-5845][Shuffle] Time to cleanup spilled shuffle files not included in shuffle write time · 0af9ea74
      Ilya Ganelin authored
      I've added a timer in the right place to fix this inaccuracy.
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #4965 from ilganeli/SPARK-5845 and squashes the following commits:
      
      bfabf88 [Ilya Ganelin] Changed to using a foreach vs. getorelse
      3e059b0 [Ilya Ganelin] Switched to using getorelse
      b946d08 [Ilya Ganelin] Fixed error with option
      9434b50 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5845
      db8647e [Ilya Ganelin] Added update for shuffleWriteTime around spilled file cleanup in ExternalSorter
      0af9ea74
  4. Mar 12, 2015
  5. Mar 11, 2015
    • Tathagata Das's avatar
      [SPARK-6128][Streaming][Documentation] Updates to Spark Streaming Programming Guide · cd3b68d9
      Tathagata Das authored
      Updates to the documentation are as follows:
      
      - Added information on Kafka Direct API and Kafka Python API
      - Added joins to the main streaming guide
      - Improved details on the fault-tolerance semantics
      
      Generated docs located here
      http://people.apache.org/~tdas/spark-1.3.0-temp-docs/streaming-programming-guide.html#fault-tolerance-semantics
      
      More things to add:
      - Configuration for Kafka receive rate
      - May be add concurrentJobs
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #4956 from tdas/streaming-guide-update-1.3 and squashes the following commits:
      
      819408c [Tathagata Das] Minor fixes.
      debe484 [Tathagata Das] Added DataFrames and MLlib
      380cf8d [Tathagata Das] Fix link
      04167a6 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-guide-update-1.3
      0b77486 [Tathagata Das] Updates based on Josh's comments.
      86c4c2a [Tathagata Das] Updated streaming guides
      82de92a [Tathagata Das] Add Kafka to Python api docs
      cd3b68d9
    • Tathagata Das's avatar
      [SPARK-6274][Streaming][Examples] Added examples streaming + sql examples. · 51a79a77
      Tathagata Das authored
      Added Scala, Java and Python streaming examples showing DataFrame and SQL operations within streaming.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #4975 from tdas/streaming-sql-examples and squashes the following commits:
      
      705cba1 [Tathagata Das] Fixed python lint error
      75a3fad [Tathagata Das] Fixed python lint error
      5fbf789 [Tathagata Das] Removed empty lines at the end
      874b943 [Tathagata Das] Added examples streaming + sql examples.
      51a79a77
    • Sean Owen's avatar
      SPARK-6245 [SQL] jsonRDD() of empty RDD results in exception · 55c4831d
      Sean Owen authored
      Avoid `UnsupportedOperationException` from JsonRDD.inferSchema on empty RDD.
      
      Not sure if this is supposed to be an error (but a better one), but it seems like this case can come up if the input is down-sampled so much that nothing is sampled.
      
      Now stuff like this:
      ```
      sqlContext.jsonRDD(sc.parallelize(List[String]()))
      ```
      just results in
      ```
      org.apache.spark.sql.DataFrame = []
      ```
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4971 from srowen/SPARK-6245 and squashes the following commits:
      
      3699964 [Sean Owen] Set() -> Set.empty
      3c619e1 [Sean Owen] Avoid UnsupportedOperationException from JsonRDD.inferSchema on empty RDD
      55c4831d
    • Sandy Ryza's avatar
      SPARK-3642. Document the nuances of shared variables. · 2d87a415
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #2490 from sryza/sandy-spark-3642 and squashes the following commits:
      
      aae3340 [Sandy Ryza] SPARK-3642. Document the nuances of broadcast variables
      2d87a415
    • Ilya Ganelin's avatar
      [SPARK-4423] Improve foreach() documentation to avoid confusion between local-... · 548643a9
      Ilya Ganelin authored
      [SPARK-4423] Improve foreach() documentation to avoid confusion between local- and cluster-mode behavior
      
      Hi all - I've added a writeup on how closures work within Spark to help clarify the general case for this problem and similar problems. I hope this addresses the issue and would love any feedback.
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #4696 from ilganeli/SPARK-4423 and squashes the following commits:
      
      c5dc498 [Ilya Ganelin] Fixed typo
      07b78e8 [Ilya Ganelin] Updated to fix capitalization
      48c1983 [Ilya Ganelin] Updated to fix capitalization and clarify wording
      2fd2a07 [Ilya Ganelin] Incoporated a few more minor fixes. Fixed a bug in python code. Added semicolons for java
      4772f99 [Ilya Ganelin] Incorporated latest feedback
      448bd79 [Ilya Ganelin] Updated some verbage and added section links
      5dbbda5 [Ilya Ganelin] Improved some wording
      d374d3a [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-4423
      2600668 [Ilya Ganelin] Minor edits
      c768ab2 [Ilya Ganelin] Updated documentation to add a section on closures. This helps understand confusing behavior of foreach and map functions when attempting to modify variables outside of the scope of an RDD action or transformation
      548643a9
    • Marcelo Vanzin's avatar
      [SPARK-6228] [network] Move SASL classes from network/shuffle to network... · 5b335bdd
      Marcelo Vanzin authored
      .../common.
      
      No code changes. Left the shuffle-related files in the shuffle module.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #4953 from vanzin/SPARK-6228 and squashes the following commits:
      
      664ef30 [Marcelo Vanzin] [SPARK-6228] [network] Move SASL classes from network/shuffle to network/common.
      5b335bdd
    • Sean Owen's avatar
      SPARK-6225 [CORE] [SQL] [STREAMING] Resolve most build warnings, 1.3.0 edition · 6e94c4ea
      Sean Owen authored
      Resolve javac, scalac warnings of various types -- deprecations, Scala lang, unchecked cast, etc.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4950 from srowen/SPARK-6225 and squashes the following commits:
      
      3080972 [Sean Owen] Ordered imports: Java, Scala, 3rd party, Spark
      c67985b [Sean Owen] Resolve javac, scalac warnings of various types -- deprecations, Scala lang, unchecked cast, etc.
      6e94c4ea
    • zzcclp's avatar
      [SPARK-6279][Streaming]In KafkaRDD.scala, Miss expressions flag "s" at logging string · ec30c178
      zzcclp authored
      In KafkaRDD.scala, Miss expressions flag "s" at logging string
      In logging file, it print `Beginning offset $
      {part.fromOffset}
      is the same as ending offset ` but not `Beginning offset 111 is the same as ending offset `.
      
      Author: zzcclp <xm_zzc@sina.com>
      
      Closes #4979 from zzcclp/SPARK-6279 and squashes the following commits:
      
      768f88e [zzcclp] Miss expressions flag "s"
      ec30c178
    • Hongbo Liu's avatar
      [SQL][Minor] fix typo in comments · 40f49795
      Hongbo Liu authored
      Removed an repeated "from" in the comments.
      
      Author: Hongbo Liu <liuhb86@gmail.com>
      
      Closes #4976 from liuhb86/mine and squashes the following commits:
      
      e280e7c [Hongbo Liu] [SQL][Minor] fix typo in comments
      40f49795
    • Sean Owen's avatar
      [MINOR] [DOCS] Fix map -> mapToPair in Streaming Java example · 35b25640
      Sean Owen authored
      Fix map -> mapToPair in Java example. (And zap some unneeded "throws Exception" while here)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4967 from srowen/MapToPairFix and squashes the following commits:
      
      ded2bc0 [Sean Owen] Fix map -> mapToPair in Java example. (And zap some unneeded "throws Exception" while here)
      35b25640
    • Marcelo Vanzin's avatar
      [SPARK-4924] Add a library for launching Spark jobs programmatically. · 517975d8
      Marcelo Vanzin authored
      This change encapsulates all the logic involved in launching a Spark job
      into a small Java library that can be easily embedded into other applications.
      
      The overall goal of this change is twofold, as described in the bug:
      
      - Provide a public API for launching Spark processes. This is a common request
        from users and currently there's no good answer for it.
      
      - Remove a lot of the duplicated code and other coupling that exists in the
        different parts of Spark that deal with launching processes.
      
      A lot of the duplication was due to different code needed to build an
      application's classpath (and the bootstrapper needed to run the driver in
      certain situations), and also different code needed to parse spark-submit
      command line options in different contexts. The change centralizes those
      as much as possible so that all code paths can rely on the library for
      handling those appropriately.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3916 from vanzin/SPARK-4924 and squashes the following commits:
      
      18c7e4d [Marcelo Vanzin] Fix make-distribution.sh.
      2ce741f [Marcelo Vanzin] Add lots of quotes.
      3b28a75 [Marcelo Vanzin] Update new pom.
      a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      897141f [Marcelo Vanzin] Review feedback.
      e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      28cd35e [Marcelo Vanzin] Remove stale comment.
      b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide.
      5f4ddcc [Marcelo Vanzin] Better usage messages.
      92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage.
      6184c07 [Marcelo Vanzin] Rename field.
      4c19196 [Marcelo Vanzin] Update comment.
      7e66c18 [Marcelo Vanzin] Fix pyspark tests.
      0031a8e [Marcelo Vanzin] Review feedback.
      c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows.
      e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark.
      43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher.
      b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher.
      28b1434 [Marcelo Vanzin] Add a comment.
      304333a [Marcelo Vanzin] Fix propagation of properties file arg.
      bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong).
      8ec0243 [Marcelo Vanzin] Add missing newline.
      95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder.
      72da7ec [Marcelo Vanzin] Rename SparkClassLauncher.
      62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path.
      9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private.
      e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public.
      e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      de81da2 [Marcelo Vanzin] Fix CommandUtils.
      86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof.
      b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode."
      0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home.
      7cff919 [Marcelo Vanzin] Javadoc updates.
      eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows.
      e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs.
      f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature.
      7ed8859 [Marcelo Vanzin] Some more feedback.
      54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      61919df [Marcelo Vanzin] Clean leftover debug statement.
      aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode.
      e584fc3 [Marcelo Vanzin] Rework command building a little bit.
      525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines.
      8ac4e92 [Marcelo Vanzin] Minor test cleanup.
      e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher.
      c617539 [Marcelo Vanzin] Review feedback round 1.
      fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode.
      2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048.
      799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      a7936ef [Marcelo Vanzin] Fix pyspark tests.
      656374e [Marcelo Vanzin] Mima fixes.
      4d511e7 [Marcelo Vanzin] Fix tools search code.
      7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn.
      1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes.
      25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing.
      27be98a [Marcelo Vanzin] Modify Spark to use launcher lib.
      6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
      517975d8
    • Xusen Yin's avatar
      [SPARK-5986][MLLib] Add save/load for k-means · 2d4e00ef
      Xusen Yin authored
      This PR adds save/load for K-means as described in SPARK-5986. Python version will be added in another PR.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #4951 from yinxusen/SPARK-5986 and squashes the following commits:
      
      6dd74a0 [Xusen Yin] rewrite some functions and classes
      cd390fd [Xusen Yin] add indexed point
      b144216 [Xusen Yin] remove invalid comments
      dce7055 [Xusen Yin] add save/load for k-means for SPARK-5986
      2d4e00ef
  6. Mar 10, 2015
    • Michael Armbrust's avatar
      [SPARK-5183][SQL] Update SQL Docs with JDBC and Migration Guide · 26723741
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4958 from marmbrus/sqlDocs and squashes the following commits:
      
      9351dbc [Michael Armbrust] fix parquet example
      6877e13 [Michael Armbrust] add sql examples
      d81b7e7 [Michael Armbrust] rxins comments
      e393528 [Michael Armbrust] fix order
      19c2735 [Michael Armbrust] more on data source load/store
      00d5914 [Michael Armbrust] Update SQL Docs with JDBC and Migration Guide
      26723741
    • Reynold Xin's avatar
      Minor doc: Remove the extra blank line in data types javadoc. · 74fb4337
      Reynold Xin authored
      The extra blank line is preventing the first lines from showing up in the package summary page.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4955 from rxin/datatype-docs and squashes the following commits:
      
      1621114 [Reynold Xin] Minor doc: Remove the extra blank line in data types javadoc.
      74fb4337
Loading