Skip to content
Snippets Groups Projects
  1. Feb 16, 2015
    • OopsOutOfMemory's avatar
      [SQL] Add fetched row count in SparkSQLCLIDriver · b4d7c703
      OopsOutOfMemory authored
      before this change:
      ```scala
      Time taken: 0.619 seconds
      ```
      
      after this change :
      ```scala
      Time taken: 0.619 seconds, Fetched: 4 row(s)
      ```
      
      Author: OopsOutOfMemory <victorshengli@126.com>
      
      Closes #4604 from OopsOutOfMemory/rowcount and squashes the following commits:
      
      7252dea [OopsOutOfMemory] add fetched row count
      b4d7c703
    • Michael Armbrust's avatar
      [SQL] Initial support for reporting location of error in sql string · 104b2c45
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4587 from marmbrus/position and squashes the following commits:
      
      0810052 [Michael Armbrust] fix tests
      395c019 [Michael Armbrust] Merge remote-tracking branch 'marmbrus/position' into position
      e155dce [Michael Armbrust] more errors
      f3efa51 [Michael Armbrust] Update AnalysisException.scala
      d45ff60 [Michael Armbrust] [SQL] Initial support for reporting location of error in sql string
      104b2c45
    • Daoyuan Wang's avatar
      [SPARK-5824] [SQL] add null format in ctas and set default col comment to null · 275a0c08
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #4609 from adrian-wang/ctas and squashes the following commits:
      
      0a75d5a [Daoyuan Wang] reorder import
      93d1863 [Daoyuan Wang] add null format in ctas and set default col comment to null
      275a0c08
    • Cheng Hao's avatar
      [SQL] [Minor] Update the SpecificMutableRow.copy · cc552e04
      Cheng Hao authored
      When profiling the Join / Aggregate queries via VisualVM, I noticed lots of `SpecificMutableRow` objects created, as well as the `MutableValue`, since the `SpecificMutableRow` are mostly used in data source implementation, but the `copy` method could be called multiple times in upper modules (e.g. in Join / aggregation etc.), duplicated instances created should be avoid.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4619 from chenghao-intel/specific_mutable_row and squashes the following commits:
      
      9300d23 [Cheng Hao] update the SpecificMutableRow.copy
      cc552e04
    • Sean Owen's avatar
      SPARK-5795 [STREAMING] api.java.JavaPairDStream.saveAsNewAPIHadoopFiles may not friendly to java · 8e25373c
      Sean Owen authored
      Revise JavaPairDStream API declaration on saveAs Hadoop methods, to allow it to be called directly as intended.
      
      CC tdas for review
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4608 from srowen/SPARK-5795 and squashes the following commits:
      
      36f1ead [Sean Owen] Add code that shows compile problem and fix
      036bd27 [Sean Owen] Revise JavaPairDStream API declaration on saveAs Hadoop methods, to allow it to be called directly as intended.
      8e25373c
    • Reynold Xin's avatar
    • Liang-Chi Hsieh's avatar
      [SPARK-5799][SQL] Compute aggregation function on specified numeric columns · 5c78be7a
      Liang-Chi Hsieh authored
      Compute aggregation function on specified numeric columns. For example:
      
          val df = Seq(("a", 1, 0, "b"), ("b", 2, 4, "c"), ("a", 2, 3, "d")).toDataFrame("key", "value1", "value2", "rest")
          df.groupBy("key").min("value2")
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4592 from viirya/specific_cols_agg and squashes the following commits:
      
      9446896 [Liang-Chi Hsieh] For comments.
      314c4cd [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      353fad7 [Liang-Chi Hsieh] For python unit tests.
      54ed0c4 [Liang-Chi Hsieh] Address comments.
      b079e6b [Liang-Chi Hsieh] Remove duplicate codes.
      55100fb [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      880c2ac [Liang-Chi Hsieh] Fix Python style checks.
      4c63a01 [Liang-Chi Hsieh] Fix pyspark.
      b1a24fc [Liang-Chi Hsieh] Address comments.
      2592f29 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into specific_cols_agg
      27069c3 [Liang-Chi Hsieh] Combine functions and add varargs annotation.
      371a3f7 [Liang-Chi Hsieh] Compute aggregation function on specified numeric columns.
      5c78be7a
    • Sean Owen's avatar
      SPARK-5815 [MLLIB] Part 2. Deprecate SVDPlusPlus APIs that expose DoubleMatrix from JBLAS · a3afa4a1
      Sean Owen authored
      Now, deprecated runSVDPlusPlus and update run, for 1.4.0 / master only
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4625 from srowen/SPARK-5815.2 and squashes the following commits:
      
      6fd2ca5 [Sean Owen] Now, deprecated runSVDPlusPlus and update run, for 1.4.0 / master only
      a3afa4a1
    • Xutingjun's avatar
      [SPARK-5831][Streaming]When checkpoint file size is bigger than 10, then delete the old ones · 1115e8e7
      Xutingjun authored
      Author: Xutingjun <1039320815@qq.com>
      
      Closes #4621 from XuTingjun/checkpointFile and squashes the following commits:
      
      b5f2816 [Xutingjun] fix bug
      1115e8e7
    • Cheng Lian's avatar
      [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet data source with the newly... · 3ce58cf9
      Cheng Lian authored
      [SPARK-4553] [SPARK-5767] [SQL] Wires Parquet data source with the newly introduced write support for data source API
      
      This PR migrates the Parquet data source to the new data source write support API.  Now users can also overwriting and appending to existing tables. Notice that inserting into partitioned tables is not supported yet.
      
      When Parquet data source is enabled, insertion to Hive Metastore Parquet tables is also fullfilled by the Parquet data source. This is done by the newly introduced `HiveMetastoreCatalog.ParquetConversions` rule, which is a "proper" implementation of the original hacky `HiveStrategies.ParquetConversion`. The latter is still preserved, and can be removed together with the old Parquet support in the future.
      
      TODO:
      
      - [x] Update outdated comments in `newParquet.scala`.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4563)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4563 from liancheng/parquet-refining and squashes the following commits:
      
      fa98d27 [Cheng Lian] Fixes test cases which should disable off Parquet data source
      2476e82 [Cheng Lian] Fixes compilation error introduced during rebasing
      a83d290 [Cheng Lian] Passes Hive Metastore partitioning information to ParquetRelation2
      3ce58cf9
    • Cheng Lian's avatar
      [Minor] [SQL] Renames stringRddToDataFrame to stringRddToDataFrameHolder for consistency · 199a9e80
      Cheng Lian authored
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4613)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4613 from liancheng/df-implicit-rename and squashes the following commits:
      
      db8bdd3 [Cheng Lian] Renames stringRddToDataFrame to stringRddToDataFrameHolder for consistency
      199a9e80
    • Peter Rudenko's avatar
      [Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop · d51d6ba1
      Peter Rudenko authored
      On a big dataset explicitly unpersist train and validation folds allows to load more data into memory in the next loop iteration. On my environment (single node 8Gb worker RAM, 2 GB dataset file, 3 folds for cross validation), saved more than 5 minutes.
      
      Author: Peter Rudenko <petro.rudenko@gmail.com>
      
      Closes #4595 from petro-rudenko/patch-2 and squashes the following commits:
      
      66a7cfb [Peter Rudenko] Move validationDataset cache to declaration
      c5f3265 [Peter Rudenko] [Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop
      d51d6ba1
  2. Feb 15, 2015
    • Peter Rudenko's avatar
      [Ml] SPARK-5796 Don't transform data on a last estimator in Pipeline · c78a12c4
      Peter Rudenko authored
      If it's a last estimator in Pipeline there's no need to transform data, since there's no next stage that would consume this data.
      
      Author: Peter Rudenko <petro.rudenko@gmail.com>
      
      Closes #4590 from petro-rudenko/patch-1 and squashes the following commits:
      
      d13ec33 [Peter Rudenko] [Ml] SPARK-5796 Don't transform data on a last estimator in Pipeline
      c78a12c4
    • Sean Owen's avatar
      SPARK-5815 [MLLIB] Deprecate SVDPlusPlus APIs that expose DoubleMatrix from JBLAS · acf2558d
      Sean Owen authored
      Deprecate SVDPlusPlus.run and introduce SVDPlusPlus.runSVDPlusPlus with return type that doesn't include DoubleMatrix
      
      CC mengxr
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4614 from srowen/SPARK-5815 and squashes the following commits:
      
      288cb05 [Sean Owen] Clarify deprecation plans in scaladoc
      497458e [Sean Owen] Deprecate SVDPlusPlus.run and introduce SVDPlusPlus.runSVDPlusPlus with return type that doesn't include DoubleMatrix
      acf2558d
    • Xiangrui Meng's avatar
      [SPARK-5769] Set params in constructors and in setParams in Python ML pipelines · cd4a1536
      Xiangrui Meng authored
      This PR allow Python users to set params in constructors and in setParams, where we use decorator `keyword_only` to force keyword arguments. The trade-off is discussed in the design doc of SPARK-4586.
      
      Generated doc:
      ![screen shot 2015-02-12 at 3 06 58 am](https://cloud.githubusercontent.com/assets/829644/6166491/9cfcd06a-b265-11e4-99ea-473d866634fc.png)
      
      CC: davies rxin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4564 from mengxr/py-pipeline-kw and squashes the following commits:
      
      fedf720 [Xiangrui Meng] use toDF
      d565f2c [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into py-pipeline-kw
      cbc15d3 [Xiangrui Meng] fix style
      5032097 [Xiangrui Meng] update pipeline signature
      950774e [Xiangrui Meng] simplify keyword_only and update constructor/setParams signatures
      fdde5fc [Xiangrui Meng] fix style
      c9384b8 [Xiangrui Meng] fix sphinx doc
      8e59180 [Xiangrui Meng] add setParams and make constructors take params, where we force keyword args
      cd4a1536
    • Sean Owen's avatar
      SPARK-5669 [BUILD] Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS · 836577b3
      Sean Owen authored
      Exclude libgfortran, libgcc bundled by JBLAS for Windows. This much is simple, and solves the essential license issue. But the more important question is whether MLlib works on Windows then.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4453 from srowen/SPARK-5669 and squashes the following commits:
      
      734dd86 [Sean Owen] Exclude libgfortran, libgcc bundled by JBLAS, affecting Windows / OS X / Linux 32-bit (not Linux 64-bit)
      836577b3
    • martinzapletal's avatar
      [MLLIB][SPARK-5502] User guide for isotonic regression · 61eb1267
      martinzapletal authored
      User guide for isotonic regression added to docs/mllib-regression.md including code examples for Scala and Java.
      
      Author: martinzapletal <zapletal-martin@email.cz>
      
      Closes #4536 from zapletal-martin/SPARK-5502 and squashes the following commits:
      
      67fe773 [martinzapletal] SPARK-5502 reworded model prediction rules to use more general language rather than the code/implementation specific terms
      80bd4c3 [martinzapletal] SPARK-5502 created docs page for isotonic regression, added links to the page, updated data and examples
      7d8136e [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
      504b5c3 [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
      61eb1267
    • Takeshi Yamamuro's avatar
      [SPARK-5827][SQL] Add missing import in the example of SqlContext · c771e475
      Takeshi Yamamuro authored
      If one tries an example by using copy&paste, throw an exception.
      
      Author: Takeshi Yamamuro <linguin.m.s@gmail.com>
      
      Closes #4615 from maropu/AddMissingImportInSqlContext and squashes the following commits:
      
      ab21b66 [Takeshi Yamamuro] Add missing import in the example of SqlContext
      c771e475
  3. Feb 14, 2015
    • gli's avatar
      SPARK-5822 [BUILD] cannot import src/main/scala & src/test/scala into eclipse as source folder · ed5f4bb7
      gli authored
         When import the whole project into eclipse as maven project, found that the
         src/main/scala & src/test/scala can not be set as source folder as default
         behavior, so add a "add-source" goal in scala-maven-plugin to let this work.
      
      Author: gli <gli@redhat.com>
      
      Closes #4531 from ligangty/addsource and squashes the following commits:
      
      4e4db4c [gli] [IDE] cannot import src/main/scala & src/test/scala into eclipse as source folder
      ed5f4bb7
    • Sean Owen's avatar
      Revise formatting of previous commit f80e2629 · 15a2ab5f
      Sean Owen authored
      15a2ab5f
    • gasparms's avatar
      [SPARK-5800] Streaming Docs. Change linked files according the selected language · f80e2629
      gasparms authored
      Currently, Spark Streaming Programming Guide after updateStateByKey  explanation links to file stateful_network_wordcount.py and note "For the complete Scala code ..." for any language tab selected. This is an incoherence.
      
      I've changed the guide and link its pertinent example file. JavaStatefulNetworkWordCount.java example was not created so I added to the commit.
      
      Author: gasparms <gmunoz@stratio.com>
      
      Closes #4589 from gasparms/feature/streaming-guide and squashes the following commits:
      
      7f37f89 [gasparms] More style changes
      ec202b0 [gasparms] Follow spark style guide
      f527328 [gasparms] Improve example to look like scala example
      4d8785c [gasparms] Remove throw exception
      e92e6b8 [gasparms] Fix incoherence
      92db405 [gasparms] Fix Streaming Programming Guide. Change files according the selected language
      f80e2629
    • Reynold Xin's avatar
      [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames · e98dfe62
      Reynold Xin authored
      - The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
      - toDataFrame -> toDF
      - Dsl -> functions
      - implicits moved into SQLContext.implicits
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      
      Python changes:
      - toDataFrame -> toDF
      - Dsl -> functions package
      - addColumn -> withColumn
      - renameColumn -> withColumnRenamed
      - add toDF functions to RDD on SQLContext init
      - add flatMap to DataFrame
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4556 from rxin/SPARK-5752 and squashes the following commits:
      
      5ef9910 [Reynold Xin] More fix
      61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
      ff5832c [Reynold Xin] Fix python
      749c675 [Reynold Xin] count(*) fixes.
      5806df0 [Reynold Xin] Fix build break again.
      d941f3d [Reynold Xin] Fixed explode compilation break.
      fe1267a [Davies Liu] flatMap
      c4afb8e [Reynold Xin] style
      d9de47f [Davies Liu] add comment
      b783994 [Davies Liu] add comment for toDF
      e2154e5 [Davies Liu] schema() -> schema
      3a1004f [Davies Liu] Dsl -> functions, toDF()
      fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      97dd47c [Davies Liu] fix mistake
      6168f74 [Davies Liu] fix test
      1fc0199 [Davies Liu] fix test
      a075cd5 [Davies Liu] clean up, toPandas
      663d314 [Davies Liu] add test for agg('*')
      9e214d5 [Reynold Xin] count(*) fixes.
      1ed7136 [Reynold Xin] Fix build break again.
      921b2e3 [Reynold Xin] Fixed explode compilation break.
      14698d4 [Davies Liu] flatMap
      ba3e12d [Reynold Xin] style
      d08c92d [Davies Liu] add comment
      5c8b524 [Davies Liu] add comment for toDF
      a4e5e66 [Davies Liu] schema() -> schema
      d377fc9 [Davies Liu] Dsl -> functions, toDF()
      6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
      807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
      e98dfe62
  4. Feb 13, 2015
    • Sean Owen's avatar
      SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlus · 0ce4e430
      Sean Owen authored
      This just unpersist()s each RDD in this code that was cache()ed.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4234 from srowen/SPARK-3290 and squashes the following commits:
      
      66c1e11 [Sean Owen] unpersist() each RDD that was cache()ed
      0ce4e430
    • Josh Rosen's avatar
      [SPARK-5227] [SPARK-5679] Disable FileSystem cache in WholeTextFileRecordReaderSuite · d06d5ee9
      Josh Rosen authored
      This patch fixes two difficult-to-reproduce Jenkins test failures in InputOutputMetricsSuite (SPARK-5227 and SPARK-5679).  The problem was that WholeTextFileRecordReaderSuite modifies the `fs.local.block.size` Hadoop configuration and this change was affecting subsequent test suites due to Hadoop's caching of FileSystem instances (see HADOOP-8490 for more details).
      
      The fix implemented here is to disable FileSystem caching in WholeTextFileRecordReaderSuite.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4599 from JoshRosen/inputoutputsuite-fix and squashes the following commits:
      
      47dc447 [Josh Rosen] [SPARK-5227] [SPARK-5679] Disable FileSystem cache in WholeTextFileRecordReaderSuite
      d06d5ee9
    • Xiangrui Meng's avatar
      [SPARK-5730][ML] add doc groups to spark.ml components · 4f4c6d5a
      Xiangrui Meng authored
      This PR adds three groups to the ScalaDoc: `param`, `setParam`, and `getParam`. Params will show up in the generated Scala API doc as the top group. Setters/getters will be at the bottom.
      
      Preview:
      
      ![screen shot 2015-02-13 at 2 47 49 pm](https://cloud.githubusercontent.com/assets/829644/6196657/5740c240-b38f-11e4-94bb-bd8ef5a796c5.png)
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4600 from mengxr/SPARK-5730 and squashes the following commits:
      
      febed9a [Xiangrui Meng] add doc groups to spark.ml components
      4f4c6d5a
    • Xiangrui Meng's avatar
      [SPARK-5803][MLLIB] use ArrayBuilder to build primitive arrays · d50a91d5
      Xiangrui Meng authored
      because ArrayBuffer is not specialized.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4594 from mengxr/SPARK-5803 and squashes the following commits:
      
      1261bd5 [Xiangrui Meng] merge master
      a4ea872 [Xiangrui Meng] use ArrayBuilder to build primitive arrays
      d50a91d5
    • Xiangrui Meng's avatar
      [SPARK-5806] re-organize sections in mllib-clustering.md · cc56c872
      Xiangrui Meng authored
      Put example code close to the algorithm description.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4598 from mengxr/SPARK-5806 and squashes the following commits:
      
      a137872 [Xiangrui Meng] re-organize sections in mllib-clustering.md
      cc56c872
    • Yin Huai's avatar
      [SPARK-5789][SQL]Throw a better error message if JsonRDD.parseJson encounters... · 2e0c0845
      Yin Huai authored
      [SPARK-5789][SQL]Throw a better error message if JsonRDD.parseJson encounters unrecoverable parsing errors.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4582 from yhuai/jsonErrorMessage and squashes the following commits:
      
      152dbd4 [Yin Huai] Update error message.
      1466256 [Yin Huai] Throw a better error message when a JSON object in the input dataset span multiple records (lines for files or strings for an RDD of strings).
      2e0c0845
    • Daoyuan Wang's avatar
      [SPARK-5642] [SQL] Apply column pruning on unused aggregation fields · 2cbb3e43
      Daoyuan Wang authored
      select k from (select key k, max(value) v from src group by k) t
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4415 from adrian-wang/groupprune and squashes the following commits:
      
      5d2d8a3 [Daoyuan Wang] address Michael's comments
      61f8ef7 [Daoyuan Wang] add a unit test
      80ddcc6 [Daoyuan Wang] keep project
      b69d385 [Daoyuan Wang] add a prune rule for grouping set
      2cbb3e43
    • Andrew Or's avatar
      5d3cc6b3
    • Reynold Xin's avatar
      [HOTFIX] Ignore DirectKafkaStreamSuite. · 378c7eb0
      Reynold Xin authored
      378c7eb0
    • Emre Sevinç's avatar
      SPARK-5805 Fixed the type error in documentation. · 9f31db06
      Emre Sevinç authored
      Fixes SPARK-5805 : Fix the type error in the final example given in MLlib - Clustering documentation.
      
      Author: Emre Sevinç <emre.sevinc@gmail.com>
      
      Closes #4596 from emres/SPARK-5805 and squashes the following commits:
      
      1029f66 [Emre Sevinç] SPARK-5805 Fixed the type error in documentation.
      9f31db06
    • Josh Rosen's avatar
      [SPARK-5735] Replace uses of EasyMock with Mockito · 077eec2d
      Josh Rosen authored
      This patch replaces all uses of EasyMock with Mockito.  There are two motivations for this:
      
      1. We should use a single mocking framework in our tests in order to keep things consistent.
      2. EasyMock may be responsible for non-deterministic unit test failures due to its Objensis dependency (see SPARK-5626 for more details).
      
      Most of these changes are fairly mechanical translations of Mockito code to EasyMock, although I made a small change that strengthens the assertions in one test in KinesisReceiverSuite.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4578 from JoshRosen/SPARK-5735-remove-easymock and squashes the following commits:
      
      0ab192b [Josh Rosen] Import sorting plus two minor changes to more closely match old semantics.
      977565b [Josh Rosen] Remove EasyMock from build.
      fae1d8f [Josh Rosen] Remove EasyMock usage in KinesisReceiverSuite.
      7cca486 [Josh Rosen] Remove EasyMock usage in MesosSchedulerBackendSuite
      fc5e94d [Josh Rosen] Remove EasyMock in CacheManagerSuite
      077eec2d
    • Ryan Williams's avatar
      [SPARK-5783] Better eventlog-parsing error messages · fc6d3e79
      Ryan Williams authored
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #4573 from ryan-williams/history and squashes the following commits:
      
      a8647ec [Ryan Williams] fix test calls to .replay()
      98aa3fe [Ryan Williams] include filename in history-parsing error message
      8deecf0 [Ryan Williams] add line number to history-parsing error message
      b668b52 [Ryan Williams] add log info line to history-eventlog parsing
      fc6d3e79
    • sboeschhuawei's avatar
      [SPARK-5503][MLLIB] Example code for Power Iteration Clustering · e1a1ff81
      sboeschhuawei authored
      Author: sboeschhuawei <stephen.boesch@huawei.com>
      
      Closes #4495 from javadba/picexamples and squashes the following commits:
      
      3c84b14 [sboeschhuawei] PIC Examples updates from Xiangrui's comments round 5
      2878675 [sboeschhuawei] Fourth round with xiangrui on PICExample
      d7ac350 [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3
      d7f0cba [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3
      cef28f4 [sboeschhuawei] Further updates to PICExample from Xiangrui's comments
      f7ff43d [sboeschhuawei] Update to PICExample from Xiangrui's comments
      efeec45 [sboeschhuawei] Update to PICExample from Xiangrui's comments
      03e8de4 [sboeschhuawei] Added PICExample
      c509130 [sboeschhuawei] placeholder for pic examples
      5864d4a [sboeschhuawei] placeholder for pic examples
      e1a1ff81
    • uncleGen's avatar
      [SPARK-5732][CORE]:Add an option to print the spark version in spark script. · c0ccd256
      uncleGen authored
      Naturally, we may need to add an option to print the spark version in spark script. It is pretty common in script tool.
      ![9](https://cloud.githubusercontent.com/assets/7402327/6183331/cab1b74e-b38e-11e4-9daa-e26e6015cff3.JPG)
      
      Author: uncleGen <hustyugm@gmail.com>
      Author: genmao.ygm <genmao.ygm@alibaba-inc.com>
      
      Closes #4522 from uncleGen/master-clean-150211 and squashes the following commits:
      
      9f2127c [genmao.ygm] revert the behavior of "-v"
      015ddee [uncleGen] minor changes
      463f02c [uncleGen] minor changes
      c0ccd256
    • WangTaoTheTonic's avatar
      [SPARK-4832][Deploy]some other processes might take the daemon pid · 1768bd51
      WangTaoTheTonic authored
      Some other processes might use the pid saved in pid file. In that case we should ignore it and launch daemons.
      
      JIRA is down for maintenance. I will file one once it return.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #3683 from WangTaoTheTonic/otherproc and squashes the following commits:
      
      daa86a1 [WangTaoTheTonic] some bash style fix
      8befee7 [WangTaoTheTonic] handle the mistake scenario
      cf4ecc6 [WangTaoTheTonic] remove redundant condition
      f36cfb4 [WangTaoTheTonic] some other processes might take the pid
      1768bd51
    • tianyi's avatar
      [SPARK-3365][SQL]Wrong schema generated for List type · 1c8633f3
      tianyi authored
      This PR fix the issue SPARK-3365.
      The reason is Spark generated wrong schema for the type `List` in `ScalaReflection.scala`
      for example:
      
      the generated schema for type `Seq[String]` is:
      ```
      {"name":"x","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}}`
      ```
      
      the generated schema for type `List[String]` is:
      ```
      {"name":"x","type":{"type":"struct","fields":[]},"nullable":true,"metadata":{}}`
      ```
      
      Author: tianyi <tianyi.asiainfo@gmail.com>
      
      Closes #4581 from tianyi/SPARK-3365 and squashes the following commits:
      
      a097e86 [tianyi] change the order of resolution in ScalaReflection.scala
      1c8633f3
  5. Feb 12, 2015
Loading