Skip to content
Snippets Groups Projects
  1. Jul 24, 2015
    • Cheolsoo Park's avatar
      [SPARK-9270] [PYSPARK] allow --name option in pyspark · 9a113961
      Cheolsoo Park authored
      This is continuation of #7512 which added `--name` option to spark-shell. This PR adds the same option to pyspark.
      
      Note that `--conf spark.app.name` in command-line has no effect in spark-shell and pyspark. Instead, `--name` must be used. This is in fact inconsistency with spark-sql which doesn't accept `--name` option while it accepts `--conf spark.app.name`. I am not fixing this inconsistency in this PR. IMO, one of `--name` and `--conf spark.app.name` is needed not both. But since I cannot decide which to choose, I am not making any change here.
      
      Author: Cheolsoo Park <cheolsoop@netflix.com>
      
      Closes #7610 from piaozhexiu/SPARK-9270 and squashes the following commits:
      
      763e86d [Cheolsoo Park] Update windows script
      400b7f9 [Cheolsoo Park] Allow --name option to pyspark
      9a113961
    • Marcelo Vanzin's avatar
      [SPARK-9261] [STREAMING] Avoid calling APIs that expose shaded classes. · 8399ba14
      Marcelo Vanzin authored
      Doing this may cause weird errors when tests are run on maven, depending
      on the flags used. Instead, expose the needed functionality through methods
      that do not expose shaded classes.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7601 from vanzin/SPARK-9261 and squashes the following commits:
      
      4f64a16 [Marcelo Vanzin] [SPARK-9261] [streaming] Avoid calling APIs that expose shaded classes.
      8399ba14
    • Josh Rosen's avatar
      [SPARK-9295] Analysis should detect sorting on unsupported column types · 6aceaf3d
      Josh Rosen authored
      This patch extends CheckAnalysis to throw errors for queries that try to sort on unsupported column types, such as ArrayType.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7633 from JoshRosen/SPARK-9295 and squashes the following commits:
      
      23b2fbf [Josh Rosen] Embed function in foreach
      bfe1451 [Josh Rosen] Update to allow sorting by null literals
      2f1b802 [Josh Rosen] Add analysis rule to detect sorting on unsupported column types (SPARK-9295)
      6aceaf3d
    • MechCoder's avatar
      [SPARK-9222] [MLlib] Make class instantiation variables in DistributedLDAModel private[clustering] · e2531245
      MechCoder authored
      This makes it easier to test all the class variables of the DistributedLDAmodel.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7573 from MechCoder/lda_test and squashes the following commits:
      
      2f1a293 [MechCoder] [SPARK-9222] [MLlib] Make class instantiation variables in DistributedLDAModel private[clustering]
      e2531245
    • Josh Rosen's avatar
      [SPARK-9292] Analysis should check that join conditions' data types are BooleanType · c2b50d69
      Josh Rosen authored
      This patch adds an analysis check to ensure that join conditions' data types are BooleanType. This check is necessary in order to report proper errors for non-boolean DataFrame join conditions.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7630 from JoshRosen/SPARK-9292 and squashes the following commits:
      
      aec6c7b [Josh Rosen] Check condition type in resolved()
      75a3ea6 [Josh Rosen] Fix SPARK-9292.
      c2b50d69
    • Reynold Xin's avatar
      [SPARK-9305] Rename org.apache.spark.Row to Item. · c8d71a41
      Reynold Xin authored
      It's a thing used in test cases, but named Row. Pretty annoying because everytime I search for Row, it shows up before the Spark SQL Row, which is what a developer wants most of the time.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7638 from rxin/remove-row and squashes the following commits:
      
      aeda52d [Reynold Xin] [SPARK-9305] Rename org.apache.spark.Row to Item.
      c8d71a41
    • Reynold Xin's avatar
      [SPARK-9285][SQL] Remove InternalRow's inheritance from Row. · 431ca39b
      Reynold Xin authored
      I also changed InternalRow's size/length function to numFields, to make it more obvious that it is not about bytes, but the number of fields.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7626 from rxin/internalRow and squashes the following commits:
      
      e124daf [Reynold Xin] Fixed test case.
      805ceb7 [Reynold Xin] Commented out the failed test suite.
      f8a9ca5 [Reynold Xin] Fixed more bugs. Still at least one more remaining.
      76d9081 [Reynold Xin] Fixed data sources.
      7807f70 [Reynold Xin] Fixed DataFrameSuite.
      cb60cd2 [Reynold Xin] Code review & small bug fixes.
      0a2948b [Reynold Xin] Fixed style.
      3280d03 [Reynold Xin] [SPARK-9285][SQL] Remove InternalRow's inheritance from Row.
      431ca39b
    • Yu ISHIKAWA's avatar
      [SPARK-9249] [SPARKR] local variable assigned but may not be used · 3aec9f4e
      Yu ISHIKAWA authored
      [[SPARK-9249] local variable assigned but may not be used - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9249)
      
      https://gist.github.com/yu-iskw/0e5b0253c11769457ea5
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #7640 from yu-iskw/SPARK-9249 and squashes the following commits:
      
      7a51cab [Yu ISHIKAWA] [SPARK-9249][SparkR] local variable assigned but may not be used
      3aec9f4e
    • François Garillot's avatar
      [SPARK-9250] Make change-scala-version more helpful w.r.t. valid Scala versions · 428cde5d
      François Garillot authored
      Author: François Garillot <francois@garillot.net>
      
      Closes #7595 from huitseeker/issue/SPARK-9250 and squashes the following commits:
      
      80a0218 [François Garillot] [SPARK-9250] Make change-scala-version's usage more explicit, introduce a -h|--help option.
      428cde5d
    • zhichao.li's avatar
      [SPARK-9238] [SQL] Remove two extra useless entries for bytesOfCodePointInUTF8 · 846cf462
      zhichao.li authored
      Only a trial thing, not sure if I understand correctly or not but I guess only 2 entries in `bytesOfCodePointInUTF8` for the case of 6 bytes codepoint(1111110x) is enough.
      Details can be found from https://en.wikipedia.org/wiki/UTF-8 in "Description" section.
      
      Author: zhichao.li <zhichao.li@intel.com>
      
      Closes #7582 from zhichao-li/utf8 and squashes the following commits:
      
      8bddd01 [zhichao.li] two extra entries
      846cf462
    • Davies Liu's avatar
      [SPARK-9069] [SQL] follow up · dfb18be0
      Davies Liu authored
      Address comments for #7605
      
      cc rxin
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7634 from davies/decimal_unlimited2 and squashes the following commits:
      
      b2d8b0d [Davies Liu] add doc and test for DecimalType.isWiderThan
      65b251c [Davies Liu] fix test
      6a91f32 [Davies Liu] fix style
      ca9c973 [Davies Liu] address comments
      dfb18be0
    • François Garillot's avatar
      [SPARK-9236] [CORE] Make defaultPartitioner not reuse a parent RDD's... · 6cd28cc2
      François Garillot authored
      [SPARK-9236] [CORE] Make defaultPartitioner not reuse a parent RDD's partitioner if it has 0 partitions
      
      See also comments on https://issues.apache.org/jira/browse/SPARK-9236
      
      Author: François Garillot <francois@garillot.net>
      
      Closes #7616 from huitseeker/issue/SPARK-9236 and squashes the following commits:
      
      217f902 [François Garillot] [SPARK-9236] Make defaultPartitioner not reuse a parent RDD's partitioner if it has 0 partitions
      6cd28cc2
    • Liang-Chi Hsieh's avatar
      [SPARK-8756] [SQL] Keep cached information and avoid re-calculating footers in ParquetRelation2 · 6a7e537f
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8756
      
      Currently, in ParquetRelation2, footers are re-read every time refresh() is called. But we can check if it is possibly changed before we do the reading because reading all footers will be expensive when there are too many partitions. This pr fixes this by keeping some cached information to check it.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7154 from viirya/cached_footer_parquet_relation and squashes the following commits:
      
      92e9347 [Liang-Chi Hsieh] Fix indentation.
      ae0ec64 [Liang-Chi Hsieh] Fix wrong assignment.
      c8fdfb7 [Liang-Chi Hsieh] Fix it.
      a52b6d1 [Liang-Chi Hsieh] For comments.
      c2a2420 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cached_footer_parquet_relation
      fa5458f [Liang-Chi Hsieh] Use Map to cache FileStatus and do merging previously loaded schema and newly loaded one.
      6ae0911 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cached_footer_parquet_relation
      21bbdec [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cached_footer_parquet_relation
      12a0ed9 [Liang-Chi Hsieh] Add check of FileStatus's modification time.
      186429d [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cached_footer_parquet_relation
      0ef8caf [Liang-Chi Hsieh] Keep cached information and avoid re-calculating footers.
      6a7e537f
    • Reynold Xin's avatar
      [build] Enable memory leak detection for Tungsten. · 8fe32b4f
      Reynold Xin authored
      This was turned off accidentally in #7591.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7637 from rxin/enable-mem-leak-detect and squashes the following commits:
      
      34bc3ef [Reynold Xin] Enable memory leak detection for Tungsten.
      8fe32b4f
    • Reynold Xin's avatar
      [SPARK-9200][SQL] Don't implicitly cast non-atomic types to string type. · cb8c241f
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7636 from rxin/complex-string-implicit-cast and squashes the following commits:
      
      3e67327 [Reynold Xin] [SPARK-9200][SQL] Don't implicitly cast non-atomic types to string type.
      cb8c241f
    • Wenchen Fan's avatar
      [SPARK-9294][SQL] cleanup comments, code style, naming typo for the new aggregation · 408e64b2
      Wenchen Fan authored
      fix some comments and code style for https://github.com/apache/spark/pull/7458
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7619 from cloud-fan/agg-clean and squashes the following commits:
      
      3925457 [Wenchen Fan] one more...
      cc78357 [Wenchen Fan] one more cleanup
      26f6a93 [Wenchen Fan] some minor cleanup for the new aggregation
      408e64b2
    • Ram Sriharsha's avatar
      [SPARK-8092] [ML] Allow OneVsRest Classifier feature and label column names to be configurable. · d4d762f2
      Ram Sriharsha authored
      The base classifier input and output columns are ignored in favor of  the ones specified in OneVsRest.
      
      Author: Ram Sriharsha <rsriharsha@hw11853.local>
      
      Closes #6631 from harsha2010/SPARK-8092 and squashes the following commits:
      
      6591dc6 [Ram Sriharsha] add documentation for params
      b7024b1 [Ram Sriharsha] cleanup
      f0e2bfb [Ram Sriharsha] merge with master
      108d3d7 [Ram Sriharsha] merge with master
      4f74126 [Ram Sriharsha] Allow label/ features columns to be configurable
      d4d762f2
  2. Jul 23, 2015
    • Tathagata Das's avatar
      [SPARK-9216] [STREAMING] Define KinesisBackedBlockRDDs · d249636e
      Tathagata Das authored
      For more information see master JIRA: https://issues.apache.org/jira/browse/SPARK-9215
      Design Doc: https://docs.google.com/document/d/1k0dl270EnK7uExrsCE7jYw7PYx0YC935uBcxn3p0f58/edit
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #7578 from tdas/kinesis-rdd and squashes the following commits:
      
      543d208 [Tathagata Das] Fixed scala style
      5082a30 [Tathagata Das] Fixed scala style
      3f40c2d [Tathagata Das] Addressed comments
      c4f25d2 [Tathagata Das] Addressed comment
      d3d64d1 [Tathagata Das] Minor update
      f6e35c8 [Tathagata Das] Added retry logic to make it more robust
      8874b70 [Tathagata Das] Updated Kinesis RDD
      575bdbc [Tathagata Das] Fix scala style issues
      4a36096 [Tathagata Das] Add license
      5da3995 [Tathagata Das] Changed KinesisSuiteHelper to KinesisFunSuite
      528e206 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into kinesis-rdd
      3ae0814 [Tathagata Das] Added KinesisBackedBlockRDD
      d249636e
    • Yanbo Liang's avatar
      [SPARK-9122] [MLLIB] [PySpark] spark.mllib regression support batch predict · 52de3acc
      Yanbo Liang authored
      spark.mllib support batch predict for LinearRegressionModel, RidgeRegressionModel and LassoModel.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #7614 from yanboliang/spark-9122 and squashes the following commits:
      
      4e610c0 [Yanbo Liang] spark.mllib regression support batch predict
      52de3acc
    • Davies Liu's avatar
      [SPARK-9069] [SPARK-9264] [SQL] remove unlimited precision support for DecimalType · 8a94eb23
      Davies Liu authored
      Romove Decimal.Unlimited (change to support precision up to 38, to match with Hive and other databases).
      
      In order to keep backward source compatibility, Decimal.Unlimited is still there, but change to Decimal(38, 18).
      
      If no precision and scale is provide, it's Decimal(10, 0) as before.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7605 from davies/decimal_unlimited and squashes the following commits:
      
      aa3f115 [Davies Liu] fix tests and style
      fb0d20d [Davies Liu] address comments
      bfaae35 [Davies Liu] fix style
      df93657 [Davies Liu] address comments and clean up
      06727fd [Davies Liu] Merge branch 'master' of github.com:apache/spark into decimal_unlimited
      4c28969 [Davies Liu] fix tests
      8d783cc [Davies Liu] fix tests
      788631c [Davies Liu] fix double with decimal in Union/except
      1779bde [Davies Liu] fix scala style
      c9c7c78 [Davies Liu] remove Decimal.Unlimited
      8a94eb23
    • Cheng Lian's avatar
      [SPARK-9207] [SQL] Enables Parquet filter push-down by default · bebe3f7b
      Cheng Lian authored
      PARQUET-136 and PARQUET-173 have been fixed in parquet-mr 1.7.0. It's time to enable filter push-down by default now.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #7612 from liancheng/spark-9207 and squashes the following commits:
      
      77e6b5e [Cheng Lian] Enables Parquet filter push-down by default
      bebe3f7b
    • Josh Rosen's avatar
      [SPARK-9286] [SQL] Methods in Unevaluable should be final and... · b2f3aca1
      Josh Rosen authored
      [SPARK-9286] [SQL] Methods in Unevaluable should be final and AlgebraicAggregate should extend Unevaluable.
      
      This patch marks the Unevaluable.eval() and UnevaluablegenCode() methods as final and fixes two cases where they were overridden.  It also updates AggregateFunction2 to extend Unevaluable.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7627 from JoshRosen/unevaluable-fix and squashes the following commits:
      
      8d9ed22 [Josh Rosen] AlgebraicAggregate should extend Unevaluable
      65329c2 [Josh Rosen] Do not have AggregateFunction1 inherit from AggregateExpression1
      fa68a22 [Josh Rosen] Make eval() and genCode() final
      b2f3aca1
    • David Arroyo Cazorla's avatar
      [SPARK-5447][SQL] Replace reference 'schema rdd' with DataFrame @rxin. · 662d60db
      David Arroyo Cazorla authored
      Author: David Arroyo Cazorla <darroyo@stratio.com>
      
      Closes #7618 from darroyocazorla/master and squashes the following commits:
      
      5f91379 [David Arroyo Cazorla] [SPARK-5447][SQL] Replace reference 'schema rdd' with DataFrame
      662d60db
    • Xiangrui Meng's avatar
      [SPARK-9243] [Documentation] null -> zero in crosstab doc · ecfb3127
      Xiangrui Meng authored
      We forgot to update doc. brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7608 from mengxr/SPARK-9243 and squashes the following commits:
      
      0ea3236 [Xiangrui Meng] null -> zero in crosstab doc
      ecfb3127
    • Yijie Shen's avatar
      [SPARK-9183] confusing error message when looking up missing function in Spark SQL · d2666a3c
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-9183
      
      cc rxin
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7613 from yjshen/npe_udf and squashes the following commits:
      
      44f58f2 [Yijie Shen] add jira ticket number
      903c963 [Yijie Shen] add explanation comments
      f44dd3c [Yijie Shen] Change two hive class LogLevel to avoid annoying messages
      d2666a3c
    • Cheng Hao's avatar
      [Build][Minor] Fix building error & performance · 19aeab57
      Cheng Hao authored
      1. When build the latest code with sbt, it throws exception like:
      [error] /home/hcheng/git/catalyst/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala:78: match may not be exhaustive.
      [error] It would fail on the following input: UNKNOWN
      [error]       val classNameByStatus = status match {
      [error]
      
      2. Potential performance issue when implicitly convert an Array[Any] to Seq[Any]
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7611 from chenghao-intel/toseq and squashes the following commits:
      
      cab75c5 [Cheng Hao] remove the toArray
      24df682 [Cheng Hao] fix building error & performance
      19aeab57
    • Wenchen Fan's avatar
      [SPARK-9082] [SQL] [FOLLOW-UP] use `partition` in `PushPredicateThroughProject` · 52ef76de
      Wenchen Fan authored
      a follow up of https://github.com/apache/spark/pull/7446
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7607 from cloud-fan/tmp and squashes the following commits:
      
      7106989 [Wenchen Fan] use `partition` in `PushPredicateThroughProject`
      52ef76de
    • Zhang, Liye's avatar
      [SPARK-9212] [CORE] upgrade Netty version to 4.0.29.Final · 26ed22ae
      Zhang, Liye authored
      related JIRA: [SPARK-9212](https://issues.apache.org/jira/browse/SPARK-9212) and [SPARK-8101](https://issues.apache.org/jira/browse/SPARK-8101)
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #7562 from liyezhang556520/SPARK-9212 and squashes the following commits:
      
      1917729 [Zhang, Liye] SPARK-9212 upgrade Netty version to 4.0.29.Final
      26ed22ae
    • Reynold Xin's avatar
      Revert "[SPARK-8579] [SQL] support arbitrary object in UnsafeRow" · fb36397b
      Reynold Xin authored
      Reverts ObjectPool. As it stands, it has a few problems:
      
      1. ObjectPool doesn't work with spilling and memory accounting.
      2. I don't think in the long run the idea of an object pool is what we want to support, since it essentially goes back to unmanaged memory, and creates pressure on GC, and is hard to account for the total in memory size.
      3. The ObjectPool patch removed the specialized getters for strings and binary, and as a result, actually introduced branches when reading non primitive data types.
      
      If we do want to support arbitrary user defined types in the future, I think we can just add an object array in UnsafeRow, rather than relying on indirect memory addressing through a pool. We also need to pick execution strategies that are optimized for those, rather than keeping a lot of unserialized JVM objects in memory during aggregation.
      
      This is probably the hardest thing I had to revert in Spark, due to recent patches that also change the same part of the code. Would be great to get a careful look.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7591 from rxin/revert-object-pool and squashes the following commits:
      
      01db0bc [Reynold Xin] Scala style.
      eda89fc [Reynold Xin] Fixed describe.
      2967118 [Reynold Xin] Fixed accessor for JoinedRow.
      e3294eb [Reynold Xin] Merge branch 'master' into revert-object-pool
      657855f [Reynold Xin] Temp commit.
      c20f2c8 [Reynold Xin] Style fix.
      fe37079 [Reynold Xin] Revert "[SPARK-8579] [SQL] support arbitrary object in UnsafeRow"
      fb36397b
    • Josh Rosen's avatar
      [SPARK-9266] Prevent "managed memory leak detected" exception from masking original exception · ac3ae0f2
      Josh Rosen authored
      When a task fails with an exception and also fails to properly clean up its managed memory, the `spark.unsafe.exceptionOnMemoryLeak` memory leak detection mechanism's exceptions will mask the original exception that caused the task to fail. We should throw the memory leak exception only if no other exception occurred.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7603 from JoshRosen/SPARK-9266 and squashes the following commits:
      
      c268cb5 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-9266
      c1f0167 [Josh Rosen] Fix the error masking problem
      448eae8 [Josh Rosen] Add regression test
      ac3ae0f2
    • Perinkulam I. Ganesh's avatar
      [SPARK-8695] [CORE] [MLLIB] TreeAggregation shouldn't be triggered when it... · b983d493
      Perinkulam I. Ganesh authored
      [SPARK-8695] [CORE] [MLLIB] TreeAggregation shouldn't be triggered when it doesn't save wall-clock time.
      
      Author: Perinkulam I. Ganesh <gip@us.ibm.com>
      
      Closes #7397 from piganesh/SPARK-8695 and squashes the following commits:
      
      041620c [Perinkulam I. Ganesh] [SPARK-8695][CORE][MLlib] TreeAggregation shouldn't be triggered when it doesn't save wall-clock time.
      9ad067c [Perinkulam I. Ganesh] [SPARK-8695] [core] [WIP] TreeAggregation shouldn't be triggered for 5 partitions
      a6fed07 [Perinkulam I. Ganesh] [SPARK-8695] [core] [WIP] TreeAggregation shouldn't be triggered for 5 partitions
      b983d493
    • Yijie Shen's avatar
      [SPARK-8935] [SQL] Implement code generation for all casts · 6d0d8b40
      Yijie Shen authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8935
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #7365 from yjshen/cast_codegen and squashes the following commits:
      
      ef6e8b5 [Yijie Shen] getColumn and setColumn in struct cast, autounboxing in array and map
      eaece18 [Yijie Shen] remove null case in cast code gen
      fd7eba4 [Yijie Shen] resolve comments
      80378a5 [Yijie Shen] the missing self cast
      611d66e [Yijie Shen] Bug fix: NullType & primitive object unboxing
      6d5c0fe [Yijie Shen] rebase and add Interval codegen
      9424b65 [Yijie Shen] tiny style fix
      4a1c801 [Yijie Shen] remove CodeHolder class, use function instead.
      3f5df88 [Yijie Shen] CodeHolder for complex dataTypes
      c286f13 [Yijie Shen] moved all the cast code into class body
      4edfd76 [Yijie Shen] [WIP] finished primitive part
      6d0d8b40
    • Liang-Chi Hsieh's avatar
      [SPARK-7254] [MLLIB] Run PowerIterationClustering directly on graph · 825ab1e4
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7254
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6054 from viirya/pic_on_graph and squashes the following commits:
      
      8b87b81 [Liang-Chi Hsieh] Fix scala style.
      a22fb8b [Liang-Chi Hsieh] For comment.
      ef565a0 [Liang-Chi Hsieh] Fix indentation.
      d249aa1 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into pic_on_graph
      82d7351 [Liang-Chi Hsieh] Run PowerIterationClustering directly on graph.
      825ab1e4
    • Joseph K. Bradley's avatar
      [SPARK-9268] [ML] Removed varargs annotation from Params.setDefault taking multiple params · 410dd41c
      Joseph K. Bradley authored
      Removed varargs annotation from Params.setDefault taking multiple params.
      
      Though varargs is technically correct, it often requires that developers do clean assembly, rather than (not clean) assembly, which is a nuisance during development.
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #7604 from jkbradley/params-setdefault-varargs and squashes the following commits:
      
      6016dc6 [Joseph K. Bradley] removed varargs annotation from Params.setDefault taking multiple params
      410dd41c
  3. Jul 22, 2015
    • Xiangrui Meng's avatar
      [SPARK-8364] [SPARKR] Add crosstab to SparkR DataFrames · 2f5cbd86
      Xiangrui Meng authored
      Add `crosstab` to SparkR DataFrames, which takes two column names and returns a local R data.frame. This is similar to `table` in R. However, `table` in SparkR is used for loading SQL tables as DataFrames. The return type is data.frame instead table for `crosstab` to be compatible with Scala/Python.
      
      I couldn't run R tests successfully on my local. Many unit tests failed. So let's try Jenkins.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7318 from mengxr/SPARK-8364 and squashes the following commits:
      
      d75e894 [Xiangrui Meng] fix tests
      53f6ddd [Xiangrui Meng] fix tests
      f1348d6 [Xiangrui Meng] update test
      47cb088 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8364
      5621262 [Xiangrui Meng] first version without test
      2f5cbd86
    • Josh Rosen's avatar
      [SPARK-9144] Remove DAGScheduler.runLocallyWithinThread and spark.localExecution.enabled · b217230f
      Josh Rosen authored
      Spark has an option called spark.localExecution.enabled; according to the docs:
      
      > Enables Spark to run certain jobs, such as first() or take() on the driver, without sending tasks to the cluster. This can make certain jobs execute very quickly, but may require shipping a whole partition of data to the driver.
      
      This feature ends up adding quite a bit of complexity to DAGScheduler, especially in the runLocallyWithinThread method, but as far as I know nobody uses this feature (I searched the mailing list and haven't seen any recent mentions of the configuration nor stacktraces including the runLocally method). As a step towards scheduler complexity reduction, I propose that we remove this feature and all code related to it for Spark 1.5.
      
      This pull request simply brings #7484 up to date.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7585 from rxin/remove-local-exec and squashes the following commits:
      
      84bd10e [Reynold Xin] Python fix.
      1d9739a [Reynold Xin] Merge pull request #7484 from JoshRosen/remove-localexecution
      eec39fa [Josh Rosen] Remove allowLocal(); deprecate user-facing uses of it.
      b0835dc [Josh Rosen] Remove local execution code in DAGScheduler
      8975d96 [Josh Rosen] Remove local execution tests.
      ffa8c9b [Josh Rosen] Remove documentation for configuration
      b217230f
    • Reynold Xin's avatar
      [SPARK-9262][build] Treat Scala compiler warnings as errors · d71a13f4
      Reynold Xin authored
      I've seen a few cases in the past few weeks that the compiler is throwing warnings that are caused by legitimate bugs. This patch upgrades warnings to errors, except deprecation warnings.
      
      Note that ideally we should be able to mark deprecation warnings as errors as well. However, due to the lack of ability to suppress individual warning messages in the Scala compiler, we cannot do that (since we do need to access deprecated APIs in Hadoop).
      
      Most of the work are done by ericl.
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #7598 from rxin/warnings and squashes the following commits:
      
      beb311b [Reynold Xin] Fixed tests.
      542c031 [Reynold Xin] Fixed one more warning.
      87c354a [Reynold Xin] Fixed all non-deprecation warnings.
      78660ac [Eric Liang] first effort to fix warnings
      d71a13f4
    • martinzapletal's avatar
      [SPARK-8484] [ML] Added TrainValidationSplit for hyper-parameter tuning. · a721ee52
      martinzapletal authored
      - [X] Added TrainValidationSplit for hyper-parameter tuning. It randomly splits the input dataset into train and validation and use evaluation metric on the validation set to select the best model. It should be similar to CrossValidator, but simpler and less expensive.
      - [X] Simplified replacement of https://github.com/apache/spark/pull/6996
      
      Author: martinzapletal <zapletal-martin@email.cz>
      
      Closes #7337 from zapletal-martin/SPARK-8484-TrainValidationSplit and squashes the following commits:
      
      cafc949 [martinzapletal] Review comments https://github.com/apache/spark/pull/7337.
      511b398 [martinzapletal] Merge remote-tracking branch 'upstream/master' into SPARK-8484-TrainValidationSplit
      f4fc9c4 [martinzapletal] SPARK-8484 Resolved feedback to https://github.com/apache/spark/pull/7337
      00c4f5a [martinzapletal] SPARK-8484. Styling.
      d699506 [martinzapletal] SPARK-8484. Styling.
      93ed2ee [martinzapletal] Styling.
      3bc1853 [martinzapletal] SPARK-8484. Styling.
      2aa6f43 [martinzapletal] SPARK-8484. Added TrainValidationSplit for hyper-parameter tuning. It randomly splits the input dataset into train and validation and use evaluation metric on the validation set to select the best model.
      21662eb [martinzapletal] SPARK-8484. Added TrainValidationSplit for hyper-parameter tuning. It randomly splits the input dataset into train and validation and use evaluation metric on the validation set to select the best model.
      a721ee52
    • MechCoder's avatar
      [SPARK-9223] [PYSPARK] [MLLIB] Support model save/load in LDA · 5307c9d3
      MechCoder authored
      Since save / load has been merged in LDA, it takes no time to write the wrappers in Python as well.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7587 from MechCoder/python_lda_save_load and squashes the following commits:
      
      c8e4ea7 [MechCoder] [SPARK-9223] [PySpark] Support model save/load in LDA
      5307c9d3
    • Kenichi Maehashi's avatar
      [SPARK-9180] fix spark-shell to accept --name option · 430cd781
      Kenichi Maehashi authored
      This patch fixes [[SPARK-9180]](https://issues.apache.org/jira/browse/SPARK-9180).
      Users can now set the app name of spark-shell using `spark-shell --name "whatever"`.
      
      Author: Kenichi Maehashi <webmaster@kenichimaehashi.com>
      
      Closes #7512 from kmaehashi/fix-spark-shell-app-name and squashes the following commits:
      
      e24991a [Kenichi Maehashi] use setIfMissing instead of setAppName
      18aa4ad [Kenichi Maehashi] fix spark-shell to accept --name option
      430cd781
Loading