- Jul 25, 2015
-
-
Sean Owen authored
Add back change-version-to-X.sh scripts, as wrappers for new script, for backwards compatibility Author: Sean Owen <sowen@cloudera.com> Closes #7639 from srowen/SPARK-9304 and squashes the following commits: 9ab2681 [Sean Owen] Add deprecation message to wrappers 3c8c202 [Sean Owen] Add back change-version-to-X.sh scripts, as wrappers for new script, for backwards compatibility
-
Reynold Xin authored
The two are redundant. Once this patch is merged, I plan to remove the inbound conversions from unsafe aggregates. Author: Reynold Xin <rxin@databricks.com> Closes #7658 from rxin/unsafeconverters and squashes the following commits: ed19e6c [Reynold Xin] Updated support types. 2a56d7e [Reynold Xin] [SPARK-9334][SQL] Remove UnsafeRowConverter in favor of UnsafeProjection.
-
Reynold Xin authored
They were added to improve performance (so JIT can inline the JoinedRow calls). However, we can also just improve it by projecting output out to UnsafeRow in Tungsten variant of the operators. Author: Reynold Xin <rxin@databricks.com> Closes #7659 from rxin/remove-joinedrows and squashes the following commits: 7510447 [Reynold Xin] [SPARK-9336][SQL] Remove extra JoinedRows
-
JD authored
Author: JD <jd@csh.rit.edu> Author: Joseph Batchik <josephbatchik@gmail.com> Closes #7606 from JDrit/expr and squashes the following commits: ad7f607 [Joseph Batchik] fixing python linter error 9d6daea [Joseph Batchik] removed order by per @rxin's comment 707d5c6 [Joseph Batchik] Added expr to fuctions.py 79df83c [JD] added example to the docs b89eec8 [JD] moved function up as per @rxin's comment 4960909 [JD] updated per @JoshRosen's comment 2cb329c [JD] updated per @rxin's comment 9a9ad0c [JD] removing unused import 6dc26d0 [JD] removed split 7f2222c [JD] Adding expr function as per SPARK-8668
-
Patrick Wendell authored
-
- Jul 24, 2015
-
-
Reynold Xin authored
The generated expression code can be hard to read since they are not indented well. This patch adds a code formatter that formats code automatically when we output them to the screen. Author: Reynold Xin <rxin@databricks.com> Closes #7656 from rxin/codeformatter and squashes the following commits: 5ba0e90 [Reynold Xin] [SPARK-9331][SQL] Add a code formatter to auto-format generated code.
-
Reynold Xin authored
Also took the chance to rearrange some of the methods in UnsafeRow to group static/private/public things together. Author: Reynold Xin <rxin@databricks.com> Closes #7654 from rxin/getStruct and squashes the following commits: b491a09 [Reynold Xin] Fixed typo. 48d77e5 [Reynold Xin] [SPARK-9330][SQL] Create specialized getStruct getter in InternalRow.
-
MechCoder authored
Word2Vec used to convert from an Array[Float] representation to a Map[String, Array[Float]] and then back to an Array[Float] through Word2VecModel. This prevents this conversion while still supporting the older method of supplying a Map. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #5748 from MechCoder/spark-7045 and squashes the following commits: e308913 [MechCoder] move docs 5703116 [MechCoder] minor fa04313 [MechCoder] style fixes b1d61c4 [MechCoder] better errors and tests 3b32c8c [MechCoder] [SPARK-7045] Avoid intermediate representation when creating model
-
Liang-Chi Hsieh authored
JIRA: https://issues.apache.org/jira/browse/SPARK-9067 According to the description of the JIRA ticket, calling `reader.close()` only after the task is finished will cause memory and file open limit problem since these resources are occupied even we don't need that anymore. This PR simply closes the reader early when we know there is no more data to read. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #7424 from viirya/close_reader and squashes the following commits: 3ff64e5 [Liang-Chi Hsieh] For comments. 3d20267 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into close_reader e152182 [Liang-Chi Hsieh] For comments. 5116cbe [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into close_reader 3ceb755 [Liang-Chi Hsieh] For comments. e34d98e [Liang-Chi Hsieh] For comments. 50ed729 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into close_reader 216912f [Liang-Chi Hsieh] Fix it. f429016 [Liang-Chi Hsieh] Release reader if we don't need it. a305621 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into close_reader 67569da [Liang-Chi Hsieh] Close reader early if there is no more data.
-
Cheolsoo Park authored
This is continuation of #7512 which added `--name` option to spark-shell. This PR adds the same option to pyspark. Note that `--conf spark.app.name` in command-line has no effect in spark-shell and pyspark. Instead, `--name` must be used. This is in fact inconsistency with spark-sql which doesn't accept `--name` option while it accepts `--conf spark.app.name`. I am not fixing this inconsistency in this PR. IMO, one of `--name` and `--conf spark.app.name` is needed not both. But since I cannot decide which to choose, I am not making any change here. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #7610 from piaozhexiu/SPARK-9270 and squashes the following commits: 763e86d [Cheolsoo Park] Update windows script 400b7f9 [Cheolsoo Park] Allow --name option to pyspark
-
Marcelo Vanzin authored
Doing this may cause weird errors when tests are run on maven, depending on the flags used. Instead, expose the needed functionality through methods that do not expose shaded classes. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7601 from vanzin/SPARK-9261 and squashes the following commits: 4f64a16 [Marcelo Vanzin] [SPARK-9261] [streaming] Avoid calling APIs that expose shaded classes.
-
Josh Rosen authored
This patch extends CheckAnalysis to throw errors for queries that try to sort on unsupported column types, such as ArrayType. Author: Josh Rosen <joshrosen@databricks.com> Closes #7633 from JoshRosen/SPARK-9295 and squashes the following commits: 23b2fbf [Josh Rosen] Embed function in foreach bfe1451 [Josh Rosen] Update to allow sorting by null literals 2f1b802 [Josh Rosen] Add analysis rule to detect sorting on unsupported column types (SPARK-9295)
-
MechCoder authored
This makes it easier to test all the class variables of the DistributedLDAmodel. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #7573 from MechCoder/lda_test and squashes the following commits: 2f1a293 [MechCoder] [SPARK-9222] [MLlib] Make class instantiation variables in DistributedLDAModel private[clustering]
-
Josh Rosen authored
This patch adds an analysis check to ensure that join conditions' data types are BooleanType. This check is necessary in order to report proper errors for non-boolean DataFrame join conditions. Author: Josh Rosen <joshrosen@databricks.com> Closes #7630 from JoshRosen/SPARK-9292 and squashes the following commits: aec6c7b [Josh Rosen] Check condition type in resolved() 75a3ea6 [Josh Rosen] Fix SPARK-9292.
-
Reynold Xin authored
It's a thing used in test cases, but named Row. Pretty annoying because everytime I search for Row, it shows up before the Spark SQL Row, which is what a developer wants most of the time. Author: Reynold Xin <rxin@databricks.com> Closes #7638 from rxin/remove-row and squashes the following commits: aeda52d [Reynold Xin] [SPARK-9305] Rename org.apache.spark.Row to Item.
-
Reynold Xin authored
I also changed InternalRow's size/length function to numFields, to make it more obvious that it is not about bytes, but the number of fields. Author: Reynold Xin <rxin@databricks.com> Closes #7626 from rxin/internalRow and squashes the following commits: e124daf [Reynold Xin] Fixed test case. 805ceb7 [Reynold Xin] Commented out the failed test suite. f8a9ca5 [Reynold Xin] Fixed more bugs. Still at least one more remaining. 76d9081 [Reynold Xin] Fixed data sources. 7807f70 [Reynold Xin] Fixed DataFrameSuite. cb60cd2 [Reynold Xin] Code review & small bug fixes. 0a2948b [Reynold Xin] Fixed style. 3280d03 [Reynold Xin] [SPARK-9285][SQL] Remove InternalRow's inheritance from Row.
-
Yu ISHIKAWA authored
[[SPARK-9249] local variable assigned but may not be used - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9249) https://gist.github.com/yu-iskw/0e5b0253c11769457ea5 Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #7640 from yu-iskw/SPARK-9249 and squashes the following commits: 7a51cab [Yu ISHIKAWA] [SPARK-9249][SparkR] local variable assigned but may not be used
-
François Garillot authored
Author: François Garillot <francois@garillot.net> Closes #7595 from huitseeker/issue/SPARK-9250 and squashes the following commits: 80a0218 [François Garillot] [SPARK-9250] Make change-scala-version's usage more explicit, introduce a -h|--help option.
-
zhichao.li authored
Only a trial thing, not sure if I understand correctly or not but I guess only 2 entries in `bytesOfCodePointInUTF8` for the case of 6 bytes codepoint(1111110x) is enough. Details can be found from https://en.wikipedia.org/wiki/UTF-8 in "Description" section. Author: zhichao.li <zhichao.li@intel.com> Closes #7582 from zhichao-li/utf8 and squashes the following commits: 8bddd01 [zhichao.li] two extra entries
-
Davies Liu authored
Address comments for #7605 cc rxin Author: Davies Liu <davies@databricks.com> Closes #7634 from davies/decimal_unlimited2 and squashes the following commits: b2d8b0d [Davies Liu] add doc and test for DecimalType.isWiderThan 65b251c [Davies Liu] fix test 6a91f32 [Davies Liu] fix style ca9c973 [Davies Liu] address comments
-
François Garillot authored
[SPARK-9236] [CORE] Make defaultPartitioner not reuse a parent RDD's partitioner if it has 0 partitions See also comments on https://issues.apache.org/jira/browse/SPARK-9236 Author: François Garillot <francois@garillot.net> Closes #7616 from huitseeker/issue/SPARK-9236 and squashes the following commits: 217f902 [François Garillot] [SPARK-9236] Make defaultPartitioner not reuse a parent RDD's partitioner if it has 0 partitions
-
Liang-Chi Hsieh authored
JIRA: https://issues.apache.org/jira/browse/SPARK-8756 Currently, in ParquetRelation2, footers are re-read every time refresh() is called. But we can check if it is possibly changed before we do the reading because reading all footers will be expensive when there are too many partitions. This pr fixes this by keeping some cached information to check it. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #7154 from viirya/cached_footer_parquet_relation and squashes the following commits: 92e9347 [Liang-Chi Hsieh] Fix indentation. ae0ec64 [Liang-Chi Hsieh] Fix wrong assignment. c8fdfb7 [Liang-Chi Hsieh] Fix it. a52b6d1 [Liang-Chi Hsieh] For comments. c2a2420 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cached_footer_parquet_relation fa5458f [Liang-Chi Hsieh] Use Map to cache FileStatus and do merging previously loaded schema and newly loaded one. 6ae0911 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cached_footer_parquet_relation 21bbdec [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cached_footer_parquet_relation 12a0ed9 [Liang-Chi Hsieh] Add check of FileStatus's modification time. 186429d [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into cached_footer_parquet_relation 0ef8caf [Liang-Chi Hsieh] Keep cached information and avoid re-calculating footers.
-
Reynold Xin authored
This was turned off accidentally in #7591. Author: Reynold Xin <rxin@databricks.com> Closes #7637 from rxin/enable-mem-leak-detect and squashes the following commits: 34bc3ef [Reynold Xin] Enable memory leak detection for Tungsten.
-
Reynold Xin authored
Author: Reynold Xin <rxin@databricks.com> Closes #7636 from rxin/complex-string-implicit-cast and squashes the following commits: 3e67327 [Reynold Xin] [SPARK-9200][SQL] Don't implicitly cast non-atomic types to string type.
-
Wenchen Fan authored
fix some comments and code style for https://github.com/apache/spark/pull/7458 Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7619 from cloud-fan/agg-clean and squashes the following commits: 3925457 [Wenchen Fan] one more... cc78357 [Wenchen Fan] one more cleanup 26f6a93 [Wenchen Fan] some minor cleanup for the new aggregation
-
Ram Sriharsha authored
The base classifier input and output columns are ignored in favor of the ones specified in OneVsRest. Author: Ram Sriharsha <rsriharsha@hw11853.local> Closes #6631 from harsha2010/SPARK-8092 and squashes the following commits: 6591dc6 [Ram Sriharsha] add documentation for params b7024b1 [Ram Sriharsha] cleanup f0e2bfb [Ram Sriharsha] merge with master 108d3d7 [Ram Sriharsha] merge with master 4f74126 [Ram Sriharsha] Allow label/ features columns to be configurable
-
- Jul 23, 2015
-
-
Tathagata Das authored
For more information see master JIRA: https://issues.apache.org/jira/browse/SPARK-9215 Design Doc: https://docs.google.com/document/d/1k0dl270EnK7uExrsCE7jYw7PYx0YC935uBcxn3p0f58/edit Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #7578 from tdas/kinesis-rdd and squashes the following commits: 543d208 [Tathagata Das] Fixed scala style 5082a30 [Tathagata Das] Fixed scala style 3f40c2d [Tathagata Das] Addressed comments c4f25d2 [Tathagata Das] Addressed comment d3d64d1 [Tathagata Das] Minor update f6e35c8 [Tathagata Das] Added retry logic to make it more robust 8874b70 [Tathagata Das] Updated Kinesis RDD 575bdbc [Tathagata Das] Fix scala style issues 4a36096 [Tathagata Das] Add license 5da3995 [Tathagata Das] Changed KinesisSuiteHelper to KinesisFunSuite 528e206 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into kinesis-rdd 3ae0814 [Tathagata Das] Added KinesisBackedBlockRDD
-
Yanbo Liang authored
spark.mllib support batch predict for LinearRegressionModel, RidgeRegressionModel and LassoModel. Author: Yanbo Liang <ybliang8@gmail.com> Closes #7614 from yanboliang/spark-9122 and squashes the following commits: 4e610c0 [Yanbo Liang] spark.mllib regression support batch predict
-
Davies Liu authored
Romove Decimal.Unlimited (change to support precision up to 38, to match with Hive and other databases). In order to keep backward source compatibility, Decimal.Unlimited is still there, but change to Decimal(38, 18). If no precision and scale is provide, it's Decimal(10, 0) as before. Author: Davies Liu <davies@databricks.com> Closes #7605 from davies/decimal_unlimited and squashes the following commits: aa3f115 [Davies Liu] fix tests and style fb0d20d [Davies Liu] address comments bfaae35 [Davies Liu] fix style df93657 [Davies Liu] address comments and clean up 06727fd [Davies Liu] Merge branch 'master' of github.com:apache/spark into decimal_unlimited 4c28969 [Davies Liu] fix tests 8d783cc [Davies Liu] fix tests 788631c [Davies Liu] fix double with decimal in Union/except 1779bde [Davies Liu] fix scala style c9c7c78 [Davies Liu] remove Decimal.Unlimited
-
Cheng Lian authored
PARQUET-136 and PARQUET-173 have been fixed in parquet-mr 1.7.0. It's time to enable filter push-down by default now. Author: Cheng Lian <lian@databricks.com> Closes #7612 from liancheng/spark-9207 and squashes the following commits: 77e6b5e [Cheng Lian] Enables Parquet filter push-down by default
-
Josh Rosen authored
[SPARK-9286] [SQL] Methods in Unevaluable should be final and AlgebraicAggregate should extend Unevaluable. This patch marks the Unevaluable.eval() and UnevaluablegenCode() methods as final and fixes two cases where they were overridden. It also updates AggregateFunction2 to extend Unevaluable. Author: Josh Rosen <joshrosen@databricks.com> Closes #7627 from JoshRosen/unevaluable-fix and squashes the following commits: 8d9ed22 [Josh Rosen] AlgebraicAggregate should extend Unevaluable 65329c2 [Josh Rosen] Do not have AggregateFunction1 inherit from AggregateExpression1 fa68a22 [Josh Rosen] Make eval() and genCode() final
-
David Arroyo Cazorla authored
Author: David Arroyo Cazorla <darroyo@stratio.com> Closes #7618 from darroyocazorla/master and squashes the following commits: 5f91379 [David Arroyo Cazorla] [SPARK-5447][SQL] Replace reference 'schema rdd' with DataFrame
-
Xiangrui Meng authored
We forgot to update doc. brkyvz Author: Xiangrui Meng <meng@databricks.com> Closes #7608 from mengxr/SPARK-9243 and squashes the following commits: 0ea3236 [Xiangrui Meng] null -> zero in crosstab doc
-
Yijie Shen authored
JIRA: https://issues.apache.org/jira/browse/SPARK-9183 cc rxin Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #7613 from yjshen/npe_udf and squashes the following commits: 44f58f2 [Yijie Shen] add jira ticket number 903c963 [Yijie Shen] add explanation comments f44dd3c [Yijie Shen] Change two hive class LogLevel to avoid annoying messages
-
Cheng Hao authored
1. When build the latest code with sbt, it throws exception like: [error] /home/hcheng/git/catalyst/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala:78: match may not be exhaustive. [error] It would fail on the following input: UNKNOWN [error] val classNameByStatus = status match { [error] 2. Potential performance issue when implicitly convert an Array[Any] to Seq[Any] Author: Cheng Hao <hao.cheng@intel.com> Closes #7611 from chenghao-intel/toseq and squashes the following commits: cab75c5 [Cheng Hao] remove the toArray 24df682 [Cheng Hao] fix building error & performance
-
Wenchen Fan authored
a follow up of https://github.com/apache/spark/pull/7446 Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7607 from cloud-fan/tmp and squashes the following commits: 7106989 [Wenchen Fan] use `partition` in `PushPredicateThroughProject`
-
Zhang, Liye authored
related JIRA: [SPARK-9212](https://issues.apache.org/jira/browse/SPARK-9212) and [SPARK-8101](https://issues.apache.org/jira/browse/SPARK-8101) Author: Zhang, Liye <liye.zhang@intel.com> Closes #7562 from liyezhang556520/SPARK-9212 and squashes the following commits: 1917729 [Zhang, Liye] SPARK-9212 upgrade Netty version to 4.0.29.Final
-
Reynold Xin authored
Reverts ObjectPool. As it stands, it has a few problems: 1. ObjectPool doesn't work with spilling and memory accounting. 2. I don't think in the long run the idea of an object pool is what we want to support, since it essentially goes back to unmanaged memory, and creates pressure on GC, and is hard to account for the total in memory size. 3. The ObjectPool patch removed the specialized getters for strings and binary, and as a result, actually introduced branches when reading non primitive data types. If we do want to support arbitrary user defined types in the future, I think we can just add an object array in UnsafeRow, rather than relying on indirect memory addressing through a pool. We also need to pick execution strategies that are optimized for those, rather than keeping a lot of unserialized JVM objects in memory during aggregation. This is probably the hardest thing I had to revert in Spark, due to recent patches that also change the same part of the code. Would be great to get a careful look. Author: Reynold Xin <rxin@databricks.com> Closes #7591 from rxin/revert-object-pool and squashes the following commits: 01db0bc [Reynold Xin] Scala style. eda89fc [Reynold Xin] Fixed describe. 2967118 [Reynold Xin] Fixed accessor for JoinedRow. e3294eb [Reynold Xin] Merge branch 'master' into revert-object-pool 657855f [Reynold Xin] Temp commit. c20f2c8 [Reynold Xin] Style fix. fe37079 [Reynold Xin] Revert "[SPARK-8579] [SQL] support arbitrary object in UnsafeRow"
-
Josh Rosen authored
When a task fails with an exception and also fails to properly clean up its managed memory, the `spark.unsafe.exceptionOnMemoryLeak` memory leak detection mechanism's exceptions will mask the original exception that caused the task to fail. We should throw the memory leak exception only if no other exception occurred. Author: Josh Rosen <joshrosen@databricks.com> Closes #7603 from JoshRosen/SPARK-9266 and squashes the following commits: c268cb5 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-9266 c1f0167 [Josh Rosen] Fix the error masking problem 448eae8 [Josh Rosen] Add regression test
-
Perinkulam I. Ganesh authored
[SPARK-8695] [CORE] [MLLIB] TreeAggregation shouldn't be triggered when it doesn't save wall-clock time. Author: Perinkulam I. Ganesh <gip@us.ibm.com> Closes #7397 from piganesh/SPARK-8695 and squashes the following commits: 041620c [Perinkulam I. Ganesh] [SPARK-8695][CORE][MLlib] TreeAggregation shouldn't be triggered when it doesn't save wall-clock time. 9ad067c [Perinkulam I. Ganesh] [SPARK-8695] [core] [WIP] TreeAggregation shouldn't be triggered for 5 partitions a6fed07 [Perinkulam I. Ganesh] [SPARK-8695] [core] [WIP] TreeAggregation shouldn't be triggered for 5 partitions
-