- Dec 19, 2014
-
-
Eran Medan authored
the signature of registerKryoClasses is actually of Array[Class[_]] not Seq Author: Eran Medan <ehrann.mehdan@gmail.com> Closes #3747 from eranation/patch-1 and squashes the following commits: ee9885d [Eran Medan] change signature of example to match released code
-
Marcelo Vanzin authored
Currently the event logger uses a directory and several files to describe an app's event log, all but one of which are empty. This is not very HDFS-friendly, since creating lots of nodes in HDFS (especially when they don't contain any data) is frowned upon due to the node metadata being kept in the NameNode's memory. Instead, add a header section to the event log file that contains metadata needed to read the events. This metadata includes things like the Spark version (for future code that may need it for backwards compatibility) and the compression codec used for the event data. With the new approach, aside from reducing the load on the NN, there's also a lot less remote calls needed when reading the log directory. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #1222 from vanzin/hist-server-single-log and squashes the following commits: cc8f5de [Marcelo Vanzin] Store header in plain text. c7e6123 [Marcelo Vanzin] Update comment. 59c561c [Marcelo Vanzin] Review feedback. 216c5a3 [Marcelo Vanzin] Review comments. dce28e9 [Marcelo Vanzin] Fix log overwrite test. f91c13e [Marcelo Vanzin] Handle "spark.eventLog.overwrite", and add unit test. 346f0b4 [Marcelo Vanzin] Review feedback. ed0023e [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 3f4500f [Marcelo Vanzin] Unit test for SPARK-3697. 45c7a1f [Marcelo Vanzin] Version of SPARK-3697 for this branch. b3ee30b [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log a6d5c50 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 16fd491 [Marcelo Vanzin] Use unique log directory for each codec. 0ef3f70 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log d93c44a [Marcelo Vanzin] Add a newline to make the header more readable. 9e928ba [Marcelo Vanzin] Add types. bd6ba8c [Marcelo Vanzin] Review feedback. a624a89 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 04364dc [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log bb7c2d3 [Marcelo Vanzin] Fix scalastyle warning. 16661a3 [Marcelo Vanzin] Simplify some internal code. cc6bce4 [Marcelo Vanzin] Some review feedback. a722184 [Marcelo Vanzin] Do not encode metadata in log file name. 3700586 [Marcelo Vanzin] Restore log flushing. f677930 [Marcelo Vanzin] Fix botched rebase. ae571fa [Marcelo Vanzin] Fix end-to-end event logger test. 9db0efd [Marcelo Vanzin] Show prettier name in UI. 8f42274 [Marcelo Vanzin] Make history server parse old-style log directories. 6251dd7 [Marcelo Vanzin] Make event logger use a single file.
-
Josh Rosen authored
This patch upgrades `spark-ec2`'s Boto version to 2.34.0, since this is blocking several features. Newer versions of Boto don't work properly when they're loaded from a zipfile since they try to read a JSON file from a path relative to the Boto library sources. Therefore, this patch also changes spark-ec2 to automatically download Boto from PyPi if it's not present in `SPARK_EC2_DIR/lib`, similar to what we do in the `sbt/sbt` script. This shouldn't ben an issue for users since they already need to have an internet connection to launch an EC2 cluster. By performing the downloading in spark_ec2.py instead of the Bash script, this should also work for Windows users. I've tested this with Python 2.6, too. Author: Josh Rosen <joshrosen@databricks.com> Closes #3737 from JoshRosen/update-boto and squashes the following commits: 0aa43cc [Josh Rosen] Remove unused setup_standalone_cluster() method. f02935d [Josh Rosen] Enable Python deprecation warnings and fix one Boto warning: 587ae89 [Josh Rosen] [SPARK-4890] Upgrade Boto to 2.34.0; automatically download Boto from PyPi instead of packaging it
-
Ryan Williams authored
Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #2848 from ryan-williams/fetch-file and squashes the following commits: c14daff [Ryan Williams] Fix copy that was changed to a move inadvertently 8e39c16 [Ryan Williams] code review feedback 788ed41 [Ryan Williams] don’t redundantly overwrite executor JAR deps
-
Ryan Williams authored
Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3736 from ryan-williams/hist and squashes the following commits: 421d8ff [Ryan Williams] add another random typo fix 76d6a4c [Ryan Williams] remove hdfs example a2d0f82 [Ryan Williams] code review feedback 9ca7629 [Ryan Williams] [SPARK-4889] update history server example cmds
-
Reynold Xin authored
This consolidates some code path and makes constructor arguments simpler for a few classes. Author: Reynold Xin <rxin@databricks.com> Closes #3738 from rxin/sparkEnvDepRefactor and squashes the following commits: 82e02cc [Reynold Xin] Fixed couple bugs. 217062a [Reynold Xin] Code review feedback. bd00af7 [Reynold Xin] Small refactoring to pass SparkEnv into Executor rather than creating SparkEnv in Executor.
-
scwf authored
Author: scwf <wangfei1@huawei.com> Closes #3743 from scwf/abc and squashes the following commits: 7d98bc8 [scwf] removing spark-staging-1038
-
Cheng Hao authored
HiveInspectors.scala failed in compiling with Hadoop 1, as the BytesWritable.copyBytes is not available in Hadoop 1. Author: Cheng Hao <hao.cheng@intel.com> Closes #3742 from chenghao-intel/settable_oi_hotfix and squashes the following commits: bb04d1f [Cheng Hao] hot fix for ByteWritables.copyBytes
-
Sandy Ryza authored
Author: Sandy Ryza <sandy@cloudera.com> Closes #3684 from sryza/sandy-spark-3428 and squashes the following commits: cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing GC time metrics
-
- Dec 18, 2014
-
-
Liang-Chi Hsieh authored
The current version of `getCallSite` visits the collection of `StackTraceElement` twice. However, it is unnecessary since we can perform our work with a single visit. We also do not need to keep filtered `StackTraceElement`. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3532 from viirya/refactor_getCallSite and squashes the following commits: 62aa124 [Liang-Chi Hsieh] Fix style. e741017 [Liang-Chi Hsieh] Refactor getCallSite.
-
RJ Nowling authored
...ta generators This patch adds: * Exponential, gamma, and log normal generators that wrap Apache Commons math3 to the private API * Functions for generating exponential, gamma, and log normal RDDs and vector RDDs * Tests for the above Author: RJ Nowling <rnowling@gmail.com> Closes #3680 from rnowling/spark4728 and squashes the following commits: 455f50a [RJ Nowling] Add tests for exponential, gamma, and log normal samplers to JavaRandomRDDsSuite 3e1134a [RJ Nowling] Fix val/var, unncessary creation of Distribution objects when setting seeds, and import line longer than line wrap limits 58f5b97 [RJ Nowling] Fix bounds in tests so they scale with variance, not stdev 84fd98d [RJ Nowling] Add more values for testing distributions. 9f96232 [RJ Nowling] [SPARK-4728] Add exponential, gamma, and log normal sampling to MLlib data generators
-
wangfei authored
Remove ```Command``` and use ```RunnableCommand``` instead. Author: wangfei <wangfei1@huawei.com> Author: scwf <wangfei1@huawei.com> Closes #3712 from scwf/cmd and squashes the following commits: 51a82f2 [wangfei] fix test failure 0e03be8 [wangfei] address comments 4033bed [scwf] remove CreateTableAsSelect in hivestrategy 5d20010 [wangfei] address comments 125f542 [scwf] factory command in spark sql
-
Cheng Hao authored
Hive UDAF may create an customized object constructed by SettableStructObjectInspector, this is critical when integrate Hive UDAF with the refactor-ed UDAF interface. Performance issue in `wrap/unwrap` since more match cases added, will do it in another PR. Author: Cheng Hao <hao.cheng@intel.com> Closes #3429 from chenghao-intel/settable_oi and squashes the following commits: 9f0aff3 [Cheng Hao] update code style issues as feedbacks 2b0561d [Cheng Hao] Add more scala doc f5a40e8 [Cheng Hao] add scala doc 2977e9b [Cheng Hao] remove the timezone setting for test suite 3ed284c [Cheng Hao] fix the date type comparison f1b6749 [Cheng Hao] Update the comment 932940d [Cheng Hao] Add more unit test 72e4332 [Cheng Hao] Add settable StructObjectInspector support
-
ravipesala authored
Adding support to the partial aggregation of SumDistinct Author: ravipesala <ravindra.pesala@huawei.com> Closes #3348 from ravipesala/SPARK-2554 and squashes the following commits: fd28e4d [ravipesala] Fixed review comments e60e67f [ravipesala] Fixed test cases and made it as nullable 32fe234 [ravipesala] Supporting SumDistinct partial aggregation Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
-
YanTangZhai authored
[SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references The sql "select * from spark_test::for_test where abs(20141202) is not null" has predicates=List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)) and partitionKeyIds=AttributeSet(). PruningPredicates is List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)). Then the exception "java.lang.IllegalArgumentException: requirement failed: Partition pruning predicates only supported for partitioned tables." is thrown. The sql "select * from spark_test::for_test_partitioned_table where abs(20141202) is not null and type_id=11 and platform = 3" with partitioned key insert_date has predicates=List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202), (type_id#12 = 11), (platform#8 = 3)) and partitionKeyIds=AttributeSet(insert_date#24). PruningPredicates is List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)). Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Closes #3556 from YanTangZhai/SPARK-4693 and squashes the following commits: 620ebe3 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references 37cfdf5 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references 70a3544 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references efa9b03 [YanTangZhai] Update HiveQuerySuite.scala 72accf1 [YanTangZhai] Update HiveQuerySuite.scala e572b9a [YanTangZhai] Update HiveStrategies.scala 6e643f8 [YanTangZhai] Merge pull request #11 from apache/master e249846 [YanTangZhai] Merge pull request #10 from apache/master d26d982 [YanTangZhai] Merge pull request #9 from apache/master 76d4027 [YanTangZhai] Merge pull request #8 from apache/master 03b62b0 [YanTangZhai] Merge pull request #7 from apache/master 8a00106 [YanTangZhai] Merge pull request #6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master
-
guowei2 authored
**sessionToActivePool** in **SparkSQLOperationManager** grow infinitely, even as sessions expire. we should remove the pool value when the session closed, even though **sessionToActivePool** would not exist in all of sessions. Author: guowei2 <guowei2@asiainfo.com> Closes #3617 from guowei2/SPARK-4756 and squashes the following commits: e9b97b8 [guowei2] fix compile bug with Shim12 cf0f521 [guowei2] Merge remote-tracking branch 'apache/master' into SPARK-4756 e070998 [guowei2] fix: remove active pool of the session when it expired
-
Thu Kyaw authored
...arquetFile accept hadoop glob pattern in path. Author: Thu Kyaw <trk007@gmail.com> Closes #3407 from tkyaw/master and squashes the following commits: 19115ad [Thu Kyaw] Merge https://github.com/apache/spark ceded32 [Thu Kyaw] [SPARK-3928][SQL] Support wildcard matches on Parquet files. d322c28 [Thu Kyaw] [SPARK-3928][SQL] Support wildcard matches on Parquet files. ce677c6 [Thu Kyaw] [SPARK-3928][SQL] Support wildcard matches on Parquet files.
-
Cheng Hao authored
Add support for `GROUPING SETS`, `ROLLUP`, `CUBE` and the the virtual column `GROUPING__ID`. More details on how to use the `GROUPING SETS" can be found at: https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup https://issues.apache.org/jira/secure/attachment/12676811/grouping_set.pdf The generic idea of the implementations are : 1 Replace the `ROLLUP`, `CUBE` with `GROUPING SETS` 2 Explode each of the input row, and then feed them to `Aggregate` * Each grouping set are represented as the bit mask for the `GroupBy Expression List`, for each bit, `1` means the expression is selected, otherwise `0` (left is the lower bit, and right is the higher bit in the `GroupBy Expression List`) * Several of projections are constructed according to the grouping sets, and within each projection(Seq[Expression), we replace those expressions with `Literal(null)` if it's not selected in the grouping set (based on the bit mask) * Output Schema of `Explode` is `child.output :+ grouping__id` * GroupBy Expressions of `Aggregate` is `GroupBy Expression List :+ grouping__id` * Keep the `Aggregation expressions` the same for the `Aggregate` The expressions substitutions happen in Logic Plan analyzing, so we will benefit from the Logical Plan optimization (e.g. expression constant folding, and map side aggregation etc.), Only an `Explosive` operator added for Physical Plan, which will explode the rows according the pre-set projections. A known issue will be done in the follow up PR: * Optimization `ColumnPruning` is not supported yet for `Explosive` node. Author: Cheng Hao <hao.cheng@intel.com> Closes #1567 from chenghao-intel/grouping_sets and squashes the following commits: fe65fcc [Cheng Hao] Remove the extra space 3547056 [Cheng Hao] Add more doc and Simplify the Expand a7c869d [Cheng Hao] update code as feedbacks d23c672 [Cheng Hao] Add GroupingExpression to replace the Seq[Expression] 414b165 [Cheng Hao] revert the unnecessary changes ec276c6 [Cheng Hao] Support Rollup/Cube/GroupingSets
-
Andrew Or authored
This is such that the `ExecutorAllocationManager` does not take in the `SparkContext` with all of its dependencies as an argument. This prevents future developers of this class to tie down this class further with the `SparkContext`, which has really become quite a monstrous object. cc'ing pwendell who originally suggested this, and JoshRosen who may have thoughts about the trait mix-in style of `SparkContext`. Author: Andrew Or <andrew@databricks.com> Closes #3614 from andrewor14/dynamic-allocation-sc and squashes the following commits: 187070d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 59baf6c [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 347a348 [Andrew Or] Refactor SparkContext into ExecutorAllocationClient
-
Aaron Davidson authored
This is used in NioBlockTransferService here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala#L66 Author: Aaron Davidson <aaron@databricks.com> Closes #3688 from aarondav/SPARK-4837 and squashes the following commits: ebd2007 [Aaron Davidson] [SPARK-4837] NettyBlockTransferService should use spark.blockManager.port config
-
Ivan Vergiliev authored
Author: Ivan Vergiliev <ivan@leanplum.com> Closes #3605 from IvanVergiliev/change-serializer and squashes the following commits: a49b7cf [Ivan Vergiliev] Use serializer instead of closureSerializer in aggregate/foldByKey.
-
Madhu Siddalingaiah authored
Rewording was based on this discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-td9804.html This is the associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-4884 Author: Madhu Siddalingaiah <madhu@madhu.com> Closes #3722 from msiddalingaiah/master and squashes the following commits: 79e679f [Madhu Siddalingaiah] [DOC]: improve documentation 51d14b9 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 38faca4 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again) 332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code> cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions
-
Ernest authored
spark.locality.wait set to 100000 in examples/graphx/Analytics.scala. Should be left to the user. Author: Ernest <earneyzxl@gmail.com> Closes #3730 from Earne/SPARK-4880 and squashes the following commits: d79ed04 [Ernest] remove spark.locality.wait in Analytics
-
DB Tsai authored
The original test doesn't make sense since if you step in, the lossSum is already NaN, and the coefficients are diverging. That's because the step size is too large for SGD, so it doesn't work. The correct behavior is that you should get smaller coefficients than the one without regularization. Comparing the values using 20000.0 relative error doesn't make sense as well. Author: DB Tsai <dbtsai@alpinenow.com> Closes #3735 from dbtsai/mlortestfix and squashes the following commits: b1a3c42 [DB Tsai] first commit
-
Ilya Ganelin authored
Hi all - cleaned up the code to get rid of the unused parameter and added some discussion of the ThreadPoolExecutor parameters to explain why we can use a single threadCount instead of providing a min/max. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3664 from ilganeli/SPARK-3607C and squashes the following commits: 3c05690 [Ilya Ganelin] Updated documentation and refactored code to extract shared variables
-
Timothy Chen authored
Author: Timothy Chen <tnachen@gmail.com> Closes #3349 from tnachen/mesos_doc and squashes the following commits: 737ef49 [Timothy Chen] Add TOC 5ca546a [Timothy Chen] Update description around cores requested. 26283a5 [Timothy Chen] Add mesos specific configurations into doc
-
Sandy Ryza authored
... changed to a time period Author: Sandy Ryza <sandy@cloudera.com> Closes #3471 from sryza/sandy-spark-3779 and squashes the following commits: 20b9887 [Sandy Ryza] Deprecate old property 42b5df7 [Sandy Ryza] Review feedback 9a959a1 [Sandy Ryza] SPARK-3779. yarn spark.yarn.applicationMaster.waitTries config should be changed to a time period
-
Zhan Zhang authored
Currently, there is no way to pass yarn am specific java options. It cause some potential issues when reading classpath from hadoop configuration file. Hadoop configuration actually replace variables in its property with the system property passed in java options. How to specify the value depends on different hadoop distribution. The new options are SPARK_YARN_JAVA_OPTS or spark.yarn.extraJavaOptions. I make it as spark global level, because typically we don't want user to specify this in their command line each time submitting spark job after it is setup in spark-defaults.conf. In addition, with this new extra options enabled to be passed to AM, it provides more flexibility. For example int the following valid mapred-site.xml file, we have the class path which specify values using system property. Hadoop can correctly handle it because it has java options passed in. This is the example, currently spark will break due to hadoop.version is not passed in. <property> <name>mapreduce.application.classpath</name> <value>/etc/hadoop/${hadoop.version}/mapreduce/*</value> </property> In the meantime, we cannot relies on mapreduce.admin.map.child.java.opts in mapred-site.xml, because it has its own extra java options specified, which does not apply to Spark. Author: Zhan Zhang <zhazhan@gmail.com> Closes #3409 from zhzhan/Spark-4461 and squashes the following commits: daec3d0 [Zhan Zhang] solve review comments 08f44a7 [Zhan Zhang] add warning in driver mode if spark.yarn.am.extraJavaOptions is configured 5a505d3 [Zhan Zhang] solve review comments 4ed43ad [Zhan Zhang] solve review comments ad777ed [Zhan Zhang] Merge branch 'master' into Spark-4461 3e9e574 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark e3f9abe [Zhan Zhang] solve review comments 8963552 [Zhan Zhang] rebase f8f6700 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark dea1692 [Zhan Zhang] change the option key name to client mode specific 90d5dff [Zhan Zhang] rebase 8ac9254 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 092a25f [Zhan Zhang] solve review comments bc5a9ae [Zhan Zhang] solve review comments 782b014 [Zhan Zhang] add new configuration to docs/running-on-yarn.md and remove it from spark-defaults.conf.template 6faaa97 [Zhan Zhang] solve review comments 369863f [Zhan Zhang] clean up unnecessary var 733de9c [Zhan Zhang] Merge branch 'master' into Spark-4461 a68e7f0 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 864505a [Zhan Zhang] Add extra java options to be passed to Yarn application master 15830fc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 685d911 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 03ebad3 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark 46d9e3d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark ebb213a [Zhan Zhang] revert b983ef3 [Zhan Zhang] test c4efb9b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 779d67b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 4daae6d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 12e1be5 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark ce0ca7b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 93f3081 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3764505 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark a9d372b [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark a00f60f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark f6a8a40 [Zhan Zhang] revert ba14f28 [Zhan Zhang] test
-
- Dec 17, 2014
-
-
lewuathe authored
Modify python annotations for sphinx. There is no change to build process from. https://github.com/apache/spark/blob/master/docs/README.md Author: lewuathe <lewuathe@me.com> Closes #3685 from Lewuathe/sphinx-tag-for-pydoc and squashes the following commits: 88a0fd9 [lewuathe] [SPARK-4822] Fix DevelopApi and WARN tags 3d7a398 [lewuathe] [SPARK-4822] Use sphinx tags for Python doc annotations
-
Patrick Wendell authored
This commit exists to close the following pull requests on Github: Closes #3137 (close requested by 'marmbrus') Closes #3362 (close requested by 'marmbrus') Closes #2979 (close requested by 'JoshRosen') Closes #2223 (close requested by 'JoshRosen') Closes #2998 (close requested by 'marmbrus') Closes #3202 (close requested by 'marmbrus') Closes #3079 (close requested by 'marmbrus') Closes #3210 (close requested by 'marmbrus') Closes #2764 (close requested by 'marmbrus') Closes #3618 (close requested by 'marmbrus') Closes #3501 (close requested by 'marmbrus') Closes #2768 (close requested by 'marmbrus') Closes #3381 (close requested by 'marmbrus') Closes #3510 (close requested by 'marmbrus') Closes #3703 (close requested by 'marmbrus') Closes #2543 (close requested by 'marmbrus') Closes #2876 (close requested by 'marmbrus') Closes #1281 (close requested by 'JoshRosen')
-
Venkata Ramana Gollamudi authored
[SPARK-3891][SQL] Add array support to percentile, percentile_approx and constant inspectors support Supported passing array to percentile and percentile_approx UDAFs To support percentile_approx, constant inspectors are supported for GenericUDAF Constant folding support added to CreateArray expression Avoided constant udf expression re-evaluation Author: Venkata Ramana G <ramana.gollamudihuawei.com> Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com> Closes #2802 from gvramana/percentile_array_support and squashes the following commits: a0182e5 [Venkata Ramana Gollamudi] fixed review comment a18f917 [Venkata Ramana Gollamudi] avoid constant udf expression re-evaluation - fixes failure due to return iterator and value type mismatch c46db0f [Venkata Ramana Gollamudi] Removed TestHive reset 4d39105 [Venkata Ramana Gollamudi] Unified inspector creation, style check fixes f37fd69 [Venkata Ramana Gollamudi] Fixed review comments 47f6365 [Venkata Ramana Gollamudi] fixed test cb7c61e [Venkata Ramana Gollamudi] Supported ConstantInspector for UDAF Fixed HiveUdaf wrap object issue. 7f94aff [Venkata Ramana Gollamudi] Added foldable support to CreateArray
-
Cheng Hao authored
``` TestSQLContext.sparkContext.parallelize( """{"ip":"27.31.100.29","headers":{"Host":"1.abc.com","Charset":"UTF-8"}}""" :: """{"ip":"27.31.100.29","headers":{}}""" :: """{"ip":"27.31.100.29","headers":""}""" :: Nil) ``` As empty string (the "headers") will be considered as String in the beginning (in line 2 and 3), it ignores the real nested data type (struct type "headers" in line 1), and also take the line 1 (the "headers") as String Type, which is not our expected. Author: Cheng Hao <hao.cheng@intel.com> Closes #3708 from chenghao-intel/json and squashes the following commits: e7a72e9 [Cheng Hao] add more concise unit test 853de51 [Cheng Hao] NullType instead of StringType when sampling against empty string or null value
-
Michael Armbrust authored
Author: Michael Armbrust <michael@databricks.com> Closes #3727 from marmbrus/parquetNotEq and squashes the following commits: 2157bfc [Michael Armbrust] Fix parquet filter suite
-
Joseph K. Bradley authored
+ small doc edit + include edit to make IntelliJ happy CC: davies mengxr Note to davies -- this does not fix the "WARNING: Literal block expected; none found." warnings since that seems to involve spacing which IntelliJ does not like. (Those warnings occur when generating the Python docs.) Author: Joseph K. Bradley <joseph@databricks.com> Closes #3669 from jkbradley/python-warnings and squashes the following commits: 4587868 [Joseph K. Bradley] fixed warning 8cb073c [Joseph K. Bradley] Updated based on davies recommendation c51eca4 [Joseph K. Bradley] Updated rst file for pyspark.mllib.rand doc. Small doc edit. Small include edit to make IntelliJ happy.
-
Cheng Hao authored
In local mode, Hadoop/Hive will ignore the "mapred.map.tasks", hence for small table file, it's always a single input split, however, SparkSQL doesn't honor that in table scanning, and we will get different result when do the Hive Compatibility test. This PR will fix that. Author: Cheng Hao <hao.cheng@intel.com> Closes #2589 from chenghao-intel/source_split and squashes the following commits: dff38e7 [Cheng Hao] Remove the extra blank line 160a2b6 [Cheng Hao] fix the compiling bug 04d67f7 [Cheng Hao] Keep 1 split for small file in table scanning
-
Daoyuan Wang authored
Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3616 from adrian-wang/sqrt and squashes the following commits: d877439 [Daoyuan Wang] fix NULLTYPE 3effa2c [Daoyuan Wang] sqrt(negative value) should return null
-
Cheng Lian authored
Predicates like `a = NULL` and `a < NULL` can't be pushed down since Parquet `Lt`, `LtEq`, `Gt`, `GtEq` doesn't accept null value. Note that `Eq` and `NotEq` can only be used with `null` to represent predicates like `a IS NULL` and `a IS NOT NULL`. However, normally this issue doesn't cause NPE because any value compared to `NULL` results `NULL`, and Spark SQL automatically optimizes out `NULL` predicate in the `SimplifyFilters` rule. Only testing code that intentionally disables the optimizer may trigger this issue. (That's why this issue is not marked as blocker and I do **NOT** think we need to backport this to branch-1.1 This PR restricts `Lt`, `LtEq`, `Gt` and `GtEq` to non-null values only, and only uses `Eq` with null value to pushdown `IsNull` and `IsNotNull`. Also, added support for Parquet `NotEq` filter for completeness and (tiny) performance gain, it's also used to pushdown `IsNotNull`. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3367) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3367 from liancheng/filters-with-null and squashes the following commits: cc41281 [Cheng Lian] Fixes several styling issues de7de28 [Cheng Lian] Adds stricter rules for Parquet filters with null
-
Michael Armbrust authored
Based on #2543. Author: Michael Armbrust <michael@databricks.com> Closes #3724 from marmbrus/resolveGetField and squashes the following commits: 0a47aae [Michael Armbrust] Fix case insensitive resolution of GetField.
-
carlmartin authored
HiveThriftServer2 can not exit automactic when changing the standy resource manager in Yarn HA mode. The scheduler backend was aware of the AM had been exited so it call sc.stop to exit the driver process but there was a user thread(HiveThriftServer2 ) which was still alive and cause this problem. To fix it, make a demo thread to detect the sparkContext is null or not.If the sc is stopped, call the ThriftServer.stop to stop the user thread. Author: carlmartin <carlmartinmax@gmail.com> Closes #3576 from SaintBacchus/ThriftServer2ExitBug and squashes the following commits: 2890b4a [carlmartin] Use SparkListener instead of the demo thread to stop the hive server. c15da0e [carlmartin] HiveThriftServer2 can not exit automactic when changing the standy resource manager in Yarn HA mode
-
Cheng Hao authored
Add `sort by` support for both DSL & SqlParser. This PR is relevant with #3386, either one merged, will cause the other rebased. Author: Cheng Hao <hao.cheng@intel.com> Closes #3481 from chenghao-intel/sortby and squashes the following commits: 041004f [Cheng Hao] Add sort by for DSL & SimpleSqlParser
-