- Dec 22, 2014
-
-
Kostas Sakellis authored
This commit consolidates some of the exceptions thrown if compression codecs are not available. If a bad configuration string was passed in, a ClassNotFoundException was through. Also, if Snappy was not available, it would throw an InvocationTargetException when the codec was being used (not when it was being initialized). Now, an IllegalArgumentException is thrown when a codec is not available at creation time - either because the class does not exist or the codec itself is not available in the system. This will allow us to have a better message and fail faster. Author: Kostas Sakellis <kostas@cloudera.com> Closes #3119 from ksakellis/kostas-spark-4079 and squashes the following commits: 9709c7c [Kostas Sakellis] Removed unnecessary Logging class 63bfdd0 [Kostas Sakellis] Removed isAvailable to preserve binary compatibility 1d0ef2f [Kostas Sakellis] [SPARK-4079] [CORE] Added more information to exception 64f3d27 [Kostas Sakellis] [SPARK-4079] [CORE] Code review feedback 52dfa8f [Kostas Sakellis] [SPARK-4079] [CORE] Default to LZF if Snappy not available
-
Sandy Ryza authored
Author: Sandy Ryza <sandy@cloudera.com> Closes #3652 from sryza/sandy-spark-4447 and squashes the following commits: 2791158 [Sandy Ryza] Review feedback c23507b [Sandy Ryza] Strip margin from client arguments help string 18be7ba [Sandy Ryza] SPARK-4447
-
Takeshi Yamamuro authored
Add missing Javadoc comments in ShuffleDependency. Author: Takeshi Yamamuro <linguin.m.s@gmail.com> Closes #3594 from maropu/DependencyJavadocFix and squashes the following commits: 32129b4 [Takeshi Yamamuro] Fix comments in @aggregator and @mapSideCombine 303c75d [Takeshi Yamamuro] [SPARK-4733] Add missing prameter comments in ShuffleDependency
-
carlmartin authored
Using val arr1 = (0 until num).toArray instead of val arr1 = new Array[Int](num) for (i <- 0 until arr1.length) { arr1(i) = i } for short. Author: carlmartin <carlmartinmax@gmail.com> Closes #3750 from SaintBacchus/BroadcastTest and squashes the following commits: 43adb70 [carlmartin] Improve some code in BroadcastTest for short
-
zsxwing authored
Author: zsxwing <zsxwing@gmail.com> Closes #3734 from zsxwing/SPARK-4883 and squashes the following commits: e6f2b61 [zsxwing] Fix the name cc74727 [zsxwing] Add a name to the directoryCleaner thread
-
Zhang, Liye authored
Author: Zhang, Liye <liye.zhang@intel.com> Closes #3717 from liyezhang556520/version2Log and squashes the following commits: ccd30d7 [Zhang, Liye] delete log in sparkConf 330f70c [Zhang, Liye] move the log from SaprkConf to SparkContext 96dc115 [Zhang, Liye] remove curly brace e833330 [Zhang, Liye] add spark version to driver log
-
Tsuyoshi Ozawa authored
Author: Tsuyoshi Ozawa <ozawa.tsuyoshi@lab.ntt.co.jp> Closes #3757 from oza/SPARK-4915 and squashes the following commits: 3b0d6d6 [Tsuyoshi Ozawa] Fix classname to be specified for external shuffle service.
-
zsxwing authored
Reuse Text in saveAsTextFile to reduce GC. /cc rxin Author: zsxwing <zsxwing@gmail.com> Closes #3762 from zsxwing/SPARK-4918 and squashes the following commits: 59f03eb [zsxwing] Reuse Text in saveAsTextFile
-
zsxwing authored
`NullWritable` is a `Comparable` rather than `Comparable[NullWritable]` in Hadoop 1.+, so the compiler cannot find an implicit Ordering for it. It will generate different anonymous classes for `saveAsTextFile` in Hadoop 1.+ and Hadoop 2.+. Therefore, here we provide an Ordering for NullWritable so that the compiler will generate same codes. I used the following commands to confirm the generated byte codes are some. ``` mvn -Dhadoop.version=1.2.1 -DskipTests clean package -pl core -am javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop1.txt mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package -pl core -am javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop2.txt diff ~/hadoop1.txt ~/hadoop2.txt ``` However, the compiler will generate different codes for the classes which call methods of `JobContext/TaskAttemptContext`. `JobContext/TaskAttemptContext` is a class in Hadoop 1.+, and calling its method will use `invokevirtual`, while it's an interface in Hadoop 2.+, and will use `invokeinterface`. To fix it, we can use reflection to call `JobContext/TaskAttemptContext.getConfiguration`. Author: zsxwing <zsxwing@gmail.com> Closes #3740 from zsxwing/SPARK-2075 and squashes the following commits: 39d9df2 [zsxwing] Fix the code style e4ad8b5 [zsxwing] Use null for the implicit Ordering 734bac9 [zsxwing] Explicitly set the implicit parameters ca03559 [zsxwing] Use reflection to access JobContext/TaskAttemptContext.getConfiguration fa40db0 [zsxwing] Add an Ordering for NullWritable to make the compiler generate same byte codes for RDD
-
- Dec 21, 2014
-
-
Sean Owen authored
Fix small Hadoop 1 compile error from SPARK-2261. In Hadoop 1.x, all we have is FileStatus.isDir, so these "is file" assertions are changed to "is not a dir". This is how similar checks are done so far in the code base. Author: Sean Owen <sowen@cloudera.com> Closes #3754 from srowen/SPARK-4910 and squashes the following commits: 52c5e4e [Sean Owen] Fix small Hadoop 1 compile error from SPARK-2261
-
- Dec 20, 2014
-
-
huangzhaowei authored
Mvn Build Failed: value defaultProperties not found .Maybe related to this pr: https://github.com/apache/spark/commit/1d648123a77bbcd9b7a34cc0d66c14fa85edfecd andrewor14 can you look at this problem? Author: huangzhaowei <carlmartinmax@gmail.com> Closes #3749 from SaintBacchus/Mvn-Build-Fail and squashes the following commits: 8e2917c [huangzhaowei] Build Failed: value defaultProperties not found
-
- Dec 19, 2014
-
-
Andrew Or authored
Once the external shuffle service is also documented, the dynamic allocation section will link to it. Let me know if the whole dynamic allocation should be moved to its separate page; I personally think the organization might be cleaner that way. This patch builds on top of oza's work in #3689. aarondav pwendell Author: Andrew Or <andrew@databricks.com> Author: Tsuyoshi Ozawa <ozawa.tsuyoshi@gmail.com> Closes #3731 from andrewor14/document-dynamic-allocation and squashes the following commits: 1281447 [Andrew Or] Address a few comments b9843f2 [Andrew Or] Document the configs as well 246fb44 [Andrew Or] Merge branch 'SPARK-4839' of github.com:oza/spark into document-dynamic-allocation 8c64004 [Andrew Or] Add documentation for dynamic allocation (without configs) 6827b56 [Tsuyoshi Ozawa] Fixing a documentation of spark.dynamicAllocation.enabled. 53cff58 [Tsuyoshi Ozawa] Adding a documentation about dynamic resource allocation.
-
Daniel Darabos authored
My guess for fixing https://issues.apache.org/jira/browse/SPARK-4831. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #3678 from darabos/patch-1 and squashes the following commits: 36e1243 [Daniel Darabos] Do not include SPARK_CLASSPATH if empty.
-
Kanwaljit Singh authored
Since we can set spark executor memory and executor cores using property file, we must also be allowed to set the executor instances. Author: Kanwaljit Singh <kanwaljit.singh@guavus.com> Closes #1657 from kjsingh/branch-1.0 and squashes the following commits: d8a5a12 [Kanwaljit Singh] SPARK-2641: Fixing how spark arguments are loaded from properties file for num executors Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
-
Masayoshi TSUZUKI authored
Added equivalent module as utils.sh and modified spark-shell2.cmd to use it to parse options. Now we can use application options. ex) `bin\spark-shell.cmd --master spark://master:7077 -i path\to\script.txt` Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #3350 from tsudukim/feature/SPARK-3060 and squashes the following commits: 4551e56 [Masayoshi TSUZUKI] Modified too long line which defines the submission options to pass findstr command. 3a11361 [Masayoshi TSUZUKI] [SPARK-3060] spark-shell.cmd doesn't accept application options in Windows OS
-
Eran Medan authored
the signature of registerKryoClasses is actually of Array[Class[_]] not Seq Author: Eran Medan <ehrann.mehdan@gmail.com> Closes #3747 from eranation/patch-1 and squashes the following commits: ee9885d [Eran Medan] change signature of example to match released code
-
Marcelo Vanzin authored
Currently the event logger uses a directory and several files to describe an app's event log, all but one of which are empty. This is not very HDFS-friendly, since creating lots of nodes in HDFS (especially when they don't contain any data) is frowned upon due to the node metadata being kept in the NameNode's memory. Instead, add a header section to the event log file that contains metadata needed to read the events. This metadata includes things like the Spark version (for future code that may need it for backwards compatibility) and the compression codec used for the event data. With the new approach, aside from reducing the load on the NN, there's also a lot less remote calls needed when reading the log directory. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #1222 from vanzin/hist-server-single-log and squashes the following commits: cc8f5de [Marcelo Vanzin] Store header in plain text. c7e6123 [Marcelo Vanzin] Update comment. 59c561c [Marcelo Vanzin] Review feedback. 216c5a3 [Marcelo Vanzin] Review comments. dce28e9 [Marcelo Vanzin] Fix log overwrite test. f91c13e [Marcelo Vanzin] Handle "spark.eventLog.overwrite", and add unit test. 346f0b4 [Marcelo Vanzin] Review feedback. ed0023e [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 3f4500f [Marcelo Vanzin] Unit test for SPARK-3697. 45c7a1f [Marcelo Vanzin] Version of SPARK-3697 for this branch. b3ee30b [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log a6d5c50 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 16fd491 [Marcelo Vanzin] Use unique log directory for each codec. 0ef3f70 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log d93c44a [Marcelo Vanzin] Add a newline to make the header more readable. 9e928ba [Marcelo Vanzin] Add types. bd6ba8c [Marcelo Vanzin] Review feedback. a624a89 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 04364dc [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log bb7c2d3 [Marcelo Vanzin] Fix scalastyle warning. 16661a3 [Marcelo Vanzin] Simplify some internal code. cc6bce4 [Marcelo Vanzin] Some review feedback. a722184 [Marcelo Vanzin] Do not encode metadata in log file name. 3700586 [Marcelo Vanzin] Restore log flushing. f677930 [Marcelo Vanzin] Fix botched rebase. ae571fa [Marcelo Vanzin] Fix end-to-end event logger test. 9db0efd [Marcelo Vanzin] Show prettier name in UI. 8f42274 [Marcelo Vanzin] Make history server parse old-style log directories. 6251dd7 [Marcelo Vanzin] Make event logger use a single file.
-
Josh Rosen authored
This patch upgrades `spark-ec2`'s Boto version to 2.34.0, since this is blocking several features. Newer versions of Boto don't work properly when they're loaded from a zipfile since they try to read a JSON file from a path relative to the Boto library sources. Therefore, this patch also changes spark-ec2 to automatically download Boto from PyPi if it's not present in `SPARK_EC2_DIR/lib`, similar to what we do in the `sbt/sbt` script. This shouldn't ben an issue for users since they already need to have an internet connection to launch an EC2 cluster. By performing the downloading in spark_ec2.py instead of the Bash script, this should also work for Windows users. I've tested this with Python 2.6, too. Author: Josh Rosen <joshrosen@databricks.com> Closes #3737 from JoshRosen/update-boto and squashes the following commits: 0aa43cc [Josh Rosen] Remove unused setup_standalone_cluster() method. f02935d [Josh Rosen] Enable Python deprecation warnings and fix one Boto warning: 587ae89 [Josh Rosen] [SPARK-4890] Upgrade Boto to 2.34.0; automatically download Boto from PyPi instead of packaging it
-
Ryan Williams authored
Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #2848 from ryan-williams/fetch-file and squashes the following commits: c14daff [Ryan Williams] Fix copy that was changed to a move inadvertently 8e39c16 [Ryan Williams] code review feedback 788ed41 [Ryan Williams] don’t redundantly overwrite executor JAR deps
-
Ryan Williams authored
Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3736 from ryan-williams/hist and squashes the following commits: 421d8ff [Ryan Williams] add another random typo fix 76d6a4c [Ryan Williams] remove hdfs example a2d0f82 [Ryan Williams] code review feedback 9ca7629 [Ryan Williams] [SPARK-4889] update history server example cmds
-
Reynold Xin authored
This consolidates some code path and makes constructor arguments simpler for a few classes. Author: Reynold Xin <rxin@databricks.com> Closes #3738 from rxin/sparkEnvDepRefactor and squashes the following commits: 82e02cc [Reynold Xin] Fixed couple bugs. 217062a [Reynold Xin] Code review feedback. bd00af7 [Reynold Xin] Small refactoring to pass SparkEnv into Executor rather than creating SparkEnv in Executor.
-
scwf authored
Author: scwf <wangfei1@huawei.com> Closes #3743 from scwf/abc and squashes the following commits: 7d98bc8 [scwf] removing spark-staging-1038
-
Cheng Hao authored
HiveInspectors.scala failed in compiling with Hadoop 1, as the BytesWritable.copyBytes is not available in Hadoop 1. Author: Cheng Hao <hao.cheng@intel.com> Closes #3742 from chenghao-intel/settable_oi_hotfix and squashes the following commits: bb04d1f [Cheng Hao] hot fix for ByteWritables.copyBytes
-
Sandy Ryza authored
Author: Sandy Ryza <sandy@cloudera.com> Closes #3684 from sryza/sandy-spark-3428 and squashes the following commits: cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing GC time metrics
-
- Dec 18, 2014
-
-
Liang-Chi Hsieh authored
The current version of `getCallSite` visits the collection of `StackTraceElement` twice. However, it is unnecessary since we can perform our work with a single visit. We also do not need to keep filtered `StackTraceElement`. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3532 from viirya/refactor_getCallSite and squashes the following commits: 62aa124 [Liang-Chi Hsieh] Fix style. e741017 [Liang-Chi Hsieh] Refactor getCallSite.
-
RJ Nowling authored
...ta generators This patch adds: * Exponential, gamma, and log normal generators that wrap Apache Commons math3 to the private API * Functions for generating exponential, gamma, and log normal RDDs and vector RDDs * Tests for the above Author: RJ Nowling <rnowling@gmail.com> Closes #3680 from rnowling/spark4728 and squashes the following commits: 455f50a [RJ Nowling] Add tests for exponential, gamma, and log normal samplers to JavaRandomRDDsSuite 3e1134a [RJ Nowling] Fix val/var, unncessary creation of Distribution objects when setting seeds, and import line longer than line wrap limits 58f5b97 [RJ Nowling] Fix bounds in tests so they scale with variance, not stdev 84fd98d [RJ Nowling] Add more values for testing distributions. 9f96232 [RJ Nowling] [SPARK-4728] Add exponential, gamma, and log normal sampling to MLlib data generators
-
wangfei authored
Remove ```Command``` and use ```RunnableCommand``` instead. Author: wangfei <wangfei1@huawei.com> Author: scwf <wangfei1@huawei.com> Closes #3712 from scwf/cmd and squashes the following commits: 51a82f2 [wangfei] fix test failure 0e03be8 [wangfei] address comments 4033bed [scwf] remove CreateTableAsSelect in hivestrategy 5d20010 [wangfei] address comments 125f542 [scwf] factory command in spark sql
-
Cheng Hao authored
Hive UDAF may create an customized object constructed by SettableStructObjectInspector, this is critical when integrate Hive UDAF with the refactor-ed UDAF interface. Performance issue in `wrap/unwrap` since more match cases added, will do it in another PR. Author: Cheng Hao <hao.cheng@intel.com> Closes #3429 from chenghao-intel/settable_oi and squashes the following commits: 9f0aff3 [Cheng Hao] update code style issues as feedbacks 2b0561d [Cheng Hao] Add more scala doc f5a40e8 [Cheng Hao] add scala doc 2977e9b [Cheng Hao] remove the timezone setting for test suite 3ed284c [Cheng Hao] fix the date type comparison f1b6749 [Cheng Hao] Update the comment 932940d [Cheng Hao] Add more unit test 72e4332 [Cheng Hao] Add settable StructObjectInspector support
-
ravipesala authored
Adding support to the partial aggregation of SumDistinct Author: ravipesala <ravindra.pesala@huawei.com> Closes #3348 from ravipesala/SPARK-2554 and squashes the following commits: fd28e4d [ravipesala] Fixed review comments e60e67f [ravipesala] Fixed test cases and made it as nullable 32fe234 [ravipesala] Supporting SumDistinct partial aggregation Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
-
YanTangZhai authored
[SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references The sql "select * from spark_test::for_test where abs(20141202) is not null" has predicates=List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)) and partitionKeyIds=AttributeSet(). PruningPredicates is List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)). Then the exception "java.lang.IllegalArgumentException: requirement failed: Partition pruning predicates only supported for partitioned tables." is thrown. The sql "select * from spark_test::for_test_partitioned_table where abs(20141202) is not null and type_id=11 and platform = 3" with partitioned key insert_date has predicates=List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202), (type_id#12 = 11), (platform#8 = 3)) and partitionKeyIds=AttributeSet(insert_date#24). PruningPredicates is List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)). Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Closes #3556 from YanTangZhai/SPARK-4693 and squashes the following commits: 620ebe3 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references 37cfdf5 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references 70a3544 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references efa9b03 [YanTangZhai] Update HiveQuerySuite.scala 72accf1 [YanTangZhai] Update HiveQuerySuite.scala e572b9a [YanTangZhai] Update HiveStrategies.scala 6e643f8 [YanTangZhai] Merge pull request #11 from apache/master e249846 [YanTangZhai] Merge pull request #10 from apache/master d26d982 [YanTangZhai] Merge pull request #9 from apache/master 76d4027 [YanTangZhai] Merge pull request #8 from apache/master 03b62b0 [YanTangZhai] Merge pull request #7 from apache/master 8a00106 [YanTangZhai] Merge pull request #6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master
-
guowei2 authored
**sessionToActivePool** in **SparkSQLOperationManager** grow infinitely, even as sessions expire. we should remove the pool value when the session closed, even though **sessionToActivePool** would not exist in all of sessions. Author: guowei2 <guowei2@asiainfo.com> Closes #3617 from guowei2/SPARK-4756 and squashes the following commits: e9b97b8 [guowei2] fix compile bug with Shim12 cf0f521 [guowei2] Merge remote-tracking branch 'apache/master' into SPARK-4756 e070998 [guowei2] fix: remove active pool of the session when it expired
-
Thu Kyaw authored
...arquetFile accept hadoop glob pattern in path. Author: Thu Kyaw <trk007@gmail.com> Closes #3407 from tkyaw/master and squashes the following commits: 19115ad [Thu Kyaw] Merge https://github.com/apache/spark ceded32 [Thu Kyaw] [SPARK-3928][SQL] Support wildcard matches on Parquet files. d322c28 [Thu Kyaw] [SPARK-3928][SQL] Support wildcard matches on Parquet files. ce677c6 [Thu Kyaw] [SPARK-3928][SQL] Support wildcard matches on Parquet files.
-
Cheng Hao authored
Add support for `GROUPING SETS`, `ROLLUP`, `CUBE` and the the virtual column `GROUPING__ID`. More details on how to use the `GROUPING SETS" can be found at: https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup https://issues.apache.org/jira/secure/attachment/12676811/grouping_set.pdf The generic idea of the implementations are : 1 Replace the `ROLLUP`, `CUBE` with `GROUPING SETS` 2 Explode each of the input row, and then feed them to `Aggregate` * Each grouping set are represented as the bit mask for the `GroupBy Expression List`, for each bit, `1` means the expression is selected, otherwise `0` (left is the lower bit, and right is the higher bit in the `GroupBy Expression List`) * Several of projections are constructed according to the grouping sets, and within each projection(Seq[Expression), we replace those expressions with `Literal(null)` if it's not selected in the grouping set (based on the bit mask) * Output Schema of `Explode` is `child.output :+ grouping__id` * GroupBy Expressions of `Aggregate` is `GroupBy Expression List :+ grouping__id` * Keep the `Aggregation expressions` the same for the `Aggregate` The expressions substitutions happen in Logic Plan analyzing, so we will benefit from the Logical Plan optimization (e.g. expression constant folding, and map side aggregation etc.), Only an `Explosive` operator added for Physical Plan, which will explode the rows according the pre-set projections. A known issue will be done in the follow up PR: * Optimization `ColumnPruning` is not supported yet for `Explosive` node. Author: Cheng Hao <hao.cheng@intel.com> Closes #1567 from chenghao-intel/grouping_sets and squashes the following commits: fe65fcc [Cheng Hao] Remove the extra space 3547056 [Cheng Hao] Add more doc and Simplify the Expand a7c869d [Cheng Hao] update code as feedbacks d23c672 [Cheng Hao] Add GroupingExpression to replace the Seq[Expression] 414b165 [Cheng Hao] revert the unnecessary changes ec276c6 [Cheng Hao] Support Rollup/Cube/GroupingSets
-
Andrew Or authored
This is such that the `ExecutorAllocationManager` does not take in the `SparkContext` with all of its dependencies as an argument. This prevents future developers of this class to tie down this class further with the `SparkContext`, which has really become quite a monstrous object. cc'ing pwendell who originally suggested this, and JoshRosen who may have thoughts about the trait mix-in style of `SparkContext`. Author: Andrew Or <andrew@databricks.com> Closes #3614 from andrewor14/dynamic-allocation-sc and squashes the following commits: 187070d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 59baf6c [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 347a348 [Andrew Or] Refactor SparkContext into ExecutorAllocationClient
-
Aaron Davidson authored
This is used in NioBlockTransferService here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala#L66 Author: Aaron Davidson <aaron@databricks.com> Closes #3688 from aarondav/SPARK-4837 and squashes the following commits: ebd2007 [Aaron Davidson] [SPARK-4837] NettyBlockTransferService should use spark.blockManager.port config
-
Ivan Vergiliev authored
Author: Ivan Vergiliev <ivan@leanplum.com> Closes #3605 from IvanVergiliev/change-serializer and squashes the following commits: a49b7cf [Ivan Vergiliev] Use serializer instead of closureSerializer in aggregate/foldByKey.
-
Madhu Siddalingaiah authored
Rewording was based on this discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-td9804.html This is the associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-4884 Author: Madhu Siddalingaiah <madhu@madhu.com> Closes #3722 from msiddalingaiah/master and squashes the following commits: 79e679f [Madhu Siddalingaiah] [DOC]: improve documentation 51d14b9 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 38faca4 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again) 332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code> cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions
-
Ernest authored
spark.locality.wait set to 100000 in examples/graphx/Analytics.scala. Should be left to the user. Author: Ernest <earneyzxl@gmail.com> Closes #3730 from Earne/SPARK-4880 and squashes the following commits: d79ed04 [Ernest] remove spark.locality.wait in Analytics
-
DB Tsai authored
The original test doesn't make sense since if you step in, the lossSum is already NaN, and the coefficients are diverging. That's because the step size is too large for SGD, so it doesn't work. The correct behavior is that you should get smaller coefficients than the one without regularization. Comparing the values using 20000.0 relative error doesn't make sense as well. Author: DB Tsai <dbtsai@alpinenow.com> Closes #3735 from dbtsai/mlortestfix and squashes the following commits: b1a3c42 [DB Tsai] first commit
-
Ilya Ganelin authored
Hi all - cleaned up the code to get rid of the unused parameter and added some discussion of the ThreadPoolExecutor parameters to explain why we can use a single threadCount instead of providing a min/max. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3664 from ilganeli/SPARK-3607C and squashes the following commits: 3c05690 [Ilya Ganelin] Updated documentation and refactored code to extract shared variables
-