Commits · d8fefab4d8149f0638282570c75271ef35c65cff · cs525-sp18-g07 / spark

Jan 22, 2016

[HOTFIX][BUILD][TEST-MAVEN] Remove duplicate dependency · d8fefab4
Shixiong Zhu authored 9 years ago
```
Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10868 from zsxwing/hotfix-akka-pom.
```
d8fefab4

[SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method · 8a88e121

I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629
Please, let me know what do you think.
Thanks!

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #10580 from NarineK/sparkrSavaAsRable.

8a88e121

[SPARK-12959][SQL] Writing Bucketed Data with Disabled Bucketing in SQLConf · e13c147e

gatorsmile authored 9 years ago

When users turn off bucketing in SQLConf, we should issue some messages to tell users these operations will be converted to normal way.

Also added a test case for this scenario and fixed the helper function.

Do you think this PR is helpful when using bucket tables? cloud-fan Thank you!

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10870 from gatorsmile/bucketTableWritingTestcases.

e13c147e

Jan 21, 2016

[SPARK-12960] [PYTHON] Some examples are missing support for python2 · 006906db

Mark Grover authored 9 years ago

Without importing the print_function, the lines later on like ```print("Usage: direct_kafka_wordcount.py <broker_list> <topic>", file=sys.stderr)``` fail when using python2.*. Import fixes that problem and doesn't break anything on python3 either.

Author: Mark Grover <mark@apache.org>

Closes #10872 from markgrover/python2_compat.

006906db

[SPARK-12747][SQL] Use correct type name for Postgres JDBC's real array · 55c7dd03

Liang-Chi Hsieh authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12747

Postgres JDBC driver uses "FLOAT4" or "FLOAT8" not "real".

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #10695 from viirya/fix-postgres-jdbc.

55c7dd03

[SPARK-12908][ML] Add warning message for LogisticRegression for potential converge issue · b4574e38

DB Tsai authored 9 years ago

When all labels are the same, it's a dangerous ground for LogisticRegression without intercept to converge. GLMNET doesn't support this case, and will just exit. GLM can train, but will have a warning message saying the algorithm doesn't converge.

Author: DB Tsai <dbt@netflix.com>

Closes #10862 from dbtsai/add-tests.

b4574e38

[SPARK-12534][DOC] update documentation to list command line equivalent to properties · 85200c09

felixcheung authored 9 years ago

Several Spark properties equivalent to Spark submit command line options are missing.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10491 from felixcheung/sparksubmitdoc.

85200c09

Jan 20, 2016

[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. · 1b2a918e
Sun Rui authored 9 years ago
```
Author: Sun Rui <rui.sun@intel.com>

Closes #10201 from sun-rui/SPARK-12204.
```
1b2a918e

[SPARK-12910] Fixes : R version for installing sparkR · d7415991

smishra8 authored 9 years ago

Testing code:
```
$ ./install-dev.sh
USING R_HOME = /usr/bin
ERROR: this R is version 2.15.1, package 'SparkR' requires R >= 3.0
```

Using the new argument:
```
$ ./install-dev.sh /content/username/SOFTWARE/R-3.2.3
USING R_HOME = /content/username/SOFTWARE/R-3.2.3/bin
* installing *source* package â€˜SparkRâ€™ ...
** R
** inst
** preparing package for lazy loading
Creating a new generic function for â€˜colnamesâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜colnames<-â€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜covâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜na.omitâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜filterâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜intersectâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜sampleâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜transformâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜subsetâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜summaryâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜lagâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜rankâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜sdâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜varâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜predictâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜rbindâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜lapplyâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜Filterâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜aliasâ€™ from package â€˜statsâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜substrâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜%in%â€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜meanâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜uniqueâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜nrowâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜ncolâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜headâ€™ from package â€˜utilsâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜factorialâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜atan2â€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜ifelseâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
** help
No man pages found in package  â€˜SparkRâ€™
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (SparkR)

```

Author: Shubhanshu Mishra <smishra8@illinois.edu>

Closes #10836 from napsternxg/master.

d7415991

[SPARK-8968] [SQL] [HOT-FIX] Fix scala 2.11 build. · d60f8d74
Yin Huai authored 9 years ago

d60f8d74

[SPARK-8968][SQL] external sort by the partition clomns when dynamic... · 015c8efb

wangfei authored 9 years ago

[SPARK-8968][SQL] external sort by the partition clomns when dynamic partitioning to optimize the memory overhead

Now the hash based writer dynamic partitioning show the bad performance for big data and cause many small files and high GC. This patch we do external sort first so that each time we only need open one writer.

before this patch:
![gc](https://cloud.githubusercontent.com/assets/7018048/9149788/edc48c6e-3dec-11e5-828c-9995b56e4d65.PNG)

after this patch:
![gc-optimize-externalsort](https://cloud.githubusercontent.com/assets/7018048/9149794/60f80c9c-3ded-11e5-8a56-7ae18ddc7a2f.png)

Author: wangfei <wangfei_hello@126.com>
Author: scwf <wangfei1@huawei.com>

Closes #7336 from scwf/dynamic-optimize-basedon-apachespark.

015c8efb

[SPARK-12797] [SQL] Generated TungstenAggregate (without grouping keys) · b362239d

Davies Liu authored 9 years ago

As discussed in #10786, the generated TungstenAggregate does not support imperative functions.

For a query
```
sqlContext.range(10).filter("id > 1").groupBy().count()
```

The generated code will looks like:
```
/* 032 */     if (!initAgg0) {
/* 033 */       initAgg0 = true;
/* 034 */
/* 035 */       // initialize aggregation buffer
/* 037 */       long bufValue2 = 0L;
/* 038 */
/* 039 */
/* 040 */       // initialize Range
/* 041 */       if (!range_initRange5) {
/* 042 */         range_initRange5 = true;
       ...
/* 071 */       }
/* 072 */
/* 073 */       while (!range_overflow8 && range_number7 < range_partitionEnd6) {
/* 074 */         long range_value9 = range_number7;
/* 075 */         range_number7 += 1L;
/* 076 */         if (range_number7 < range_value9 ^ 1L < 0) {
/* 077 */           range_overflow8 = true;
/* 078 */         }
/* 079 */
/* 085 */         boolean primitive11 = false;
/* 086 */         primitive11 = range_value9 > 1L;
/* 087 */         if (!false && primitive11) {
/* 092 */           // do aggregate and update aggregation buffer
/* 099 */           long primitive17 = -1L;
/* 100 */           primitive17 = bufValue2 + 1L;
/* 101 */           bufValue2 = primitive17;
/* 105 */         }
/* 107 */       }
/* 109 */
/* 110 */       // output the result
/* 112 */       bufferHolder25.reset();
/* 114 */       rowWriter26.initialize(bufferHolder25, 1);
/* 118 */       rowWriter26.write(0, bufValue2);
/* 120 */       result24.pointTo(bufferHolder25.buffer, bufferHolder25.totalSize());
/* 121 */       currentRow = result24;
/* 122 */       return;
/* 124 */     }
/* 125 */
```

cc nongli

Author: Davies Liu <davies@databricks.com>

Closes #10840 from davies/gen_agg.

b362239d

[SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal · 10173279

Herman van Hovell authored 9 years ago

The current parser turns a decimal literal, for example ```12.1```, into a Double. The problem with this approach is that we convert an exact literal into a non-exact ```Double```. The PR changes this behavior, a Decimal literal is now converted into an extact ```BigDecimal```.

The behavior for scientific decimals, for example ```12.1e01```, is unchanged. This will be converted into a Double.

This PR replaces the ```BigDecimal``` literal by a ```Double``` literal, because the ```BigDecimal``` is the default now. You can use the double literal by appending a 'D' to the value, for instance: ```3.141527D```

cc davies rxin

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #10796 from hvanhovell/SPARK-12848.

10173279

[SPARK-12888][SQL] benchmark the new hash expression · f3934a8d

Wenchen Fan authored 9 years ago

Benchmark it on 4 different schemas, the result:
```
Intel(R) Core(TM) i7-4960HQ CPU  2.60GHz
Hash For simple:                   Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
interpreted version                       31.47           266.54         1.00 X
codegen version                           64.52           130.01         0.49 X
```

```
Intel(R) Core(TM) i7-4960HQ CPU  2.60GHz
Hash For normal:                   Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
interpreted version                     4068.11             0.26         1.00 X
codegen version                         1175.92             0.89         3.46 X
```

```
Intel(R) Core(TM) i7-4960HQ CPU  2.60GHz
Hash For array:                    Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
interpreted version                     9276.70             0.06         1.00 X
codegen version                        14762.23             0.04         0.63 X
```

```
Intel(R) Core(TM) i7-4960HQ CPU  2.60GHz
Hash For map:                      Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
interpreted version                    58869.79             0.01         1.00 X
codegen version                         9285.36             0.06         6.34 X
```

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10816 from cloud-fan/hash-benchmark.

f3934a8d

[SPARK-12616][SQL] Making Logical Operator `Union` Support Arbitrary Number of Children · 8f90c151

gatorsmile authored 9 years ago

The existing `Union` logical operator only supports two children. Thus, adding a new logical operator `Unions` which can have arbitrary number of children to replace the existing one.

`Union` logical plan is a binary node. However, a typical use case for union is to union a very large number of input sources (DataFrames, RDDs, or files). It is not uncommon to union hundreds of thousands of files. In this case, our optimizer can become very slow due to the large number of logical unions. We should change the Union logical plan to support an arbitrary number of children, and add a single rule in the optimizer to collapse all adjacent `Unions` into a single `Unions`. Note that this problem doesn't exist in physical plan, because the physical `Unions` already supports arbitrary number of children.

Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>

Closes #10577 from gatorsmile/unionAllMultiChildren.

8f90c151

[SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project · b7d74a60

Shixiong Zhu authored 9 years ago

Include the following changes:

1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
3. Update the ActorWordCount example and add the JavaActorWordCount example
4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10744 from zsxwing/streaming-akka-2.

b7d74a60

[SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all... · 944fdadf

Shixiong Zhu authored 9 years ago

[SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all Streaming events to the same thread as Spark events

Including the following changes:

1. Add StreamingListenerForwardingBus to WrappedStreamingListenerEvent process events in `onOtherEvent` to StreamingListener
2. Remove StreamingListenerBus
3. Merge AsynchronousListenerBus and LiveListenerBus to the same class LiveListenerBus
4. Add `logEvent` method to SparkListenerEvent so that EventLoggingListener can use it to ignore WrappedStreamingListenerEvents

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10779 from zsxwing/streaming-listener.

944fdadf

[SPARK-10263][ML] Add @Since annotation to ml.param and ml.* · e3727c40

Takahashi Hiroshi authored 9 years ago

Add Since annotations to ml.param and ml.*

Author: Takahashi Hiroshi <takahashi.hiroshi@lab.ntt.co.jp>
Author: Hiroshi Takahashi <takahashi.hiroshi@lab.ntt.co.jp>

Closes #8935 from taishi-oss/issue10263.

e3727c40

[SPARK-12898] Consider having dummyCallSite for HiveTableScan · ab4a6bfd

Rajesh Balamohan authored 9 years ago

Currently, HiveTableScan runs with getCallSite which is really expensive and shows up when scanning through large table with partitions (e.g TPC-DS) which slows down the overall runtime of the job. It would be good to consider having dummyCallSite in HiveTableScan.

Author: Rajesh Balamohan <rbalamohan@apache.org>

Closes #10825 from rajeshbalamohan/SPARK-12898.

ab4a6bfd

[SPARK-12925][SQL] Improve HiveInspectors.unwrap for StringObjectIns… · e75e340a

Rajesh Balamohan authored 9 years ago

Text is in UTF-8 and converting it via "UTF8String.fromString" incurs decoding and encoding, which turns out to be expensive and redundant. Profiler snapshot details is attached in the JIRA (ref:https://issues.apache.org/jira/secure/attachment/12783331/SPARK-12925_profiler_cpu_samples.png)

Author: Rajesh Balamohan <rbalamohan@apache.org>

Closes #10848 from rajeshbalamohan/SPARK-12925.

e75e340a

[SPARK-12230][ML] WeightedLeastSquares.fit() should handle division by zero... · 9753835c

Imran Younus authored 9 years ago

[SPARK-12230][ML] WeightedLeastSquares.fit() should handle division by zero properly if standard deviation of target variable is zero.

This fixes the behavior of WeightedLeastSquars.fit() when the standard deviation of the target variable is zero. If the fitIntercept is true, there is no need to train.

Author: Imran Younus <iyounus@us.ibm.com>

Closes #10274 from iyounus/SPARK-12230_bug_fix_in_weighted_least_squares.

9753835c

[SPARK-11295][PYSPARK] Add packages to JUnit output for Python tests · 9bb35c5b

Gábor Lipták authored 9 years ago

This is #9263 from gliptak (improving grouping/display of test case results) with a small fix of bisecting k-means unit test.

Author: Gábor Lipták <gliptak@gmail.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #10850 from mengxr/SPARK-11295.

9bb35c5b

[SPARK-6519][ML] Add spark.ml API for bisecting k-means · 9376ae72
Yu ISHIKAWA authored 9 years ago
```
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #9604 from yu-iskw/SPARK-6519.
```
9376ae72
[SPARK-12881] [SQL] subexpress elimination in mutable projection · 8e4f894e
Davies Liu authored 9 years ago
```
Author: Davies Liu <davies@databricks.com>

Closes #10814 from davies/mutable_subexpr.
```
8e4f894e

[SPARK-12912][SQL] Add a test suite for EliminateSubQueries · 753b1945

Reynold Xin authored 9 years ago

Also updated documentation to explain why ComputeCurrentTime and EliminateSubQueries are in the optimizer rather than analyzer.

Author: Reynold Xin <rxin@databricks.com>

Closes #10837 from rxin/optimizer-analyzer-comment.

753b1945

Jan 19, 2016

[SPARK-12871][SQL] Support to specify the option for compression codec. · 6844d36a

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12871
This PR added an option to support to specify compression codec.
This adds the option `codec` as an alias `compression` as filed in [SPARK-12668 ](https://issues.apache.org/jira/browse/SPARK-12668).

Note that I did not add configurations for Hadoop 1.x as this `CsvRelation` is using Hadoop 2.x API and I guess it is going to drop Hadoop 1.x support.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #10805 from HyukjinKwon/SPARK-12420.

6844d36a

[SPARK-12232][SPARKR] New R API for read.table to avoid name conflict · 488bbb21

felixcheung authored 9 years ago

shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table`

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10406 from felixcheung/readtable.

488bbb21

Revert "[SPARK-11295] Add packages to JUnit output for Python tests" · beda9014
Xiangrui Meng authored 9 years ago
```
This reverts commit c6f971b4.
```
beda9014
[SPARK-12337][SPARKR] Implement dropDuplicates() method of DataFrame in SparkR. · 3ac64828
Sun Rui authored 9 years ago
```
Author: Sun Rui <rui.sun@intel.com>

Closes #10309 from sun-rui/SPARK-12337.
```
3ac64828

[SPARK-12168][SPARKR] Add automated tests for conflicted function in R · 37fefa66

felixcheung authored 9 years ago

Currently this is reported when loading the SparkR package in R (probably would add is.nan)
```
Loading required package: methods

Attaching package: ‘SparkR’

The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var

The following objects are masked from ‘package:base’:

    colnames, colnames<-, intersect, rank, rbind, sample, subset,
    summary, table, transform
```

Adding this test adds an automated way to track changes to masked method.
Also, the second part of this test check for those functions that would not be accessible without namespace/package prefix.

Incidentally, this might point to how we would fix those inaccessible functions in base or stats.
Looking for feedback for adding this test.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10171 from felixcheung/rmaskedtest.

37fefa66

[SPARK-12770][SQL] Implement rules for branch elimination for CaseWhen · 3e84ef0a

Reynold Xin authored 9 years ago

The three optimization cases are:

1. If the first branch's condition is a true literal, remove the CaseWhen and use the value from that branch.
2. If a branch's condition is a false or null literal, remove that branch.
3. If only the else branch is left, remove the CaseWhen and use the value from the else branch.

Author: Reynold Xin <rxin@databricks.com>

Closes #10827 from rxin/SPARK-12770.

3e84ef0a

[SPARK-9716][ML] BinaryClassificationEvaluator should accept Double prediction column · f6f7ca9d

BenFradet authored 9 years ago

This PR aims to allow the prediction column of `BinaryClassificationEvaluator` to be of double type.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10472 from BenFradet/SPARK-9716.

f6f7ca9d

[SPARK-2750][WEB UI] Add https support to the Web UI · 43f1d59e

scwf authored 9 years ago

Author: scwf <wangfei1@huawei.com>
Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: WangTaoTheTonic <wangtao111@huawei.com>
Author: w00228970 <wangfei1@huawei.com>

Closes #10238 from vanzin/SPARK-2750.

43f1d59e

[BUILD] Runner for spark packages · efd7eed3

Michael Armbrust authored 9 years ago

This is a convenience method added to the SBT build for developers, though if people think its useful we could consider adding a official script that runs using the assembly instead of compiling on demand.  It simply compiles spark (without requiring an assembly), and invokes Spark Submit to download / run the package.

Example Usage:
```
$ build/sbt
> sparkPackage com.databricks:spark-sql-perf_2.10:0.2.4 com.databricks.spark.sql.perf.RunBenchmark --help
```

Author: Michael Armbrust <michael@databricks.com>

Closes #10834 from marmbrus/sparkPackageRunner.

efd7eed3

[SPARK-11295] Add packages to JUnit output for Python tests · c6f971b4

Gábor Lipták authored 9 years ago

SPARK-11295 Add packages to JUnit output for Python tests

This improves grouping/display of test case results.

Author: Gábor Lipták <gliptak@gmail.com>

Closes #9263 from gliptak/SPARK-11295.

c6f971b4

[SPARK-12816][SQL] De-alias type when generating schemas · c78e2080

Jakob Odersky authored 9 years ago

Call `dealias` on local types to fix schema generation for abstract type members, such as

```scala
type KeyValue = (Int, String)
```

Add simple test

Author: Jakob Odersky <jodersky@gmail.com>

Closes #10749 from jodersky/aliased-schema.

c78e2080

[SPARK-12560][SQL] SqlTestUtils.stripSparkFilter needs to copy utf8strings · 4dbd3161

Imran Rashid authored 9 years ago

See https://issues.apache.org/jira/browse/SPARK-12560

This isn't causing any problems currently because the tests for string predicate pushdown are currently disabled. I ran into this while trying to turn them back on with a different version of parquet. Figure it was good to fix now in any case.

Author: Imran Rashid <irashid@cloudera.com>

Closes #10510 from squito/SPARK-12560.

4dbd3161

[SPARK-12867][SQL] Nullability of Intersect can be stricter · b72e01e8

gatorsmile authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-12867

When intersecting one nullable column with one non-nullable column, the result will not contain any null. Thus, we can make nullability of `intersect` stricter.

liancheng Could you please check if the code changes are appropriate? Also added test cases to verify the results. Thanks!

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10812 from gatorsmile/nullabilityIntersect.

b72e01e8

[SPARK-12804][ML] Fix LogisticRegression with FitIntercept on all same label training data · 2388de51
Feynman Liang authored 9 years ago
```
CC jkbradley mengxr dbtsai

Author: Feynman Liang <feynman.liang@gmail.com>

Closes #10743 from feynmanliang/SPARK-12804.
```
2388de51

[SPARK-12887] Do not expose var's in TaskMetrics · b122c861

Andrew Or authored 9 years ago

This is a step in implementing SPARK-10620, which migrates TaskMetrics to accumulators.

TaskMetrics has a bunch of var's, some are fully public, some are `private[spark]`. This is bad coding style that makes it easy to accidentally overwrite previously set metrics. This has happened a few times in the past and caused bugs that were difficult to debug.

Instead, we should have get-or-create semantics, which are more readily understandable. This makes sense in the case of TaskMetrics because these are just aggregated metrics that we want to collect throughout the task, so it doesn't matter who's incrementing them.

Parent PR: #10717

Author: Andrew Or <andrew@databricks.com>
Author: Josh Rosen <joshrosen@databricks.com>
Author: andrewor14 <andrew@databricks.com>

Closes #10815 from andrewor14/get-or-create-metrics.

b122c861