Skip to content
Snippets Groups Projects
  1. Sep 06, 2017
    • Bryan Cutler's avatar
      [SPARK-19357][ML] Adding parallel model evaluation in ML tuning · 16c4c03c
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Modified `CrossValidator` and `TrainValidationSplit` to be able to evaluate models in parallel for a given parameter grid.  The level of parallelism is controlled by a parameter `numParallelEval` used to schedule a number of models to be trained/evaluated so that the jobs can be run concurrently.  This is a naive approach that does not check the cluster for needed resources, so care must be taken by the user to tune the parameter appropriately.  The default value is `1` which will train/evaluate in serial.
      ## How was this patch tested?
      Added unit tests for CrossValidator and TrainValidationSplit to verify that model selection is the same when run in serial vs parallel.  Manual testing to verify tasks run in parallel when param is > 1. Added parameter usage to relevant examples.
      Author: Bryan Cutler <>
      Closes #16774 from BryanCutler/parallel-model-eval-SPARK-19357.
  2. Sep 01, 2017
    • WeichenXu's avatar
      [SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to ensure... · 900f14f6
      WeichenXu authored
      [SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to ensure consistent output columns
      ## What changes were proposed in this pull request?
      Add test for prediction using the model with all combinations of output columns turned on/off.
      Make sure the output column values match, presumably by comparing vs. the case with all 3 output columns turned on.
      ## How was this patch tested?
      Test updated.
      Author: WeichenXu <>
      Author: WeichenXu <>
      Closes #19065 from WeichenXu123/generic_test_for_prob_classifier.
    • Sean Owen's avatar
      [SPARK-14280][BUILD][WIP] Update and pom.xml to add Scala... · 12ab7f7e
      Sean Owen authored
      [SPARK-14280][BUILD][WIP] Update and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation
      …build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure
      ## What changes were proposed in this pull request?
      This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.
      In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.
      It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release.
      - Scalatest 2.x -> 3.0.3
      - Chill 0.8.0 -> 0.8.4
      - Clapper 1.0.x -> 1.1.2
      - json4s 3.2.x -> 3.4.2
      - Jackson 2.6.x -> 2.7.9 (required by json4s)
      This change does _not_ fully enable a Scala 2.12 build:
      - It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here
      - It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too.
      What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.
      ## How was this patch tested?
      Existing tests and build. Manually tested with `./dev/ 2.12` to verify it compiles, modulo the exceptions above.
      Author: Sean Owen <>
      Closes #18645 from srowen/SPARK-14280.
  3. Aug 31, 2017
  4. Aug 30, 2017
    • Bryan Cutler's avatar
      [SPARK-21469][ML][EXAMPLES] Adding Examples for FeatureHasher · 4133c1b0
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      This PR adds ML examples for the FeatureHasher transform in Scala, Java, Python.
      ## How was this patch tested?
      Manually ran examples and verified that output is consistent for different APIs
      Author: Bryan Cutler <>
      Closes #19024 from BryanCutler/ml-examples-FeatureHasher-SPARK-21810.
    • Sean Owen's avatar
      [SPARK-21806][MLLIB] BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading · 734ed7a7
      Sean Owen authored
      ## What changes were proposed in this pull request?
      Prepend (0,p) to precision-recall curve not (0,1) where p matches lowest recall point
      ## How was this patch tested?
      Updated tests.
      Author: Sean Owen <>
      Closes #19038 from srowen/SPARK-21806.
  5. Aug 29, 2017
  6. Aug 28, 2017
    • Weichen Xu's avatar
      [SPARK-17139][ML] Add model summary for MultinomialLogisticRegression · c7270a46
      Weichen Xu authored
      ## What changes were proposed in this pull request?
      Add 4 traits, using the following hierarchy:
      LogisticRegressionTrainingSummary: LogisticRegressionSummary
      BinaryLogisticRegressionSummary: LogisticRegressionSummary
      BinaryLogisticRegressionTrainingSummary: LogisticRegressionTrainingSummary, BinaryLogisticRegressionSummary
      and the public method such as `def summary` only return trait type listed above.
      and then implement 4 concrete classes:
      LogisticRegressionSummaryImpl (multiclass case)
      LogisticRegressionTrainingSummaryImpl (multiclass case)
      BinaryLogisticRegressionSummaryImpl (binary case).
      BinaryLogisticRegressionTrainingSummaryImpl (binary case).
      ## How was this patch tested?
      Existing tests & added tests.
      Author: WeichenXu <>
      Closes #15435 from WeichenXu123/mlor_summary.
    • WeichenXu's avatar
      [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSummarizer.variance generate negative result · 0456b405
      WeichenXu authored
      ## What changes were proposed in this pull request?
      Because of numerical error, MultivariateOnlineSummarizer.variance is possible to generate negative variance.
      **This is a serious bug because many algos in MLLib**
      **use stddev computed from** `sqrt(variance)`
      **it will generate NaN and crash the whole algorithm.**
      we can reproduce this bug use the following code:
          val summarizer1 = (new MultivariateOnlineSummarizer)
            .add(Vectors.dense(3.0), 0.7)
          val summarizer2 = (new MultivariateOnlineSummarizer)
            .add(Vectors.dense(3.0), 0.4)
          val summarizer3 = (new MultivariateOnlineSummarizer)
            .add(Vectors.dense(3.0), 0.5)
          val summarizer4 = (new MultivariateOnlineSummarizer)
            .add(Vectors.dense(3.0), 0.4)
          val summarizer = summarizer1
      This PR fix the bugs in `mllib.stat.MultivariateOnlineSummarizer.variance` and `ml.stat.SummarizerBuffer.variance`, and several places in `WeightedLeastSquares`
      ## How was this patch tested?
      test cases added.
      Author: WeichenXu <>
      Closes #19029 from WeichenXu123/fix_summarizer_var_bug.
  7. Aug 25, 2017
    • Sean Owen's avatar
      [MINOR][BUILD] Fix build warnings and Java lint errors · de7af295
      Sean Owen authored
      ## What changes were proposed in this pull request?
      Fix build warnings and Java lint errors. This just helps a bit in evaluating (new) warnings in another PR I have open.
      ## How was this patch tested?
      Existing tests
      Author: Sean Owen <>
      Closes #19051 from srowen/JavaWarnings.
  8. Aug 24, 2017
  9. Aug 22, 2017
    • Weichen Xu's avatar
      [SPARK-12664][ML] Expose probability in mlp model · d6b30edd
      Weichen Xu authored
      ## What changes were proposed in this pull request?
      Modify MLP model to inherit `ProbabilisticClassificationModel` and so that it can expose the probability  column when transforming data.
      ## How was this patch tested?
      Test added.
      Author: WeichenXu <>
      Closes #17373 from WeichenXu123/expose_probability_in_mlp_model.
    • Yanbo Liang's avatar
      [ML][MINOR] Make sharedParams update. · 34296190
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      ```sharedParams.scala``` was generated by ```SharedParamsCodeGen```, but it's not updated in master. Maybe someone manual update ```sharedParams.scala```, this PR fix this issue.
      ## How was this patch tested?
      Offline check.
      Author: Yanbo Liang <>
      Closes #19011 from yanboliang/sharedParams.
    • Weichen Xu's avatar
      [SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd contains zero · d56c2621
      Weichen Xu authored
      ## What changes were proposed in this pull request?
      fix bug of MLOR do not work correctly when featureStd contains zero
      We can reproduce the bug through such dataset (features including zero variance), will generate wrong result (all coefficients becomes 0)
          val multinomialDatasetWithZeroVar = {
            val nPoints = 100
            val coefficients = Array(
              -0.57997, 0.912083, -0.371077,
              -0.16624, -0.84355, -0.048509)
            val xMean = Array(5.843, 3.0)
            val xVariance = Array(0.6856, 0.0)  // including zero variance
            val testData = generateMultinomialLogisticInput(
              coefficients, xMean, xVariance, addIntercept = true, nPoints, seed)
            val df = sc.parallelize(testData, 4).toDF().withColumn("weight", lit(1.0))
      ## How was this patch tested?
      testcase added.
      Author: WeichenXu <>
      Closes #18896 from WeichenXu123/fix_mlor_stdvalue_zero_bug.
  10. Aug 21, 2017
    • Yanbo Liang's avatar
      [SPARK-19762][ML][FOLLOWUP] Add necessary comments to L2Regularization. · c108a5d3
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      MLlib ```LinearRegression/LogisticRegression/LinearSVC``` always standardize the data during training to improve the rate of convergence regardless of _standardization_ is true or false. If _standardization_ is false, we perform reverse standardization by penalizing each component differently to get effectively the same objective function when the training dataset is not standardized. We should keep these comments in the code to let developers understand how we handle it correctly.
      ## How was this patch tested?
      Existing tests, only adding some comments in code.
      Author: Yanbo Liang <>
      Closes #18992 from yanboliang/SPARK-19762.
    • Nick Pentreath's avatar
      [SPARK-21468][PYSPARK][ML] Python API for FeatureHasher · 988b84d7
      Nick Pentreath authored
      Add Python API for `FeatureHasher` transformer.
      ## How was this patch tested?
      New doc test.
      Author: Nick Pentreath <>
      Closes #18970 from MLnick/SPARK-21468-pyspark-hasher.
  11. Aug 20, 2017
  12. Aug 16, 2017
    • Peng Meng's avatar
      [SPARK-21680][ML][MLLIB] optimize Vector compress · a0345cbe
      Peng Meng authored
      ## What changes were proposed in this pull request?
      When use Vector.compressed to change a Vector to SparseVector, the performance is very low comparing with Vector.toSparse.
      This is because you have to scan the value three times using Vector.compressed, but you just need two times when use Vector.toSparse.
      When the length of the vector is large, there is significant performance difference between this two method.
      ## How was this patch tested?
      The existing UT
      Author: Peng Meng <>
      Closes #18899 from mpjlu/optVectorCompress.
    • Nick Pentreath's avatar
      [SPARK-13969][ML] Add FeatureHasher transformer · 0bb8d1f3
      Nick Pentreath authored
      This PR adds a `FeatureHasher` transformer, modeled on [scikit-learn]( and [Vowpal wabbit](
      The transformer operates on multiple input columns in one pass. Current behavior is:
      * for numerical columns, the values are assumed to be real values and the feature index is `hash(columnName)` while feature value is `feature_value`
      * for string columns, the values are assumed to be categorical and the feature index is `hash(column_name=feature_value)`, while feature value is `1.0`
      * For hash collisions, feature values will be summed
      * `null` (missing) values are ignored
      The following dataframe illustrates the basic semantics:
      |int|double|float|stringNum|string|features                                 |
      |3  |4.0   |5.0  |1        |foo   |(16,[0,8,11,12,15],[5.0,3.0,1.0,4.0,1.0])|
      |6  |7.0   |8.0  |2        |bar   |(16,[0,8,11,12,15],[8.0,6.0,1.0,7.0,1.0])|
      ## How was this patch tested?
      New unit tests and manual experiments.
      Author: Nick Pentreath <>
      Closes #18513 from MLnick/FeatureHasher.
    • Jan Vrsovsky's avatar
      [SPARK-21723][ML] Fix writing LibSVM (key not found: numFeatures) · 8321c141
      Jan Vrsovsky authored
      ## What changes were proposed in this pull request?
      Check the option "numFeatures" only when reading LibSVM, not when writing. When writing, Spark was raising an exception. After the change it will ignore the option completely. liancheng HyukjinKwon
      (Maybe the usage should be forbidden when writing, in a major version change?).
      ## How was this patch tested?
      Manual test, that loading and writing LibSVM files work fine, both with and without the numFeatures option.
      Please review before opening a pull request.
      Author: Jan Vrsovsky <>
      Closes #18872 from ProtD/master.
  13. Aug 15, 2017
    • WeichenXu's avatar
      [SPARK-19634][ML] Multivariate summarizer - dataframes API · 07549b20
      WeichenXu authored
      ## What changes were proposed in this pull request?
      This patch adds the DataFrames API to the multivariate summarizer (mean, variance, etc.). In addition to all the features of MultivariateOnlineSummarizer, it also allows the user to select a subset of the metrics.
      ## How was this patch tested?
      Testcases added.
      ## Performance
      Resolve several performance issues in #17419, further optimization pending on SQL team's work. One of the SQL layer performance issue related to these feature has been resolved in #18712, thanks liancheng and cloud-fan
      ### Performance data
      (test on my laptop, use 2 partitions. tries out = 20, warm up = 10)
      The unit of test results is records/milliseconds (higher is better)
      Vector size/records number | 1/10000000 | 10/1000000 | 100/1000000 | 1000/100000 | 10000/10000
      Dataframe | 15149  | 7441 | 2118 | 224 | 21
      RDD from Dataframe | 4992  | 4440 | 2328 | 320 | 33
      raw RDD | 53931  | 20683 | 3966 | 528 | 53
      Author: WeichenXu <>
      Closes #18798 from WeichenXu123/SPARK-19634-dataframe-summarizer.
    • Marcelo Vanzin's avatar
      [SPARK-21731][BUILD] Upgrade scalastyle to 0.9. · 3f958a99
      Marcelo Vanzin authored
      This version fixes a few issues in the import order checker; it provides
      better error messages, and detects more improper ordering (thus the need
      to change a lot of files in this patch). The main fix is that it correctly
      complains about the order of packages vs. classes.
      As part of the above, I moved some "SparkSession" import in ML examples
      inside the "$example on$" blocks; that didn't seem consistent across
      different source files to start with, and avoids having to add more on/off blocks
      around specific imports.
      The new scalastyle also seems to have a better header detector, so a few
      license headers had to be updated to match the expected indentation.
      Author: Marcelo Vanzin <>
      Closes #18943 from vanzin/SPARK-21731.
  14. Aug 10, 2017
    • Peng Meng's avatar
      [SPARK-21638][ML] Fix RF/GBT Warning message error · ca695585
      Peng Meng authored
      ## What changes were proposed in this pull request?
      When train RF model, there are many warning messages like this:
      > WARN  RandomForest: Tree learning is using approximately 268492800 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 2622 nodes in this iteration.
      This warning message is unnecessary and the data is not accurate.
      Actually, if all the nodes cannot split in one iteration, it will show this warning. For most of the case, all the nodes cannot split just in one iteration, so for most of the case, it will show this warning for each iteration.
      ## How was this patch tested?
      The existing UT
      Author: Peng Meng <>
      Closes #18868 from mpjlu/fixRFwarning.
  15. Aug 09, 2017
  16. Aug 07, 2017
    • Ajay Saini's avatar
      [SPARK-21542][ML][PYTHON] Python persistence helper functions · fdcee028
      Ajay Saini authored
      ## What changes were proposed in this pull request?
      Added DefaultParamsWriteable, DefaultParamsReadable, DefaultParamsWriter, and DefaultParamsReader to Python to support Python-only persistence of Json-serializable parameters.
      ## How was this patch tested?
      Instantiated an estimator with Json-serializable parameters (ex. LogisticRegression), saved it using the added helper functions, and loaded it back, and compared it to the original instance to make sure it is the same. This test was both done in the Python REPL and implemented in the unit tests.
      Note to reviewers: there are a few excess comments that I left in the code for clarity but will remove before the code is merged to master.
      Author: Ajay Saini <>
      Closes #18742 from ajaysaini725/PythonPersistenceHelperFunctions.
    • Peng Meng's avatar
      [SPARK-21623][ML] fix RF doc · 1426eea8
      Peng Meng authored
      ## What changes were proposed in this pull request?
      comments of parentStats in RF are wrong.
      parentStats is not only used for the first iteration, it is used with all the iteration for unordered features.
      ## How was this patch tested?
      Author: Peng Meng <>
      Closes #18832 from mpjlu/fixRFDoc.
  17. Aug 06, 2017
  18. Aug 01, 2017
  19. Jul 31, 2017
    • wangmiao1981's avatar
      [SPARK-21381][SPARKR] SparkR: pass on setHandleInvalid for classification algorithms · 9570e81a
      wangmiao1981 authored
      ## What changes were proposed in this pull request?
      SPARK-20307 Added handleInvalid option to RFormula for tree-based classification algorithms. We should add this parameter for other classification algorithms in SparkR.
      This is a followup PR for SPARK-20307.
      ## How was this patch tested?
      New Unit tests are added.
      Author: wangmiao1981 <>
      Closes #18605 from wangmiao1981/class.
  20. Jul 27, 2017
    • Yan Facai (颜发才)'s avatar
      [SPARK-21306][ML] OneVsRest should support setWeightCol · a5a31899
      Yan Facai (颜发才) authored
      ## What changes were proposed in this pull request?
      add `setWeightCol` method for OneVsRest.
      `weightCol` is ignored if classifier doesn't inherit HasWeightCol trait.
      ## How was this patch tested?
      + [x] add an unit test.
      Author: Yan Facai (颜发才) <>
      Closes #18554 from facaiy/BUG/oneVsRest_missing_weightCol.
    • actuaryzhang's avatar
      [SPARK-19270][ML] Add summary table to GLM summary · ddcd2e82
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      Add R-like summary table to GLM summary, which includes feature name (if exist), parameter estimate, standard error, t-stat and p-value. This allows scala users to easily gather these commonly used inference results.
      srowen yanboliang  felixcheung
      ## How was this patch tested?
      New tests. One for testing feature Name, and one for testing the summary Table.
      Author: actuaryzhang <>
      Author: Wayne Zhang <>
      Author: Yanbo Liang <>
      Closes #16630 from actuaryzhang/glmTable.
  21. Jul 26, 2017
    • sethah's avatar
      [SPARK-20988][ML] Logistic regression uses aggregator hierarchy · cf29828d
      sethah authored
      ## What changes were proposed in this pull request?
      This change pulls the `LogisticAggregator` class out of LogisticRegression.scala and makes it extend `DifferentiableLossAggregator`. It also changes logistic regression to use the generic `RDDLossFunction` instead of having its own.
      Other minor changes:
      * L2Regularization accepts `Option[Int => Double]` for features standard deviation
      * L2Regularization uses `Vector` type instead of Array
      * Some tests added to LeastSquaresAggregator
      ## How was this patch tested?
      Unit test suites are added.
      Author: sethah <>
      Closes #18305 from sethah/SPARK-20988.
    • Yuhao Yang's avatar
      [SPARK-21524][ML] unit test fix: ValidatorParamsSuiteHelpers generates wrong temp files · ae4ea5fe
      Yuhao Yang authored
      ## What changes were proposed in this pull request?
      ValidatorParamsSuiteHelpers.testFileMove() is generating temp dir in the wrong place and does not delete them.
      ValidatorParamsSuiteHelpers.testFileMove() is invoked by TrainValidationSplitSuite and crossValidatorSuite. Currently it uses `tempDir` from `TempDirectory`, which unfortunately is never initialized since the `boforeAll()` of `ValidatorParamsSuiteHelpers` is never invoked.
      In my system, it leaves some temp directories in the assembly folder each time I run the TrainValidationSplitSuite and crossValidatorSuite.
      ## How was this patch tested?
      unit test fix
      Author: Yuhao Yang <>
      Closes #18728 from hhbyyh/tempDirFix.
  22. Jul 20, 2017
    • Yanbo Liang's avatar
      [MINOR][ML] Reorg RFormula params. · 5d1850d4
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      There are mainly two reasons for this reorg:
      * Some params are placed in ```RFormulaBase```, while others are placed in ```RFormula```, this is disordered.
      * ```RFormulaModel``` should have params ```handleInvalid```, ```formula``` and ```forceIndexLabel```, that users can get invalid values handling policy, formula or whether to force index label if they only have a ```RFormulaModel```. So we need move these params to ```RFormulaBase``` which is also inherited by ```RFormulaModel```.
      * ```RFormulaModel``` should support set different ```handleInvalid``` when cross validation.
      ## How was this patch tested?
      Existing tests.
      Author: Yanbo Liang <>
      Closes #18681 from yanboliang/rformula-reorg.
  23. Jul 18, 2017
    • Sean Owen's avatar
      [SPARK-15526][ML][FOLLOWUP] Make JPMML provided scope to avoid including... · d3f4a211
      Sean Owen authored
      [SPARK-15526][ML][FOLLOWUP] Make JPMML provided scope to avoid including unshaded JARs, and repromote to compile in MLlib
      Following the comment at -- this change actually needed a little more work to be complete.
      This also marks JPMML as `provided` to make sure its JARs aren't included in the `jars` output, but then scopes to `compile` in `mllib`. This is how Guava is handled.
      Checked result in `assembly/target/scala-2.11/jars` to verify there are no JPMML jars. Maven and SBT builds still work.
      Author: Sean Owen <>
      Closes #18637 from srowen/SPARK-15526.2.
    • Sean Owen's avatar
      [SPARK-21415] Triage scapegoat warnings, part 1 · e26dac5f
      Sean Owen authored
      ## What changes were proposed in this pull request?
      Address scapegoat warnings for:
      - BigDecimal double constructor
      - Catching NPE
      - Finalizer without super
      - List.size is O(n)
      - Prefer Seq.empty
      - Prefer Set.empty
      - instead of reverseMap
      - Type shadowing
      - Unnecessary if condition.
      - Use .log1p
      - Var could be val
      In some instances like Seq.empty, I avoided making the change even where valid in test code to keep the scope of the change smaller. Those issues are concerned with performance and it won't matter for tests.
      ## How was this patch tested?
      Existing tests
      Author: Sean Owen <>
      Closes #18635 from srowen/Scapegoat1.
  24. Jul 17, 2017
    • Ajay Saini's avatar
      [SPARK-21221][ML] CrossValidator and TrainValidationSplit Persist Nested... · 7047f49f
      Ajay Saini authored
      [SPARK-21221][ML] CrossValidator and TrainValidationSplit Persist Nested Estimators such as OneVsRest
      ## What changes were proposed in this pull request?
      Added functionality for CrossValidator and TrainValidationSplit to persist nested estimators such as OneVsRest. Also added CrossValidator and TrainValidation split persistence to pyspark.
      ## How was this patch tested?
      Performed both cross validation and train validation split with a one vs. rest estimator and tested read/write functionality of the estimator parameter maps required by these meta-algorithms.
      Author: Ajay Saini <>
      Closes #18428 from ajaysaini725/MetaAlgorithmPersistNestedEstimators.
  25. Jul 15, 2017
    • Yanbo Liang's avatar
      [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should handle invalid for both... · 69e5282d
      Yanbo Liang authored
      [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should handle invalid for both features and label column.
      ## What changes were proposed in this pull request?
      ```RFormula``` should handle invalid for both features and label column.
      #18496 only handle invalid values in features column. This PR add handling invalid values for label column and test cases.
      ## How was this patch tested?
      Add test cases.
      Author: Yanbo Liang <>
      Closes #18613 from yanboliang/spark-20307.
  26. Jul 13, 2017
    • Sean Owen's avatar
      [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 · 425c4ada
      Sean Owen authored
      ## What changes were proposed in this pull request?
      - Remove Scala 2.10 build profiles and support
      - Replace some 2.10 support in scripts with commented placeholders for 2.12 later
      - Remove deprecated API calls from 2.10 support
      - Remove usages of deprecated context bounds where possible
      - Remove Scala 2.10 workarounds like ScalaReflectionLock
      - Other minor Scala warning fixes
      ## How was this patch tested?
      Existing tests
      Author: Sean Owen <>
      Closes #17150 from srowen/SPARK-19810.