Skip to content
Snippets Groups Projects
  1. Mar 22, 2015
    • Reynold Xin's avatar
    • Calvin Jia's avatar
      [SPARK-6122][Core] Upgrade Tachyon client version to 0.6.1. · a41b9c60
      Calvin Jia authored
      Changes the Tachyon client version from 0.5 to 0.6 in spark core and distribution script.
      
      New dependencies in Tachyon 0.6.0 include
      
      commons-codec:commons-codec:jar:1.5:compile
      io.netty:netty-all:jar:4.0.23.Final:compile
      
      These are already in spark core.
      
      Author: Calvin Jia <jia.calvin@gmail.com>
      
      Closes #4867 from calvinjia/upgrade_tachyon_0.6.0 and squashes the following commits:
      
      eed9230 [Calvin Jia] Update tachyon version to 0.6.1.
      11907b3 [Calvin Jia] Use TachyonURI for tachyon paths instead of strings.
      71bf441 [Calvin Jia] Upgrade Tachyon client version to 0.6.0.
      a41b9c60
    • Kamil Smuga's avatar
      SPARK-6454 [DOCS] Fix links to pyspark api · 6ef48632
      Kamil Smuga authored
      Author: Kamil Smuga <smugakamil@gmail.com>
      Author: stderr <smugakamil@gmail.com>
      
      Closes #5120 from kamilsmuga/master and squashes the following commits:
      
      fee3281 [Kamil Smuga] more python api links fixed for docs
      13240cb [Kamil Smuga] resolved merge conflicts with upstream/master
      6649b3b [Kamil Smuga] fix broken docs links to Python API
      92f03d7 [stderr] Fix links to pyspark api
      6ef48632
    • Jongyoul Lee's avatar
      [SPARK-6453][Mesos] Some Mesos*Suite have a different package with their classes · adb2ff75
      Jongyoul Lee authored
      - Moved Suites from o.a.s.s.mesos to o.a.s.s.cluster.mesos
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #5126 from jongyoul/SPARK-6453 and squashes the following commits:
      
      4f24a3e [Jongyoul Lee] [SPARK-6453][Mesos] Some Mesos*Suite have a different package with their classes - Fixed imports orders
      8ab149d [Jongyoul Lee] [SPARK-6453][Mesos] Some Mesos*Suite have a different package with their classes - Moved Suites from o.a.s.s.mesos to o.a.s.s.cluster.mesos
      adb2ff75
    • Hangchen Yu's avatar
      [SPARK-6455] [docs] Correct some mistakes and typos · ab4f516f
      Hangchen Yu authored
      Correct some typos. Correct a mistake in lib/PageRank.scala. The first PageRank implementation uses standalone Graph interface, but the second uses Pregel interface. It may mislead the code viewers.
      
      Author: Hangchen Yu <yuhc@gitcafe.com>
      
      Closes #5128 from yuhc/master and squashes the following commits:
      
      53e5432 [Hangchen Yu] Merge branch 'master' of https://github.com/yuhc/spark
      67b77b5 [Hangchen Yu] [SPARK-6455] [docs] Correct some mistakes and typos
      206f2dc [Hangchen Yu] Correct some mistakes and typos.
      ab4f516f
    • Ryan Williams's avatar
      [SPARK-6448] Make history server log parse exceptions · b9fe504b
      Ryan Williams authored
      This helped me to debug a parse error that was due to the event log format changing recently.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #5122 from ryan-williams/histerror and squashes the following commits:
      
      5831656 [Ryan Williams] line length
      c3742ae [Ryan Williams] Make history server log parse exceptions
      b9fe504b
    • ypcat's avatar
      [SPARK-6408] [SQL] Fix JDBCRDD filtering string literals · 9b1e1f20
      ypcat authored
      Author: ypcat <ypcat6@gmail.com>
      Author: Pei-Lun Lee <pllee@appier.com>
      
      Closes #5087 from ypcat/spark-6408 and squashes the following commits:
      
      1becc16 [ypcat] [SPARK-6408] [SQL] styling
      1bc4455 [ypcat] [SPARK-6408] [SQL] move nested function outside
      e57fa4a [ypcat] [SPARK-6408] [SQL] fix test case
      245ab6f [ypcat] [SPARK-6408] [SQL] add test cases for filtering quoted strings
      8962534 [Pei-Lun Lee] [SPARK-6408] [SQL] Fix filtering string literals
      9b1e1f20
  2. Mar 21, 2015
    • Reynold Xin's avatar
      [SPARK-6428][SQL] Added explicit type for all public methods for Hive module · b6090f90
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5108 from rxin/hive-public-type and squashes the following commits:
      
      a320328 [Reynold Xin] [SPARK-6428][SQL] Added explicit type for all public methods for Hive module.
      b6090f90
    • Yin Huai's avatar
      [SPARK-6250][SPARK-6146][SPARK-5911][SQL] Types are now reserved words in DDL parser. · 94a102ac
      Yin Huai authored
      This PR creates a trait `DataTypeParser` used to parse data types. This trait aims to be single place to provide the functionality of parsing data types' string representation. It is currently mixed in with `DDLParser` and `SqlParser`. It is also used to parse the data type for `DataFrame.cast` and to convert Hive metastore's data type string back to a `DataType`.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-6250
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #5078 from yhuai/ddlKeywords and squashes the following commits:
      
      0e66097 [Yin Huai] Special handle struct<>.
      fea6012 [Yin Huai] Style.
      c9733fb [Yin Huai] Create a trait to parse data types.
      94a102ac
    • Venkata Ramana Gollamudi's avatar
      [SPARK-5680][SQL] Sum function on all null values, should return zero · ee569a0c
      Venkata Ramana Gollamudi authored
      SELECT sum('a'), avg('a'), variance('a'), std('a') FROM src;
      Should give output as
      0.0	NULL	NULL	NULL
      This fixes hive udaf_number_format.q
      
      Author: Venkata Ramana G <ramana.gollamudihuawei.com>
      
      Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
      
      Closes #4466 from gvramana/sum_fix and squashes the following commits:
      
      42e14d1 [Venkata Ramana Gollamudi] Added comments
      39415c0 [Venkata Ramana Gollamudi] Handled the partitioned Sum expression scenario
      df66515 [Venkata Ramana Gollamudi] code style fix
      4be2606 [Venkata Ramana Gollamudi] Add udaf_number_format to whitelist and golden answer
      330fd64 [Venkata Ramana Gollamudi] fix sum function for all null data
      ee569a0c
    • x1-'s avatar
      [SPARK-5320][SQL]Add statistics method at NoRelation (override super). · 52dd4b2b
      x1- authored
      Because of no statistics override, in spute of super class say 'LeafNode must override'.
      fix issue
      
      [SPARK-5320: Joins on simple table created using select gives error](https://issues.apache.org/jira/browse/SPARK-5320)
      
      Author: x1- <viva008@gmail.com>
      
      Closes #5105 from x1-/SPARK-5320 and squashes the following commits:
      
      e561aac [x1-] Add statistics method at NoRelation (override super).
      52dd4b2b
  3. Mar 20, 2015
    • Yanbo Liang's avatar
      [SPARK-5821] [SQL] JSON CTAS command should throw error message when delete path failure · e5d2c37c
      Yanbo Liang authored
      When using "CREATE TEMPORARY TABLE AS SELECT" to create JSON table, we first delete the path file or directory and then generate a new directory with the same name. But if only read permission was granted, the delete failed.
      Here we just throwing an error message to let users know what happened.
      ParquetRelation2 may also hit this problem. I think to restrict JSONRelation and ParquetRelation2 must base on directory is more reasonable for access control. Maybe I can do it in follow up works.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      Author: Yanbo Liang <yanbohappy@gmail.com>
      
      Closes #4610 from yanboliang/jsonInsertImprovements and squashes the following commits:
      
      c387fce [Yanbo Liang] fix typos
      42d7fb6 [Yanbo Liang] add unittest & fix output format
      46f0d9d [Yanbo Liang] Update JSONRelation.scala
      e2df8d5 [Yanbo Liang] check path exisit when write
      79f7040 [Yanbo Liang] Update JSONRelation.scala
      e4bc229 [Yanbo Liang] Update JSONRelation.scala
      5a42d83 [Yanbo Liang] JSONRelation CTAS should check if delete is successful
      e5d2c37c
    • Cheng Lian's avatar
      [SPARK-6315] [SQL] Also tries the case class string parser while reading Parquet schema · 937c1e55
      Cheng Lian authored
      When writing Parquet files, Spark 1.1.x persists the schema string into Parquet metadata with the result of `StructType.toString`, which was then deprecated in Spark 1.2 by a schema string in JSON format. But we still need to take the old schema format into account while reading Parquet files.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5034)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #5034 from liancheng/spark-6315 and squashes the following commits:
      
      a182f58 [Cheng Lian] Adds a regression test
      b9c6dbe [Cheng Lian] Also tries the case class string parser while reading Parquet schema
      937c1e55
    • Yanbo Liang's avatar
      [SPARK-5821] [SQL] ParquetRelation2 CTAS should check if delete is successful · bc37c974
      Yanbo Liang authored
      Do the same check as #4610 for ParquetRelation2.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #5107 from yanboliang/spark-5821-parquet and squashes the following commits:
      
      7092c8d [Yanbo Liang] ParquetRelation2 CTAS should check if delete is successful
      bc37c974
    • MechCoder's avatar
      [SPARK-6025] [MLlib] Add helper method evaluateEachIteration to extract learning curve · 25e271d9
      MechCoder authored
      Added evaluateEachIteration to allow the user to manually extract the error for each iteration of GradientBoosting. The internal optimisation can be dealt with later.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4906 from MechCoder/spark-6025 and squashes the following commits:
      
      67146ab [MechCoder] Minor
      352001f [MechCoder] Minor
      6e8aa10 [MechCoder] Made the following changes Used mapPartition instead of map Refactored computeError and unpersisted broadcast variables
      bc99ac6 [MechCoder] Refactor the method and stuff
      dbda033 [MechCoder] [SPARK-6025] Add helper method evaluateEachIteration to extract learning curve
      25e271d9
    • Reynold Xin's avatar
      [SPARK-6428][SQL] Added explicit type for all public methods in sql/core · a95043b1
      Reynold Xin authored
      Also implemented equals/hashCode when they are missing.
      
      This is done in order to enable automatic public method type checking.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5104 from rxin/sql-hashcode-explicittype and squashes the following commits:
      
      ffce6f3 [Reynold Xin] Code review feedback.
      8b36733 [Reynold Xin] [SPARK-6428][SQL] Added explicit type for all public methods.
      a95043b1
    • lewuathe's avatar
      [SPARK-6421][MLLIB] _regression_train_wrapper does not test initialWeights correctly · 257cde7c
      lewuathe authored
      Weight parameters must be initialized correctly even when numpy array is passed as initial weights.
      
      Author: lewuathe <lewuathe@me.com>
      
      Closes #5101 from Lewuathe/SPARK-6421 and squashes the following commits:
      
      7795201 [lewuathe] Fix lint-python errors
      21d4fe3 [lewuathe] Fix init logic of weights
      257cde7c
    • MechCoder's avatar
      [SPARK-6309] [SQL] [MLlib] Implement MatrixUDT · 11e02595
      MechCoder authored
      Utilities to serialize and deserialize Matrices in MLlib
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #5048 from MechCoder/spark-6309 and squashes the following commits:
      
      05dc6f2 [MechCoder] Hashcode and organize imports
      16d5d47 [MechCoder] Test some more
      6e67020 [MechCoder] TST: Test using Array conversion instead of equals
      7fa7a2c [MechCoder] [SPARK-6309] [SQL] [MLlib] Implement MatrixUDT
      11e02595
    • Jongyoul Lee's avatar
      [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set · 49a01c7e
      Jongyoul Lee authored
      - Fixed calculateTotalMemory to use spark.mesos.executor.memoryOverhead
      - Added testCase
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #5099 from jongyoul/SPARK-6423 and squashes the following commits:
      
      6747fce [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Changed a description of spark.mesos.executor.memoryOverhead
      475a7c8 [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Fit the import rules
      453c5a2 [Jongyoul Lee] [SPARK-6423][Mesos] MemoryUtils should use memoryOverhead if it's set - Fixed calculateTotalMemory to use spark.mesos.executor.memoryOverhead - Added testCase
      49a01c7e
    • Xiangrui Meng's avatar
      [SPARK-5955][MLLIB] add checkpointInterval to ALS · 6b36470c
      Xiangrui Meng authored
      Add checkpiontInterval to ALS to prevent:
      
      1. StackOverflow exceptions caused by long lineage,
      2. large shuffle files generated during iterations,
      3. slow recovery when some node fail.
      
      srowen coderxiang
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #5076 from mengxr/SPARK-5955 and squashes the following commits:
      
      df56791 [Xiangrui Meng] update impl to reuse code
      29affcb [Xiangrui Meng] do not materialize factors in implicit
      20d3f7f [Xiangrui Meng] add checkpointInterval to ALS
      6b36470c
    • Xusen Yin's avatar
      [Spark 6096][MLlib] Add Naive Bayes load save methods in Python · 25636d98
      Xusen Yin authored
      See [SPARK-6096](https://issues.apache.org/jira/browse/SPARK-6096).
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #5090 from yinxusen/SPARK-6096 and squashes the following commits:
      
      bd0fea5 [Xusen Yin] fix style problem, etc.
      3fd41f2 [Xusen Yin] use hanging indent in Python style
      e83803d [Xusen Yin] fix Python style
      d6dbde5 [Xusen Yin] fix python call java error
      a054bb3 [Xusen Yin] add save load for NaiveBayes python
      25636d98
    • Shuo Xiang's avatar
      [MLlib] SPARK-5954: Top by key · 5e6ad24f
      Shuo Xiang authored
      This PR implements two functions
        - `topByKey(num: Int): RDD[(K, Array[V])]` finds the top-k values for each key in a pair RDD. This can be used, e.g., in computing top recommendations.
      
      - `takeOrderedByKey(num: Int): RDD[(K, Array[V])] ` does the opposite of `topByKey`
      
      The `sorted` is used here as the `toArray` method of the PriorityQueue does not return a necessarily sorted array.
      
      Author: Shuo Xiang <shuoxiangpub@gmail.com>
      
      Closes #5075 from coderxiang/topByKey and squashes the following commits:
      
      1611c37 [Shuo Xiang] code clean up
      6f565c0 [Shuo Xiang] naming
      a80e0ec [Shuo Xiang] typo and warning
      82dded9 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into topByKey
      d202745 [Shuo Xiang] move to MLPairRDDFunctions
      901b0af [Shuo Xiang] style check
      70c6e35 [Shuo Xiang] remove takeOrderedByKey, update doc and test
      0895c17 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into topByKey
      b10e325 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into topByKey
      debccad [Shuo Xiang] topByKey
      5e6ad24f
    • Yanbo Liang's avatar
      [SPARK-6095] [MLLIB] Support model save/load in Python's linear models · 48866f78
      Yanbo Liang authored
      For Python's linear models, weights and intercept are stored in Python.
      This PR implements Python's linear models sava/load functions which do the same thing as scala.
      It can also make model import/export cross languages.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #5016 from yanboliang/spark-6095 and squashes the following commits:
      
      d9bb824 [Yanbo Liang] fix python style
      b3813ca [Yanbo Liang] linear model save/load for Python reuse the Scala implementation
      48866f78
    • Marcelo Vanzin's avatar
      [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT. · a7456459
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5056 from vanzin/SPARK-6371 and squashes the following commits:
      
      63220df [Marcelo Vanzin] Merge branch 'master' into SPARK-6371
      6506f75 [Marcelo Vanzin] Use more fine-grained exclusion.
      178ba71 [Marcelo Vanzin] Oops.
      75b2375 [Marcelo Vanzin] Exclude VertexRDD in MiMA.
      a45a62c [Marcelo Vanzin] Work around MIMA warning.
      1d8a670 [Marcelo Vanzin] Re-group jetty exclusion.
      0e8e909 [Marcelo Vanzin] Ignore ml, don't ignore graphx.
      cef4603 [Marcelo Vanzin] Indentation.
      296cf82 [Marcelo Vanzin] [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.
      a7456459
    • WangTaoTheTonic's avatar
      [SPARK-6426][Doc]User could also point the yarn cluster config directory via YARN_CONF_DI... · 385b2ff1
      WangTaoTheTonic authored
      ...R
      
      https://issues.apache.org/jira/browse/SPARK-6426
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #5103 from WangTaoTheTonic/SPARK-6426 and squashes the following commits:
      
      e6dd78d [WangTaoTheTonic] User could also point the yarn cluster config directory via YARN_CONF_DIR
      385b2ff1
    • mbonaci's avatar
      [SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample · 28bcb9e9
      mbonaci authored
      The docs for the `sample` method were insufficient, now less so.
      
      Author: mbonaci <mbonaci@gmail.com>
      
      Closes #5097 from mbonaci/master and squashes the following commits:
      
      a6a9d97 [mbonaci] [SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample method
      28bcb9e9
    • Reynold Xin's avatar
      [SPARK-6428][MLlib] Added explicit type for public methods and implemented... · db4d317c
      Reynold Xin authored
      [SPARK-6428][MLlib] Added explicit type for public methods and implemented hashCode when equals is defined.
      
      I want to add a checker to turn public type checking on, since future pull requests can accidentally expose a non-public type. This is the first cleanup task.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #5102 from rxin/mllib-hashcode-publicmethodtypes and squashes the following commits:
      
      617f19e [Reynold Xin] Fixed Scala compilation error.
      52bc2d5 [Reynold Xin] [MLlib] Added explicit type for public methods and implemented hashCode when equals is defined.
      db4d317c
    • Sean Owen's avatar
      SPARK-6338 [CORE] Use standard temp dir mechanisms in tests to avoid orphaned temp files · 6f80c3e8
      Sean Owen authored
      Use `Utils.createTempDir()` to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5029 from srowen/SPARK-6338 and squashes the following commits:
      
      27b740a [Sean Owen] Fix hive-thriftserver tests that don't expect an existing dir
      4a212fa [Sean Owen] Standardize a bit more temp dir management
      9004081 [Sean Owen] Revert some added recursive-delete calls
      57609e4 [Sean Owen] Use Utils.createTempDir() to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
      6f80c3e8
    • Sean Owen's avatar
      SPARK-5134 [BUILD] Bump default Hadoop version to 2+ · d08e3eb3
      Sean Owen authored
      Bump default Hadoop version to 2.2.0. (This is already the dependency version reported by published Maven artifacts.) See JIRA for further discussion.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #5027 from srowen/SPARK-5134 and squashes the following commits:
      
      acbee14 [Sean Owen] Bump default Hadoop version to 2.2.0. (This is already the dependency version reported by published Maven artifacts.)
      d08e3eb3
    • Jongyoul Lee's avatar
      [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR · 116c553f
      Jongyoul Lee authored
      - Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend
      - This is related #5000
      
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #5088 from jongyoul/SPARK-6286-1 and squashes the following commits:
      
      4f2362f [Jongyoul Lee] [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR - Fixed scalastyle
      ac4336a [Jongyoul Lee] [SPARK-6286][Mesos][minor] Handle missing Mesos case TASK_ERROR - Made TaskState.isFailed for handling TASK_LOST and TASK_ERROR and synchronizing CoarseMesosSchedulerBackend and MesosSchedulerBackend
      116c553f
  4. Mar 19, 2015
    • Reynold Xin's avatar
      Tighten up field/method visibility in Executor and made some code more clear to read. · 0745a305
      Reynold Xin authored
      I was reading Executor just now and found that some latest changes introduced some weird code path with too much monadic chaining and unnecessary fields. I cleaned it up a bit, and also tightened up the visibility of various fields/methods. Also added some inline documentation to help understand this code better.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4850 from rxin/executor and squashes the following commits:
      
      866fc60 [Reynold Xin] Code review feedback.
      020efbb [Reynold Xin] Tighten up field/method visibility in Executor and made some code more clear to read.
      0745a305
    • Nicholas Chammas's avatar
      [SPARK-6219] [Build] Check that Python code compiles · f17d43b0
      Nicholas Chammas authored
      This PR expands the Python lint checks so that they check for obvious compilation errors in our Python code.
      
      For example:
      
      ```
      $ ./dev/lint-python
      Python lint checks failed.
      Compiling ./ec2/spark_ec2.py ...
        File "./ec2/spark_ec2.py", line 618
          return (master_nodes,, slave_nodes)
                               ^
      SyntaxError: invalid syntax
      
      ./ec2/spark_ec2.py:618:25: E231 missing whitespace after ','
      ./ec2/spark_ec2.py:1117:101: E501 line too long (102 > 100 characters)
      ```
      
      This PR also bumps up the version of `pep8`. It ignores new types of checks introduced by that version bump while fixing problems missed by the older version of `pep8` we were using.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      
      Closes #4941 from nchammas/compile-spark-ec2 and squashes the following commits:
      
      75e31d8 [Nicholas Chammas] upgrade pep8 + check compile
      b33651c [Nicholas Chammas] PEP8 line length
      f17d43b0
    • Wenchen Fan's avatar
      [Core][minor] remove unused `visitedStages` in `DAGScheduler.stageDependsOn` · 3b5aaa6a
      Wenchen Fan authored
      We define and update `visitedStages` in `DAGScheduler.stageDependsOn`, but never read it. So we can safely remove it.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #5086 from cloud-fan/minor and squashes the following commits:
      
      24663ea [Wenchen Fan] remove un-used variable
      3b5aaa6a
    • Brennon York's avatar
      [SPARK-5313][Project Infra]: Create simple framework for highlighting changes introduced in a PR · 8cb23a1f
      Brennon York authored
      Built a simple framework with a `dev/tests` directory to house all pull request related tests. I've moved the two original tests (`pr_merge_ability` and `pr_public_classes`) into the new `dev/tests` directory and tested to the best of my ability. At this point I need to test against Jenkins actually running the new `run-tests-jenkins` script to ensure things aren't broken down the path.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #5072 from brennonyork/SPARK-5313 and squashes the following commits:
      
      8ae990c [Brennon York] added dev/run-tests back, removed echo
      5db4ed4 [Brennon York] removed the git checkout
      1b50050 [Brennon York] adding echos to see what jenkins is seeing
      b823959 [Brennon York] removed run-tests to further test the public_classes pr test
      2b9ce12 [Brennon York] added the dev/run-tests call back in
      ffd49c0 [Brennon York] remove -c from bash as that was removing the trailing args
      735d615 [Brennon York] removed the actual dev/run-tests command to further test jenkins
      d579662 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-5313
      aa48029 [Brennon York] removed echo lines for testing jenkins
      24cd965 [Brennon York] added test output to check within jenkins to verify
      3a38e73 [Brennon York] removed the temporary read
      9c881ff [Brennon York] updated test suite
      183b7ee [Brennon York] added documentation on how to create tests
      0bc2efe [Brennon York] ensure each test starts on the current pr branch
      1743378 [Brennon York] added tests in test suite
      abd7430 [Brennon York] updated to include test suite
      8cb23a1f
    • Yanbo Liang's avatar
      [SPARK-6291] [MLLIB] GLM toString & toDebugString · dda4dedc
      Yanbo Liang authored
      GLM toString prints out intercept, numFeatures.
      For LogisticRegression and SVM model, toString also prints out numClasses, threshold.
      GLM toDebugString prints out the whole weights, intercept.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #5038 from yanboliang/spark-6291 and squashes the following commits:
      
      2f578b0 [Yanbo Liang] code format
      78b33f2 [Yanbo Liang] fix typos
      1e8a023 [Yanbo Liang] GLM toString & toDebugString
      dda4dedc
    • mcheah's avatar
      [SPARK-5843] [API] Allowing map-side combine to be specified in Java. · 3c4e486b
      mcheah authored
      Specifically, when calling JavaPairRDD.combineByKey(), there is a new
      six-parameter method that exposes the map-side-combine boolean as the
      fifth parameter and the serializer as the sixth parameter.
      
      Author: mcheah <mcheah@palantir.com>
      
      Closes #4634 from mccheah/pair-rdd-map-side-combine and squashes the following commits:
      
      5c58319 [mcheah] Fixing compiler errors.
      3ce7deb [mcheah] Addressing style and documentation comments.
      7455c7a [mcheah] Allowing Java combineByKey to specify Serializer as well.
      6ddd729 [mcheah] [SPARK-5843] Allowing map-side combine to be specified in Java.
      3c4e486b
    • Pierre Borckmans's avatar
      [SPARK-6402][DOC] - Remove some refererences to shark in docs and ec2 · 797f8a00
      Pierre Borckmans authored
      EC2 script and job scheduling documentation still refered to Shark.
      I removed these references.
      
      I also removed a remaining `SHARK_VERSION` variable from `ec2-variables.sh`.
      
      Author: Pierre Borckmans <pierre.borckmans@realimpactanalytics.com>
      
      Closes #5083 from pierre-borckmans/remove_refererences_to_shark_in_docs and squashes the following commits:
      
      4e90ffc [Pierre Borckmans] Removed deprecated SHARK_VERSION
      caea407 [Pierre Borckmans] Remove shark reference from ec2 script doc
      196c744 [Pierre Borckmans] Removed references to Shark
      797f8a00
    • CodingCat's avatar
      [SPARK-4012] stop SparkContext when the exception is thrown from an infinite loop · 2c3f83c3
      CodingCat authored
      https://issues.apache.org/jira/browse/SPARK-4012
      
      This patch is a resubmission for https://github.com/apache/spark/pull/2864
      
      What I am proposing in this patch is that ***when the exception is thrown from an infinite loop, we should stop the SparkContext, instead of let JVM throws exception forever***
      
      So, in the infinite loops where we originally wrapped with a ` logUncaughtExceptions`, I changed to `tryOrStopSparkContext`, so that the Spark component is stopped
      
      Early stopped JVM process is helpful for HA scheme design, for example,
      
      The user has a script checking the existence of the pid of the Spark Streaming driver for monitoring the availability; with the code before this patch, the JVM process is still available but not functional when the exceptions are thrown
      
      andrewor14, srowen , mind taking further consideration about the change?
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #5004 from CodingCat/SPARK-4012-1 and squashes the following commits:
      
      589276a [CodingCat] throw fatal error again
      3c72cd8 [CodingCat] address the comments
      6087864 [CodingCat] revise comments
      6ad3eb0 [CodingCat] stop SparkContext instead of quit the JVM process
      6322959 [CodingCat] exit JVM process when the exception is thrown from an infinite loop
      2c3f83c3
    • Tathagata Das's avatar
      [SPARK-6222][Streaming] Dont delete checkpoint data when doing pre-batch-start checkpoint · 645cf3fc
      Tathagata Das authored
      This is another alternative approach to https://github.com/apache/spark/pull/4964/
      I think this is a simpler fix that can be backported easily to other branches (1.2 and 1.3).
      
      All it does it introduce a flag so that the pre-batch-start checkpoint does not call clear checkpoint.
      
      There is not unit test yet. I will add it when this approach is commented upon. Not sure if this is testable easily.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #5008 from tdas/SPARK-6222 and squashes the following commits:
      
      7315bc2 [Tathagata Das] Removed empty line.
      c438de4 [Tathagata Das] Revert unnecessary change.
      5e98374 [Tathagata Das] Added unit test
      50cb60b [Tathagata Das] Fixed style issue
      295ca5c [Tathagata Das] Fixing SPARK-6222
      645cf3fc
  5. Mar 18, 2015
Loading