Skip to content
Snippets Groups Projects
  1. Feb 25, 2015
    • Davies Liu's avatar
      [SPARK-5944] [PySpark] fix version in Python API docs · f3f4c87b
      Davies Liu authored
      use RELEASE_VERSION when building the Python API docs
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4731 from davies/api_version and squashes the following commits:
      
      c9744c9 [Davies Liu] Update create-release.sh
      08cbc3f [Davies Liu] fix python docs
      f3f4c87b
    • Kay Ousterhout's avatar
      [SPARK-5982] Remove incorrect Local Read Time Metric · 838a4803
      Kay Ousterhout authored
      This metric is incomplete, because the files are memory mapped, so much of the read from disk occurs later as tasks actually read the file's data.
      
      This should be merged into 1.3, so that we never expose this incorrect metric to users.
      
      CC pwendell ksakellis sryza
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #4749 from kayousterhout/SPARK-5982 and squashes the following commits:
      
      9737b5e [Kay Ousterhout] More fixes
      a1eb300 [Kay Ousterhout] Removed one more use of local read time
      cf13497 [Kay Ousterhout] [SPARK-5982] Remove incorrectwq Local Read Time Metric
      838a4803
    • Brennon York's avatar
      [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing · 9f603fce
      Brennon York authored
      Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or `leftJoin`ed and have different partition sizes they fail under the `zipPartitions` method. This fix tests whether the partitions are equal or not and, if not, will repartition the other to match the partition size of the calling VertexRDD.
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #4705 from brennonyork/SPARK-1955 and squashes the following commits:
      
      0882590 [Brennon York] updated to properly handle differently-partitioned vertexRDDs
      9f603fce
    • Milan Straka's avatar
      [SPARK-5970][core] Register directory created in getOrCreateLocalRootDirs for automatic deletion. · a777c65d
      Milan Straka authored
      As documented in createDirectory, the result of createDirectory is not registered for automatic removal. Currently there are 4 directories left in `/tmp` after just running `pyspark`.
      
      Author: Milan Straka <fox@ucw.cz>
      
      Closes #4759 from foxik/remove-tmp-dirs and squashes the following commits:
      
      280450d [Milan Straka] Use createTempDir in getOrCreateLocalRootDirs...
      a777c65d
    • Sean Owen's avatar
      SPARK-5930 [DOCS] Documented default of spark.shuffle.io.retryWait is confusing · 7d8e6a2e
      Sean Owen authored
      Clarify default max wait in spark.shuffle.io.retryWait docs
      
      CC andrewor14
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4769 from srowen/SPARK-5930 and squashes the following commits:
      
      ae2792b [Sean Owen] Clarify default max wait in spark.shuffle.io.retryWait docs
      7d8e6a2e
    • Michael Armbrust's avatar
      [SPARK-5996][SQL] Fix specialized outbound conversions · f84c799e
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4757 from marmbrus/udtConversions and squashes the following commits:
      
      3714aad [Michael Armbrust] [SPARK-5996][SQL] Fix specialized outbound conversions
      f84c799e
    • guliangliang's avatar
      [SPARK-5771] Number of Cores in Completed Applications of Standalone Master... · dd077abf
      guliangliang authored
      [SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
      
      In Standalone mode, the number of cores in Completed Applications of the Master Web Page will always be zero, if sc.stop() is called.
      But the number will always be right, if sc.stop() is not called.
      The reason maybe:
      after sc.stop() is called, the function removeExecutor of class ApplicationInfo will be called, thus reduce the variable coresGranted to zero. The variable coresGranted is used to display the number of Cores on the Web Page.
      
      Author: guliangliang <guliangliang@qiyi.com>
      
      Closes #4567 from marsishandsome/Spark5771 and squashes the following commits:
      
      694796e [guliangliang] remove duplicate code
      a20e390 [guliangliang] change to Cores Using & Requested
      0c19c95 [guliangliang] change Cores to Cores (max)
      cfbd97d [guliangliang] [SPARK-5771] Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called
      dd077abf
    • Benedikt Linse's avatar
      [GraphX] fixing 3 typos in the graphx programming guide · 5b8480e0
      Benedikt Linse authored
      Corrected 3 Typos in the GraphX programming guide. I hope this is the correct way to contribute.
      
      Author: Benedikt Linse <benedikt.linse@gmail.com>
      
      Closes #4766 from 1123/master and squashes the following commits:
      
      8a63812 [Benedikt Linse] fixing 3 typos in the graphx programming guide
      5b8480e0
    • prabs's avatar
      [SPARK-5666][streaming][MQTT streaming] some trivial fixes · d51ed263
      prabs authored
      modified to adhere to accepted coding standards as pointed by tdas in PR #3844
      
      Author: prabs <prabsmails@gmail.com>
      Author: Prabeesh K <prabsmails@gmail.com>
      
      Closes #4178 from prabeesh/master and squashes the following commits:
      
      bd2cb49 [Prabeesh K] adress the comment
      ccc0765 [prabs] adress the comment
      46f9619 [prabs] adress the comment
      c035bdc [prabs] adress the comment
      22dd7f7 [prabs] address the comments
      0cc67bd [prabs] adress the comment
      838c38e [prabs] adress the comment
      cd57029 [prabs] address the comments
      66919a3 [Prabeesh K] changed MqttDefaultFilePersistence to MemoryPersistence
      5857989 [prabs] modified to adhere to accepted coding standards
      d51ed263
  2. Feb 24, 2015
    • Davies Liu's avatar
      [SPARK-5994] [SQL] Python DataFrame documentation fixes · d641fbb3
      Davies Liu authored
      select empty should NOT be the same as select. make sure selectExpr is behaving the same.
      join param documentation
      link to source doesn't work in jekyll generated file
      cross reference of columns (i.e. enabling linking)
      show(): move df example before df.show()
      move tests in SQLContext out of docstring otherwise doc is too long
      Column.desc and .asc doesn't have any documentation
      in documentation, sort functions.*)
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4756 from davies/df_docs and squashes the following commits:
      
      f30502c [Davies Liu] fix doc
      32f0d46 [Davies Liu] fix DataFrame docs
      d641fbb3
    • Yin Huai's avatar
      [SPARK-5286][SQL] SPARK-5286 followup · 769e092b
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-5286
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4755 from yhuai/SPARK-5286-throwable and squashes the following commits:
      
      4c0c450 [Yin Huai] Catch Throwable instead of Exception.
      769e092b
    • Tathagata Das's avatar
      [SPARK-5993][Streaming][Build] Fix assembly jar location of kafka-assembly · 922b43b3
      Tathagata Das authored
      Published Kafka-assembly JAR was empty in 1.3.0-RC1
      This is because the maven build generated two Jars-
      1. an empty JAR file (since kafka-assembly has no code of its own)
      2. a assembly JAR file containing everything in a different location as 1
      The maven publishing plugin uploaded 1 and not 2.
      Instead if 2 is not configure to generate in a different location, there is only 1 jar containing everything, which gets published.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #4753 from tdas/SPARK-5993 and squashes the following commits:
      
      c390db8 [Tathagata Das] Fix assembly jar location of kafka-assembly
      922b43b3
    • Reynold Xin's avatar
      [SPARK-5985][SQL] DataFrame sortBy -> orderBy in Python. · fba11c2f
      Reynold Xin authored
      Also added desc/asc function for constructing sorting expressions more conveniently. And added a small fix to lift alias out of cast expression.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4752 from rxin/SPARK-5985 and squashes the following commits:
      
      aeda5ae [Reynold Xin] Added Experimental flag to ColumnName.
      047ad03 [Reynold Xin] Lift alias out of cast.
      c9cf17c [Reynold Xin] [SPARK-5985][SQL] DataFrame sortBy -> orderBy in Python.
      fba11c2f
    • Reynold Xin's avatar
      [SPARK-5904][SQL] DataFrame Java API test suites. · 53a1ebf3
      Reynold Xin authored
      Added a new test suite to make sure Java DF programs can use varargs properly.
      Also moved all suites into test.org.apache.spark package to make sure the suites also test for method visibility.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4751 from rxin/df-tests and squashes the following commits:
      
      1e8b8e4 [Reynold Xin] Fixed imports and renamed JavaAPISuite.
      a6ca53b [Reynold Xin] [SPARK-5904][SQL] DataFrame Java API test suites.
      53a1ebf3
    • Cheng Lian's avatar
      [SPARK-5751] [SQL] [WIP] Revamped HiveThriftServer2Suite for robustness · f816e739
      Cheng Lian authored
      **NOTICE** Do NOT merge this, as we're waiting for #3881 to be merged.
      
      `HiveThriftServer2Suite` has been notorious for its flakiness for a while. This was mostly due to spawning and communicate with external server processes. This PR revamps this test suite for better robustness:
      
      1. Fixes a racing condition occurred while using `tail -f` to check log file
      
         It's possible that the line we are looking for has already been printed into the log file before we start the `tail -f` process. This PR uses `tail -n +0 -f` to ensure all lines are checked.
      
      2. Retries up to 3 times if the server fails to start
      
         In most of the cases, the server fails to start because of port conflict. This PR no longer asks the system to choose an available TCP port, but uses a random port first, and retries up to 3 times if the server fails to start.
      
      3. A server instance is reused among all test cases within a single suite
      
         The original `HiveThriftServer2Suite` is splitted into two test suites, `HiveThriftBinaryServerSuite` and `HiveThriftHttpServerSuite`. Each suite starts a `HiveThriftServer2` instance and reuses it for all of its test cases.
      
      **TODO**
      
      - [ ] Starts the Thrift server in foreground once #3881 is merged (adding `--foreground` flag to `spark-daemon.sh`)
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4720)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4720 from liancheng/revamp-thrift-server-tests and squashes the following commits:
      
      d6c80eb [Cheng Lian] Relaxes server startup timeout
      6f14eb1 [Cheng Lian] Revamped HiveThriftServer2Suite for robustness
      f816e739
    • MechCoder's avatar
      [SPARK-5436] [MLlib] Validate GradientBoostedTrees using runWithValidation · 2a0fe348
      MechCoder authored
      One can early stop if the decrease in error rate is lesser than a certain tol or if the error increases if the training data is overfit.
      
      This introduces a new method runWithValidation which takes in a pair of RDD's , one for the training data and the other for the validation.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4677 from MechCoder/spark-5436 and squashes the following commits:
      
      1bb21d4 [MechCoder] Combine regression and classification tests into a single one
      e4d799b [MechCoder] Addresses indentation and doc comments
      b48a70f [MechCoder] COSMIT
      b928a19 [MechCoder] Move validation while training section under usage tips
      fad9b6e [MechCoder] Made the following changes 1. Add section to documentation 2. Return corresponding to bestValidationError 3. Allow negative tolerance.
      55e5c3b [MechCoder] One liner for prevValidateError
      3e74372 [MechCoder] TST: Add test for classification
      77549a9 [MechCoder] [SPARK-5436] Validate GradientBoostedTrees using runWithValidation
      2a0fe348
    • Davies Liu's avatar
      [SPARK-5973] [PySpark] fix zip with two RDDs with AutoBatchedSerializer · da505e59
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4745 from davies/fix_zip and squashes the following commits:
      
      2124b2c [Davies Liu] Update tests.py
      b5c828f [Davies Liu] increase the number of records
      c1e40fd [Davies Liu] fix zip with two RDDs with AutoBatchedSerializer
      da505e59
    • Michael Armbrust's avatar
      [SPARK-5952][SQL] Lock when using hive metastore client · a2b91379
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4746 from marmbrus/hiveLock and squashes the following commits:
      
      8b871cf [Michael Armbrust] [SPARK-5952][SQL] Lock when using hive metastore client
      a2b91379
    • Judy's avatar
      [Spark-5708] Add Slf4jSink to Spark Metrics · c5ba975e
      Judy authored
      Add Slf4jSink to Spark Metrics using Coda Hale's SlfjReporter.
      This sends metrics to log4j, allowing spark users to reuse log4j pipeline for metrics collection.
      
      Reviewed existing unit tests and didn't see any sink-related tests. Please advise on if tests should be added.
      
      Author: Judy <judynash@microsoft.com>
      Author: judynash <judynash@microsoft.com>
      
      Closes #4644 from judynash/master and squashes the following commits:
      
      57ef214 [judynash] doc clarification and indent fixes
      a751a66 [Judy] Spark-5708: Add Slf4jSink to Spark Metrics
      c5ba975e
    • Xiangrui Meng's avatar
      [MLLIB] Change x_i to y_i in Variance's user guide · 105791e3
      Xiangrui Meng authored
      Variance is calculated on labels/responses.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4740 from mengxr/patch-1 and squashes the following commits:
      
      673317b [Xiangrui Meng] [MLLIB] Change x_i to y_i in Variance's user guide
      105791e3
    • Andrew Or's avatar
      [SPARK-5965] Standalone Worker UI displays {{USER_JAR}} · 6d2caa57
      Andrew Or authored
      For screenshot see: https://issues.apache.org/jira/browse/SPARK-5965
      This was caused by 20a60131.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #4739 from andrewor14/user-jar-blocker and squashes the following commits:
      
      23c4a9e [Andrew Or] Use right argument
      6d2caa57
    • Tathagata Das's avatar
      [Spark-5967] [UI] Correctly clean JobProgressListener.stageIdToActiveJobIds · 64d2c01f
      Tathagata Das authored
      Patch should be self-explanatory
      pwendell JoshRosen
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #4741 from tdas/SPARK-5967 and squashes the following commits:
      
      653b5bb [Tathagata Das] Fixed the fix and added test
      e2de972 [Tathagata Das] Clear stages which have no corresponding active jobs.
      64d2c01f
    • Michael Armbrust's avatar
      [SPARK-5532][SQL] Repartition should not use external rdd representation · 20123662
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4738 from marmbrus/udtRepart and squashes the following commits:
      
      c06d7b5 [Michael Armbrust] fix compilation
      91c8829 [Michael Armbrust] [SQL][SPARK-5532] Repartition should not use external rdd representation
      20123662
    • Michael Armbrust's avatar
      [SPARK-5910][SQL] Support for as in selectExpr · 0a59e45e
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4736 from marmbrus/asExprs and squashes the following commits:
      
      5ba97e4 [Michael Armbrust] [SPARK-5910][SQL] Support for as in selectExpr
      0a59e45e
    • Cheng Lian's avatar
      [SPARK-5968] [SQL] Suppresses ParquetOutputCommitter WARN logs · 84033313
      Cheng Lian authored
      Please refer to the [JIRA ticket] [1] for the motivation.
      
      [1]: https://issues.apache.org/jira/browse/SPARK-5968
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4744)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4744 from liancheng/spark-5968 and squashes the following commits:
      
      caac6a8 [Cheng Lian] Suppresses ParquetOutputCommitter WARN logs
      84033313
    • Xiangrui Meng's avatar
      [SPARK-5958][MLLIB][DOC] update block matrix user guide · cf2e4165
      Xiangrui Meng authored
      * Removed SVD code from examples.
      * Corrected Java API doc link.
      * Updated variable names: `AtransposeA` -> `ata`.
      * Minor changes.
      
      brkyvz
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4737 from mengxr/update-block-matrix-user-guide and squashes the following commits:
      
      70f53ac [Xiangrui Meng] update block matrix user guide
      cf2e4165
  3. Feb 23, 2015
    • Michael Armbrust's avatar
      [SPARK-5873][SQL] Allow viewing of partially analyzed plans in queryExecution · 1ed57086
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4684 from marmbrus/explainAnalysis and squashes the following commits:
      
      afbaa19 [Michael Armbrust] fix python
      d93278c [Michael Armbrust] fix hive
      e5fa0a4 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explainAnalysis
      52119f2 [Michael Armbrust] more tests
      82a5431 [Michael Armbrust] fix tests
      25753d2 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explainAnalysis
      aee1e6a [Michael Armbrust] fix hive
      b23a844 [Michael Armbrust] newline
      de8dc51 [Michael Armbrust] more comments
      acf620a [Michael Armbrust] [SPARK-5873][SQL] Show partially analyzed plans in query execution
      1ed57086
    • Yin Huai's avatar
      [SPARK-5935][SQL] Accept MapType in the schema provided to a JSON dataset. · 48376bfe
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5935
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #4710 from yhuai/jsonMapType and squashes the following commits:
      
      3e40390 [Yin Huai] Remove unnecessary changes.
      f8e6267 [Yin Huai] Fix test.
      baa36e3 [Yin Huai] Accept MapType in the schema provided to jsonFile/jsonRDD.
      48376bfe
    • Joseph K. Bradley's avatar
      [SPARK-5912] [docs] [mllib] Small fixes to ChiSqSelector docs · 59536cc8
      Joseph K. Bradley authored
      Fixes:
      * typo in Scala example
      * Removed comment "usually applied on sparse data" since that is debatable
      * small edits to text for clarity
      
      CC: avulanov  I noticed a typo post-hoc and ended up making a few small edits.  Do the changes look OK?
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4732 from jkbradley/chisqselector-docs and squashes the following commits:
      
      9656a3b [Joseph K. Bradley] added Java example for ChiSqSelector to guide
      3f3f9f4 [Joseph K. Bradley] small fixes to ChiSqSelector docs
      59536cc8
    • Alexander Ulanov's avatar
      [MLLIB] SPARK-5912 Programming guide for feature selection · 28ccf5ee
      Alexander Ulanov authored
      Added description of ChiSqSelector and few words about feature selection in general. I could add a code example, however it would not look reasonable in the absence of feature discretizer or a dataset in the `data` folder that has redundant features.
      
      Author: Alexander Ulanov <nashb@yandex.ru>
      
      Closes #4709 from avulanov/SPARK-5912 and squashes the following commits:
      
      19a8a4e [Alexander Ulanov] Addressing reviewers comments @jkbradley
      58d9e4d [Alexander Ulanov] Addressing reviewers comments @jkbradley
      eb6b9fe [Alexander Ulanov] Typo
      2921a1d [Alexander Ulanov] ChiSqSelector example of use
      c845350 [Alexander Ulanov] ChiSqSelector docs
      28ccf5ee
    • Jacky Li's avatar
      [SPARK-5939][MLLib] make FPGrowth example app take parameters · 651a1c01
      Jacky Li authored
      Add parameter parsing in FPGrowth example app in Scala and Java
      And a sample data file is added in data/mllib folder
      
      Author: Jacky Li <jacky.likun@huawei.com>
      
      Closes #4714 from jackylk/parameter and squashes the following commits:
      
      8c478b3 [Jacky Li] fix according to comments
      3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters
      f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters
      651a1c01
    • CodingCat's avatar
      [SPARK-5724] fix the misconfiguration in AkkaUtils · 242d4958
      CodingCat authored
      https://issues.apache.org/jira/browse/SPARK-5724
      
      In AkkaUtil, we set several failure detector related the parameters as following
      
      ```
      al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String])
            .withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString(
            s"""
            |akka.daemonic = on
            |akka.loggers = [""akka.event.slf4j.Slf4jLogger""]
            |akka.stdout-loglevel = "ERROR"
            |akka.jvm-exit-on-fatal-error = off
            |akka.remote.require-cookie = "$requireCookie"
            |akka.remote.secure-cookie = "$secureCookie"
            |akka.remote.transport-failure-detector.heartbeat-interval = $akkaHeartBeatInterval s
            |akka.remote.transport-failure-detector.acceptable-heartbeat-pause = $akkaHeartBeatPauses s
            |akka.remote.transport-failure-detector.threshold = $akkaFailureDetector
            |akka.actor.provider = "akka.remote.RemoteActorRefProvider"
            |akka.remote.netty.tcp.transport-class = "akka.remote.transport.netty.NettyTransport"
            |akka.remote.netty.tcp.hostname = "$host"
            |akka.remote.netty.tcp.port = $port
            |akka.remote.netty.tcp.tcp-nodelay = on
            |akka.remote.netty.tcp.connection-timeout = $akkaTimeout s
            |akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B
            |akka.remote.netty.tcp.execution-pool-size = $akkaThreads
            |akka.actor.default-dispatcher.throughput = $akkaBatchSize
            |akka.log-config-on-start = $logAkkaConfig
            |akka.remote.log-remote-lifecycle-events = $lifecycleEvents
            |akka.log-dead-letters = $lifecycleEvents
            |akka.log-dead-letters-during-shutdown = $lifecycleEvents
            """.stripMargin))
      
      ```
      
      Actually, we do not have any parameter naming "akka.remote.transport-failure-detector.threshold"
      see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html
      what we have is "akka.remote.watch-failure-detector.threshold"
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #4512 from CodingCat/SPARK-5724 and squashes the following commits:
      
      bafe56e [CodingCat] fix the grammar in configuration doc
      338296e [CodingCat] remove failure-detector related info
      8bfcfd4 [CodingCat] fix the misconfiguration in AkkaUtils
      242d4958
    • Saisai Shao's avatar
      [SPARK-5943][Streaming] Update the test to use new API to reduce the warning · 757b14b8
      Saisai Shao authored
      Author: Saisai Shao <saisai.shao@intel.com>
      
      Closes #4722 from jerryshao/SPARK-5943 and squashes the following commits:
      
      1b01233 [Saisai Shao] Update the test to use new API to reduce the warning
      757b14b8
    • Makoto Fukuhara's avatar
      [EXAMPLES] fix typo. · 93487674
      Makoto Fukuhara authored
      Author: Makoto Fukuhara <fukuo33@gmail.com>
      
      Closes #4724 from fukuo33/fix-typo and squashes the following commits:
      
      8c806b9 [Makoto Fukuhara] fix typo.
      93487674
    • Ilya Ganelin's avatar
      [SPARK-3885] Provide mechanism to remove accumulators once they are no longer used · 95cd643a
      Ilya Ganelin authored
      Instead of storing a strong reference to accumulators, I've replaced this with a weak reference and updated any code that uses these accumulators to check whether the reference resolves before using the accumulator. A weak reference will be cleared when there is no longer an existing copy of the variable versus using a soft reference in which case accumulators would only be cleared when the GC explicitly ran out of memory.
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #4021 from ilganeli/SPARK-3885 and squashes the following commits:
      
      4ba9575 [Ilya Ganelin]  Fixed error in test suite
      8510943 [Ilya Ganelin] Extra code
      bb76ef0 [Ilya Ganelin] File deleted somehow
      283a333 [Ilya Ganelin] Added cleanup method for accumulators to remove stale references within Accumulators.original to accumulators that are now out of scope
      345fd4f [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885
      7485a82 [Ilya Ganelin] Fixed build error
      c8e0f2b [Ilya Ganelin] Added working test for accumulator garbage collection
      94ce754 [Ilya Ganelin] Still not being properly garbage collected
      8722b63 [Ilya Ganelin] Fixing gc test
      7414a9c [Ilya Ganelin] Added test for accumulator garbage collection
      18d62ec [Ilya Ganelin] Updated to throw Exception when accessing a GCd accumulator
      9a81928 [Ilya Ganelin] Reverting permissions changes
      28f705c [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885
      b820ab4b [Ilya Ganelin] reset
      d78f4bf [Ilya Ganelin] Removed obsolete comment
      0746e61 [Ilya Ganelin] Updated DAGSchedulerSUite to fix bug
      3350852 [Ilya Ganelin] Updated DAGScheduler and Suite to correctly use new implementation of WeakRef Accumulator storage
      c49066a [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885
      cbb9023 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885
      a77d11b [Ilya Ganelin] Updated Accumulators class to store weak references instead of strong references to allow garbage collection of old accumulators
      95cd643a
    • Aaron Josephs's avatar
      [SPARK-911] allow efficient queries for a range if RDD is partitioned wi... · e4f9d03d
      Aaron Josephs authored
      ...th RangePartitioner
      
      Author: Aaron Josephs <ajoseph4@binghamton.edu>
      
      Closes #1381 from aaronjosephs/PLAT-911 and squashes the following commits:
      
      e30ade5 [Aaron Josephs] [SPARK-911] allow efficient queries for a range if RDD is partitioned with RangePartitioner
      e4f9d03d
  4. Feb 22, 2015
  5. Feb 21, 2015
    • Evan Yu's avatar
      [SPARK-5860][CORE] JdbcRDD: overflow on large range with high number of partitions · 7683982f
      Evan Yu authored
      Fix a overflow bug in JdbcRDD when calculating partitions for large BIGINT ids
      
      Author: Evan Yu <ehotou@gmail.com>
      
      Closes #4701 from hotou/SPARK-5860 and squashes the following commits:
      
      9e038d1 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level
      7883ad9 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level
      c88755a [Evan Yu] [SPARK-5860][CORE] switch to BigInt instead of BigDecimal
      4e9ff4f [Evan Yu] [SPARK-5860][CORE] JdbcRDD overflow on large range with high number of partitions
      7683982f
Loading