Skip to content
Snippets Groups Projects
  1. Feb 18, 2015
    • Sean Owen's avatar
      SPARK-5669 [BUILD] [HOTFIX] Spark assembly includes incompatibly licensed... · 5aecdcf1
      Sean Owen authored
      SPARK-5669 [BUILD] [HOTFIX] Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS
      
      Correct exclusion path for JBLAS native libs.
      (More explanation coming soon on the mailing list re: 1.3.0 RC1)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4673 from srowen/SPARK-5669.2 and squashes the following commits:
      
      e29693c [Sean Owen] Correct exclusion path for JBLAS native libs
      5aecdcf1
    • Kousuke Saruta's avatar
      [SPARK-4949]shutdownCallback in SparkDeploySchedulerBackend should be enclosed... · 82197ed3
      Kousuke Saruta authored
      [SPARK-4949]shutdownCallback in SparkDeploySchedulerBackend should be enclosed by synchronized block.
      
      A variable `shutdownCallback` in SparkDeploySchedulerBackend can be accessed from multiple threads so it should be enclosed by synchronized block.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3781 from sarutak/SPARK-4949 and squashes the following commits:
      
      c146c93 [Kousuke Saruta] Removed "setShutdownCallback" method
      c7265dc [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4949
      42ca528 [Kousuke Saruta] Changed the declaration of the variable "shutdownCallback" as a volatile reference instead of AtomicReference
      552df7c [Kousuke Saruta] Changed the declaration of the variable "shutdownCallback" as a volatile reference instead of AtomicReference
      f556819 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4949
      1b60fd1 [Kousuke Saruta] Improved the locking logics
      5942765 [Kousuke Saruta] Enclosed shutdownCallback in SparkDeploySchedulerBackend by synchronized block
      82197ed3
    • MechCoder's avatar
      SPARK-4610 addendum: [Minor] [MLlib] Minor doc fix in GBT classification example · e79a7a62
      MechCoder authored
      numClassesForClassification has been renamed to numClasses.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4672 from MechCoder/minor-doc and squashes the following commits:
      
      d2ddb7f [MechCoder] Minor doc fix in GBT classification example
      e79a7a62
    • Davies Liu's avatar
      [SPARK-5878] fix DataFrame.repartition() in Python · c1b6fa98
      Davies Liu authored
      Also add tests for distinct()
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4667 from davies/repartition and squashes the following commits:
      
      79059fd [Davies Liu] add test
      cb4915e [Davies Liu] fix repartition
      c1b6fa98
    • Tor Myklebust's avatar
      Avoid deprecation warnings in JDBCSuite. · de0dd6de
      Tor Myklebust authored
      This pull request replaces calls to deprecated methods from `java.util.Date` with near-equivalents in `java.util.Calendar`.
      
      Author: Tor Myklebust <tmyklebu@gmail.com>
      
      Closes #4668 from tmyklebu/master and squashes the following commits:
      
      66215b1 [Tor Myklebust] Use GregorianCalendar instead of Timestamp get methods.
      de0dd6de
    • Cheng Lian's avatar
      [Minor] [SQL] Cleans up DataFrame variable names and toDF() calls · 61ab0854
      Cheng Lian authored
      Although we've migrated to the DataFrame API, lots of code still uses `rdd` or `srdd` as local variable names. This PR tries to address these naming inconsistencies and some other minor DataFrame related style issues.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4670)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4670 from liancheng/df-cleanup and squashes the following commits:
      
      3e14448 [Cheng Lian] Cleans up DataFrame variable names and toDF() calls
      61ab0854
    • Tathagata Das's avatar
      [SPARK-5731][Streaming][Test] Fix incorrect test in DirectKafkaStreamSuite · 3912d332
      Tathagata Das authored
      The test was incorrect. Instead of counting the number of records, it counted the number of partitions of RDD generated by DStream. Which is not its intention. I will be testing this patch multiple times to understand its flakiness.
      
      PS: This was caused by my refactoring in https://github.com/apache/spark/pull/4384/
      
      koeninger check it out.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #4597 from tdas/kafka-flaky-test and squashes the following commits:
      
      d236235 [Tathagata Das] Unignored last test.
      e9a1820 [Tathagata Das] fix test
      3912d332
  2. Feb 17, 2015
    • Yin Huai's avatar
      [SPARK-5723][SQL]Change the default file format to Parquet for CTAS statements. · e50934f1
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5723
      
      Author: Yin Huai <yhuai@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #4639 from yhuai/defaultCTASFileFormat and squashes the following commits:
      
      a568137 [Yin Huai] Merge remote-tracking branch 'upstream/master' into defaultCTASFileFormat
      ad2b07d [Yin Huai] Update tests and error messages.
      8af5b2a [Yin Huai] Update conf key and unit test.
      5a67903 [Yin Huai] Use data source write path for Hive's CTAS statements when no storage format/handler is specified.
      e50934f1
    • Yin Huai's avatar
      [SPARK-5875][SQL]logical.Project should not be resolved if it contains aggregates or generators · d5f12bfe
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-5875 has a case to reproduce the bug and explain the root cause.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4663 from yhuai/projectResolved and squashes the following commits:
      
      472f7b6 [Yin Huai] If a logical.Project has any AggregateExpression or Generator, it's resolved field should be false.
      d5f12bfe
    • Josh Rosen's avatar
      [SPARK-4454] Revert getOrElse() cleanup in DAGScheduler.getCacheLocs() · a51fc7ef
      Josh Rosen authored
      This method is performance-sensitive and this change wasn't necessary.
      a51fc7ef
    • Josh Rosen's avatar
      [SPARK-4454] Properly synchronize accesses to DAGScheduler cacheLocs map · d46d6246
      Josh Rosen authored
      This patch addresses a race condition in DAGScheduler by properly synchronizing accesses to its `cacheLocs` map.
      
      This map is accessed by the `getCacheLocs` and `clearCacheLocs()` methods, which can be called by separate threads, since DAGScheduler's `getPreferredLocs()` method is called by SparkContext and indirectly calls `getCacheLocs()`.  If this map is cleared by the DAGScheduler event processing thread while a user thread is submitting a job and computing preferred locations, then this can cause the user thread to throw "NoSuchElementException: key not found" errors.
      
      Most accesses to DAGScheduler's internal state do not need synchronization because that state is only accessed from the event processing loop's thread.  An alternative approach to fixing this bug would be to refactor this code so that SparkContext sends the DAGScheduler a message in order to get the list of preferred locations.  However, this would involve more extensive changes to this code and would be significantly harder to backport to maintenance branches since some of the related code has undergone significant refactoring (e.g. the introduction of EventLoop).  Since `cacheLocs` is the only state that's accessed in this way, adding simple synchronization seems like a better short-term fix.
      
      See #3345 for additional context.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4660 from JoshRosen/SPARK-4454 and squashes the following commits:
      
      12d64ba [Josh Rosen] Properly synchronize accesses to DAGScheduler cacheLocs map.
      d46d6246
    • Burak Yavuz's avatar
      [SPARK-5811] Added documentation for maven coordinates and added Spark Packages support · ae6cfb3a
      Burak Yavuz authored
      Documentation for maven coordinates + Spark Package support. Added pyspark tests for `--packages`
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4662 from brkyvz/SPARK-5811 and squashes the following commits:
      
      56ccccd [Burak Yavuz] fixed broken test
      64cb8ee [Burak Yavuz] passed pep8 on local
      c07b81e [Burak Yavuz] fixed pep8
      a8bd6b7 [Burak Yavuz] submit PR
      4ef4046 [Burak Yavuz] ready for PR
      8fb02e5 [Burak Yavuz] merged master
      25c9b9f [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into python-jar
      560d13b [Burak Yavuz] before PR
      17d3f76 [Davies Liu] support .jar as python package
      a3eb717 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5811
      c60156d [Burak Yavuz] [SPARK-5811] Added documentation for maven coordinates
      ae6cfb3a
    • Davies Liu's avatar
      [SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark · c3d2b90b
      Davies Liu authored
      Currently, PySpark does not support narrow dependency during cogroup/join when the two RDDs have the partitioner, another unnecessary shuffle stage will come in.
      
      The Python implementation of cogroup/join is different than Scala one, it depends on union() and partitionBy(). This patch will try to use PartitionerAwareUnionRDD() in union(), when all the RDDs have the same partitioner. It also fix `reservePartitioner` in all the map() or mapPartitions(), then partitionBy() can skip the unnecessary shuffle stage.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4629 from davies/narrow and squashes the following commits:
      
      dffe34e [Davies Liu] improve test, check number of stages for join/cogroup
      1ed3ba2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into narrow
      4d29932 [Davies Liu] address comment
      cc28d97 [Davies Liu] add unit tests
      940245e [Davies Liu] address comments
      ff5a0a6 [Davies Liu] skip the partitionBy() on Python side
      eb26c62 [Davies Liu] narrow dependency in PySpark
      c3d2b90b
    • Yin Huai's avatar
      [SPARK-5852][SQL]Fail to convert a newly created empty metastore parquet table... · 117121a4
      Yin Huai authored
      [SPARK-5852][SQL]Fail to convert a newly created empty metastore parquet table to a data source parquet table.
      
      The problem is that after we create an empty hive metastore parquet table (e.g. `CREATE TABLE test (a int) STORED AS PARQUET`), Hive will create an empty dir for us, which cause our data source `ParquetRelation2` fail to get the schema of the table. See JIRA for the case to reproduce the bug and the exception.
      
      This PR is based on #4562 from chenghao-intel.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-5852
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4655 from yhuai/CTASParquet and squashes the following commits:
      
      b8b3450 [Yin Huai] Update tests.
      2ac94f7 [Yin Huai] Update tests.
      3db3d20 [Yin Huai] Minor update.
      d7e2308 [Yin Huai] Revert changes in HiveMetastoreCatalog.scala.
      36978d1 [Cheng Hao] Update the code as feedback
      a04930b [Cheng Hao] fix bug of scan an empty parquet based table
      442ffe0 [Cheng Hao] passdown the schema for Parquet File in HiveContext
      117121a4
    • Davies Liu's avatar
      [SPARK-5872] [SQL] create a sqlCtx in pyspark shell · 4d4cc760
      Davies Liu authored
      The sqlCtx will be HiveContext if hive is built in assembly jar, or SQLContext if not.
      
      It also skip the Hive tests in pyspark.sql.tests if no hive is available.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4659 from davies/sqlctx and squashes the following commits:
      
      0e6629a [Davies Liu] sqlCtx in pyspark
      4d4cc760
    • Davies Liu's avatar
      [SPARK-5871] output explain in Python · 3df85dcc
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4658 from davies/explain and squashes the following commits:
      
      db87ea2 [Davies Liu] output explain in Python
      3df85dcc
    • Davies Liu's avatar
      [SPARK-4172] [PySpark] Progress API in Python · 445a755b
      Davies Liu authored
      This patch bring the pull based progress API into Python, also a example in Python.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3027 from davies/progress_api and squashes the following commits:
      
      b1ba984 [Davies Liu] fix style
      d3b9253 [Davies Liu] add tests, mute the exception after stop
      4297327 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      969fa9d [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      25590c9 [Davies Liu] update with Java API
      360de2d [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      c0f1021 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      023afb3 [Davies Liu] add Python API and example for progress API
      445a755b
    • Michael Armbrust's avatar
      [SPARK-5868][SQL] Fix python UDFs in HiveContext and checks in SQLContext · de4836f8
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4657 from marmbrus/pythonUdfs and squashes the following commits:
      
      a7823a8 [Michael Armbrust] [SPARK-5868][SQL] Fix python UDFs in HiveContext and checks in SQLContext
      de4836f8
    • Cheng Hao's avatar
      [SQL] [Minor] Update the HiveContext Unittest · 9d281fa5
      Cheng Hao authored
      In unit test, the table src(key INT, value STRING) is not the same as HIVE src(key STRING, value STRING)
      https://github.com/apache/hive/blob/branch-0.13/data/scripts/q_test_init.sql
      
      And in the reflect.q, test failed for expression `reflect("java.lang.Integer", "valueOf", key, 16)`, which expect the argument `key` as STRING not INT.
      
      This PR doesn't aim to change the `src` schema, we can do that after 1.3 released, however, we probably need to re-generate all the golden files.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4584 from chenghao-intel/reflect and squashes the following commits:
      
      e5bdc3a [Cheng Hao] Move the test case reflect into blacklist
      184abfd [Cheng Hao] revert the change to table src1
      d9bcf92 [Cheng Hao] Update the HiveContext Unittest
      9d281fa5
    • Liang-Chi Hsieh's avatar
      [Minor][SQL] Use same function to check path parameter in JSONRelation · ac506b7c
      Liang-Chi Hsieh authored
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4649 from viirya/use_checkpath and squashes the following commits:
      
      0f9a1a1 [Liang-Chi Hsieh] Use same function to check path parameter.
      ac506b7c
    • Liang-Chi Hsieh's avatar
      [SPARK-5862][SQL] Only transformUp the given plan once in HiveMetastoreCatalog · 4611de1c
      Liang-Chi Hsieh authored
      Current `ParquetConversions` in `HiveMetastoreCatalog` will transformUp the given plan multiple times if there are many Metastore Parquet tables. Since the transformUp operation is recursive, it should be better to only perform it once.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4651 from viirya/parquet_atonce and squashes the following commits:
      
      c1ed29d [Liang-Chi Hsieh] Fix bug.
      e0f919b [Liang-Chi Hsieh] Only transformUp the given plan once.
      4611de1c
    • CodingCat's avatar
      [Minor] fix typo in SQL document · 31efb39c
      CodingCat authored
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #4656 from CodingCat/fix_typo and squashes the following commits:
      
      b41d15c [CodingCat] recover
      689fe46 [CodingCat] fix typo
      31efb39c
    • Davies Liu's avatar
      [SPARK-5864] [PySpark] support .jar as python package · fc4eb950
      Davies Liu authored
      A jar file containing Python sources in it could be used as a Python package, just like zip file.
      
      spark-submit already put the jar file into PYTHONPATH, this patch also put it in the sys.path, then it could be used in Python worker.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4652 from davies/jar and squashes the following commits:
      
      17d3f76 [Davies Liu] support .jar as python package
      fc4eb950
    • Sean Owen's avatar
      SPARK-5841 [CORE] [HOTFIX] Memory leak in DiskBlockManager · 49c19fdb
      Sean Owen authored
      Avoid call to remove shutdown hook being called from shutdown hook
      
      CC pwendell JoshRosen MattWhelan
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4648 from srowen/SPARK-5841.2 and squashes the following commits:
      
      51548db [Sean Owen] Avoid call to remove shutdown hook being called from shutdown hook
      49c19fdb
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 24f358b9
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #3297 (close requested by 'andrewor14')
      Closes #3345 (close requested by 'pwendell')
      Closes #2729 (close requested by 'srowen')
      Closes #2320 (close requested by 'pwendell')
      Closes #4529 (close requested by 'andrewor14')
      Closes #2098 (close requested by 'srowen')
      Closes #4120 (close requested by 'andrewor14')
      24f358b9
    • MechCoder's avatar
      [SPARK-3381] [MLlib] Eliminate bins for unordered features in DecisionTrees · 9b746f38
      MechCoder authored
      For unordered features, it is sufficient to use splits since the threshold of the split corresponds the threshold of the HighSplit of the bin and there is no use of the LowSplit.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4231 from MechCoder/spark-3381 and squashes the following commits:
      
      58c19a5 [MechCoder] COSMIT
      c274b74 [MechCoder] Remove unordered feature calculation in labeledPointToTreePoint
      b2b9b89 [MechCoder] COSMIT
      d3ee042 [MechCoder] [SPARK-3381] [MLlib] Eliminate bins for unordered features
      9b746f38
    • xukun 00228947's avatar
      [SPARK-5661]function hasShutdownDeleteTachyonDir should use... · b271c265
      xukun 00228947 authored
      [SPARK-5661]function hasShutdownDeleteTachyonDir should use shutdownDeleteTachyonPaths to determine whether contains file
      
      hasShutdownDeleteTachyonDir(file: TachyonFile) should use shutdownDeleteTachyonPaths(not shutdownDeletePaths) to determine Whether contain file. To solve it ,delete two unused function.
      
      Author: xukun 00228947 <xukun.xu@huawei.com>
      Author: viper-kun <xukun.xu@huawei.com>
      
      Closes #4418 from viper-kun/deleteunusedfun and squashes the following commits:
      
      87340eb [viper-kun] fix style
      3d6c69e [xukun 00228947] fix bug
      2bc397e [xukun 00228947] deleteunusedfun
      b271c265
    • Ryan Williams's avatar
      [SPARK-5778] throw if nonexistent metrics config file provided · d8f69cf7
      Ryan Williams authored
      previous behavior was to log an error; this is fine in the general
      case where no `spark.metrics.conf` parameter was specified, in which
      case a default `metrics.properties` is looked for, and the execption
      logged and suppressed if it doesn't exist.
      
      if the user has purposefully specified a metrics.conf file, however,
      it makes more sense to show them an error when said file doesn't
      exist.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #4571 from ryan-williams/metrics and squashes the following commits:
      
      5bccb14 [Ryan Williams] private-ize some MetricsConfig members
      08ff998 [Ryan Williams] rename METRICS_CONF: DEFAULT_METRICS_CONF_FILENAME
      f4d7fab [Ryan Williams] fix tests
      ad24b0e [Ryan Williams] add "metrics.properties" to .rat-excludes
      94e810b [Ryan Williams] throw if nonexistent Sink class is specified
      31d2c30 [Ryan Williams] metrics code review feedback
      56287db [Ryan Williams] throw if nonexistent metrics config file provided
      d8f69cf7
    • Davies Liu's avatar
      [SPARK-5859] [PySpark] [SQL] fix DataFrame Python API · d8adefef
      Davies Liu authored
      1. added explain()
      2. add isLocal()
      3. do not call show() in __repl__
      4. add foreach() and foreachPartition()
      5. add distinct()
      6. fix functions.col()/column()/lit()
      7. fix unit tests in sql/functions.py
      8. fix unicode in showString()
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4645 from davies/df6 and squashes the following commits:
      
      6b46a2c [Davies Liu] fix DataFrame Python API
      d8adefef
    • Michael Armbrust's avatar
      [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation · c74b07fa
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4642 from marmbrus/docs and squashes the following commits:
      
      d291c34 [Michael Armbrust] python tests
      9be66e3 [Michael Armbrust] comments
      d56afc2 [Michael Armbrust] fix style
      f004747 [Michael Armbrust] fix build
      c4a907b [Michael Armbrust] fix tests
      42e2b73 [Michael Armbrust] [SQL] Documentation / API Clean-up.
      c74b07fa
    • Xiangrui Meng's avatar
      [SPARK-5858][MLLIB] Remove unnecessary first() call in GLM · c76da36c
      Xiangrui Meng authored
      `numFeatures` is only used by multinomial logistic regression. Calling `.first()` for every GLM causes performance regression, especially in Python.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4647 from mengxr/SPARK-5858 and squashes the following commits:
      
      036dc7f [Xiangrui Meng] remove unnecessary first() call
      12c5548 [Xiangrui Meng] check numFeatures only once
      c76da36c
    • Patrick Wendell's avatar
      SPARK-5856: In Maven build script, launch Zinc with more memory · 3ce46e94
      Patrick Wendell authored
      I've seen out of memory exceptions when trying
      to run many parallel builds against the same Zinc
      server during packaging. We should use the same
      increased memory settings we use for Maven itself.
      
      I tested this and confirmed that the Nailgun JVM
      launched with higher memory.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4643 from pwendell/zinc-memory and squashes the following commits:
      
      717cfb0 [Patrick Wendell] SPARK-5856: Launch Zinc with larger memory options.
      3ce46e94
    • Josh Rosen's avatar
      Revert "[SPARK-5363] [PySpark] check ending mark in non-block way" · ee6e3eff
      Josh Rosen authored
      This reverts commits ac6fe67e and c06e42f2.
      ee6e3eff
    • jerryshao's avatar
      [SPARK-5826][Streaming] Fix Configuration not serializable problem · a65766bf
      jerryshao authored
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #4612 from jerryshao/SPARK-5826 and squashes the following commits:
      
      7ec71db [jerryshao] Remove transient for conf statement
      88d84e6 [jerryshao] Fix Configuration not serializable problem
      a65766bf
    • Patrick Wendell's avatar
      HOTFIX: Style issue causing build break · c06e42f2
      Patrick Wendell authored
      Caused by #4601
      c06e42f2
    • Xiangrui Meng's avatar
      [SPARK-5802][MLLIB] cache transformed data in glm · fd84229e
      Xiangrui Meng authored
      If we need to transform the input data, we should cache the output to avoid re-computing feature vectors every iteration. dbtsai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4593 from mengxr/SPARK-5802 and squashes the following commits:
      
      ae3be84 [Xiangrui Meng] cache transformed data in glm
      fd84229e
  3. Feb 16, 2015
    • Reynold Xin's avatar
      [SPARK-5853][SQL] Schema support in Row. · d380f324
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4640 from rxin/SPARK-5853 and squashes the following commits:
      
      9c6f569 [Reynold Xin] [SPARK-5853][SQL] Schema support in Row.
      d380f324
    • Patrick Wendell's avatar
      SPARK-5850: Remove experimental label for Scala 2.11 and FlumePollingStream · a51d51ff
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4638 from pwendell/SPARK-5850 and squashes the following commits:
      
      386126f [Patrick Wendell] SPARK-5850: Remove experimental label for Scala 2.11 and FlumePollingStream.
      a51d51ff
    • Davies Liu's avatar
      [SPARK-5363] [PySpark] check ending mark in non-block way · ac6fe67e
      Davies Liu authored
      There is chance of dead lock that the Python process is waiting for ending mark from JVM, but which is eaten by corrupted stream.
      
      This PR checks the ending mark from Python in non-block way, so it will not blocked by Python process.
      
      There is a small chance that the ending mark is sent by Python process but not available right now, then Python process will not be used.
      
      cc JoshRosen pwendell
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4601 from davies/freeze and squashes the following commits:
      
      e15a8c3 [Davies Liu] update logging
      890329c [Davies Liu] Merge branch 'freeze' of github.com:davies/spark into freeze
      2bd2228 [Davies Liu] add more logging
      656d544 [Davies Liu] Update PythonRDD.scala
      05e1085 [Davies Liu] check ending mark in non-block way
      ac6fe67e
    • Reynold Xin's avatar
      [SQL] Various DataFrame doc changes. · 0e180bfc
      Reynold Xin authored
      Added a bunch of tags.
      
      Also changed parquetFile to take varargs rather than a string followed by varargs.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4636 from rxin/df-doc and squashes the following commits:
      
      651f80c [Reynold Xin] Fixed parquetFile in PySpark.
      8dc3024 [Reynold Xin] [SQL] Various DataFrame doc changes.
      0e180bfc
Loading