Skip to content
Snippets Groups Projects
  1. Nov 24, 2015
  2. Nov 23, 2015
    • Stephen Samuel's avatar
      Updated sql programming guide to include jdbc fetch size · 026ea2ea
      Stephen Samuel authored
      Author: Stephen Samuel <sam@sksamuel.com>
      
      Closes #9377 from sksamuel/master.
      026ea2ea
    • Bryan Cutler's avatar
      [SPARK-10560][PYSPARK][MLLIB][DOCS] Make StreamingLogisticRegressionWithSGD... · 10574564
      Bryan Cutler authored
      [SPARK-10560][PYSPARK][MLLIB][DOCS] Make StreamingLogisticRegressionWithSGD Python API equal to Scala one
      
      This is to bring the API documentation of StreamingLogisticReressionWithSGD and StreamingLinearRegressionWithSGC in line with the Scala versions.
      
      -Fixed the algorithm descriptions
      -Added default values to parameter descriptions
      -Changed StreamingLogisticRegressionWithSGD regParam to default to 0, as in the Scala version
      
      Author: Bryan Cutler <bjcutler@us.ibm.com>
      
      Closes #9141 from BryanCutler/StreamingLogisticRegressionWithSGD-python-api-sync.
      10574564
    • Josh Rosen's avatar
      [SPARK-9866][SQL] Speed up VersionsSuite by using persistent Ivy cache · 9db5f601
      Josh Rosen authored
      This patch attempts to speed up VersionsSuite by storing fetched Hive JARs in an Ivy cache that persists across tests runs. If `SPARK_VERSIONS_SUITE_IVY_PATH` is set, that path will be used for the cache; if it is not set, VersionsSuite will create a temporary Ivy cache which is deleted after the test completes.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #9624 from JoshRosen/SPARK-9866.
      9db5f601
    • Marcelo Vanzin's avatar
      [SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv. · c2467dad
      Marcelo Vanzin authored
      This change abstracts the code that serves jars / files to executors so that
      each RpcEnv can have its own implementation; the akka version uses the existing
      HTTP-based file serving mechanism, while the netty versions uses the new
      stream support added to the network lib, which makes file transfers benefit
      from the easier security configuration of the network library, and should also
      reduce overhead overall.
      
      The change includes a small fix to TransportChannelHandler so that it propagates
      user events to downstream handlers.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9530 from vanzin/SPARK-11140.
      c2467dad
    • Marcelo Vanzin's avatar
      [SPARK-11865][NETWORK] Avoid returning inactive client in TransportClientFactory. · 7cfa4c6b
      Marcelo Vanzin authored
      There's a very narrow race here where it would be possible for the timeout handler
      to close a channel after the client factory verified that the channel was still
      active. This change makes sure the client is marked as being recently in use so
      that the timeout handler does not close it until a new timeout cycle elapses.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9853 from vanzin/SPARK-11865.
      7cfa4c6b
    • Luciano Resende's avatar
      [SPARK-11910][STREAMING][DOCS] Update twitter4j dependency version · 242be7da
      Luciano Resende authored
      Author: Luciano Resende <lresende@apache.org>
      
      Closes #9892 from lresende/SPARK-11910.
      242be7da
    • Davies Liu's avatar
      [SPARK-11836][SQL] udf/cast should not create new SQLContext · 1d912020
      Davies Liu authored
      They should use the existing SQLContext.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9914 from davies/create_udf.
      1d912020
    • Josh Rosen's avatar
      [SPARK-4424] Remove spark.driver.allowMultipleContexts override in tests · 1b6e938b
      Josh Rosen authored
      This patch removes `spark.driver.allowMultipleContexts=true` from our test configuration. The multiple SparkContexts check was originally disabled because certain tests suites in SQL needed to create multiple contexts. As far as I know, this configuration change is no longer necessary, so we should remove it in order to make it easier to find test cleanup bugs.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #9865 from JoshRosen/SPARK-4424.
      1b6e938b
    • Mortada Mehyar's avatar
      [SPARK-11837][EC2] python3 compatibility for launching ec2 m3 instances · f6dcc6e9
      Mortada Mehyar authored
      this currently breaks for python3 because `string` module doesn't have `letters` anymore, instead `ascii_letters` should be used
      
      Author: Mortada Mehyar <mortada.mehyar@gmail.com>
      
      Closes #9797 from mortada/python3_fix.
      f6dcc6e9
    • Yanbo Liang's avatar
      [SPARK-11920][ML][DOC] ML LinearRegression should use correct dataset in... · 98d7ec7d
      Yanbo Liang authored
      [SPARK-11920][ML][DOC] ML LinearRegression should use correct dataset in examples and user guide doc
      
      ML ```LinearRegression``` use ```data/mllib/sample_libsvm_data.txt``` as dataset in examples and user guide doc, but it's actually classification dataset rather than regression dataset. We should use ```data/mllib/sample_linear_regression_data.txt``` instead.
      The deeper causes is that ```LinearRegression``` with "normal" solver can not solve this dataset correctly, may be due to the ill condition and unreasonable label. This issue has been reported at [SPARK-11918](https://issues.apache.org/jira/browse/SPARK-11918).
      It will confuse users if they run the example code but get exception, so we should make this change which can clearly illustrate the usage of ```LinearRegression``` algorithm.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #9905 from yanboliang/spark-11920.
      98d7ec7d
    • Marcelo Vanzin's avatar
      [SPARK-11762][NETWORK] Account for active streams when couting outstanding requests. · 5231cd5a
      Marcelo Vanzin authored
      This way the timeout handling code can correctly close "hung" channels that are
      processing streams.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9747 from vanzin/SPARK-11762.
      5231cd5a
    • jerryshao's avatar
      [SPARK-7173][YARN] Add label expression support for application master · 5fd86e4f
      jerryshao authored
      Add label expression support for AM to restrict it runs on the specific set of nodes. I tested it locally and works fine.
      
      sryza and vanzin please help to review, thanks a lot.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #9800 from jerryshao/SPARK-7173.
      5fd86e4f
    • Wenchen Fan's avatar
      [SPARK-11913][SQL] support typed aggregate with complex buffer schema · 946b4065
      Wenchen Fan authored
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #9898 from cloud-fan/agg.
      946b4065
    • Wenchen Fan's avatar
      [SPARK-11921][SQL] fix `nullable` of encoder schema · f2996e0d
      Wenchen Fan authored
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #9906 from cloud-fan/nullable.
      f2996e0d
    • Wenchen Fan's avatar
      [SPARK-11894][SQL] fix isNull for GetInternalRowField · 1a5baaa6
      Wenchen Fan authored
      We should use `InternalRow.isNullAt` to check if the field is null before calling `InternalRow.getXXX`
      
      Thanks gatorsmile who discovered this bug.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #9904 from cloud-fan/null.
      1a5baaa6
    • Xiu Guo's avatar
      [SPARK-11628][SQL] support column datatype of char(x) to recognize HiveChar · 94ce65df
      Xiu Guo authored
      Can someone review my code to make sure I'm not missing anything? Thanks!
      
      Author: Xiu Guo <xguo27@gmail.com>
      Author: Xiu Guo <guoxi@us.ibm.com>
      
      Closes #9612 from xguo27/SPARK-11628.
      94ce65df
Loading