Skip to content
Snippets Groups Projects
  1. Jul 15, 2014
    • Reynold Xin's avatar
      Update README.md to include a slightly more informative project description. · 8f1d4226
      Reynold Xin authored
      
      (cherry picked from commit 401083be9f010f95110a819a49837ecae7d9c4ec)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      8f1d4226
    • DB Tsai's avatar
      [SPARK-2477][MLlib] Using appendBias for adding intercept in GeneralizedLinearAlgorithm · 52beb20f
      DB Tsai authored
      Instead of using prependOne currently in GeneralizedLinearAlgorithm, we would like to use appendBias for 1) keeping the indices of original training set unchanged by adding the intercept into the last element of vector and 2) using the same public API for consistently adding intercept.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #1410 from dbtsai/SPARK-2477_intercept_with_appendBias and squashes the following commits:
      
      011432c [DB Tsai] From Alpine Data Labs
      52beb20f
    • Reynold Xin's avatar
      [SPARK-2399] Add support for LZ4 compression. · dd95abad
      Reynold Xin authored
      Based on Greg Bowyer's patch from JIRA https://issues.apache.org/jira/browse/SPARK-2399
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1416 from rxin/lz4 and squashes the following commits:
      
      6c8fefe [Reynold Xin] Fixed typo.
      8a14d38 [Reynold Xin] [SPARK-2399] Add support for LZ4 compression.
      dd95abad
    • lianhuiwang's avatar
      discarded exceeded completedDrivers · 7446f5ff
      lianhuiwang authored
      When completedDrivers number exceeds the threshold, the first Max(spark.deploy.retainedDrivers, 1) will be discarded.
      
      Author: lianhuiwang <lianhuiwang09@gmail.com>
      
      Closes #1114 from lianhuiwang/retained-drivers and squashes the following commits:
      
      8789418 [lianhuiwang] discarded exceeded completedDrivers
      7446f5ff
    • Michael Armbrust's avatar
      [SPARK-2485][SQL] Lock usage of hive client. · c7c7ac83
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1412 from marmbrus/lockHiveClient and squashes the following commits:
      
      4bc9d5a [Michael Armbrust] protected[hive]
      22e9177 [Michael Armbrust] Add comments.
      7aa8554 [Michael Armbrust] Don't lock on hive's object.
      a6edc5f [Michael Armbrust] Lock usage of hive client.
      c7c7ac83
    • Kousuke Saruta's avatar
      [SPARK-2390] Files in staging directory cannot be deleted and wastes the space of HDFS · c6d75745
      Kousuke Saruta authored
      When running jobs with YARN Cluster mode and using HistoryServer, the files in the Staging Directory (~/.sparkStaging on HDFS) cannot be deleted.
      HistoryServer uses directory where event log is written, and the directory is represented as a instance of o.a.h.f.FileSystem created by using FileSystem.get.
      
      On the other hand, ApplicationMaster has a instance named fs, which also created by using FileSystem.get.
      
      FileSystem.get returns cached same instance when URI passed to the method represents same file system and the method is called by same user.
      Because of the behavior, when the directory for event log is on HDFS, fs of ApplicationMaster and fileSystem of FileLogger is same instance.
      When shutting down ApplicationMaster, fileSystem.close is called in FileLogger#stop, which is invoked by SparkContext#stop indirectly.
      
      And ApplicationMaster#cleanupStagingDir also called by JVM shutdown hook. In this method, fs.delete(stagingDirPath) is invoked.
      Because fs.delete in ApplicationMaster is called after fileSystem.close in FileLogger, fs.delete fails and results not deleting files in the staging directory.
      
      I think, calling fileSystem.delete is not needed.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #1326 from sarutak/SPARK-2390 and squashes the following commits:
      
      10e1a88 [Kousuke Saruta] Removed fileSystem.close from FileLogger.scala not to prevent any other FileSystem operation
      c6d75745
    • Aaron Davidson's avatar
      Add/increase severity of warning in documentation of groupBy() · a2aa7beb
      Aaron Davidson authored
      groupBy()/groupByKey() is notorious for being a very convenient API that can lead to poor performance when used incorrectly.
      
      This PR just makes it clear that users should be cautious not to rely on this API when they really want a different (more performant) one, such as reduceByKey().
      
      (Note that one source of confusion is the name; this groupBy() is not the same as a SQL GROUP-BY, which is used for aggregation and is more similar in nature to Spark's reduceByKey().)
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #1380 from aarondav/warning and squashes the following commits:
      
      f60da39 [Aaron Davidson] Give better advice
      d0afb68 [Aaron Davidson] Add/increase severity of warning in documentation of groupBy()
      a2aa7beb
    • William Benton's avatar
      SPARK-2486: Utils.getCallSite is now resilient to bogus frames · 1f99fea5
      William Benton authored
      When running Spark under certain instrumenting profilers,
      Utils.getCallSite could crash with an NPE.  This commit
      makes it more resilient to failures occurring while inspecting
      stack frames.
      
      Author: William Benton <willb@redhat.com>
      
      Closes #1413 from willb/spark-2486 and squashes the following commits:
      
      b7c0274 [William Benton] Use explicit null checks instead of Try()
      0f0c1ae [William Benton] Utils.getCallSite is now resilient to bogus frames
      1f99fea5
    • Takuya UESHIN's avatar
      [SPARK-2467] Revert SparkBuild to publish-local to both .m2 and .ivy2. · e2255e4b
      Takuya UESHIN authored
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1398 from ueshin/issues/SPARK-2467 and squashes the following commits:
      
      7f01d58 [Takuya UESHIN] Revert SparkBuild to publish-local to both .m2 and .ivy2.
      e2255e4b
  2. Jul 14, 2014
    • Takuya UESHIN's avatar
      [SPARK-2446][SQL] Add BinaryType support to Parquet I/O. · 9fe693b5
      Takuya UESHIN authored
      Note that this commit changes the semantics when loading in data that was created with prior versions of Spark SQL.  Before, we were writing out strings as Binary data without adding any other annotations. Thus, when data is read in from prior versions, data that was StringType will now become BinaryType.  Users that need strings can CAST that column to a String.  It was decided that while this breaks compatibility, it does make us compatible with other systems (Hive, Thrift, etc) and adds support for Binary data, so this is the right decision long term.
      
      To support `BinaryType`, the following changes are needed:
      - Make `StringType` use `OriginalType.UTF8`
      - Add `BinaryType` using `PrimitiveTypeName.BINARY` without `OriginalType`
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1373 from ueshin/issues/SPARK-2446 and squashes the following commits:
      
      ecacb92 [Takuya UESHIN] Add BinaryType support to Parquet I/O.
      616e04a [Takuya UESHIN] Make StringType use OriginalType.UTF8.
      9fe693b5
    • li-zhihui's avatar
      [SPARK-1946] Submit tasks after (configured ratio) executors have been registered · 3dd8af7a
      li-zhihui authored
      Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality.
      
      A simple solution is sleeping few seconds in application, so that executors have enough time to register.
      
      The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered.
      
      \# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0
      spark.scheduler.minRegisteredExecutorsRatio = 0.8
      
      \# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000
      spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000
      
      Author: li-zhihui <zhihui.li@intel.com>
      
      Closes #900 from li-zhihui/master and squashes the following commits:
      
      b9f8326 [li-zhihui] Add logs & edit docs
      1ac08b1 [li-zhihui] Add new configs to user docs
      22ead12 [li-zhihui] Move waitBackendReady to postStartHook
      c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS
      4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor
      0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks
      4261454 [li-zhihui] Add docs for new configs & code style
      ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime
      6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha
      812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode
      e7b6272 [li-zhihui] support yarn-cluster
      37f7dc2 [li-zhihui] support yarn mode(percentage style)
      3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
      3dd8af7a
    • Zongheng Yang's avatar
      [SPARK-2443][SQL] Fix slow read from partitioned tables · d60b09bb
      Zongheng Yang authored
      This fix obtains a comparable performance boost as [PR #1390](https://github.com/apache/spark/pull/1390) by moving an array update and deserializer initialization out of a potentially very long loop. Suggested by yhuai. The below results are updated for this fix.
      
      ## Benchmarks
      Generated a local text file with 10M rows of simple key-value pairs. The data is loaded as a table through Hive. Results are obtained on my local machine using hive/console.
      
      Without the fix:
      
      Type | Non-partitioned | Partitioned (1 part)
      ------------ | ------------ | -------------
      First run | 9.52s end-to-end (1.64s Spark job) | 36.6s (28.3s)
      Stablized runs | 1.21s (1.18s) | 27.6s (27.5s)
      
      With this fix:
      
      Type | Non-partitioned | Partitioned (1 part)
      ------------ | ------------ | -------------
      First run | 9.57s (1.46s) | 11.0s (1.69s)
      Stablized runs | 1.13s (1.10s) | 1.23s (1.19s)
      
      Author: Zongheng Yang <zongheng.y@gmail.com>
      
      Closes #1408 from concretevitamin/slow-read-2 and squashes the following commits:
      
      d86e437 [Zongheng Yang] Move update & initialization out of potentially long loop.
      d60b09bb
    • Daoyuan's avatar
      move some test file to match src code · 38ccd6eb
      Daoyuan authored
      Just move some test suite to corresponding package
      
      Author: Daoyuan <daoyuan.wang@intel.com>
      
      Closes #1401 from adrian-wang/movetestfiles and squashes the following commits:
      
      d1a6803 [Daoyuan] move some test file to match src code
      38ccd6eb
    • Prashant Sharma's avatar
      Made rdd.py pep8 complaint by using Autopep8 and a little manual editing. · aab53496
      Prashant Sharma authored
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1354 from ScrapCodes/pep8-comp-1 and squashes the following commits:
      
      9858ea8 [Prashant Sharma] Code Review
      d8851b7 [Prashant Sharma] Found # noqa works even inside comment blocks. Not sure if it works with all versions of python.
      10c0cef [Prashant Sharma] Made rdd.py pep8 complaint by using Autopep8 and a little manual tweaking.
      aab53496
  3. Jul 13, 2014
    • Sean Owen's avatar
      SPARK-2363. Clean MLlib's sample data files · 635888cb
      Sean Owen authored
      (Just made a PR for this, mengxr was the reporter of:)
      
      MLlib has sample data under serveral folders:
      1) data/mllib
      2) data/
      3) mllib/data/*
      Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #1394 from srowen/SPARK-2363 and squashes the following commits:
      
      54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/
      635888cb
  4. Jul 12, 2014
    • Sandy Ryza's avatar
      SPARK-2462. Make Vector.apply public. · 4c8be64e
      Sandy Ryza authored
      Apologies if there's an already-discussed reason I missed for why this doesn't make sense.
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #1389 from sryza/sandy-spark-2462 and squashes the following commits:
      
      2e5e201 [Sandy Ryza] SPARK-2462.  Make Vector.apply public.
      4c8be64e
    • Michael Armbrust's avatar
      [SPARK-2405][SQL] Reusue same byte buffers when creating new instance of InMemoryRelation · 1a7d7cc8
      Michael Armbrust authored
      Reuse byte buffers when creating unique attributes for multiple instances of an InMemoryRelation in a single query plan.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1332 from marmbrus/doubleCache and squashes the following commits:
      
      4a19609 [Michael Armbrust] Clean up concurrency story by calculating buffersn the constructor.
      b39c931 [Michael Armbrust] Allocations are kind of a side effect.
      f67eff7 [Michael Armbrust] Reusue same byte buffers when creating new instance of InMemoryRelation
      1a7d7cc8
    • Michael Armbrust's avatar
      [SPARK-2441][SQL] Add more efficient distinct operator. · 7e26b576
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1366 from marmbrus/partialDistinct and squashes the following commits:
      
      12a31ab [Michael Armbrust] Add more efficient distinct operator.
      7e26b576
    • Ankur Dave's avatar
      [SPARK-2455] Mark (Shippable)VertexPartition serializable · 7a013529
      Ankur Dave authored
      VertexPartition and ShippableVertexPartition are contained in RDDs but are not marked Serializable, leading to NotSerializableExceptions when using Java serialization.
      
      The fix is simply to mark them as Serializable. This PR does that and adds a test for serializing them using Java and Kryo serialization.
      
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #1376 from ankurdave/SPARK-2455 and squashes the following commits:
      
      ed4a51b [Ankur Dave] Make (Shippable)VertexPartition serializable
      1fd42c5 [Ankur Dave] Add failing tests for Java serialization
      7a013529
    • Daniel Darabos's avatar
      Use the Executor's ClassLoader in sc.objectFile(). · 2245c87a
      Daniel Darabos authored
      This makes it possible to read classes from the object file which were specified in the user-provided jars. (By default ObjectInputStream uses latestUserDefinedLoader, which may or may not be the right one.)
      
      I created this because I ran into the following problem. I have x:RDD[X] with X being defined in the jar that I provide to SparkContext. I save it with x.saveAsObjectFile("x"). I try to load it with sc.objectFile\[X\]("x"). It fails with ClassNotFoundException.
      
      After a good while of debugging I figured out that Utils.deserialize() most likely uses the ClassLoader of Utils. This is the bootstrap ClassLoader, so it is not aware of the dynamically added jars. This patch fixes the issue.
      
      A more robust fix would be to always default to Thread.currentThread.getContextClassLoader. This would prevent this problem from biting anyone in the future. It would be a bit harder to test though. On the topic of testing, if you'd like to see tests for this, I will need some hand-holding. Thanks!
      
      Author: Daniel Darabos <darabos.daniel@gmail.com>
      
      Closes #181 from darabos/master and squashes the following commits:
      
      45a011a [Daniel Darabos] Add test for SPARK-1877. (Fixed in 52eb54d0.)
      e13e090 [Daniel Darabos] Merge branch 'master' of https://github.com/apache/spark
      61fe0d0 [Daniel Darabos] Fix style (line too long).
      1b5df2c [Daniel Darabos] Use the Executor's ClassLoader in sc.objectFile(). This makes it possible to read classes from the object file which were specified in the user-provided jars. (By default ObjectInputStream uses latestUserDefinedLoader, which may or may not be the right one.)
      2245c87a
    • Li Pu's avatar
      use specialized axpy in RowMatrix for SVD · d38887b8
      Li Pu authored
      After running some more tests on large matrix, found that the BV axpy (breeze/linalg/Vector.scala, axpy) is slower than the BSV axpy (breeze/linalg/operators/SparseVectorOps.scala, sv_dv_axpy), 8s v.s. 2s for each multiplication. The BV axpy operates on an iterator while BSV axpy directly operates on the underlying array. I think the overhead comes from creating the iterator (with a zip) and advancing the pointers.
      
      Author: Li Pu <lpu@twitter.com>
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Li Pu <li.pu@outlook.com>
      
      Closes #1378 from vrilleup/master and squashes the following commits:
      
      6fb01a3 [Li Pu] use specialized axpy in RowMatrix
      5255f2a [Li Pu] Merge remote-tracking branch 'upstream/master'
      7312ec1 [Li Pu] very minor comment fix
      4c618e9 [Li Pu] Merge pull request #1 from mengxr/vrilleup-master
      a461082 [Xiangrui Meng] make superscript show up correctly in doc
      861ec48 [Xiangrui Meng] simplify axpy
      62969fa [Xiangrui Meng] use BDV directly in symmetricEigs change the computation mode to local-svd, local-eigs, and dist-eigs update tests and docs
      c273771 [Li Pu] automatically determine SVD compute mode and parameters
      7148426 [Li Pu] improve RowMatrix multiply
      5543cce [Li Pu] improve svd api
      819824b [Li Pu] add flag for dense svd or sparse svd
      eb15100 [Li Pu] fix binary compatibility
      4c7aec3 [Li Pu] improve comments
      e7850ed [Li Pu] use aggregate and axpy
      827411b [Li Pu] fix EOF new line
      9c80515 [Li Pu] use non-sparse implementation when k = n
      fe983b0 [Li Pu] improve scala style
      96d2ecb [Li Pu] improve eigenvalue sorting
      e1db950 [Li Pu] SPARK-1782: svd for sparse matrix using ARPACK
      d38887b8
    • DB Tsai's avatar
      [SPARK-1969][MLlib] Online summarizer APIs for mean, variance, min, and max · 55960869
      DB Tsai authored
      It basically moved the private ColumnStatisticsAggregator class from RowMatrix to public available DeveloperApi with documentation and unitests.
      
      Changes:
      1) Moved the private implementation from org.apache.spark.mllib.linalg.ColumnStatisticsAggregator to org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
      2) When creating OnlineSummarizer object, the number of columns is not needed in the constructor. It's determined when users add the first sample.
      3) Added the APIs documentation for MultivariateOnlineSummarizer.
      4) Added the unittests for MultivariateOnlineSummarizer.
      
      Author: DB Tsai <dbtsai@dbtsai.com>
      
      Closes #955 from dbtsai/dbtsai-summarizer and squashes the following commits:
      
      b13ac90 [DB Tsai] dbtsai-summarizer
      55960869
  5. Jul 11, 2014
    • Kousuke Saruta's avatar
      [SPARK-2457] Inconsistent description in README about build option · cbff1877
      Kousuke Saruta authored
      Now, we should use -Pyarn instead of SPARK_YARN when building but README says as follows.
      
          For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
          with YARN, also set `SPARK_YARN=true`:
      
            # Apache Hadoop 2.0.5-alpha
            $ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
      
            # Cloudera CDH 4.2.0 with MapReduce v2
            $ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
      
            # Apache Hadoop 2.2.X and newer
            $ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #1382 from sarutak/SPARK-2457 and squashes the following commits:
      
      e7b2d64 [Kousuke Saruta] Replaced "SPARK_YARN=true" with "-Pyarn" in README
      cbff1877
    • Prashant Sharma's avatar
      [SPARK-2437] Rename MAVEN_PROFILES to SBT_MAVEN_PROFILES and add SBT_MAVEN_PROPERTIES · b23e9c3e
      Prashant Sharma authored
      NOTE: It is not possible to use both env variable  `SBT_MAVEN_PROFILES`  and `-P` flag at same time. `-P` if specified takes precedence.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1374 from ScrapCodes/SPARK-2437/rename-MAVEN_PROFILES and squashes the following commits:
      
      8694bde [Prashant Sharma] [SPARK-2437] Rename MAVEN_PROFILES to SBT_MAVEN_PROFILES and add SBT_MAVEN_PROPERTIES
      b23e9c3e
    • Andrew Or's avatar
      [Minor] Remove unused val in Master · f4f46dec
      Andrew Or authored
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1365 from andrewor14/master-fs and squashes the following commits:
      
      497f100 [Andrew Or] Sneak in a space and hope no one will notice
      05ba6da [Andrew Or] Remove unused val
      f4f46dec
    • CrazyJvm's avatar
      fix Graph partitionStrategy comment · 282cca0e
      CrazyJvm authored
      Author: CrazyJvm <crazyjvm@gmail.com>
      
      Closes #1368 from CrazyJvm/graph-comment-1 and squashes the following commits:
      
      d47f3c5 [CrazyJvm] fix style
      e190d6f [CrazyJvm] fix Graph partitionStrategy comment
      282cca0e
  6. Jul 10, 2014
    • Xiangrui Meng's avatar
      [SPARK-2358][MLLIB] Add an option to include native BLAS/LAPACK loader in the build · 2f59ce7d
      Xiangrui Meng authored
      It would be easy for users to include the netlib-java jniloader in the spark jar, which is LGPL-licensed. We can follow the same approach as ganglia support in Spark, which could be enabled by turning on "-Pganglia-lgpl" at build time. We can use "-Pnetlib-lgpl" flag for this.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1295 from mengxr/netlib-lgpl and squashes the following commits:
      
      aebf001 [Xiangrui Meng] add a profile to optionally include native BLAS/LAPACK loader in mllib
      2f59ce7d
    • Takuya UESHIN's avatar
      [SPARK-2428][SQL] Add except and intersect methods to SchemaRDD. · 10b59ba2
      Takuya UESHIN authored
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1355 from ueshin/issues/SPARK-2428 and squashes the following commits:
      
      b6fa264 [Takuya UESHIN] Add except and intersect methods to SchemaRDD.
      10b59ba2
    • Takuya UESHIN's avatar
      [SPARK-2415] [SQL] RowWriteSupport should handle empty ArrayType correctly. · f5abd271
      Takuya UESHIN authored
      `RowWriteSupport` doesn't write empty `ArrayType` value, so the read value becomes `null`.
      It should write empty `ArrayType` value as it is.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1339 from ueshin/issues/SPARK-2415 and squashes the following commits:
      
      32afc87 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2415
      2f05196 [Takuya UESHIN] Fix RowWriteSupport to handle empty ArrayType correctly.
      f5abd271
    • Takuya UESHIN's avatar
      [SPARK-2431][SQL] Refine StringComparison and related codes. · f62c4272
      Takuya UESHIN authored
      Refine `StringComparison` and related codes as follows:
      - `StringComparison` could be similar to `StringRegexExpression` or `CaseConversionExpression`.
      - Nullability of `StringRegexExpression` could depend on children's nullabilities.
      - Add a case that the like condition includes no wildcard to `LikeSimplification`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1357 from ueshin/issues/SPARK-2431 and squashes the following commits:
      
      77766f5 [Takuya UESHIN] Add a case that the like condition includes no wildcard to LikeSimplification.
      b9da9d2 [Takuya UESHIN] Fix nullability of StringRegexExpression.
      680bb72 [Takuya UESHIN] Refine StringComparison.
      f62c4272
    • Artjom-Metro's avatar
      SPARK-2427: Fix Scala examples that use the wrong command line arguments index · ae8ca4df
      Artjom-Metro authored
      The Scala examples HBaseTest and HdfsTest don't use the correct indexes for the command line arguments. This due to to the fix of JIRA 1565, where these examples were not correctly adapted to the new usage of the submit script.
      
      Author: Artjom-Metro <Artjom-Metro@users.noreply.github.com>
      Author: Artjom-Metro <artjom31415@googlemail.com>
      
      Closes #1353 from Artjom-Metro/fix_examples and squashes the following commits:
      
      6111801 [Artjom-Metro] Reduce the default number of iterations
      cfaa73c [Artjom-Metro] Fix some examples that use the wrong index to access the command line arguments
      ae8ca4df
    • Issac Buenrostro's avatar
      [SPARK-1341] [Streaming] Throttle BlockGenerator to limit rate of data consumption. · 2dd67248
      Issac Buenrostro authored
      Author: Issac Buenrostro <buenrostro@ooyala.com>
      
      Closes #945 from ibuenros/SPARK-1341-throttle and squashes the following commits:
      
      5514916 [Issac Buenrostro] Formatting changes, added documentation for streaming throttling, stricter unit tests for throttling.
      62f395f [Issac Buenrostro] Add comments and license to streaming RateLimiter.scala
      7066438 [Issac Buenrostro] Moved throttle code to RateLimiter class, smoother pushing when throttling active
      ccafe09 [Issac Buenrostro] Throttle BlockGenerator to limit rate of data consumption.
      2dd67248
    • tmalaska's avatar
      [SPARK-1478].3: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 · 40a8fef4
      tmalaska authored
      This is a modified version of this PR https://github.com/apache/spark/pull/1168 done by @tmalaska
      Adds MIMA binary check exclusions.
      
      Author: tmalaska <ted.malaska@cloudera.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1347 from tdas/FLUME-1915 and squashes the following commits:
      
      96065df [Tathagata Das] Added Mima exclusion for FlumeReceiver.
      41d5338 [tmalaska] Address line 57 that was too long
      12617e5 [tmalaska] SPARK-1478: Upgrade FlumeInputDStream's Flume...
      40a8fef4
    • Nicholas Chammas's avatar
      name ec2 instances and security groups consistently · 369aa84e
      Nicholas Chammas authored
      Security groups created by `spark-ec2` do not prepend “spark-“ to the
      name.
      
      Since naming the instances themselves is new to `spark-ec2`, it’s better
      to change that pattern to match the existing naming pattern for the
      security groups, rather than the other way around.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      Author: nchammas <nicholas.chammas@gmail.com>
      
      Closes #1344 from nchammas/master and squashes the following commits:
      
      f7e4581 [Nicholas Chammas] unrelated pep8 fix
      a36eed0 [Nicholas Chammas] name ec2 instances and security groups consistently
      de7292a [nchammas] Merge pull request #4 from apache/master
      2e4fe00 [nchammas] Merge pull request #3 from apache/master
      89fde08 [nchammas] Merge pull request #2 from apache/master
      69f6e22 [Nicholas Chammas] PEP8 fixes
      2627247 [Nicholas Chammas] broke up lines before they hit 100 chars
      6544b7e [Nicholas Chammas] [SPARK-2065] give launched instances names
      69da6cf [nchammas] Merge pull request #1 from apache/master
      369aa84e
    • Patrick Wendell's avatar
      HOTFIX: Minor doc update for sbt change · 88006a62
      Patrick Wendell authored
      88006a62
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
    • Masayoshi TSUZUKI's avatar
      SPARK-2115: Stage kill link is too close to stage details link · c2babc08
      Masayoshi TSUZUKI authored
      Moved (kill) link to the right side. Add confirmation dialog when (kill) link is clicked.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #1350 from tsudukim/feature/SPARK-2115 and squashes the following commits:
      
      e2263b0 [Masayoshi TSUZUKI] Moved (kill) link to the right side. Add confirmation dialog when (kill) link is clicked.
      c2babc08
    • Raymond Liu's avatar
      Clean up SparkKMeans example's code · 2b18ea98
      Raymond Liu authored
      remove unused code
      
      Author: Raymond Liu <raymond.liu@intel.com>
      
      Closes #1352 from colorant/kmeans and squashes the following commits:
      
      ddcd1dd [Raymond Liu] Clean up SparkKMeans example's code
      2b18ea98
  7. Jul 09, 2014
Loading