Skip to content
Snippets Groups Projects
  1. Apr 24, 2014
    • Michael Armbrust's avatar
      SPARK-1494 Don't initialize classes loaded by MIMA excludes, attempt 2 · c5c1916d
      Michael Armbrust authored
      [WIP]
      
      Looks like scala reflection was invoking the static initializer:
      ```
      ...
      	at org.apache.spark.sql.test.TestSQLContext$.<init>(TestSQLContext.scala:25)
      	at org.apache.spark.sql.test.TestSQLContext$.<clinit>(TestSQLContext.scala)
      	at java.lang.Class.forName0(Native Method)
      	at java.lang.Class.forName(Class.java:270)
      	at scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:500)
      	at scala.reflect.runtime.JavaMirrors$JavaMirror.tryJavaClass(JavaMirrors.scala:505)
      	at scala.reflect.runtime.SymbolLoaders$PackageScope.lookupEntry(SymbolLoaders.scala:109)
      ...
      ```
      
      Need to make sure that this doesn't change the exclusion semantics before merging.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #526 from marmbrus/mima and squashes the following commits:
      
      8168dea [Michael Armbrust] Spurious change
      afba262 [Michael Armbrust] Prevent Scala reflection from running static class initializer.
      c5c1916d
    • Thomas Graves's avatar
      Spark 1490 Add kerberos support to the HistoryServer · bd375094
      Thomas Graves authored
      Here I've added the ability for the History server to login from a kerberos keytab file so that the history server can be run as a super user and stay up for along period of time while reading the history files from HDFS.
      
      Author: Thomas Graves <tgraves@apache.org>
      
      Closes #513 from tgravescs/SPARK-1490 and squashes the following commits:
      
      e204a99 [Thomas Graves] remove extra logging
      5418daa [Thomas Graves] fix typo in config
      0076b99 [Thomas Graves] Update docs
      4d76545 [Thomas Graves] SPARK-1490 Add kerberos support to the HistoryServer
      bd375094
    • zsxwing's avatar
      SPARK-1611: Fix incorrect initialization order in AppendOnlyMap · 78a49b25
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1611
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #534 from zsxwing/SPARK-1611 and squashes the following commits:
      
      96af089 [zsxwing] SPARK-1611: Fix incorrect initialization order in AppendOnlyMap
      78a49b25
    • Sean Owen's avatar
      SPARK-1488. Squash more language feature warnings in new commits by importing implicitConversion · 6338a93f
      Sean Owen authored
      A recent commit reintroduced some of the same warnings that SPARK-1488 resolved. These are just a few more of the same changes to remove these warnings.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #528 from srowen/SPARK-1488.2 and squashes the following commits:
      
      62d592c [Sean Owen] More feature warnings in tests
      4e2e94b [Sean Owen] Squash more language feature warnings in new commits by importing implicitConversion
      6338a93f
    • Patrick Wendell's avatar
      Small changes to release script · faeb761c
      Patrick Wendell authored
      faeb761c
    • Takuya UESHIN's avatar
      [SPARK-1610] [SQL] Fix Cast to use exact type value when cast from BooleanType to NumericTy... · 27b2821c
      Takuya UESHIN authored
      ...pe.
      
      `Cast` from `BooleanType` to `NumericType` are all using `Int` value.
      But it causes `ClassCastException` when the casted value is used by the following evaluation like the code below:
      
      ``` scala
      scala> import org.apache.spark.sql.catalyst._
      import org.apache.spark.sql.catalyst._
      
      scala> import types._
      import types._
      
      scala> import expressions._
      import expressions._
      
      scala> Add(Cast(Literal(true), ShortType), Literal(1.toShort)).eval()
      java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Short
      	at scala.runtime.BoxesRunTime.unboxToShort(BoxesRunTime.java:102)
      	at scala.math.Numeric$ShortIsIntegral$.plus(Numeric.scala:72)
      	at org.apache.spark.sql.catalyst.expressions.Add$$anonfun$eval$2.apply(arithmetic.scala:58)
      	at org.apache.spark.sql.catalyst.expressions.Add$$anonfun$eval$2.apply(arithmetic.scala:58)
      	at org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:114)
      	at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:58)
      	at .<init>(<console>:17)
      	at .<clinit>(<console>)
      	at .<init>(<console>:7)
      	at .<clinit>(<console>)
      	at $print(<console>)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:483)
      	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
      	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
      	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
      	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
      	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
      	at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
      	at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
      	at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
      	at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
      	at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
      	at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
      	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
      	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
      	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
      	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
      	at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
      	at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
      	at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
      	at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
      	at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
      ```
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #533 from ueshin/issues/SPARK-1610 and squashes the following commits:
      
      70f36e8 [Takuya UESHIN] Fix Cast to use exact type value when cast from BooleanType to NumericType.
      27b2821c
    • Reynold Xin's avatar
      SPARK-1601 & SPARK-1602: two bug fixes related to cancellation · 1fdf659d
      Reynold Xin authored
      This should go into 1.0 since it would return wrong data when the bug happens (which is pretty likely if cancellation is used). Test case attached.
      
      1. Do not put partially executed partitions into cache (in task killing).
      
      2. Iterator returned by CacheManager#getOrCompute was not an InterruptibleIterator, and was thus leading to uninterruptible jobs.
      
      Thanks @aarondav and @ahirreddy for reporting and helping debug.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #521 from rxin/kill and squashes the following commits:
      
      401033f [Reynold Xin] Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/spark into kill
      7a7bdd2 [Reynold Xin] Add a new line in the end of JobCancellationSuite.scala.
      35cd9f7 [Reynold Xin] Fixed a bug that partially executed partitions can be put into cache (in task killing).
      1fdf659d
    • Mridul Muralidharan's avatar
      SPARK-1587 Fix thread leak · dd681f50
      Mridul Muralidharan authored
      mvn test fails (intermittently) due to thread leak - since scalatest runs all tests in same vm.
      
      Author: Mridul Muralidharan <mridulm80@apache.org>
      
      Closes #504 from mridulm/resource_leak_fixes and squashes the following commits:
      
      a5d10d0 [Mridul Muralidharan] Prevent thread leaks while running tests : cleanup all threads when SparkContext.stop is invoked. Causes tests to fail
      7b5e19c [Mridul Muralidharan] Prevent NPE while running tests
      dd681f50
    • Sandeep's avatar
      [Fix #79] Replace Breakable For Loops By While Loops · bb68f477
      Sandeep authored
      Author: Sandeep <sandeep@techaddict.me>
      
      Closes #503 from techaddict/fix-79 and squashes the following commits:
      
      e3f6746 [Sandeep] Style changes
      07a4f6b [Sandeep] for loop to While loop
      0a6d8e9 [Sandeep] Breakable for loop to While loop
      bb68f477
    • zsxwing's avatar
      SPARK-1589: Fix the incorrect compare · 6ab75780
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1589
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #508 from zsxwing/SPARK-1589 and squashes the following commits:
      
      570c67a [zsxwing] SPARK-1589: Fix the incorrect compare
      6ab75780
    • Ankur Dave's avatar
      Mark all fields of EdgePartition, Graph, and GraphOps transient · 1d6abe3a
      Ankur Dave authored
      These classes are only serializable to work around closure capture, so their fields should all be marked `@transient` to avoid wasteful serialization.
      
      This PR supersedes apache/spark#519 and fixes the same bug.
      
      Author: Ankur Dave <ankurdave@gmail.com>
      
      Closes #520 from ankurdave/graphx-transient and squashes the following commits:
      
      6431760 [Ankur Dave] Mark all fields of EdgePartition, Graph, and GraphOps `@transient`
      1d6abe3a
    • Aaron Davidson's avatar
      Update Java api for setJobGroup with interruptOnCancel · d485eecb
      Aaron Davidson authored
      Also adds a unit test.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #522 from aarondav/cancel2 and squashes the following commits:
      
      565c253 [Aaron Davidson] Update Java api for setJobGroup with interruptOnCancel
      65b33d8 [Aaron Davidson] Add unit test for Thread interruption on cancellation
      d485eecb
  2. Apr 23, 2014
    • Andrew Or's avatar
      [Hot Fix #469] Fix flaky test in SparkListenerSuite · 4b2bab1d
      Andrew Or authored
      The two modified tests may fail if the race condition does not bid in our favor...
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #516 from andrewor14/stage-info-test-fix and squashes the following commits:
      
      b4b6100 [Andrew Or] Add/replace missing waitUntilEmpty() calls to listener bus
      4b2bab1d
    • Matei Zaharia's avatar
      [SPARK-1540] Add an optional Ordering parameter to PairRDDFunctions. · 640f9a0e
      Matei Zaharia authored
      In https://issues.apache.org/jira/browse/SPARK-1540 we'd like to look at Spark's API to see if we can take advantage of Comparable keys in more places, which will make external spilling more efficient. This PR is a first step towards that that shows how to pass an Ordering when available and still continue functioning otherwise. It does this using a new implicit parameter with a default value of null.
      
      The API is currently only in Scala -- in Java we'd have to add new versions of mapToPair and such that take a Comparator, or a new method to add a "type hint" to an RDD. We can address those later though.
      
      Unfortunately requiring all keys to be Comparable would not work without requiring RDDs in general to contain only Comparable types. The reason is that methods such as distinct() and intersection() do a shuffle, but should be usable on RDDs of any type. So ordering will have to remain an optimization for the types that can be ordered. I think this isn't a horrible outcome though because one of the nice things about Spark's API is that it works on objects of *any* type, without requiring you to specify a schema or implement Writable or stuff like that.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Reynold Xin <rxin@apache.org>
      
      Closes #487 from mateiz/ordered-keys and squashes the following commits:
      
      bd565f6 [Matei Zaharia] Pass an Ordering to only one version of groupBy because the Scala language spec doesn't allow having an optional parameter on all of them (this was only compiling in Scala 2.10 due to a bug).
      4629965 [Matei Zaharia] Add tests for other versions of groupBy
      3beae85 [Matei Zaharia] Added a test for implicit orderings
      80b7a3b [Matei Zaharia] Add an optional Ordering parameter to PairRDDFunctions.
      640f9a0e
    • Aaron Davidson's avatar
      SPARK-1582 Invoke Thread.interrupt() when cancelling jobs · 432201c7
      Aaron Davidson authored
      Sometimes executor threads are blocked waiting for IO or monitors, and the current implementation of job cancellation may never recover these threads. By simply invoking Thread.interrupt() during cancellation, we can often safely unblock the threads and use them for subsequent work.
      
      Note that this feature must remain optional for now because of a bug in HDFS where Thread.interrupt() may cause nodes to be marked as permanently dead (as the InterruptedException is reinterpreted as an IOException during communication with some node).
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #498 from aarondav/cancel and squashes the following commits:
      
      e52b829 [Aaron Davidson] Don't use job.properties when null
      82f78bb [Aaron Davidson] Update DAGSchedulerSuite
      b67f472 [Aaron Davidson] Add comment on why interruptOnCancel is in setJobGroup
      4cb9fd6 [Aaron Davidson] SPARK-1582 Invoke Thread.interrupt() when cancelling jobs
      432201c7
    • Marcelo Vanzin's avatar
      Honor default fs name when initializing event logger. · dd1b7a61
      Marcelo Vanzin authored
      This is related to SPARK-1459 / PR #375. Without this fix,
      FileLogger.createLogDir() may try to create the log dir on
      HDFS, while createWriter() will try to open the log file on
      the local file system, leading to interesting errors and
      confusion.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #450 from vanzin/event-file-2 and squashes the following commits:
      
      592cdb3 [Marcelo Vanzin] Honor default fs name when initializing event logger.
      dd1b7a61
    • Aaron Davidson's avatar
      SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent · a967b005
      Aaron Davidson authored
      Previously, the behavior was that if the parent RDD threw any exception other than IOException or FileNotFoundException (which is quite possible for Hadoop input sources), the entire Executor would crash, because the default thread a uncaught exception handler calls System.exit().
      
      This patch avoids two related issues:
      
        1. Always catch exceptions in this reader thread.
        2. Don't mask readerException when Python throws an EOFError
           after worker.shutdownOutput() is called.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #486 from aarondav/pyspark and squashes the following commits:
      
      fbb11e9 [Aaron Davidson] Make sure FileNotFoundExceptions are handled same as before
      b9acb3e [Aaron Davidson] SPARK-1572 Don't kill Executor if PythonRDD fails while computing parent
      a967b005
    • zsxwing's avatar
      SPARK-1583: Fix a bug that using java.util.HashMap by mistake · a6646066
      zsxwing authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-1583
      
      Does anyone know why using `java.util.HashMap` rather than `mutable.HashMap`? Some methods of `java.util.HashMap` are not generics and compiler can not help us find similar problems.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #500 from zsxwing/SPARK-1583 and squashes the following commits:
      
      7bfd74d [zsxwing] SPARK-1583: Fix a bug that using java.util.HashMap by mistake
      a6646066
    • Patrick Wendell's avatar
      SPARK-1119 and other build improvements · cd4ed293
      Patrick Wendell authored
      1. Makes assembly and examples jar naming consistent in maven/sbt.
      2. Updates make-distribution.sh to use Maven and fixes some bugs.
      3. Updates the create-release script to call make-distribution script.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #502 from pwendell/make-distribution and squashes the following commits:
      
      1a97f0d [Patrick Wendell] SPARK-1119 and other build improvements
      cd4ed293
    • Michael Armbrust's avatar
      [SQL] SPARK-1571 Mistake in java example code · 39f85e03
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #496 from marmbrus/javaBeanBug and squashes the following commits:
      
      644fedd [Michael Armbrust] Bean methods must be public.
      39f85e03
    • Michael Armbrust's avatar
      SPARK-1494 Don't initialize classes loaded by MIMA excludes. · 8e950813
      Michael Armbrust authored
      [WIP]  Just seeing how Jenkins likes this...
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #494 from marmbrus/mima and squashes the following commits:
      
      6eec616 [Michael Armbrust] Force hive tests to run.
      acaf682 [Michael Armbrust] Don't initialize loaded classes.
      8e950813
  3. Apr 22, 2014
    • Michael Armbrust's avatar
      SPARK-1562 Fix visibility / annotation of Spark SQL APIs · aa77f8a6
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #489 from marmbrus/sqlDocFixes and squashes the following commits:
      
      acee4f3 [Michael Armbrust] Fix visibility / annotation of Spark SQL APIs
      aa77f8a6
    • Xiangrui Meng's avatar
      [FIX: SPARK-1376] use --arg instead of --args in SparkSubmit to avoid warning messages · 662c860e
      Xiangrui Meng authored
      Even if users use `--arg`, `SparkSubmit` still uses `--args` for child args internally, which triggers a warning message that may confuse users:
      
      ~~~
      --args is deprecated. Use --arg instead.
      ~~~
      
      @sryza Does it look good to you?
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #485 from mengxr/submit-arg and squashes the following commits:
      
      5e1b9fe [Xiangrui Meng] update test
      cebbeb7 [Xiangrui Meng] use --arg instead of --args in SparkSubmit to avoid warning messages
      662c860e
    • Tathagata Das's avatar
      [streaming][SPARK-1578] Removed requirement for TTL in StreamingContext. · f3d19a9f
      Tathagata Das authored
      Since shuffles and RDDs that are out of context are automatically cleaned by Spark core (using ContextCleaner) there is no need for setting the cleaner TTL while creating a StreamingContext.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #491 from tdas/ttl-fix and squashes the following commits:
      
      cf01dc7 [Tathagata Das] Removed requirement for TTL in StreamingContext.
      f3d19a9f
    • Andrew Or's avatar
      [Spark-1538] Fix SparkUI incorrectly hiding persisted RDDs · 2de57387
      Andrew Or authored
      **Bug**: After the following command `sc.parallelize(1 to 1000).persist.map(_ + 1).count()` is run, the the persisted RDD is missing from the storage tab of the SparkUI.
      
      **Cause**: The command creates two RDDs in one stage, a `ParallelCollectionRDD` and a `MappedRDD`. However, the existing StageInfo only keeps the RDDInfo of the last RDD associated with the stage (`MappedRDD`), and so all RDD information regarding the first RDD (`ParallelCollectionRDD`) is discarded. In this case, we persist the first RDD,  but the StorageTab doesn't know about this RDD because it is not encoded in the StageInfo.
      
      **Fix**: Record information of all RDDs in StageInfo, instead of just the last RDD (i.e. `stage.rdd`). Since stage boundaries are marked by shuffle dependencies, the solution is to traverse the last RDD's dependency tree, visiting only ancestor RDDs related through a sequence of narrow dependencies.
      
      ---
      
      This PR also moves RDDInfo to its own file, includes a few style fixes, and adds a unit test for constructing StageInfos.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #469 from andrewor14/storage-ui-fix and squashes the following commits:
      
      07fc7f0 [Andrew Or] Add back comment that was accidentally removed (minor)
      5d799fe [Andrew Or] Add comment to justify testing of getNarrowAncestors with cycles
      9d0e2b8 [Andrew Or] Hide details of getNarrowAncestors from outsiders
      d2bac8a [Andrew Or] Deal with cycles in RDD dependency graph + add extensive tests
      2acb177 [Andrew Or] Move getNarrowAncestors to RDD.scala
      bfe83f0 [Andrew Or] Backtrace RDD dependency tree to find all RDDs that belong to a Stage
      2de57387
    • Patrick Wendell's avatar
      Assorted clean-up for Spark-on-YARN. · 995fdc96
      Patrick Wendell authored
      In particular when the HADOOP_CONF_DIR is not not specified.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #488 from pwendell/hadoop-cleanup and squashes the following commits:
      
      fe95f13 [Patrick Wendell] Changes based on Andrew's feeback
      18d09c1 [Patrick Wendell] Review comments from Andrew
      17929cc [Patrick Wendell] Assorted clean-up for Spark-on-YARN.
      995fdc96
    • Kan Zhang's avatar
      [SPARK-1570] Fix classloading in JavaSQLContext.applySchema · ea8cea82
      Kan Zhang authored
      I think I hit a class loading issue when running JavaSparkSQL example using spark-submit in local mode.
      
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #484 from kanzhang/SPARK-1570 and squashes the following commits:
      
      feaaeba [Kan Zhang] [SPARK-1570] Fix classloading in JavaSQLContext.applySchema
      ea8cea82
    • Marcelo Vanzin's avatar
      Fix compilation on Hadoop 2.4.x. · 0ea0b1a2
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #483 from vanzin/yarn-2.4 and squashes the following commits:
      
      0fc57d8 [Marcelo Vanzin] Fix compilation on Hadoop 2.4.x.
      0ea0b1a2
    • Andrew Or's avatar
      [Fix #204] Eliminate delay between binding and log checking · 745e496c
      Andrew Or authored
      **Bug**: In the existing history server, there is a `spark.history.updateInterval` seconds delay before application logs show up on the UI.
      
      **Cause**: This is because the following events happen in this order: (1) The background thread that checks for logs starts, but realizes the server has not yet bound and so waits for N seconds, (2) server binds, (3) N seconds later the background thread finds that the server has finally bound to a port, and so finally checks for application logs.
      
      **Fix**: This PR forces the log checking thread to start immediately after binding. It also documents two relevant environment variables that are currently missing.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #441 from andrewor14/history-server-fix and squashes the following commits:
      
      b2eb46e [Andrew Or] Document SPARK_PUBLIC_DNS and SPARK_HISTORY_OPTS for the history server
      e8d1fbc [Andrew Or] Eliminate delay between binding and checking for logs
      745e496c
    • Xiangrui Meng's avatar
      [SPARK-1506][MLLIB] Documentation improvements for MLlib 1.0 · 26d35f3f
      Xiangrui Meng authored
      Preview: http://54.82.240.23:4000/mllib-guide.html
      
      Table of contents:
      
      * Basics
        * Data types
        * Summary statistics
      * Classification and regression
        * linear support vector machine (SVM)
        * logistic regression
        * linear linear squares, Lasso, and ridge regression
        * decision tree
        * naive Bayes
      * Collaborative Filtering
        * alternating least squares (ALS)
      * Clustering
        * k-means
      * Dimensionality reduction
        * singular value decomposition (SVD)
        * principal component analysis (PCA)
      * Optimization
        * stochastic gradient descent
        * limited-memory BFGS (L-BFGS)
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #422 from mengxr/mllib-doc and squashes the following commits:
      
      944e3a9 [Xiangrui Meng] merge master
      f9fda28 [Xiangrui Meng] minor
      9474065 [Xiangrui Meng] add alpha to ALS examples
      928e630 [Xiangrui Meng] initialization_mode -> initializationMode
      5bbff49 [Xiangrui Meng] add imports to labeled point examples
      c17440d [Xiangrui Meng] fix python nb example
      28f40dc [Xiangrui Meng] remove localhost:4000
      369a4d3 [Xiangrui Meng] Merge branch 'master' into mllib-doc
      7dc95cc [Xiangrui Meng] update linear methods
      053ad8a [Xiangrui Meng] add links to go back to the main page
      abbbf7e [Xiangrui Meng] update ALS argument names
      648283e [Xiangrui Meng] level down statistics
      14e2287 [Xiangrui Meng] add sample libsvm data and use it in guide
      8cd2441 [Xiangrui Meng] minor updates
      186ab07 [Xiangrui Meng] update section names
      6568d65 [Xiangrui Meng] update toc, level up lr and svm
      162ee12 [Xiangrui Meng] rename section names
      5c1e1b1 [Xiangrui Meng] minor
      8aeaba1 [Xiangrui Meng] wrap long lines
      6ce6a6f [Xiangrui Meng] add summary statistics to toc
      5760045 [Xiangrui Meng] claim beta
      cc604bf [Xiangrui Meng] remove classification and regression
      92747b3 [Xiangrui Meng] make section titles consistent
      e605dd6 [Xiangrui Meng] add LIBSVM loader
      f639674 [Xiangrui Meng] add python section to migration guide
      c82ffb4 [Xiangrui Meng] clean optimization
      31660eb [Xiangrui Meng] update linear algebra and stat
      0a40837 [Xiangrui Meng] first pass over linear methods
      1fc8271 [Xiangrui Meng] update toc
      906ed0a [Xiangrui Meng] add a python example to naive bayes
      5f0a700 [Xiangrui Meng] update collaborative filtering
      656d416 [Xiangrui Meng] update mllib-clustering
      86e143a [Xiangrui Meng] remove data types section from main page
      8d1a128 [Xiangrui Meng] move part of linear algebra to data types and add Java/Python examples
      d1b5cbf [Xiangrui Meng] merge master
      72e4804 [Xiangrui Meng] one pass over tree guide
      64f8995 [Xiangrui Meng] move decision tree guide to a separate file
      9fca001 [Xiangrui Meng] add first version of linear algebra guide
      53c9552 [Xiangrui Meng] update dependencies
      f316ec2 [Xiangrui Meng] add migration guide
      f399f6c [Xiangrui Meng] move linear-algebra to dimensionality-reduction
      182460f [Xiangrui Meng] add guide for naive Bayes
      137fd1d [Xiangrui Meng] re-organize toc
      a61e434 [Xiangrui Meng] update mllib's toc
      26d35f3f
    • Tor Myklebust's avatar
      [SPARK-1281] Improve partitioning in ALS · bf9d49b6
      Tor Myklebust authored
      ALS was using HashPartitioner and explicit uses of `%` together.  Further, the naked use of `%` meant that, if the number of partitions corresponded with the stride of arithmetic progressions appearing in user and product ids, users and products could be mapped into buckets in an unfair or unwise way.
      
      This pull request:
      1) Makes the Partitioner an instance variable of ALS.
      2) Replaces the direct uses of `%` with calls to a Partitioner.
      3) Defines an anonymous Partitioner that scrambles the bits of the object's hashCode before reducing to the number of present buckets.
      
      This pull request does not make the partitioner user-configurable.
      
      I'm not all that happy about the way I did (1).  It introduces an icky lifetime issue and dances around it by nulling something.  However, I don't know a better way to make the partitioner visible everywhere it needs to be visible.
      
      Author: Tor Myklebust <tmyklebu@gmail.com>
      
      Closes #407 from tmyklebu/master and squashes the following commits:
      
      dcf583a [Tor Myklebust] Remove the partitioner member variable; instead, thread that needle everywhere it needs to go.
      23d6f91 [Tor Myklebust] Stop making the partitioner configurable.
      495784f [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark
      674933a [Tor Myklebust] Fix style.
      40edc23 [Tor Myklebust] Fix missing space.
      f841345 [Tor Myklebust] Fix daft bug creating 'pairs', also for -> foreach.
      5ec9e6c [Tor Myklebust] Clean a couple of things up using 'map'.
      36a0f43 [Tor Myklebust] Make the partitioner private.
      d872b09 [Tor Myklebust] Add negative id ALS test.
      df27697 [Tor Myklebust] Support custom partitioners.  Currently we use the same partitioner for users and products.
      c90b6d8 [Tor Myklebust] Scramble user and product ids before bucketing.
      c774d7d [Tor Myklebust] Make the partitioner a member variable and use it instead of modding directly.
      bf9d49b6
    • Xusen Yin's avatar
      fix bugs of dot in python · c919798f
      Xusen Yin authored
      If there are no `transpose()` in `self.theta`, a
      
      *ValueError: matrices are not aligned*
      
      is occurring. The former test case just ignore this situation.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #463 from yinxusen/python-naive-bayes and squashes the following commits:
      
      fcbe3bc [Xusen Yin] fix bugs of dot in python
      c919798f
    • Ahir Reddy's avatar
      [SPARK-1560]: Updated Pyrolite Dependency to be Java 6 compatible · 0f87e6ad
      Ahir Reddy authored
      Changed the Pyrolite dependency to a build which targets Java 6.
      
      Author: Ahir Reddy <ahirreddy@gmail.com>
      
      Closes #479 from ahirreddy/java6-pyrolite and squashes the following commits:
      
      8ea25d3 [Ahir Reddy] Updated maven build to use java 6 compatible pyrolite
      dabc703 [Ahir Reddy] Updated Pyrolite dependency to be Java 6 compatible
      0f87e6ad
    • CodingCat's avatar
      [HOTFIX] SPARK-1399: remove outdated comments · 87de2908
      CodingCat authored
      as the original PR was merged before this mistake is found....fix here,
      
      Sorry about that @pwendell, @andrewor14, I will be more careful next time
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #474 from CodingCat/hotfix_1399 and squashes the following commits:
      
      f3a8ba9 [CodingCat] move outdated comments
      87de2908
    • Patrick Wendell's avatar
      SPARK-1496: Have jarOfClass return Option[String] · 83084d3b
      Patrick Wendell authored
      A simple change, mostly had to change a bunch of example code.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #438 from pwendell/jar-of-class and squashes the following commits:
      
      aa010ff [Patrick Wendell] SPARK-1496: Have jarOfClass return Option[String]
      83084d3b
    • Marcelo Vanzin's avatar
      [SPARK-1459] Use local path (and not complete URL) when opening local lo... · ac164b79
      Marcelo Vanzin authored
      ...g file.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #375 from vanzin/event-file and squashes the following commits:
      
      f673029 [Marcelo Vanzin] [SPARK-1459] Use local path (and not complete URL) when opening local log file.
      ac164b79
    • Andrew Or's avatar
      [Fix #274] Document + fix annotation usages · b3e5366f
      Andrew Or authored
      ... so that we don't follow an unspoken set of forbidden rules for adding **@AlphaComponent**, **@DeveloperApi**, and **@Experimental** annotations in the code.
      
      In addition, this PR
      (1) removes unnecessary `:: * ::` tags,
      (2) adds missing `:: * ::` tags, and
      (3) removes annotations for internal APIs.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #470 from andrewor14/annotations-fix and squashes the following commits:
      
      92a7f42 [Andrew Or] Document + fix annotation usages
      b3e5366f
  4. Apr 21, 2014
    • Matei Zaharia's avatar
      [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs · fc783847
      Matei Zaharia authored
      I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one.
      
      Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/
      
      Author: Matei Zaharia <matei@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Patrick Wendell <pwendell@gmail.com>
      
      Closes #457 from mateiz/better-docs and squashes the following commits:
      
      a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package
      5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load
      f05abc0 [Matei Zaharia] Don't include java.lang package names
      995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc
      a14a93c [Matei Zaharia] typo
      76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java
      ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs
      acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced
      fc783847
    • Tathagata Das's avatar
      [SPARK-1332] Improve Spark Streaming's Network Receiver and InputDStream API [WIP] · 04c37b6f
      Tathagata Das authored
      The current Network Receiver API makes it slightly complicated to right a new receiver as one needs to create an instance of BlockGenerator as shown in SocketReceiver
      https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L51
      
      Exposing the BlockGenerator interface has made it harder to improve the receiving process. The API of NetworkReceiver (which was not a very stable API anyways) needs to be change if we are to ensure future stability.
      
      Additionally, the functions like streamingContext.socketStream that create input streams, return DStream objects. That makes it hard to expose functionality (say, rate limits) unique to input dstreams. They should return InputDStream or NetworkInputDStream. This is still not yet implemented.
      
      This PR is blocked on the graceful shutdown PR #247
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #300 from tdas/network-receiver-api and squashes the following commits:
      
      ea27b38 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into network-receiver-api
      3a4777c [Tathagata Das] Renamed NetworkInputDStream to ReceiverInputDStream, and ActorReceiver related stuff.
      838dd39 [Tathagata Das] Added more events to the StreamingListener to report errors and stopped receivers.
      a75c7a6 [Tathagata Das] Address some PR comments and fixed other issues.
      91bfa72 [Tathagata Das] Fixed bugs.
      8533094 [Tathagata Das] Scala style fixes.
      028bde6 [Tathagata Das] Further refactored receiver to allow restarting of a receiver.
      43f5290 [Tathagata Das] Made functions that create input streams return InputDStream and NetworkInputDStream, for both Scala and Java.
      2c94579 [Tathagata Das] Fixed graceful shutdown by removing interrupts on receiving thread.
      9e37a0b [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into network-receiver-api
      3223e95 [Tathagata Das] Refactored the code that runs the NetworkReceiver into further classes and traits to make them more testable.
      a36cc48 [Tathagata Das] Refactored the NetworkReceiver API for future stability.
      04c37b6f
    • Patrick Wendell's avatar
      Dev script: include RC name in git tag · 5a5b3346
      Patrick Wendell authored
      5a5b3346
Loading