Skip to content
Snippets Groups Projects
  1. Sep 28, 2015
    • Sean Owen's avatar
      [SPARK-10833] [BUILD] Inline, organize BSD/MIT licenses in LICENSE · bf4199e2
      Sean Owen authored
      In the course of https://issues.apache.org/jira/browse/LEGAL-226 it came to light that the guidance at http://www.apache.org/dev/licensing-howto.html#permissive-deps means that permissively-licensed dependencies has a different interpretation than we (er, I) had been operating under. "pointer ... to the license within the source tree" specifically means a copy of the license within Spark's distribution, whereas at the moment, Spark's LICENSE has a pointer to the project's license in the other project's source tree.
      
      The remedy is simply to inline all such license references (i.e. BSD/MIT licenses) or include their text in "licenses" subdirectory and point to that.
      
      Along the way, we can also treat other BSD/MIT licenses, whose text has been inlined into LICENSE, in the same way.
      
      The LICENSE file can continue to provide a helpful list of BSD/MIT licensed projects and a pointer to their sites. This would be over and above including license text in the distro, which is the essential thing.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #8919 from srowen/SPARK-10833.
      bf4199e2
  2. May 14, 2014
    • Sean Owen's avatar
      SPARK-1827. LICENSE and NOTICE files need a refresh to contain transitive dependency info · 2e5a7cde
      Sean Owen authored
      LICENSE and NOTICE policy is explained here:
      
      http://www.apache.org/dev/licensing-howto.html
      http://www.apache.org/legal/3party.html
      
      This leads to the following changes.
      
      First, this change enables two extensions to maven-shade-plugin in assembly/ that will try to include and merge all NOTICE and LICENSE files. This can't hurt.
      
      This generates a consolidated NOTICE file that I manually added to NOTICE.
      
      Next, a list of all dependencies and their licenses was generated:
      `mvn ... license:aggregate-add-third-party`
      to create: `target/generated-sources/license/THIRD-PARTY.txt`
      
      Each dependency is listed with one or more licenses. Determine the most-compatible license for each if there is more than one.
      
      For "unknown" license dependencies, I manually evaluateD their license. Many are actually Apache projects or components of projects covered already. The only non-trivial one was Colt, which has its own (compatible) license.
      
      I ignored Apache-licensed and public domain dependencies as these require no further action (beyond NOTICE above).
      
      BSD and MIT licenses (permissive Category A licenses) are evidently supposed to be mentioned in LICENSE, so I added a section without output from the THIRD-PARTY.txt file appropriately.
      
      Everything else, Category B licenses, are evidently mentioned in NOTICE (?) Same there.
      
      LICENSE contained some license statements for source code that is redistributed. I left this as I think that is the right place to put it.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #770 from srowen/SPARK-1827 and squashes the following commits:
      
      a764504 [Sean Owen] Add LICENSE and NOTICE info for all transitive dependencies as of 1.0
      2e5a7cde
  3. Mar 23, 2014
    • Xiangrui Meng's avatar
      [SPARK-1212] Adding sparse data support and update KMeans · 80c29689
      Xiangrui Meng authored
      Continue our discussions from https://github.com/apache/incubator-spark/pull/575
      
      This PR is WIP because it depends on a SNAPSHOT version of breeze.
      
      Per previous discussions and benchmarks, I switched to breeze for linear algebra operations. @dlwh and I made some improvements to breeze to keep its performance comparable to the bare-bone implementation, including norm computation and squared distance. This is why this PR needs to depend on a SNAPSHOT version of breeze.
      
      @fommil , please find the notice of using netlib-core in `NOTICE`. This is following Apache's instructions on appropriate labeling.
      
      I'm going to update this PR to include:
      
      1. Fast distance computation: using `\|a\|_2^2 + \|b\|_2^2 - 2 a^T b` when it doesn't introduce too much numerical error. The squared norms are pre-computed. Otherwise, computing the distance between the center (dense) and a point (possibly sparse) always takes O(n) time.
      
      2. Some numbers about the performance.
      
      3. A released version of breeze. @dlwh, a minor release of breeze will help this PR get merged early. Do you mind sharing breeze's release plan? Thanks!
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #117 from mengxr/sparse-kmeans and squashes the following commits:
      
      67b368d [Xiangrui Meng] fix SparseVector.toArray
      5eda0de [Xiangrui Meng] update NOTICE
      67abe31 [Xiangrui Meng] move ArrayRDDs to mllib.rdd
      1da1033 [Xiangrui Meng] remove dependency on commons-math3 and compute EPSILON directly
      9bb1b31 [Xiangrui Meng] optimize SparseVector.toArray
      226d2cd [Xiangrui Meng] update Java friendly methods in Vectors
      238ba34 [Xiangrui Meng] add VectorRDDs with a converter from RDD[Array[Double]]
      b28ba2f [Xiangrui Meng] add toArray to Vector
      e69b10c [Xiangrui Meng] remove examples/JavaKMeans.java, which is replaced by mllib/examples/JavaKMeans.java
      72bde33 [Xiangrui Meng] clean up code for distance computation
      712cb88 [Xiangrui Meng] make Vectors.sparse Java friendly
      27858e4 [Xiangrui Meng] update breeze version to 0.7
      07c3cf2 [Xiangrui Meng] change Mahout to breeze in doc use a simple lower bound to avoid unnecessary distance computation
      6f5cdde [Xiangrui Meng] fix a bug in filtering finished runs
      42512f2 [Xiangrui Meng] Merge branch 'master' into sparse-kmeans
      d6e6c07 [Xiangrui Meng] add predict(RDD[Vector]) to KMeansModel
      42b4e50 [Xiangrui Meng] line feed at the end
      a4ace73 [Xiangrui Meng] Merge branch 'fast-dist' into sparse-kmeans
      3ed1a24 [Xiangrui Meng] add doc to BreezeVectorWithSquaredNorm
      0107e19 [Xiangrui Meng] update NOTICE
      87bc755 [Xiangrui Meng] tuned the KMeans code: changed some for loops to while, use view to avoid copying arrays
      0ff8046 [Xiangrui Meng] update KMeans to use fastSquaredDistance
      f355411 [Xiangrui Meng] add BreezeVectorWithSquaredNorm case class
      ab74f67 [Xiangrui Meng] add fastSquaredDistance for KMeans
      4e7d5ca [Xiangrui Meng] minor style update
      07ffaf2 [Xiangrui Meng] add dense/sparse vector data models and conversions to/from breeze vectors use breeze to implement KMeans in order to support both dense and sparse data
      80c29689
  4. Mar 18, 2014
    • Matei Zaharia's avatar
      Update copyright year in NOTICE to 2014 · 79e547fe
      Matei Zaharia authored
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #174 from mateiz/update-notice and squashes the following commits:
      
      47fc1a5 [Matei Zaharia] Update copyright year in NOTICE to 2014
      79e547fe
  5. Jul 16, 2013
Loading