Skip to content
  • erenavsarogullari's avatar
    7beb227c
    [SPARK-17663][CORE] SchedulableBuilder should handle invalid data access via... · 7beb227c
    erenavsarogullari authored
    [SPARK-17663][CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file
    
    ## What changes were proposed in this pull request?
    
    If `spark.scheduler.allocation.file` has invalid `minShare` or/and `weight` values, these cause :
    - `NumberFormatException` due to `toInt` function
    - `SparkContext` can not be initialized.
    - It does not show meaningful error message to user.
    
    In a nutshell, this functionality can be more robust by selecting one of the following flows :
    
    **1-** Currently, if `schedulingMode` has an invalid value, a warning message is logged and default value is set as `FIFO`. Same pattern can be used for `minShare`(default: 0) and `weight`(default: 1) as well
    **2-** Meaningful error message can be shown to the user for all invalid cases.
    
    PR offers :
    - `schedulingMode` handles just empty values. It also needs to be supported for **whitespace**, **non-uppercase**(fair, FaIr etc...) or `SchedulingMode.NONE` cases by setting default value(`FIFO`)
    - `minShare` and `weight` handle just empty values. They also need to be supported for **non-integer** cases by setting default values.
    - Some refactoring of `PoolSuite`.
    
    **Code to Reproduce :**
    
    ```
    val conf = new SparkConf().setAppName("spark-fairscheduler").setMaster("local")
    conf.set("spark.scheduler.mode", "FAIR")
    conf.set("spark.scheduler.allocation.file", "src/main/resources/fairscheduler-invalid-data.xml")
    val sc = new SparkContext(conf)
    ```
    
    **fairscheduler-invalid-data.xml :**
    
    ```
    <allocations>
        <pool name="production">
            <schedulingMode>FIFO</schedulingMode>
            <weight>invalid_weight</weight>
            <minShare>2</minShare>
        </pool>
    </allocations>
    ```
    
    **Stacktrace :**
    
    ```
    Exception in thread "main" java.lang.NumberFormatException: For input string: "invalid_weight"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:580)
        at java.lang.Integer.parseInt(Integer.java:615)
        at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
        at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
        at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:127)
        at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:102)
    ```
    ## How was this patch tested?
    
    Added Unit Test Case.
    
    Author: erenavsarogullari <erenavsarogullari@gmail.com>
    
    Closes #15237 from erenavsarogullari/SPARK-17663.
    7beb227c
    [SPARK-17663][CORE] SchedulableBuilder should handle invalid data access via...
    erenavsarogullari authored
    [SPARK-17663][CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file
    
    ## What changes were proposed in this pull request?
    
    If `spark.scheduler.allocation.file` has invalid `minShare` or/and `weight` values, these cause :
    - `NumberFormatException` due to `toInt` function
    - `SparkContext` can not be initialized.
    - It does not show meaningful error message to user.
    
    In a nutshell, this functionality can be more robust by selecting one of the following flows :
    
    **1-** Currently, if `schedulingMode` has an invalid value, a warning message is logged and default value is set as `FIFO`. Same pattern can be used for `minShare`(default: 0) and `weight`(default: 1) as well
    **2-** Meaningful error message can be shown to the user for all invalid cases.
    
    PR offers :
    - `schedulingMode` handles just empty values. It also needs to be supported for **whitespace**, **non-uppercase**(fair, FaIr etc...) or `SchedulingMode.NONE` cases by setting default value(`FIFO`)
    - `minShare` and `weight` handle just empty values. They also need to be supported for **non-integer** cases by setting default values.
    - Some refactoring of `PoolSuite`.
    
    **Code to Reproduce :**
    
    ```
    val conf = new SparkConf().setAppName("spark-fairscheduler").setMaster("local")
    conf.set("spark.scheduler.mode", "FAIR")
    conf.set("spark.scheduler.allocation.file", "src/main/resources/fairscheduler-invalid-data.xml")
    val sc = new SparkContext(conf)
    ```
    
    **fairscheduler-invalid-data.xml :**
    
    ```
    <allocations>
        <pool name="production">
            <schedulingMode>FIFO</schedulingMode>
            <weight>invalid_weight</weight>
            <minShare>2</minShare>
        </pool>
    </allocations>
    ```
    
    **Stacktrace :**
    
    ```
    Exception in thread "main" java.lang.NumberFormatException: For input string: "invalid_weight"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:580)
        at java.lang.Integer.parseInt(Integer.java:615)
        at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
        at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
        at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:127)
        at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:102)
    ```
    ## How was this patch tested?
    
    Added Unit Test Case.
    
    Author: erenavsarogullari <erenavsarogullari@gmail.com>
    
    Closes #15237 from erenavsarogullari/SPARK-17663.
Loading