Java

Object Oriented Programming

Basics

a java program can have any number of classes. the classes can have any name and the java program can have any name
- however, only one public class is allowed in a java program
- the name of both the public class and the java program should be the same

when we compile a class using javac Durga.java, the number of class files generated = number of classes present in that program

  
class A {
  
  public static void main(String args[]) {
    System.out.println("class A");
  }
}

class B {

  public static void main(String args[]) {
    System.out.println("class B");
  }
}

class C {
}

output of above program -
three pillars of object oriented programming
- encapsulation - helps with security
  - data hiding (visibility)
  - abstraction
- inheritance - helps with reusability
- polymorphism - helps with flexibility
  - compile time -
    - overloading
    - method hiding
    - variable hiding / shadowing
  - runtime -
    - overriding

Import Statements / Package

two ways for using classes from libraries -
- fully qualified name of the class
- import statement at the top - preferred
two kinds of import statements -
- explicit import - import java.util.ArrayList; - preferred
- implicit import - import java.util.*;
  - note - implicit import does not include sub packages
by default -
- all classes under java.lang package are available and need not be imported
- all classes under the same package as the package of the current class need not be imported
package - group of related java programs
different packages can have the same java program, e.g. java.util.Date and java.sql.Date
universally accepted naming convention for package naming in java - reverse of domain name

if we write the following program -

  
package com.learning.java;

class Test {
  public static void main(String args[]) {
    System.out.println("hello world");
  }
}

and try compiling using javac Test.java, it compiles. but when i tried running java Test.java, it failed

Error: Could not find or load main class Test
Caused by: java.lang.NoClassDefFoundError: Test (wrong name: com/learning/java/Test)

so we should actually compile using - javac -d . Test.java
this generates the entire directory structure of com/learning/java in the current working directory and places the Test.class there
packages help with implementing encapsulation - the entire complexity / functionality is viewed as one unit residing inside a package

Class and Method Access Modifiers

classes have two different access modifiers - public and <<default>>
- default classes are only accessible from only within the package
- public classes can be accessed from anywhere outside the package as well
for inner class, apart from abstract, final, public and <<default>>, we can also have static, private and protected modifiers
members have four different access modifiers -
- public - access method from anywhere, inside or outside package
- default - can be accessed from inside package only, not outside the package
- private - can only be accessed from within the same class, not outside the class
- protected - can be accessed from anywhere inside package, and from subclasses if outside package as well
  - small note - if accessing protected members from inside subclass but from outside package, only subclass reference can be used, not superclass (i.e. even polymorphism is not allowed)

protected example -

a/A.java -

  
package a;

public class A {

  protected void a() {
    System.out.println("from a");
  }
}

b/B.java -

  
package b;
import a.A;

class B extends A {

  public static void main(String[] args) {
    A a = new A();
    a.a();
  }
}

output -
solution - change to -
1 2 B b = new B(); b.a();

therefore, summary in tabular form -

visibility	public	protected	default	private
same class	✅	✅	✅	✅
subclass same package	✅	✅	✅
non subclass same package	✅	✅	✅
subclass different package	✅	✅ (subclass reference only)
non subclass different package	✅

access modifiers help with - data hiding - a different class cannot “directly” access the fields of another class - it would have to use public methods
note - think about member visibility only when class is visible first (recall default vs public)
access modifiers also help achieve encapsulation - interact with data members via exposed methods, not directly

Abstract Classes And Interfaces

abstract modifier is applicable for both methods and classes
abstract method is used when we do not know about the implementation of the class upfront. e.g. Vehicle class can have an abstract method getNumberOfWheels. syntax -
1 public abstract Integer getNumberOfWheels();
if a class contains even one abstract method, it would have to be declared as abstract as well
if a class is abstract, instantiation is not possible for the class
also, if for e.g. we would not like for it to be possible to instantiate a class, we can declare it as abstract even if it does not have abstract methods
subclasses are responsible to provide the implementation of the abstract methods of super class
we can have multiple levels of nesting for abstract classes as well - abstract class Vehicle -> abstract class Car -> class RangeRover
interface methods are public and abstract without us specifying anything
so, when overriding in subclass, “method should be public”
when implementing an interface -
- either override all the methods of the interface
- or make the class itself abstract

code example -

  
interface I {

  void m1();
  void m2();
}

abstract class A implements I {

  public void m1() {
  }
}

abstract variables are not supported - so only one kind of member i.e. method is allowed for abstract, not variable
so why use abstract classes and interfaces -
- mandating a structure for an implementation - “mandates” subclasses to provide implementation, else there will be compile time error
- acting as a specification / contract - e.g. we write servlet api compliant code, but that same code can run on different vendors like jetty, tomcat, weblogic, resin, oracle http server, etc which are all implementations of the same servlet api. same applies for jdbc and the different sql compliant drivers as well.
- abstraction - client will not know / need not care about the internal implementation
note - all variables inside an interface are public static final, so they need to be initialized then and there. no instance variables can be created for an interface

Inheritance

inheritance helps use is a relationship
we use extends to implement this
members of the superclass are inherited by the subclass
so, subclass can use members of the superclass
the other way around does not hold i.e. superclass reference cannot use members of subclass
all classes are implicitly subclasses of Object
main advantage of inheritance - superclass will contain common functionality, thus helping us avoid duplication of logic in subclasses
types of inheritance -
- single inheritance - one superclass and one subclass. supported in java
- multilevel inheritance - one superclass has one subclass, and that subclass again acts as a superclass for yet another subclass. this too is supported in java
- multiple inheritance - multiple superclasses, one subclass. not supported in java for classes, but supported via interfaces
  1 2 3 4 class C1 extends C2, C3 {} // compilation error interface I1 extends I2, I3 {} // works class C1 implements I1, I2 {} // works
- hierarchical inheritance - one superclass, multiple subclasses
- hybrid inheritance - combination of multiple types of inheritance
inheritance example -
confusion cleared - we just said every class extends object. if a class C1 extends another class C2, it is extending both C2 and Object. then isn’t this multiple inheritance? why did we say java does not allow multiple inheritance?
- when we do not extend any class, we extend Object implicitly
- when we extend a different class, we do not extend Object directly. so, the root class in the chain which does not have any explicit superclass extends Object implicitly. so it is basically multi level inheritance and not multiple inheritance which helps extend this subclass extend Object indirectly
note - final class cannot have a subclass

Polymorphism - Overloading

method signature - method name + argument types
in java, return type is not a part of method signature
when resolving method calls, method signature is what gets used
so, it is a compile time error if we try to add two methods with same signature, even if they have different return types
overloading - when a class has multiple method names with same but different argument types
advantage - same method is being used for multiple implementations
static polymorphism / compile time polymorphism / early binding - in case of overloading, the decision around which variation of method to use is made at compile time

example -

  
class Overloader {

  public void printer(int x) {
    System.out.println("printing an integer: " + x);
  }

  public void printer(String x) {
    System.out.println("printing a string: " + x);
  }
}

public class Overloading {

  public static void main(String[] args) {
    Overloader overloader = new Overloader();
    overloader.printer(1); // printing an integer: 1
    overloader.printer("hello"); // printing a string: hello
  }
}

automatic promotion + overloading in java - if when overloading, an exact match is not found for a primitive type, java promotes to the next available primitive type using the following rules -
- byte -> short -> int -> long -> float -> double
- char -> int -> …
so, if refer the example above - there is no overloaded method for char. so, we jump to the next type as follows -
1 overloader.printer('a'); // printing an integer: 97

if no promotion is possible, we get a compile time error -

  
overloader.printer(10.5); // Overloading.java:19: error: no suitable method found for printer(double)

if there is a clash during overloading for superclass vs subclass, subclass gets priority
e.g. null can be used both for Object and String. so, if a method is overloaded for both of them and we pass it null, it will call the String implementation
if there is clash during overloading for two classes which are independent, compiler throws an unambiguous exception
e.g. null can be used both for String and StringBuffer. so, if a method is overloaded for both of them and we pass it null, it will throw an exception
1 overloader.printer(null); // Overloading.java:24: error: reference to printer is ambiguous
since method overloading is compile time, the decision is influenced by the reference, not by the instance
e.g. if i do Object x = new String("s"), and a method is overloaded for both String and Object, the object version would be called, since the decision is made by the type of reference - if i have two variations - m1(Object obj) and m1(String str), the m1(Object obj) variation would be called

Polymorphism - Overriding

superclass reference can hold subclass instance
the other way around does not hold i.e. subclass reference can not hold superclass instance
overriding - subclass redefines method of superclass
variations -
- superclass reference pointing to superclass instance - superclass method would be called
- subclass reference pointing to subclass instance - subclass method would be called
- superclass reference pointing to subclass instance - subclass method would be called
the third variation is what interests us - compiler only checks if superclass has that method defined
the method is called actually called on the instance during execution
dynamic polymorphism / runtime polymorphism / late binding - in case of overriding, the decision around which variation of method to use is made at runtime

co variant - when overriding, we can return subclass type of what superclass returns

  
class Parent {
  public Object m1() {
    return null;
  }
}

class Child extends Parent {
  public String m1() {
    return "hello world";
  }
}

class CoVariant {
  public static void main(String[] args) {
    Parent p = new Child();
    System.out.println("covariant response = " + p.m1()); // covariant response = hello world
  }
}

if superclass method is final, we cannot override the method and we get a compile time error
if superclass method is non final, we can override the method, and can also make it final in the subclass
if method is private, there is no concept of overriding, since it is treated like an internal method. so, even if we redefine the method with the same name in the subclass, the compiler would not complain

access modifiers + overriding - when overriding, we cannot reduce the scope, but we can increase the scope

  
class Parent {
  public String m1() {
    return "from parent";
  }
}

class Child extends Parent {
  protected String m1() {
    return "from child";
  }
}

output -

attempting to assign weaker access privileges; was public

so, conclusion for access modifiers and overriding - for private methods, overriding concept is not applicable, for others -
- superclass - public, subclass can be - public
- superclass - protected, subclass can be - protected, public
- superclass - default, subclass can be - default, protected, public
exception - below is of course only applicable for checked exceptions and not unchecked exceptions. below will make sense automatically as well if we think about Parent p = new Child(); p.m1();
- if subclass does not throw an exception, superclass can or cannot throw an exception
- if subclass throws an exception, superclass should throw a superclass of exception as well
superclass public static void m1(), subclass - public void m1() - compile time error
subclass public void m1(), superclass - public static void m1() - compile time error

subclass public static void m1(), superclass - public static void m1() - works, but it is not overriding. it is method hiding. this resolution is compile time, happens by reference, and the superclass version is called

  
class Parent {
  public static String m1() {
    return "hello from parent";
  }
}

class Child extends Parent {
  public static String m1() {
    return "hello from child";
  }
}

class MethodHiding {
  public static void main(String[] args) {
    Parent p = new Child();
    System.out.println("parent reference, child object responds with = " + p.m1());
  }
}

output -

parent reference, child object responds with = hello from parent

conclusion - method hiding is also example of compile time polymorphism / static polymorphism / early binding just like overloading

variable hiding / shadowing - there is no concept of overriding for variable members. so, if we redefine the vairable in the child class as well, resolution happens via superclass

  
class Parent { String s = "parent"; }
class Child extends Parent { String s = "child"; }

class VariableShadowing {
  public static void main(String[] args) {
    Parent p = new Child();
    System.out.println(p.s); // prints 'parent'
  }
}

TODO: add double dispatch?

Object Type Casting

syntax - A b = (C) d
three checks -
- compile time check 1 - C and d should be somehow related. either should be superclass of other
  - passes compilation -
    1 2 Object o = new String("hello world"); StringBuffer sb = (StringBuffer) o;
  - fails compilation - incompatible types: String cannot be converted to StringBuffer
    1 2 String str = new String("hello world"); StringBuffer sb = (StringBuffer) str;
- compile time check 2 - obvious - C should be subclass of A or same as A
  - passes compilation -
    1 2 Object o = new String("hello world"); StringBuffer sb = (StringBuffer) o;
  - fails compilation - incompatible types: StringBuffer cannot be converted to String
    1 2 Object o = new String("hello world"); String s = (StringBuffer) o;
- runtime check 1 - actual instance d should be subclass of C or same as C. understand how this is different from compile time check 1 - there, we were checking if whatever reference is used for d, that should be somehow related to C. here however, we check if the actual runtime object that d holds is a subclass of C or same as C
  - passes runtime -
    1 2 Object o = new String("hello world"); String s = (String) o;
  - fails runtime - ClassCastException: class java.lang.String cannot be cast to class java.lang.StringBuffer
    1 2 Object o = new String("hello world"); StringBuffer sb = (StringBuffer) o;

Constructors

constructor helps with initialization
new keyword helps with instantiation
for constructor, method name should be same as the name of class
only applicable modifiers for constructors are access modifiers
use case - make the constructor private. now, an object for the class can only be created from inside the class. this can help us for e.g. implement the singleton pattern
if we do not add any constructor for a class, the compiler adds the default no args constructor for us automatically
note - this default no args constructor is added for abstract classes as well
first line in our constructor should always be super() or this
if we do not add the super call ourselves, the compiler will automatically add super() for us
note - this automatic adding of super happens for both constructors written by us and inside the default no args constructor

convoluted example -

our code -

  
class Test {
      
  Test(int i) {
    this();
  }

  Test() {

  }
}

what compiler generates

  
class Test {
      
  Test(int i) {
    this();
  }

  Test() {
    super();
  }
}

when we have code like below, we get a compilation error because when the compiler generates super() automatically, it is not enough since the superclass only one constructor - the one we manually wrote, and it requires an argument, which the compiler is not capable of defaulting
1 2 3 4 5 6 class Parent { Parent(int i) {} } class Child extends Parent { }

error -

Test.java:5: error: constructor Parent in class Parent cannot be applied to given types;
class Child extends Parent {
^
  required: int
  found:    no arguments
  reason: actual and formal argument lists differ in length
1 error

note - super has to be the first line, otherwise we get error: call to super must be first statement in constructor
note - this has to be the first line, otherwise we get error: call to this must be first statement in constructor
so, conclusion - we can only use either super() or this() and that too only in the first line of the constructor
super() or this() can only be called inside a constructor and not inside any other method
this and super keywords can also be used to reference instance variables
note - this and super are always related to an instance, so they cannot be used inside static methods
my doubt - how to handle variable hiding / shadowing for static variables? solution - maybe use class prefix instead of super?
constructor + overloading is a common pattern. we then can use this() inside them to call each other with default values for missing arguments
a constructor can throw exceptions
however, if superclass constructor throws an exception, the subclass constructor should throw the same exception or superclass exception of that exception. we cannot wrap with try catch, since super or this should be the first call

Strings

string is mutable, string buffer objects are immutable

  
class Introduction {

  public static void main(String[] args) {
    String s = new String("Durga");
    s.concat(" Software");
    System.out.println(s); // Durga

    StringBuffer sb = new StringBuffer("Durga");
    sb.append(" Software");
    System.out.println(sb); // Durga Software
  }
}

== is for reference comparison. equals in Object class works just like ==, but sometimes subclasses can override this method, e.g. String class below overrides it for content comparison, while StringBuffer does not

  
class Equality {
  
  public static void main(String[] args) {
      
    String s1 = new String("durga");
    String s2 = new String("durga");
    System.out.println(s1 == s2); // false
    System.out.println(s1.equals(s2)); // true
      
    StringBuffer sb1 = new StringBuffer("durga");
    StringBuffer sb2 = new StringBuffer("durga");
    System.out.println(sb1 == sb2); // false
    System.out.println(sb1.equals(sb2)); // false
  }
}

heap is used for storing objects. string objects can be created when we use new String(), str.concat("suffix"), etc
scp (string constant pool) is used for storing string literals. java stores them in the hopes of reusing them later
note - scp a section in the heap itself, maybe it is present in a different location when compared to where java objects are stored
while objects in heap are eligible for gc (garbage collection), objects in scp are not, because java internally maintains references to the string literals stored in scp
deeper understanding - scp is used for storing string literals. if i do str.concat("suffix"), suffix would be stored in scp, not concatenated result of str and suffix. the concatenated result will however be stored in heap
so, it almost feels like that albeit in heap, scp is more of a compile time feature, while string objects are a runtime feature
2 in heap - (s1, String(“durga”)), (s2, String(“durga”)) and 1 in scp - “durga”. s3 and s4 point to scp itself, while s1 and s2 point to both heap and scp. note how despite having the same string 4 times, it was stored only once in scp
1 2 3 4 String s1 = new String("durga"); String s2 = new String("durga"); String s3 = "durga"; String s4 = "durga";
3 in heap, out of which 1st and 2nd are eligible for gc - (,String(“durga”)), (,String(“durga software”)), (s, String(“durga software solutions”)) and 3 in scp - “durga”, “ software”, “ solutions” -
1 2 3 String s = new String("durga"); s.concat(" software"); s = s.concat(" solutions")
in below examples, we compare equality using == and not equals, maybe because equals should anyway do content comparison, but here we see which references point to the same object
equality of string literals - (equals compares reference and if not same, contents, while == just compares reference, and that evaluates to true since sl1 and sl2 are both pointing to the same object inside scp)
1 2 3 String sl1 = "durga"; String sl2 = "durga"; System.out.println(sl1 == sl2); // true
concatenation for string literals can happen at compile time as well, which is why slc1 and slc2 point to the same object stored in the scp. this is probably happening due to optimizations that are performed on instructions
1 2 3 String slc1 = "durga software"; String slc2 = "durga " + "software"; System.out.println(slc1 == slc2); // true
here, str2 is created at runtime, so str2 points to string object in heap while str3 points to string literal in scp. str2 does not point to a corresponding object in scp in this case
1 2 3 4 String str1 = "durga"; String str2 = str1 + " software"; String str3 = "durga software"; System.out.println(str2 == str3); // false
here, both strf2 and strf3 are created at compile time hence scp itself, because final variables would be replaced at compile time. understand how this behavior changed when compared to the example above, just by adding the final keyword
1 2 3 4 final String strf1 = "durga"; String strf2 = strf1 + " software"; String strf3 = "durga software"; System.out.println(strf2 == strf3); // true
the main advantage of scp - if a string is used multiple times, its instance need not be managed / tracked separately multiple times
basically, jvm maintains a reference to strings in scp, so that there is no garbage collection happening there
also, strings in scp cannot be mutated - when we make changes, new objects are stored in heap / new strings are stored in scp
string buffers do not work like strings - string buffers do not use concepts like scp etc - so it is mutable - there is no referencing to same object in scp like concepts in string buffer
in strings, concat and + both do the same thing
other important methods in strings - equalsIgnoreCase(), charAt(), length(), isEmpty(), substring(), replace() (replace a certain character), indexOf(), lastIndexOf(), toLowerCase(), toUpperCase(), trim()
string buffer - string is not meant for string content that can change frequently
strings are immutable - for every change, a new object is created
this is why we need string buffer. all changes we make happen on the same object
since string buffer is mutable, it has two concepts - capacity and length. capacity determines how many characters the string buffer can hold, while length gives the current number of characters the string buffer has
when we run out of space, memory is doubled, a new object is created and all the existing characters are copied
other important methods in string buffer - capacity() (get the current capacity), setCharAt(), append() (works with most primitive etc types), insert() (insert a string at a specific position), delete() (delete substring from specified positions), reverse() (reverse contents of string buffer), ensureCapacity() (increase capacity to specified capacity upfront)

note - all methods inside string buffer are synchronized - run javap java.lang.StringBuffer in terminal to view the profile of string buffer

public synchronized int length();
public synchronized int capacity();
public synchronized void setCharAt(int, char);
// and so on...

so, at a time, only one thread can operate on a StringBuffer, thus affecting performance of applications
so, we can also use StringBuilder
the apis of string builder are almost the same as string buffer - so, it is like a “non synchronized version of string buffer” - run javap java.lang.StringBuilder in terminal -
1 2 3 4 public void setCharAt(int, char); public int capacity(); public int length(); // and so on...
so higher performance at the cost of race conditions which we might have to take care of ourselves
side note - strings are automatically thread safe since they are immutable
method chaining - because most methods in String, StringBuffer, StringBuilder return the same object type, we can use method chaining technique

Exceptions

Throwable
- Exception
  - RuntimeException
    - ArithmeticException
    - NullPointerException
    - etc
  - IOException - used when doing file related operations etc
  - InterruptedException - used in multithreading related code etc
  - etc
- Error
unchecked exceptions
- runtime exceptions and its subtree
- error and its subtree
everything else is checked exception

try with resources - cleaner code, no need to call close explicitly, if they use the interface AutoCloseable

  
try (BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
  // ...
}

note - what we declare / assign inside the try statement is final, and cannot be reassigned
Closable extends AutoClosable. Closable throws IOException, while AutoClosable throws Exception which is more generic
when we are handling exceptions, it might happen that we lose track of the original exception, and throw another exception which is not that relevant. e.g.
- reading from a resource fails due to missing file
- closing the resource fails due to null pointer because the resource was never initialized properly
the eventual exception we get is the null pointer, but the missing file exception would have helped us more in identifying the root cause
so, we can also use ex.addSuppressed(Throwable t) or Throwable[] t = ex.getSuppressed(). this way, we can also find the original cause behind the exception
note - try with resources will automatically make use of suppressions for us bts

another note - when using try with resources, the null pointer exception will be added as a suppression to the file not found exception, because understand that the main exception that happened in the try block was file not found exception, and the null pointer exception happened inside the finally block

  
class CustomResource implements AutoCloseable {
  
  public void read() {
    throw new RuntimeException("could not read file");
  }

  @Override
  public void close() {
    throw new RuntimeException("a null pointer exception happened");
  }
}

public class SuppressionExample {

  public static void main(String[] args) {
      
    try {
      beforeJava7Way();
    } catch (Exception e) {
      System.out.println(e.getMessage());
      for (Throwable t : e.getSuppressed()) {
        System.out.println("suppress: " + t.getMessage());
      }
    }
      
    try {
      sinceJava7Way();
    } catch (Exception e) {
      System.out.println(e.getMessage());
      for (Throwable t : e.getSuppressed()) {
        System.out.println("suppress: " + t.getMessage());
      }
    }
  }

  private static void beforeJava7Way() {
    CustomResource customResource = null;
    try {
      customResource = new CustomResource();
      customResource.read();
    } finally {
      customResource.close();
    }
  }

  private static void sinceJava7Way() {
    try (CustomResource customResource = new CustomResource()) {
      customResource.read();
    }
  }
}

output -

a null pointer exception happened

could not read file
suppress: a null pointer exception happened

we can also catch multiple exceptions using a single catch block

  
try {

} catch (NullPointerException | ArrayOutOfBoundsException e) {
  e.printStackTrace();
  triggerAlert(e);
} 

Multi Threading

Concepts

there are two benefits of multithreading - responsiveness and performance
repeat - remember multithreading gives both the features above
concurrency means performing different tasks on the same core. instead of waiting for one task to entirely complete first, we perform both simultaneously in a time-shared manner. it increases responsiveness
concurrency is also called multi tasking. remember - we do not even need different cores for this
parallelism means performing different tasks on different cores. it increases performance
throughput is the number of tasks completed per unit time
latency is the time taken per unit task
how are the two different
- for optimizing throughput, since the tasks themselves are different, they just need to be scheduled on different threads in parallel, and that automatically increases the throughput. therefore, fewer considerations exist
- for optimizing latency, we would probably break a single task into smaller subtasks. considerations -
  - what parts of the original task can be performed in parallel and which parts have to be done sequentially
  - how to aggregate the smaller chunks of results into the final result
in case of multithreading, components like heaps get shared across the threads, while components like stack and instruction pointer are scoped to a single thread
what is stored inside the stack -
- local primitive types, e.g. if we declare an int a = 1 inside a method
- primitive formal parameters - similar to above - int add(int a, int b)
- references created inside functions are stored in stack
what is stored inside the heap -
- all the objects (not references) are stored in heap
- members of a class - primitive values and non primitive references - should be stored in heap - note how these when declared inside functions are stored in stacks as discussed above
ideally this makes sense - remember, each thread is executing a different instruction, so each of them needs its own instruction pointer etc
recall - a frame is created for every method call, and it can result with a stack overflow exception if we end up with too many frames
heap belongs to a process, and all threads can write to / read from the heap at any given time. all objects are stored in the heap till there is a reference to them, after which they get garbage collected
when we execute a program, it becomes a process i.e. it gets loaded into the memory from the disk and a thread is used to execute it
there are often way more processes being executed than cores in a cpu. so, using context switching, one thread at a time gets cpu and, gets paused and another thread is scheduled on the cpu
context switching has overhead, and doing a lot of it can lead to something called thrashing
however, context switching between the threads of the same process is much cheaper than context switching between the threads of different processes, since a lot of components like heaps are reused
when the operating system has to chose between scheduling multiple tasks on a thread, and if for e.g. it schedules a computationally expensive task first, it can lead to the starvation of other smaller tasks
so, to combat issues like this, there are various algorithms used by the operating system to calculate the priority of a task
we can also programmatically provide a priority which gets used in the calculation above
a thing that struck me - when writing applications, do not base your conclusions off the computer you are running your code on, base it off how it would work on the server
number of threads = number of cores is the best way to start, since context switching as discussed earlier consumes resources
however, it is only optimal if the threads are always performing some computation, and never in blocked state. if the threads perform some io, then a thread performing some computation can take its place
also, modern day computers use hyper threading i.e. the same physical core is divided into multiple virtual cores. this means that a core can run more than one thread in modern cpus

Thread Creation

we create an instance of Thread and to it, we pass an object of a class that implements Runnable. its run method needs to be overridden. all of this can be replaced by a lambda java 8 onwards
1 2 Thread thread = new Thread(() -> System.out.println("i am inside " + Thread.currentThread().getName())); thread.start();
if instead of using Runnable, we extend the Thread class, we get access to a lot of internal methods
when we run Thread.sleep, we instruct the os to not schedule that thread until the timeout is over
note misconception - invoking this method does not consume any cpu i.e. it is not like a while loop that waits for 5 seconds
we can set a name of a thread to make it helpful when debugging, using thread.setName()
we can set a priority between 1 and 10 using thread.setPriority
we can use thread.setUncaughtExceptionHandler to catch “unchecked exceptions” that might have occurred during the execution of the thread, and thus cleanup resources
we can shut down the application entirely from any thread using System.exit(0)

Thread Coordination

the application will not terminate until all threads stop
but, we might want to interrupt a thread so that the thread can maybe understand that the application wants to terminate, and accordingly handle cleaning up of resources
1 2 3 Thread thread = new Thread(new Task()); thread.start(); thread.interrupt();

the interruption can be handled gracefully in two ways as described below

if our code throws an interrupted exception, calling interrupt will trigger it, and then we can handle it. other examples where this exception happens are for calls like thread.join() and object.wait()

  
public class Task implements Runnable {

  @Override
  public void run() {
    try {
      Thread.sleep(20000);
    } catch (InterruptedException e) {
      System.out.println("[inside catch] i was interrupted...");
    }
  }
}

else we can check the property isInterrupted and handle it accordingly

  
public class Task implements Runnable {

  @Override
  public void run() {
    Date date = new Date();
    while ((new Date()).getTime() - date.getTime() < 10000) {
      if (Thread.currentThread().isInterrupted()) {
        System.out.println("[inside loop] i was interrupted...");
        break;
      }
    }
  }
}

background / daemon threads there might be a case when what the thread does need not be handled gracefully, and it is just an overhead for us to check for e.g. the isInterrupted continually. so, we can set the daemon property of the thread to true. this way when the thread is interrupted, it will be terminated without us having to handle it
1 2 3 4 Thread thread = new Thread(new Task()); thread.setDaemon(true); thread.start(); thread.interrupt();
also, unlike normal threads, where the application does not close if any thread is running, a daemon thread does not prevent the application from terminating
if we implement Callable instead of Runnable, we can also throw an InterruptedException when for e.g. we see that isInterrupted is evaluated to true. this means the parent thread calling this thread will know that it was interrupted in an adhoc manner
threads execute independent of each other. but what if thread b depends on the results of thread a?
busy wait - one way could be we run a loop in thread b to monitor the status of thread a (assume thread a sets a boolean to true). this means thread b is also using resources, which is not ideal
so, we can instead call threadA.join() from thread b, thread b goes into waiting state till thread a completes
we should also consider calling the join with a timeout, e.g. threadA.join(t)
my understanding - if for e.g. the main thread runs the below. first, we start threads t1 and t2 in parallel of the main thread. now, we block the main thread by calling t1.join(). the main thread will be stopped till t1 completes
1 2 t1.start(); t2.start(); t1.join(); t2.join();
scenario 1 - t1 completes before t2, the main thread resumes, and again will be stopped till t2 completes
scenario 2 - t1 completes after t2. the main thread resumes and will not wait for t2 since it has already completed

Thread Pooling

thread pooling - reusing threads instead of recreating them every time
tasks are added to a queue, and the threads pick them up as and when they become free
so, when tasks are cpu intensive, we should have number of threads closer to core size, and when tasks are io intensive, we should have higher number of threads, but remember that -
- too many threads can cause performance issues as well due to context switching
- threads are not trivial to create, they are resource intensive
java provides 4 kinds of thread pools - FixedThreadPool, CachedThreadPool, ScheduledThreadPool and SingleThreadedExecutor

fixed thread pool executor - polls for tasks stored in a queue. there can be many tasks, but a set number of threads which get reused. the queue should be thread safe i.e. blocking

  
int numberOfProcessors = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(numberOfProcessors);

executorService.execute(new Runnable() {...});

cached thread pool executor - it looks at its threads to see if any of them are free, and if it is able to find one, it will schedule this task on the free thread. else, it will spawn a new thread. too many threads is not too big of a problem, thanks to the keep alive timeout discussed later. however, expect out of memory exceptions if too many tasks are added to the executor, because threads are resource intensive
1 ExecutorService executorService = Executors.newCachedThreadPool();
to remember - threads occupy a lot of space in main memory, hence can cause out of memory exceptions if not controlled properly

scheduled thread pool executor - it used a delay queue, so that the tasks get picked up by the threads after the specified delay or schedule. this means tasks might have to be reordered, which is done by the queue itself. schedule can help trigger the task after a certain delay, scheduleAtFixedRate can help trigger it like a cron at regular intervals while scheduleAtFixedDelay can help schedule the next task a fixed time period after the previous task was completed

  
ScheduledExecutorService executorService = Executors.newScheduledThreadPool(5);
executorService.schedule(
    () -> System.out.println("hi from " + Thread.currentThread().getName()),
    5,
    TimeUnit.SECONDS
);

single thread pool executor - like fixed thread pool executor with size of pool as one. the advantage is for e.g. all the tasks will be run in order of creation
all thread pool executors create new threads if the previous thread is killed for some reason
there are a variety of parameters that can be added to the executors
core pool size - minimum number of threads that are always kept in the pool
max pool size - maximum number of threads that can be present in the thread pool. it has value INTEGER.MAX_VALUE by default for cached and scheduled thread pool executor, while the same value as core pool size for fixed and single thread pool executor
keep alive timeout - the time till an idle thread is kept in the pool, after which it is removed. keep alive is only applicable to cached and scheduled thread pool executors, since in fixed and single thread pool executors, the number of threads do not change
note that keep alive timeout does not change the core pool threads. this behavior can however be changed using allowCoreThreadTimeOut
queue - the different types of executors use different queues based on their requirements. the queues also need to be thread safe
- e.g. a fixed and single thread pool executor has a fixed number of threads, so there can potentially be infinite number of tasks that get queued up, because of which it uses a LinkedBlockingQueue
- cached thread pool spawns number of threads equal to the number of tasks, so it uses a SynchronousQueue, which only needs to hold one task
- scheduled thread pool uses DelayedWorkQueue so that the tasks are returned from the queue only if the condition of cron etc. is met
rejection handler - assume all threads are occupied and the queue is full. in this case, the thread pool will reject the task that it gets. how it rejects the task is determined using the rejection policy. the different rejection policies are -
- abort - submitting the new task throws RejectedExecutionException, which is a runtime exception
- discard - silently discard the incoming task
- discard oldest - discard the oldest task from the queue to add this new task to the queue
- caller runs - requests the caller thread itself to run this task
till now, to obtain an instance of ExecutorService, we were using static methods on Executors. we can also use new ThreadPoolExecutor() and then pass our own core pool size, queue, etc. configuration parameters as the constructor arguments
we need to shut down the executor in a clean way. we can initiate it using executorService.shutdown(). this will throw the RejectedExecutionException for any new tasks that are submitted to it, but at the same time will complete both all the currently executing tasks and queued up tasks
if we run shutdownNow, it will return List<Runnable> for the queued up tasks and clear the queue, but complete all the currently executing tasks
awaitTermination(timeout) will terminate the tasks if they are not completed by the specified time
we also have helper methods like isShutdown() and isTerminated()
if a task wants to return a value, we use Callable instead of Runnable
however, the execute method on ExecutorService only works if we implement Runnable interface. if we implement Callable interface, we have to use submit

the return value of Callable is wrapped around a Future. future.get() is a blocking call i.e. the thread calling it will not move ahead until the future resolves. so, we can also use future.get(timeout)

  
ExecutorService executorService = Executors.newFixedThreadPool(1);

Future<Integer> result = executorService.submit(() -> {
  Thread.sleep(4000);
  return (new Random()).nextInt();
});

Thread.sleep(3000);
// this simulates that we were able to perform 3 seconds worth of operations
// in the main thread while the task thread was performing its blocking stuff

System.out.println("result = " + result.get());

we can cancel the task using future.cancel(false). this means that the thread pool will remove the task from the queue. the false means that if a thread is already running the task, it will not do anything. had we passed true, it would have tried to interrupt the task
we also have helper methods like future.isDone() and future.isCancelled()

suppose we have a list of items, and for each item, we want to perform a series of processing

  
Future<Package> package$ = executorService.submit(() -> pack(order));
Future<Delivery> delivery$ = executorService.submit(() -> deliver(package$.get()));
Future<Email> email$ = executorService.submit(() -> sendEmail(delivery$.get()));

notice how the calling thread is blocked by all get of future. instead, we could use -

  
CompletableFuture.supplyAsync(() -> pack(order))
  .thenApply((package) -> deliver(package))
  .thenApply((delivery) -> sendEmail(delivery))
  // ...

in the above case, we have specified a series of steps to run one after another and since we do not care about the results in our main thread, the assigning of tasks to threads is managed by java itself. the main thread is not paused by the get calls. notice how we also do not need to specify any executor
if we use thenApplyAsync instead of thenApply, a different thread can be used to execute the next operation instead of the previous one
internally, CompletableFuture uses fork join pool, but we can specify a custom executor as well, e.g. thenApplyAsync(fn, executor)

Race Condition

race condition - happens where resource is shared across multiple threads

  
public class SharedResourceProblem {

  public static void main(String[] args) throws Exception {

    Integer count = 10000000;
    Counter counter = new Counter();

    Thread a = new Thread(() -> {
      for (int i = 0; i < count; i++) {
        counter.increment();
      }
    });

    Thread b = new Thread(() -> {
      for (int i = 0; i < count; i++) {
        counter.decrement();
      }
    });

    a.start(); b.start();
    a.join(); b.join();

    System.out.println("shared resource value = " + counter.getCount());
    // shared resource value = 15
  }
}

class Counter {

  private int count = 0;

  public void increment() {
    count += 1;
  }

  public void decrement() {
    count -= 1;
  }

  public int getCount() {
    return count;
  }
}

the resource += 1 and resource -= 1 operations are not atomic, it comprises of three individual operations -
- getting the original value
- incrementing it by one
- setting the new value
solutions - identify critical sections and use locks, make operations atomic, etc

Synchronized

we can wrap our code blocks with a critical section, which makes them atomic. this way, only one thread can access that block of code at a time, and any other thread trying to access it during this will be suspended till the critical section is freed
say we use synchronized on multiple methods of a class
once a thread invokes one of the synchronized method of this class, no other thread can invoke any other synchronized method of this class. this is because using synchronized on a method is applied on the instance (object) of the method
the object referred to above is called a monitor. only one thread can acquire a monitor at a time
method one - prefix method signature with synchronized (refer the counter example earlier. the shared resource print would now print 0)
1 2 3 public synchronized void increment() { // ... }

another method is to use synchronized blocks

  
synchronized (object) {
  // ...
}

using blocks, the code is much more flexible since we can have different critical sections locked on different monitors
if using synchronized on methods, two different methods of the same class cannot be executed in parallel - the monitor there is the instance itself

however, when using synchronized blocks, we can do as follows inside different methods of the same class -

  
Object lock1 = new Object();
Object lock2 = new Object();
  
// ...

synchronized(lock1) {
  // ...
}

synchronized(lock2) {
  // ...
}

note - reduce critical section size for better performance

Atomic Operations

so, assignment to references and primitive values in java are atomic
- this.name = name inside for e.g. a constructor is atomic
- int a = 8 is atomic
however, an exception in this is assignment to longs and doubles. since it is 64 bit, it happens in 2 operations - one assignment for the lower 32 bit and another one for the upper 32 bit
the solution is to declare them with volatile, e.g. volatile double a = 1.2
using volatile makes operations on longs and doubles atomic
also, java has a lot of atomic classes under java.util.concurrent.atomic as well
remember - when we use volatile, we make assignment atomic, not operations like a++ atomic

my doubt probably cleared - then what is the use for e.g. AtomicReference, if assignment to reference is already an atomic operation? we can do as follows (a metric example discussed later)

  
AtomicReference<State> state$ = new AtomicReference<>();
state$.set(initialValue);

State currentState = state$.get();
State newSate = computeNewState();
Boolean isUpdateSuccess = state$.compareAndSet(currentState, newState);

Data Race

remember - race condition and data race are two different problems

data race - when the order of operations on variables do not match the sequential code we write. this happens mostly because there are optimizations like prefetching, vectorization, rearranging of instructions, etc

  
class Pair {

  private int a = 0;
  private int b = 0;

  public void increment() {
    a++;
    b++;
  }

  public void check() {
    if (b > a) {
      System.out.println("well that doesn't seem right...");
    }
  }
}

calling the class -

  
Pair pair = new Pair();

Thread t1 = new Thread(() -> { while (true) pair.increment(); });
Thread t2 = new Thread(() -> { while (true) pair.check(); });

t1.start(); t2.start();
t1.join(); t2.join();

our expectation is that since b is read before a and a is incremented before b, there is no way even with a race condition that b can be bigger than a. however, due to data race, we do hit the print statement
data race is also where we can use volatile. volatile guarantees the order of instructions being executed
1 2 private volatile int a = 0; private volatile int b = 0;
this is called the visibility problem
basically, the two threads have their own local cache, but also have a shared cache. they write the value to the local cache, but this does not
- either update the shared cache
- or the second thread’s local cache does not refresh its value from the shared cache
however, when we use volatile, it refreshes / synchronizes both the shared cache and the local cache of all threads
basically, code before access to a volatile variable gets executed before it, and code after the access to a volatile variable after. this is called the happens before relationship
while we could have just used synchronized for both the methods above, realize the advantage of using volatile over synchronized. with synchronization, we lose out on the multithreading, since our functions would have been invoked one at a time. in this case, the two methods are still being invoked concurrently
if we have n cores, for each core we have a register. then we have an associated l1 cache on top of each register. l2 cache can be shared across multiple cores, and finally we have only one l3 cache and ram
java memory model - it is an enforcement that jvm implementations have to follow so that java programs have similar behavior everywhere, and the different optimizations of instructions, cache, etc. do not affect the functioning of the program

Locking Strategies and Deadlocks

coarse-grained locking - meaning we use one lock for everything, just like having synchronized on all methods, not performant. its counterpart is fine-grained locking
coarse grained locking example - make all methods of the class synchronized
cons with fine-grained locking - we can run into deadlocks more often
conditions for a deadlock -
- mutual exclusion - only one thread can hold the resource at a time
- hold and wait - the thread acquires the resource and is waiting for another resource to be freed up
- non-preemptive - the resource is released only when the thread is done using it and another thread cannot acquire it forcefully
- circular wait - a cyclic dependency is formed where threads wait for resources acquired by each other
one way to prevent deadlocks is to acquire locks in our code in the same order. this need not be considered when releasing the locks
another way can be to use techniques like tryLock, lockInterruptibly, etc (discussed later)
reentrant lock - instead of having a synchronized block, we use this reentrant lock
1 Lock lock = new ReentrantLock();
unlike synchronized where the block signals the start and end of the critical section, locking and unlocking happens explicitly in case of reentrant locks
to avoid deadlocks caused by for e.g. the method throwing exceptions, we should use it in the following way -
1 2 3 4 5 6 lock.lock(); try { // critical section } finally { lock.unlock(); }
it provides a lot of methods for more advanced use cases like getOwner, getQueuedThreads, isHeldByCurrentThread, isLocked, etc
the name Reentrant comes from the fact that the lock can be acquired by the thread multiple times, which means it would have to free it multiple times as well, e.g. think about recursive calls. we can get the number of times it was acquired using getHoldCount
another benefit of using reentrant locks is fairness - e.g. what if a thread repeatedly acquires the lock, leading to the starving of other threads? we can prevent this by instantiating it using new ReentrantLock(true)
note that introducing fairness also has some overhead associated with it, thus impacting performance
if we do not set to true, what we get is a barge in lock i.e. suppose there are three threads waiting for the lock in a queue. when the thread originally with the lock releases it, if a new thread not in the queue comes up to acquire the lock, it gets the lock and the threads in the queue continue to stay there. however, if we had set the fairness to true, the thread with the longest waiting time gets it first
so, two problems - “fairness” and “barge in lock” are solved by reentrant lock
if the lock is not available, the thread of course goes into the suspended state till it is able to acquire the lock
we can use lockInterruptibly - this way, another thread can for e.g. call this_thread.interrupt(), and an interrupted exception is thrown. this “unblocks” the thread to help it proceed further. had we just used lock, the wait would have been indefinite
1 2 3 4 5 try { lock.lockInterruptibly(); } catch (InterruptedException e) { // cleanup and exit }
similar to above, we also have the tryLock method, which returns a boolean that indicates whether a lock was successfully acquired. it also accepts timeout as a parameter, what that does is self-explanatory

this can help, for e.g. in realtime applications to provide feedback continuously without pausing the application entirely

  
while (true) {
  if (lock.tryLock()) {
    try {
      // critical section
    } finally {
      lock.unlock();
    }
  } else {
    // some logic  
  }
  // some logic
}

so, we saw how reentrant lock, which while works like synchronized keyword, has additional capabilities like telling current owner and locking using different strategies like lockInterruptibly and tryLock
when locking till now, we used mutual exclusion to its fullest. but, we can be a bit more flexible when the shared resource is just being read from and not written to
multiple readers can access a resource concurrently but multiple writers or one writer with multiple readers cannot

this is why we have ReentrantReadWriteLock

  
ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
Lock readLock = lock.readLock();
Lock writeLock = lock.writeLock();

fairness in ReentrantReadWriteLock works the same way as ReentrantLock, except that if the thread waiting for the longest time was a reader, all reader threads in the queue are freed up to read
of course, base decisions off of type of workloads - if workload is read intensive, read write lock is better, otherwise we might be better off using the normal reentrant lock itself

Inter Thread Communication

semaphore - it helps restrict number of users to a resource
remember - locks only allow one user per resource, but semaphores allow multiple users to acquire a resource

so, we can call a lock a semaphore with one resource

  
Semaphore semaphore = new Semaphore(number_of_permits);

when we call semaphore.acquire() to acquire a permit, and the number of permits reduces by one. if no permits are available at the moment, the thread is blocked till a resource in the semaphore is released
similarly, we have semaphore.release()
optionally, i think both acquire and release accept n, the number as an argument which can help acquire / release more than one permit
another major difference from locks - there is no notion of owning thread in semaphores unlike in locks - e.g. a semaphore acquired by thread a can be released by thread b. so, thread a can acquire it again without having ever released it
this reason also makes semaphores are a great choice for producer consumer problems. producer consumer problem using semaphores -
- we need a lock so that multiple threads cannot touch the queue at one go
- we start with the full semaphore being empty and the empty semaphore being full, since there are no items initially
- look how we use semaphore’s philosophy to our advantage - consumer threads acquire full semaphore while producer threads release it
- my understanding of why we need two semaphores - e.g. if we only had full semaphore - producer releases it and consumer acquires it - how would we have “stopped” the producer from producing when the rate of production > rate of consumption? its almost like that the two semaphores help with back pressure as well
  1 2 3 4 5 Integer CAPACITY = 50; Semaphore empty = new Semaphore(CAPACITY); Semaphore full = new Semaphore(0); Queue<Item> queue = new ArrayDeque<>(CAPACITY); Lock lock = new ReentrantLock();

producer code -

  
while (true) {
  empty.acquire();
  Item item = produce();
  lock.lock();
  try {
    queue.add(item);
  } finally {
    lock.unlock();
  }
  full.release();
}

consumer code -

  
while (true) {
  full.acquire();
  Item item;
  lock.lock();
  try {
    item = queue.poll();
  } finally {
    lock.unlock();
  }
  consume(item);
  empty.release();
}

some different inter thread communication techniques we saw till now -
- calling interrupt from one thread on another thread. this is then further used in techniques like lockInterruptibly
- calling join for a thread to wait for another thread to complete its job
- using acquire and release on semaphore

conditions flow -

one thread checks a condition, and goes to sleep if the condition is not met
a second thread can “mutate the state” and signal the first thread to check its condition again
if the condition is met, the thread proceeds, else it can go back to sleep
note - conditions come with a lock, so that the “state” being modified can be wrapped with a critical section
note - when we call await on the condition, it also releases the lock before going to sleep, so that the second thread described in the flow above can acquire the lock to mutate the state. so, even though the thread which was waiting gets signaled to wake up, it also needs to be able to acquire the lock again, i.e. the other threads modifying state need to release the lock
placing the condition inside the while loop helps so that even if signalled, it will again start waiting if the condition is not met yet

first thread -

  
ReentrantLock lock = new ReentrantLock();
Condition condition = lock.newCondition();

lock.lock();
try {
  while (condition x is not met) {
    condition.await();
  }
} finally {
  lock.unlock();
}

second thread -

  
lock.lock();
try {
  // modify variables used in condition x...
  condition.signal();
  // despite signalling, thread one does not wake up, we need to unlock the lock first
} finally {
  lock.unlock();
}

conditions also have advanced methods like -
- await(timeout) - just like locks have timeouts to prevent indefinite waiting
- signalAll - using signal, only one of all the threads waiting on the condition wake up, signalAll wakes all of them up
the class Object, and therefore all objects have methods wait, notify and notifyAll
therefore, without using any special classes -
- simulate conditions using wait, notify and notifyAll
- simulate locks using synchronized
note - recall how when using conditions we were wrapping it via locks. we need to do the same thing here i.e. wrap using synchronized block in order to be able to call notify
- first thread -
  1 2 3 4 5 synchronized (this) { while (condition x is not met) { wait(); } }
- second thread. my understanding - but needs to happen on same object and inside different method -
  1 2 3 4 synchronized(this) { // modify variables used in condition x... notify(); }
when we call wait on an object, the thread it was called on continues to be in waiting state until another thread calls notify on that object
notify will wake up any random thread that was sleeping, and to wake up all threads we can use notifyAll
if we think about it, the lock.lock() and lock.unlock() are the starting and ending of synchronize blocks respectively, condition.await() is like wait() and condition.signal() like notify()
introducing locks can make our code more error-prone, more subject to deadlocks etc. however, it makes the code more flexible, e.g. unlike synchronized blocks which have to exist within a single method, locks can be acquired and freed from different methods
using locks result in issues like deadlocks if coded improperly
our main objective is to execute instructions as a single hardware operation

we can achieve this by using Atomic classes provided by java

  
AtomicInteger count = new AtomicInteger(initialValue);
count.incrementAndGet();

recall how we had discussed that a = a + 1 actually consisted of three atomic operations, which has all been condensed down into one using these java helper classes
so, recall the counter example in shared resource earlier, and how we had solved it using synchronized. we can now get rid of the synchronized and implement it as follows -
1 2 3 public void increment() { count.incrementAndGet(); }
the disadvantage of using these classes is of course that only each operation by itself is atomic, a series of such calls together is not atomic, so it may be good only for simpler use cases
a lot of operations use compareAndSet underneath, and we have access to it to. it sets the value to the new value if the current value matches the expected value. otherwise, the old value is retained. it also returns a boolean which is true if the current value matches the expected value
1 count.compareAndSet(expectedValue, newValue);
AtomicReference can be used for any object type to get and set values in a thread safe i.e. atomic way, and we can use methods like compareAndSet on it

e.g. notice how below, the synchronized keyword is not used for addSample, but we still have a thread safe implementation by using compareAndSet. note how and why we use a loop - if the old value stays the same before and after calculating the new value, then update using the new value, else recalculate using the new value using the “new old value”

  
class Metric {

  int count = 0;

  int sum = 0;
}

class MetricAtomic {

  AtomicReference<Metric> metric$ = new AtomicReference<>(new Metric());

  public void addSample(int sample) {
    Metric currentMetric;
    Metric newMetric;
    do {
      currentMetric = metric$.get();
      newMetric = new Metric();
      newMetric.count = currentMetric.count + 1;
      newMetric.sum = currentMetric.sum + sample;
    } while (!metric$.compareAndSet(currentMetric, newMetric));
  }
}

we often have a lot of tasks but not so many threads. some objects are not thread safe i.e. cannot be used by multiple threads. however, they can be used by multiple tasks being executed on the same thread. coding this ourselves can be tough, which is why we have ThreadLocal, which basically returns a new instance for every thread, and reuses that instance when a thread asks for that instance again
1 public static ThreadLocal<Car> car = ThreadLocal.withInitial(() -> new Car());
spring uses the concept of this via ContextHolders in for instance, RequestContextHolder, TransactionContextHolder, SecurityContextHolder, etc. my understanding - since spring follows one thread per-request model, this way, any of the services, classes, etc. that need access to information can get it easily. it is like setting and sharing state for a request

High Performance IO

what is blocking io - when cpu is idle, e.g. when reading from database etc
such io bound tasks block the thread till they return the result
io bound tasks are very common in web applications etc
how it works internally -
- the controllers like network cards return the response to the dma (direct memory access)
- the dma writes it to the memory
- the dma notifies the cpu that the response is available
- the cpu can now access the memory for variables
so, during this entire duration, the thread that was processing the request that involved the io task (and thus reaching out to the controller) was sitting idle and thus was blocked
this is why number of threads = number of cores does not give us the best performance when we have more io bound instead of cpu intensive tasks
this is why we have a “thread per request model” in spring mvc, which i believe caps at 200 threads to prevent out of memory errors etc
it has caveats like -
- creating and managing threads are expensive - recall how it has its own stack etc
- number of context switching increases, which too is an expensive operation - recall thrashing
- assume that there are two kinds of calls a web server supports - one that makes a call to an external service and one that calls the database. assume the external service has a performance bug, which makes the first call very slow. this way, if we had for e.g. 150 requests for first call and 150 for the second call (assume 200 is the default thread pool size in embedded tomcat), the 150 instances of the second call would start to be affected because of the 150 instances of the first call now
so, the newer model used by for e.g. spring web flux is asynchronous and non blocking
the thread is no longer blocked waiting for the response - a callback is provided which is called once the request is resolved
so now, we can go back to the thread per core model - which is much more optimal
there can be problems like callback hell etc, which is solved by using libraries like project reactor for reactive style of programming, which is more declarative to write

Virtual Threads

till now, the Thread class we saw was actually a wrapper around an actual os thread
these are also called platform threads - since they map one to one with os threads
virtual threads - they are not directly related to os threads. they are managed by the jvm itself
this makes them much less resource intensive
the jvm manages a pool of platform threads, and schedules the virtual threads on these platform threads one by one
once a virtual thread is mounted on a platform thread, it is called a carrier thread
if a virtual thread cannot progress, it is unmounted from the platform thread and the platform thread starts tracking a new virtual thread
this way, the number of platform threads stay still small in number and influenced by the number of cores
there is no context switching overhead just like in reactive programming - what we are saving on here - frequent normal (hence platform hence os threads) context switching is replaced by frequent virtual thread context switching

creation techniques -

  
Runnable runnable = () -> System.out.println("from thread: " + Thread.currentThread());

new Thread(runnable).start(); // platform thread (implicit)
// from thread: Thread[#19,Thread-0,5,main]

Thread.ofPlatform().unstarted(runnable).start(); // platform thread (explicit)
// from thread: Thread[#20,Thread-1,5,main]
  
Thread.ofVirtual().unstarted(runnable).start(); // platform thread
// from thread: VirtualThread[#21]/runnable@ForkJoinPool-1-worker-1

note - virtual threads are only useful when we have blocking io calls, not when we have cpu intensive operations
this happens because unlike the usual model where our thread had to sit idle for the blocking call, the platform thread never stops here and is always working, it is the virtual thread that is sitting idle, and hence we optimize our cpu usage because we are using our platform threads optimally
so, developers still write the usual blocking code, which simplifies coding, as compared to say reactive programming
underneath, the blocking calls have been refactored for us to make use of virtual threads so that the platform threads are not sitting idle
e.g. cached thread pools replacement is new virtual thread per task executor - we do not have to create pools of fixed size - we use a thread per task model and all the complexity is now managed by jvm for us bts
when we are using normal threads for blocking calls e.g. using jpa, the thread cannot be used. what we can do is use context switching to utilize the cpu better. however, this model meant we needed a lot of platform threads, and managing them, context switching between them, etc has a lot of overhead, which is why maybe embedded tomcat for instance had a cap of about 200 threads. now with virtual threads, there is no cap needed, so it can be used via cached thread pool executor equivalent, but here there would never be any out of memory etc issues like in cached thread pool executor, since virtual threads are very lightweight
some notes -
- virtual threads are always daemon, and making them non daemon will throw an exception
- virtual threads do not have a concept of priority

Miscellaneous Notes

io bound threads are prioritized more than computation based threads
- since most of the time of ios threads is spent in waiting state
- and most of the time of cpu bound threads is spent in computation
- maybe this is related to concepts of starvation etc somehow
why context switching is expensive - the entire state of the thread has to be saved in memory - all the stack, instruction pointer, local variables inside the method, etc
thread yield - helps hint to the scheduler that the current thread wants to give up its processor
- can be used by computationally expensive threads to hint the scheduler that they want to give up the processor for another thread
the priority we set manually only serves as a hint - the os can choose to accept / ignore it

thread.start() is not the same as thread.run() - thread.run() simply runs the runnable we pass to it inside the calling thread

  
public static void main(String [] args) {

  Thread thread = new Thread(() -> 
    System.out.println("Hello from " + Thread.currentThread().getName()));
  thread.setName("New Thread");
  thread.run(); // Hello from main
}

Thread.State - an enum, with the following states -
- NEW - created but not yet started
- RUNNABLE - thread is available for execution / already executing on some processor
- BLOCKED - blocked for a monitor / lock
- WAITING - a thread goes into this state after we call object.wait() or some_thread.join() - so, the idea is that the thread now waits for some other thread’s action?
  - a thread can also go “out” of this state after we call object.notify() from elsewhere to wake this thread up
- TIMED_WAITING - same as above but with timeouts? threads on calling Thread.sleep also go to this state
- TERMINATED - after thread has finished execution
when we override the start method, we need to call super.start()
when we say Thread.sleep(x), first the thread goes into timed_waiting state. after the time when it does go into runnable state, there is no guarantee that the thread will iimediately be scheduled on a core - a core might be occupied by some other thread
an IIllegalMonitorStateException is thrown if we try to call await / signal on a Condition without locking the lock first

Generics

what is generics -

helps extend java’s type system - types now start acting like parameters that we as clients can provide

to allow a type or method to operate on objects of various types to thus allow reusability. e.g. without generics, we would use overloading, which causes a lot of duplication of logic -

  
class OverloadingProblem {

  public static Double add(Double a, Double b) {
    return a + b;
  }

  public static Integer add(Integer a, Integer b) {
    return a + b;
  }

  public static void main(String[] args) {
    System.out.println(add(1, 5));
    System.out.println(add(1.2, 5.3));
  }
}

while providing compile time safety - e.g. without using generics, we would use type casting, which has two compile time checks but one runtime check, and catching errors at compile time > catching them at runtime -

  
class TypeCastingProblem {
      
  private static Object item = null;

  public static void setItem(Object item) {
    TypeCastingProblem.item = item;
  }
      
  public static Object getItem() {
    return item;
  }

  public static void main(String[] args) {
    setItem(1.4);
    Integer item = (Integer) getItem();
  }
}

output -

we use the diamond operator for generics

  
class Pair<K, V> {

  private K key;
  private V value;

  public Pair(K key, V value) {
    this.key = key;
    this.value = value;
  }

  @Override
  public String toString() {
    return "{ " + key + ": " + value + " }";
  }
}

class GenericExample {

  public static void main(String args[]) {
    Pair<String, Integer> score = new Pair<>("maths", 85);
    System.out.println(score);
  }
}

generic method - my understanding - this is useful when the class itself is not generic / maybe method and class generics do not mean the same thing, so we can for e.g. use T for class and V for method

  
class GenericMethod {

  public static <T> void printer(T arg) {
    System.out.println("value is: " + arg);
  }
    
  public static void main(String args[]) {
    printer(1);
    printer("hello");
  }
}

while the return type above is void, we could have for e.g. returned T etc as well
bounded generic types - bound the types that are allowed to be used, to get access to the additional functionality that is present in the types used in these bounds, e.g. only allow Number and its subclasses to be used for a generic class containing mathematical utilities
we use the extends keyword to achieve bounded generic types, and the target type should be a subclass of the interface / class mentioned in this clause
1 2 3 public static <T extends Comparable<T>> T calculateMin(T a, T b) { return (a.compareTo(b) < 0) ? a : b; }
e.g. copy is implemented as follows - my understanding - this is to help make use of dynamic polymorphism + bounded types. note - just because we can do superclass_reference = subclass_reference, does not mean we can do List<superclass_reference> = List<subclass_reference>
1 public static <T> void copy(List<? super T> dest, List<? extends T> src)
we can also specify multiple bounds using &

type inference - determine the types automatically. some examples -

java can automatically guess “the most specific type” that both String and ArrayList can work with - Serializable

  
class TypeInference {

  public static <T> T getFirst(T a, T b) {
    return a;
  } 

  public static void main(String[] args) {
    Serializable result = getFirst("hello world", new ArrayList<Integer>());
  }
}

we use List<String> list = new ArrayList<>(); and not new ArrayList<String>()
we use list.add("name") and not list.<String>add("name")

note - just because Number and Integer are related via inheritance, it does not mean List<Number> and List<Integer> are somehow related as well

this is the motivation behind wildcards

  
import java.util.List;

class Wildcards {

  private static void print(List<Object> list) {
    list.forEach(System.out::println);
  }

  public static void main(String[] args) {
    List<Integer> list = List.of(1, 2, 3);
    print(list);
  }
}

// error: incompatible types: List<Integer> cannot be converted to List<Object>

// solution - notice the use of ?
// private static void print(List<?> list) { ...

upper bounded wildcards - when we use ? and extends, e.g. allow all lists where the type of element is a subclass of the class specified in the generic method signature
drawback - e.g. while we can print all elements of the list easily, we cannot add an element to the list - e.g. the list is actually of integer, and we might be trying to add a double to the list. since java cannot identify this problem, it gives a compile time error

e.g. this works perfectly

  
private static void printList(List<? extends Number> numbers) {
  numbers.forEach(System.out::println);
}

public static void main(String[] args) {
  printList(List.of(1, 2, 3));
  printList(List.of(1.1, 2.2, 3.3));
}

however, if we add the below to the printList method -

  
private static void printList(List<? extends Number> numbers) {
  numbers.forEach(System.out::println);
  numbers.add(7);
}

we get the error below -

BoundedWildCardsExtends.java:7: error: incompatible types: int cannot be converted to CAP#1
  numbers.add(7);
              ^
where CAP#1 is a fresh type-variable:
  CAP#1 extends Number from capture of ? extends Number

lower bounded wildcards - when we use ? and super, e.g. allow all lists where the type of element is a superclass of the class specified in the generic method signature
so now, since java knows that the list passed to has elements of supertype of specified type, we can now add elements to the list of that type (dynamic polymorphism)

drawback - we cannot read from the list - we have to treat the element as type Object

  
public static void addToList(List<? super Double> list) {
  list.add(1.4);
}

public static void main(String[] args) {
  List<Object> list = new ArrayList<>();
  list.add(1); list.add("shameek");
  addToList(list);
  System.out.println(list);
}

use case of wildcards + bounded types - copy elements from one list to another -

  
public <T> void copy(List<? extends T> source, List<? super T> destination) {
  source.forEach(destination::add);
}

so, we should -
- use “lower bounded wildcards” when we want to perform some kind of mutation
- use “upper bounded wildcards” when we want to read values
- use “type parameters” when we want to do both reading and writing
one difference between “type parameters” and “wildcards” is that type parameters allow for multiple bounds unlike wildcards, e.g. following is valid - <T extends XYZ & PQR>
rule of thumb? - use wildcards when possible, when not possible (e.g. we want to influence return type based on arguments), then use type parameters
type erasure - java replaces all generic types we define with either Object, or the bound if a bound is specified
as a part ofo this, java might introducing type casting etc as well

e.g. the code below -

  
List<Integer> list = new ArrayList<>();
list.add(1);
Integer ele = list.get(0);
  
class Store<T extends Serializable> { T item; }

is converted to this code due to type erasure -

  
List list = new ArrayList();
list.add(1);
Integer ele = (Integer) list.get(0);
  
class Store { Serializable item; }

Collections

List

ArrayList allows for control over ordering of elements
all items are identified by an index
items are located right next to each other in ram, thus making random access via index o(1)
searching for items based on value is however o(n)
adding items at the end is o(1)
adding items at random positions is o(n), since it requires shifting of items by one position
same logic is applicable for removal of items - o(1) for removing items from the end and o(n) for removing items from arbitrary positions
size of array lists in java can change dynamically - once the amount of memory allocated gets over, a list with memory equal to double the size of the current list is provisioned, and all the items from the current list are copied over to the new list
however, this ability to resize dynamically comes at a price - it takes o(n) time for this resize + copying over of items to the new location to happen
however, when instantiating, we can provide the initial capacity, so that this resizing does not have to happen often
disadvantage of array lists - when removing / adding items at random positions, a lot of shifting is needed to maintain the contiguous nature
this problem is not there when using LinkedList
since in linked lists, there is only a pointer to the next element that needs to be maintained
disadvantage - linked lists don’t allow random access with given index at o(1) time
note - linked list in java is optimized -
- implemented as doubly linked list which allows it traversal in both directions
- maintains pointers to both head and tail - e.g. we can do use both addFirst and addLast at o(1) time

linked list vs array list performance for adding elements at the beginning -

  
import java.util.List;
import java.util.ArrayList;
import java.util.LinkedList;

class ListPerformance {

  public static void main(String args[]) {
    perform("linked list", new LinkedList<>());
    perform("array list", new ArrayList<>());
  }

  private static void perform(String type, List<Integer> list) {
    long start = System.currentTimeMillis();
    for (int i = 0; i < 500000; i++) {
      list.add(0, i);
    }
    long end = System.currentTimeMillis();
    System.out.println("time taken by " + type + ": " + (end - start) + "ms");
  }
}

output -

time taken by linked list: 75ms
time taken by array list: 43375ms

note - while we compared linked list to array lists above, as discussed later, if removing or adding to one of the ends, the most performant option we have is array deque, not stacks, not linked lists
vector - synchronized implementation of array list i.e. all operations like add etc will do acquiring and releasing of lock
generally, doing this using our own locks might be better, since we get more flexibility, e.g. batch multiple operations under one acquiring + releasing of lock
stack - lifo structure (last in first out)
important operations include push, pop and peek

note - stacks use vectors underneath, so they are inherently synchronized

  
Stack<String> stack = new Stack<>();
stack.push("jane");
stack.push("jackson");
System.out.println(stack); // [jane, jackson]
System.out.println(stack.pop()); // jackson

to avoid using synchronized version, we can use array dequeue instead

Queues

queues - fifo structure (first in first out)
important operations include add (enqueue), remove (dequeue) and peek (retrieve but not remove last element)

queues are abstract like stack as well - it is implemented using linked lists

  
Queue<String> queue = new LinkedList<>();
queue.add("jane");
queue.add("jackson");
System.out.println(queue); // [jane, jackson]
System.out.println(queue.remove()); // jane

priority queue - objects being stored inside a priority queue should extend the Comparable interface
this helps retrieve items form the structure in the order of their priority
dequeue - double ended queue - o(1) for operating from either side of the collection. it is implemented by array dequeue and just like normal queues, we can implement it using linked lists instead as well

note - java calls it deque and not dequeue

  
Deque<String> dequeOne = new LinkedList<>();
Deque<String> dequeTwo = new ArrayDeque<>();

my doubt about performance - based on the fact that array dequeue might be using an array underneath, doing the typical “capacity resizing” that we discussed, would we have an even more performant solution if we were to use linked list? be it for implementing stacks or queues, logically performance of linked lists > array dequeues (dynamic capacity resizing issue) > stacks (synchronization issue)
based on this answer, apparently not, because the main overhead that comes with linked lists is the extra creation of that node, garbage collection of that node, etc
so, it is probably safe to conclude that in java, when we are looking for stack or queue implementation, we should use array dequeues almost always (over the stack since it is synchronized, and linked lists since it has memory overhead?)
also, do not use array lists blindly - if we just have to remove and add elements to either ends, and do not need random access, array dequeues might be better than array lists (inserting at beginning of array list is o(n) and inserting at beginning of array deque is o(1))

Maps

key value pairs
also called associative arrays
with maps, we ensure times of o(1) for adding, removing and lookup
maps are unordered / do not support sorting
the idea is that since keys in a map are unique, we transform the keys into an index between 0 to length - 1 of array using a hash function. then, accessing elements via the given key becomes o(1) - we just need to translate the key into an index using the hash function, and random access of elements in an array is an o(1) operation
the hash function should be able to handle the type of key - e.g. if the key is an integer, using modulo operator with the length of array is enough, if the key is a string then ascii value of characters can be used and so on
collision in hash tables - the hash function we used result in the same value for multiple keys
overwrite - replace current value with new incoming value
chaining - each bucket in the hash table can store a linked list. worst case scenario - all keys evaluate to the same value, so the entire map is just a single big linked list stored in one bucket, thus resulting in an o(n) complexity instead of o(1)
open addressing -
- linear probing - try finding the next available empty slot - k + 1, k + 2, k + 3, … disadvantage - clusters are formed i.e. elements with same hash are clustered together
- quadratic probing - try finding the next available empty sot using a quadratic polynomial - k + 1, k + 4, k + 9, k + 16, …
- rehashing - perform another hashing on the key till an empty slot is found - h(h(h….(x)))
so actually, worst case in hash tables for all operations - insertions, deletions and lookups are o(n)
load factor - n / m, where n = number of items in the hash table and m = size of the array. if it is close to 1, the probability of collision will increase
so, we can also do dynamic resizing of hash tables. disadvantage - this resizing is an o(n) operation
in java, for HashMap, when the load factor becomes around 0.75, the dynamic resizing happens
however, hash maps cannot be used in multithreaded scenarios, since they are not synchronized
some important methods available in maps - keySet(), entrySet(), values()

auto generated hash code example - look at how a prime number is used to generate a function with less collision chances

  
class Person {

  private Integer age;

  private String name;

  @Override
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((age == null) ? 0 : age.hashCode());
    result = prime * result + ((name == null) ? 0 : name.hashCode());
    return result;
  }
}

note - the equals needs to be overridden as well. it might happen that due to chaining discussed earlier, multiple items end up in the same bucket of hash table. at that point, java might need to be able to differentiate between two different elements of the same hash
so basically, java uses both chaining and dynamic resizing based on load factor by the looks of it

LinkedHashMaps vs HashMaps - linked hash maps use a doubly linked lists underneath to track the “order of insertion”, so the keys are basically ordered according to insertion time

  
Map<String, Integer> hashMap = new HashMap<>();
hashMap.put("aaa", 1); hashMap.put("bbb", 2); hashMap.put("ccc", 3);
System.out.println(hashMap); // {aaa=1, ccc=3, bbb=2}

Map<String, Integer> linkedHashMap = new LinkedHashMap<>();
linkedHashMap.put("aaa", 1); linkedHashMap.put("bbb", 2); linkedHashMap.put("ccc", 3);
System.out.println(linkedHashMap); // {aaa=1, bbb=2, ccc=3}

balanced bst (binary search trees) - red black trees and avl trees
tree rotations are used to maintain this structure
tree maps use red black trees unlike in hash maps, where an array like structure is used

so, the keys are stored in sorted order in tree maps. notice how it is automatically fr us below -

  
Map<String, Integer> treeMap = new TreeMap<>();
treeMap.put("ccc", 3); treeMap.put("bbb", 2); treeMap.put("aaa", 1);
System.out.println(treeMap); // {aaa=1, bbb=2, ccc=3}

because it uses trees, operations have a guaranteed complexity of o(log n) in tree maps, whereas operations have mostly o(1) but sometimes o(n) complexity in case of hash maps
my understanding - since a bst is being used, concept of collision, load factor, etc do not exist in tree maps unlike in hash maps
so, for huge workloads, while we might have to consider tuning the load factor in case of hash set, we do not have to think about it in case of a tree set
note - in newer versions, hash maps does not use linked lists (chaining) for each bucket, it uses red black trees for each bucket. this further optimizes the hash maps now
because of the very nature - using a red black tree per bucket, using an array to store the multiple keys, etc - memory required by hash maps > tree maps
but remember, reducing time > reducing memory with cloud etc

Sets

they allow no duplicates
hash sets and hash maps work in the same way - a one dimensional array is used to store the elements by performing a hash on the element
some important functions - add, remove, retainAll (calling set2.retainAll(set1) will retain all the elements in the set2 present in set1, and remove other elements from set2)
so, operations are mostly are o(1) but can be o(log n) in worst case / o(n) when dynamic resizing is needed
again, linked hash sets are same as hash maps, the insertion order would be maintained, which is maintained with the help of an additional doubly linked list
finally, tree set are same as tree maps - maintain elements in a sorted order using a red black tree underneath, thus making operations o(log n) in general
tree sets come with their own additional methods - e.g. subset(a, b) will give us a new set with all values of the set present between a and b, first for getting the first element, etc

Sorting

sort - notice “reverse order” below -

  
List<Integer> list = new ArrayList<>();
list.add(3); list.add(2); list.add(1); list.add(4); list.add(5);
System.out.println(list); // [3, 2, 1, 4, 5]

Collections.sort(list);
System.out.println(list); // [1, 2, 3, 4, 5]

Collections.sort(list, Collections.reverseOrder());
System.out.println(list); // [5, 4, 3, 2, 1]

we can implement Comparable on our custom classes to be able to sort them directly -

  
class Person implements Comparable<Person> {

  String name;
    
  Integer age;

  Person(String name, Integer age) {
    this.name = name;
    this.age = age;
  }

  public int compareTo(Person person) {
    Integer nameDiff = name.compareTo(person.name);
    Integer ageDiff = age.compareTo(person.age);
    return ageDiff != 0 ? ageDiff : nameDiff;
  }

  public String toString() {
    return "Person(name=" + name + ", age=" + age + ")";
  }
}

class CustomSortComparable {

  public static void main(String[] args) {
      
    List<Person> people = new ArrayList<>();
    people.add(new Person("ayan", 25));
    people.add(new Person("ruth", 5));
    people.add(new Person("jack", 25));
    people.add(new Person("jane", 25));
    people.add(new Person("mike", 20));
    System.out.println(people);
    // [Person(name=ayan, age=25), Person(name=ruth, age=5), Person(name=jack, age=25), Person(name=jane, age=25), Person(name=mike, age=20)]

    Collections.sort(people);
    System.out.println(people);
    // [Person(name=ruth, age=5), Person(name=mike, age=20), Person(name=ayan, age=25), Person(name=jack, age=25), Person(name=jane, age=25)]
  }
}

Comparator use cases -

we want to sort using multiple techniques. compareTo can only have one implementation, therefore lacks flexibility
we want to sort a class not in our control i.e. we cannot change the class to make it implement Comparable
also helps achieve separation of concerns

  
class PersonAgeComparator implements Comparator<Person> {

  @Override
  public int compare(Person person1, Person person2) {
    return person2.age.compareTo(person1.age);
  }
}

Collections.sort(people, new PersonAgeComparator());
System.out.println(people);
// [Person(name=ayan, age=25), Person(name=jack, age=25), Person(name=jane, age=25), Person(name=mike, age=20), Person(name=ruth, age=5)]

Collections.sort(people, new PersonAgeComparator().reversed());
System.out.println(people);
// [Person(name=ruth, age=5), Person(name=mike, age=20), Person(name=ayan, age=25), Person(name=jack, age=25), Person(name=jane, age=25)]

using lambdas - for a more functional style, we can use the following syntax as well 🤯 -

  
Collections.sort(
  people,
  Comparator.comparing(Person::getAge).reversed().thenComparing(Person::getName)
);
System.out.println(people);
// [Person(name=ayan, age=25), Person(name=jack, age=25), Person(name=jane, age=25), Person(name=mike, age=20), Person(name=ruth, age=5)]

Miscellaneous

some methods, refer docs for more -

  
List<Integer> list = new ArrayList<>();
list.add(5); list.add(1); list.add(2); list.add(4); list.add(3);
  
System.out.println("original list = " + list); // original list = [5, 1, 2, 4, 3]

Collections.shuffle(list);
System.out.println("shuffled list = " + list); // shuffled list = [3, 1, 5, 4, 2]

Collections.reverse(list);
System.out.println("reversed list = " + list); // reversed list = [2, 4, 5, 1, 3]

System.out.println("min = " + Collections.min(list) + ", max = " + Collections.max(list)); // min = 1, max = 5

since collections are pass by reference, make collections unmodifiable so that clients cannot mutate our collections

  
List<Integer> unmodifiableList = Collections.unmodifiableList(list);
unmodifiableList.add(-1);
// Exception in thread "main" java.lang.UnsupportedOperationException
//   at java.base/java.util.Collections$UnmodifiableCollection.add(Collections.java:1091)
//   at MiscellaneousMethods.main(MiscellaneousMethods.java:20)

if we want to obtain a synchronized version of the normal collections we can use List<Integer> synchronizedList = Collections.synchronizedList(normalList)
drawback - coarse grained locking is used, all methods use synchronized keyword now
so, better solution is to use concurrent collections that java provides, e.g. ConcurrentHashMap

Maven

maven is a build tool for java
other alternatives are gradle, ant, etc
build - process of building source code into artifacts that can be run
maven has various plugins -
- jar plugin to create jars
- compiler plugin to help compile code
- surefire plugin to execute tests
a plugin has various goals. goals represent a unit of work

to examine a plugin, we can use the following commands -

mvn help:describe -Dplugin=org.apache.maven.plugins:maven-compiler-plugin`

maven coordinates -
- group id - company / department name. domain name in reverse order is the convention
- artifact id - project name
- version -
- packaging - there are two types of packaging - jar (mostly used nowadays and the default) and war (web application archive)
- classifier - e.g. we want to build for different versions of java but use the same pom. so, we can use classifiers like jdk8 and jdk11. these then get appended to the version, so people can import the right dependency
out of these, the gav (group id, artifact id and version) help us uniquely identify the project
to use these libraries in our projects, we use repositories
there two repositories - local repository and remote repository
basically, maven downloads from remote repositories and puts it into our local repository
then, our projects running locally can use dependencies downloaded in this local repository
default location for local repository is ~/.m2/repository
default url for remote repository is https://repo1.maven.org/maven2/ (called maven central)
we can configure remote repositories via settings.xml - so that we can use our own remote repository - use case - companies maintain their own remote repository, which is a mirror of maven central

Plugin Management

lifecycle has phases
a phase has multiple goals attached to it
if a phase does not have any goals attached to it, it would not be executed
e.g. the clean lifecycle has three phases - pre-clean, clean and post-clean
only the clean phase of the clean lifecycle is attached to a goal
it is attached to the clean goal of maven-clean-plugin plugin 🤯
when we say mvn clean, we are actually instructing maven to run the clean phase
when we run a phase, all the phases before it in the lifecycle are executed - in this case pre-clean would be executed first (if it has some goals attached to it, it does not by default) and the clean phase itself
we just discussed that we typically invoke mvn <<phase>>, which runs all the goals of all the phases up to before the specified phase’s lifecycle. however, we can also invoke a particular goal using the following syntax variations -
- mvn plugin_group_id:plugin_artifact_id:plugin_version:goal
- mvn plugin_group_id:plugin_artifact_id:goal
- mvn plugin_prefix:goal
- mvn plugin_prefix:goal@execution_id - while executions help us tie goals to phases, we can also invoke these executions directly
1 2 3 mvn org.apache.maven.plugins:maven-clean-plugin:2.5:clean mvn org.apache.maven.plugins:maven-clean-plugin:clean mvn clean:clean
there are two kinds of plugins -
- reporting plugins - run during site generation
- build plugin - run to help build the project

below - we try to tie the run goal of maven-antrun-plugin to pre-clean and post-clean phases -

  
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-antrun-plugin</artifactId>
  <version>3.0.0</version>
  <executions>
    <execution>
      <id>1</id>
      <phase>pre-clean</phase>
      <goals>
        <goal>run</goal>
      </goals>
      <configuration>
        <target>
          <echo level="info">Learning Maven: pre-clean</echo>
        </target>
      </configuration>
    </execution>

    <execution>
      <id>2</id>
      <phase>post-clean</phase>
      <goals>
        <goal>run</goal>
      </goals>
      <configuration>
        <target>
          <echo level="info">Learning Maven: post-clean</echo>
        </target>
      </configuration>
    </execution>
  </executions>

  <configuration>
    <target>
      <echo level="info">Learning Maven: standalone invoking</echo>
    </target>
  </configuration>
</plugin>

so, now when we run post-clean phase, all three phases - pre-clean, clean and post-clean would be run
configuring a plugin
- a plugin can have multiple execution blocks. each execution block specifies -
  - what goal to run
  - what phase to tie this goal to
  - configuration for the goal
- a configuration element can be specified in the root as well. earlier point was us basically specifying multiple execution blocks, which helped us tie goals to phases. this point here is about specifying configuration in the root block of the plugin. this can be useful when we invoke the plugin:goal directly
- dependencies - if a plugin has dependencies, we can for e.g. specify the version of that dependency using this block
- inherited - by default, the plugin configuration is inherited by the children. we can disable this behavior by setting inherited to false
id should be unique across all executions for a plugin (not across plugins)
apart from clean, the two other lifecycles are default and site

the goals that are triggered for the default lifecycle are dependent on the packaging type (recall packaging type can be one of jar or pom, it is a part of maven coordinates). for jar, this is the table -

phase	plugin:goal
process-resources	resources:resources
compile	compiler:compile
process-test-resources	resources:testResources
test-compile	compiler:testCompile
test	surefire:test
package	jar:jar
install	install:install
deploy	deploy:deploy

when we specify dependencies in dependency management of parent, child projects can get these dependencies if they want to, but don’t get the dependency unless added explicitly. plugin management works in the same way - inherit all the configuration related to the plugin specified in the plugin management section of the parent, but do not get it by default unless the plugin is added explicitly

extra - executing scripts using exec maven plugin! -

  
<plugin>
  <artifactId>exec-maven-plugin</artifactId>
  <version>3.1.1</version>
  <groupId>org.codehaus.mojo</groupId>
  <executions>
    <execution>
      <id>Renaming build artifacts</id>
      <phase>package</phase>
      <goals>
        <goal>exec</goal>
      </goals>
      <configuration>
        <executable>bash</executable>
        <commandlineArgs>handleResultJars.sh</commandlineArgs>
      </configuration>
    </execution>
  </executions>
</plugin>

Inheritance and Aggregation

<modelVersion> helps determine the xsd (scm schema definition) version to use i.e. what elements are allowed in the pom file, how they should be configured, etc
multiple levels of inheritance is supported in pom
all pom (directly or indirectly) inherit from the super pom
this inheritance helps us extract out common functionality around plugins, plugin configuration, dependencies, etc to a parent pom from which all other projects can inherit
we can print the effective pom like so - mvn help:effective-pom
my understanding - the parent might be managed separately -
- parent would be downloaded from the remote repository into the local repository, post which it can be used
- for development purpose - build the parent, which will install it in the local repository, and then build the child
the parent might be managed in the same project, in which we can provide the relativePath. understand that this way, we do not have to build the parent project separately like above -
also, packaging type in parent can be specified to be pom instead of relying on the default value i.e. jar
till now, we discussed inheritance. we can also use aggregation in maven
use case - when we run a phase e.g. mvn clean, mvn install, etc., it gets run for all the child projects as well
not only that - in aggregate projects, if the child projects depend on each other, maven can determine the right order to build them in for us automatically
we can also use the same pom for both aggregation and inheritance
notes about versions -
- version property of parent gets inherited by the children as well
- for specifying the version of parent in the child, we use ${revision}
- for specifying interdependencies between children, we use ${project.version}

based on everything above, a simple multi module setup -

parent -

  
<project>
  <modelVersion>4.0.0</modelVersion>
      
  <groupId>org.apache.maven.ci</groupId>
  <artifactId>ci-parent</artifactId>
  <version>${revision}</version>
      
  <properties>
    <revision>1.0.0-SNAPSHOT</revision>
  </properties>
      
  <modules>
    <module>child1</module>
    <module>child2</module>
  </modules>
</project>

child -

  
<project>
  <modelVersion>4.0.0</modelVersion>
    
  <parent>
    <groupId>org.apache.maven.ci</groupId>
    <artifactId>ci-parent</artifactId>
    <version>${revision}</version>
    <relativePath>../pom.xml</relativePath>
  </parent>
    
  <groupId>org.apache.maven.ci</groupId>
  <artifactId>ci-child</artifactId>

  <dependencies>
    <dependency>
      <groupId>org.apache.maven.ci</groupId>
      <artifactId>child2</artifactId>
      <version>${project.version}</version>
    </dependency>
  </dependencies>
</project>

Dependency Management

we can specify a range of versions using [3.8, 4.0) ([ for inclusive, ( for exclusive)
version format - <<major-version>>.<<minor-version>>.<<patch>>-<<qualifier>>
the qualifier SNAPSHOT is used for unstable projects, which can change frequently
this way, if we depend on a project with snapshot in its version, we get access to the latest code always
my understanding - if for e.g. we do not use snapshot -
- if the local repository already has an existing copy, maven would not bother refetching it from the remote repository to refresh the local copy
- probably sometimes the remote repository also would not allow pushing artifacts again against the same version
bts, this SNAPSHOT is converted to timestamp automatically for us - so, x.y.z-SNAPSHOT basically becomes x.y.z-timestamp, and thus this way, maven always tries pulling the latest version for us
maven is able to handle transitive dependencies for us - if our project depends on jar a which in turn depends on jar b, maven is able to download jar a and then jar b automatically for us when building the project
classpath - location of classes and packages that our project is dependent on

the different dependency scopes -

compile - include dependency in all classpaths. the default if scope is not specified explicitly
test - only required for compiling and executing tests, not required when executing therefore need not be included when packaging artifacts
runtime - include dependency when project executes or tests are being run, but do not include them when compiling. e.g. jdbc driver like mysql connector. use case - we as developers will not mistakenly depend on these libraries
provided - dependencies provided by the environment. e.g. we are developing a web application, we would need to depend on the servlet api to compile, but we would not want to include this in the war file, since it would be provided to us by the servlet container

system - like provided, but the path when compiling is specified manually

  
<dependency>
  <groupId>io.datajek</groupId>
  <artifactId>some-dependency</artifactId>
  <version>1.0</version>
  <scope>system</scope>
  <systemPath>＄{project.basedir}/libs/dep-1.0.jar</systemPath>
</dependency>

import - when a dependency is of type pom and has the scope of import, it should be replaced by its dependencies in its dependencyManagement section

dependency mediation - choosing what version of dependency to use
default behavior -
- e.g. our project depends on A and B. A depends on D which again depends on E (version x). B directly depends on E (version y). our project would use E (version y), because if we imagine dependencies like a tree, E (version y) is the closest to root
- e.g. our project depends on B and A (B comes first in pom.xml). B depends on E (version x), while A depends on E (version y). our project would use E (version x), because B comes first
so one technique based on above - if we would like to use version x of E invariably - place version x of dependency E as early as possible and directly inside the pom. this way, we end up using the verison x of E always

when adding a dependency, if we use the exclusion tag along with it, the dependencies specified in the exclusion tag are excluded from the dependency tree -

  
<dependency>
  <groupId>io.datajek</groupId>
  <artifactId>project9-projectb</artifactId>
  <version>1</version>
  <exclusions>
    <exclusion>
      <groupId>com.google.code.gson</groupId>
      <artifactId>gson</artifactId>
    </exclusion>
  </exclusions>
</dependency>

this means that we should either expect gson to come as a transitive dependency from another project, or include gson manually inside our pom as another dependency, etc
lets say our project name is xyz, and we mark a dependency in our pom as optional
it excludes this dependency from being added as a transitive dependency in any project that has xyz itself as a dependency
dependency management section - this way, all the projects in for e.g. a team can specify the versions of dependencies that work well with each other in one place, and all of it gets inherited by all other child projects

example -

if a parent has the following section -

  
<dependencyManagement>
  <dependencies>
    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.8.6</version>
    </dependency>
  </dependencies>
</dependencyManagement>

the child can skip the version of gson when adding it as a dependency

  
<dependencies>
  <dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
  </dependency>
</dependencies>

another use case of dependency management section 🤯 - helps with transitive dependencies as well - e.g. if our project has a dependency A, which depends on C (version x), and has a dependency B, which again depends on C (version y). if we add the dependency C (version z) in the dependency management section, version z of dependency is the one that maven uses!
- note - we could also have included dependency C (version z) directly in the dependency section to force maven to use version z (default behavior - closest to the root wins). however, if another project added this project as a dependency, even if it was not using dependency C (version z) directly, it would still have it being added to its classpath. this problem would not have happened in the first place if we had added dependency C (version z) in the dependency management section as described earlier

Build Portability

build portability - having consistent ways to build cross environments, machines, teams, etc
variables - variables defined in parent are inherited by children
however, children can override these variables
project can be accessed using project, e.g. ${project.version}, ${project.build.sourceDirectory}, etc. the root element of our pom is project, so that is where these variables come from. another very useful one i found - ${project.parent.basedir} if for e.g. a child project wants to access something from the parent directory
whatever we define in the properties section can be accessed using the name of the property directly, e.g. ${MyString}
java system properties (what we access using java.lang.System.getProperties()) can be accessed using java, e.g. ${java.home}
environment variables can be accessed using env, e.g. ${env.PATH}
variables in settings.xml can be accessed using settings, e.g. ${settings.offline}
profiles - alternative configuration for overriding default values

we can specify profiles either in the project specific pom.xml, or in settings.xml, which itself can be machine / project specific

  
<project>
  <modelVersion>4.0.0</modelVersion>

  <groupId>io.datajek</groupId>
  <artifactId>project14</artifactId>
  <version>1</version>
  <name>Project14</name>

  <profiles>
    <profile>
      <id>test</id>
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <executions>
              <execution>
                <id>my-special-exec</id>
                <configuration>
                  <executable>/Project14/testScript.sh</executable>
                </configuration>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    </profile>

    <profile>
      <id>prod</id>
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <executions>
              <execution>
                <id>my-special-exec</id>
                <configuration>
                  <executable>/Project14/prodScript.sh</executable>
                </configuration>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    </profile>
  </profiles>


  <build>
    <pluginManagement>
      <plugins>
        <plugin>
          <groupId>org.codehaus.mojo</groupId>
          <artifactId>exec-maven-plugin</artifactId>
          <version>3.0.0</version>
          <executions>
            <execution>
              <id>my-special-exec</id>
              <phase>clean</phase>
              <goals>
                <goal>exec</goal>
              </goals>
            </execution>
          </executions>
        </plugin>
      </plugins>
    </pluginManagement>
  </build>
      
</project>

the most basic way in which we can specify which profile to use - mvn clean -Ptest

another way ofo enabling a certain profile - inside the profile section we saw above, we can have an activation section like below -

  
<activation>
  <property>
    <name>testProp</name>
    <value>DataJek</value>
  </property>
</activation>

this means that the profile expects the system property testProp to be of value DataJek - mvn clean -DtestProp=DataJek
archetypes - a project templating toolkit so that a new project can be created easily with all the for e.g. firm specific standards established in the project from the get go

Working of Java

java tries releasing changes every 6 months
not all versions of java are lts (long term support) - java will provide patches and security updates for a long time only for versions marked lts - effort towards maintaining non lts versions would be much less
open jdk is the underlying java source code, but it is not responsible for building and distributing installable binaries
there are various vendors sitting on top of it, like adopt openjdk, oracle jdk, and then some providers like amazon have their own jdk distributions which work better with their own aws products

Object Oriented Programming

Basics

Import Statements / Package

Class and Method Access Modifiers

Abstract Classes And Interfaces

Inheritance

Polymorphism - Overloading

Polymorphism - Overriding

Object Type Casting

Constructors

Strings

Exceptions

Multi Threading

Concepts

Thread Creation

Thread Coordination

Thread Pooling

Race Condition

Synchronized

Atomic Operations

Data Race

Locking Strategies and Deadlocks

Inter Thread Communication

High Performance IO

Virtual Threads

Miscellaneous Notes

Generics

Collections

List

Queues

Maps

Sets

Sorting

Miscellaneous

Maven

Plugin Management

Inheritance and Aggregation

Dependency Management

Build Portability

Working of Java

Further Reading

High Level Design Case Studies

High Level Design

Low Level Design