Object Oriented Programming
Basics
- a java program can have any number of classes. the classes can have any name and the java program can have any name
- however, only one public class is allowed in a java program
- the name of both the public class and the java program should be the same
- when we compile a class using
javac Durga.java
, the number of class files generated = number of classes present in that program1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
class A { public static void main(String args[]) { System.out.println("class A"); } } class B { public static void main(String args[]) { System.out.println("class B"); } } class C { }
- output of above program -
- three pillars of object oriented programming
- encapsulation - helps with security
- data hiding (visibility)
- abstraction
- inheritance - helps with reusability
- polymorphism - helps with flexibility
- compile time -
- overloading
- method hiding
- variable hiding / shadowing
- runtime -
- overriding
- compile time -
- encapsulation - helps with security
Import Statements / Package
- two ways for using classes from libraries -
- fully qualified name of the class
- import statement at the top - preferred
- two kinds of import statements -
- explicit import -
import java.util.ArrayList;
- preferred - implicit import -
import java.util.*;
- note - implicit import does not include sub packages
- explicit import -
- by default -
- all classes under
java.lang
package are available and need not be imported - all classes under the same package as the package of the current class need not be imported
- all classes under
- package - group of related java programs
- different packages can have the same java program, e.g.
java.util.Date
andjava.sql.Date
- universally accepted naming convention for package naming in java - reverse of domain name
- if we write the following program -
1 2 3 4 5 6 7
package com.learning.java; class Test { public static void main(String args[]) { System.out.println("hello world"); } }
- and try compiling using
javac Test.java
, it compiles. but when i tried runningjava Test.java
, it failed1 2
Error: Could not find or load main class Test Caused by: java.lang.NoClassDefFoundError: Test (wrong name: com/learning/java/Test)
- so we should actually compile using -
javac -d . Test.java
- this generates the entire directory structure of com/learning/java in the current working directory and places the Test.class there
- packages help with implementing encapsulation - the entire complexity / functionality is viewed as one unit residing inside a package
Class and Method Access Modifiers
- classes have two different access modifiers -
public
and<<default>>
- default classes are only accessible from only within the package
- public classes can be accessed from anywhere outside the package as well
- for inner class, apart from
abstract
,final
,public
and<<default>>
, we can also havestatic
,private
andprotected
modifiers - members have four different access modifiers -
- public - access method from anywhere, inside or outside package
- default - can be accessed from inside package only, not outside the package
- private - can only be accessed from within the same class, not outside the class
- protected - can be accessed from anywhere inside package, and from subclasses if outside package as well
- small note - if accessing protected members from inside subclass but from outside package, only subclass reference can be used, not superclass (i.e. even polymorphism is not allowed)
- protected example -
- a/A.java -
1 2 3 4 5 6 7 8
package a; public class A { protected void a() { System.out.println("from a"); } }
- b/B.java -
1 2 3 4 5 6 7 8 9 10
package b; import a.A; class B extends A { public static void main(String[] args) { A a = new A(); a.a(); } }
- output -
- solution - change to -
1 2
B b = new B(); b.a();
- a/A.java -
therefore, summary in tabular form -
visibility public protected default private same class ✅ ✅ ✅ ✅ subclass same package ✅ ✅ ✅ non subclass same package ✅ ✅ ✅ subclass different package ✅ ✅ (subclass reference only) non subclass different package ✅ - access modifiers help with - data hiding - a different class cannot “directly” access the fields of another class - it would have to use public methods
- note - think about member visibility only when class is visible first (recall default vs public)
- access modifiers also help achieve encapsulation - interact with data members via exposed methods, not directly
Abstract Classes And Interfaces
abstract
modifier is applicable for both methods and classes- abstract method is used when we do not know about the implementation of the class upfront. e.g. Vehicle class can have an abstract method
getNumberOfWheels
. syntax -1
public abstract Integer getNumberOfWheels();
- if a class contains even one abstract method, it would have to be declared as abstract as well
- if a class is
abstract
, instantiation is not possible for the class - also, if for e.g. we would not like for it to be possible to instantiate a class, we can declare it as abstract even if it does not have abstract methods
- subclasses are responsible to provide the implementation of the abstract methods of super class
- we can have multiple levels of nesting for abstract classes as well - abstract class Vehicle -> abstract class Car -> class RangeRover
- interface methods are
public
andabstract
without us specifying anything - so, when overriding in subclass, “method should be public”
- when implementing an interface -
- either override all the methods of the interface
- or make the class itself abstract
- code example -
1 2 3 4 5 6 7 8 9 10 11
interface I { void m1(); void m2(); } abstract class A implements I { public void m1() { } }
- abstract variables are not supported - so only one kind of member i.e. method is allowed for abstract, not variable
- so why use abstract classes and interfaces -
- mandating a structure for an implementation - “mandates” subclasses to provide implementation, else there will be compile time error
- acting as a specification / contract - e.g. we write servlet api compliant code, but that same code can run on different vendors like jetty, tomcat, weblogic, resin, oracle http server, etc which are all implementations of the same servlet api. same applies for jdbc and the different sql compliant drivers as well.
- abstraction - client will not know / need not care about the internal implementation
- note - all variables inside an interface are public static final, so they need to be initialized then and there. no instance variables can be created for an interface
Inheritance
- inheritance helps use is a relationship
- we use
extends
to implement this - members of the superclass are inherited by the subclass
- so, subclass can use members of the superclass
- the other way around does not hold i.e. superclass reference cannot use members of subclass
- all classes are implicitly subclasses of
Object
- main advantage of inheritance - superclass will contain common functionality, thus helping us avoid duplication of logic in subclasses
- types of inheritance -
- single inheritance - one superclass and one subclass. supported in java
- multilevel inheritance - one superclass has one subclass, and that subclass again acts as a superclass for yet another subclass. this too is supported in java
- multiple inheritance - multiple superclasses, one subclass. not supported in java for classes, but supported via interfaces
1 2 3 4
class C1 extends C2, C3 {} // compilation error interface I1 extends I2, I3 {} // works class C1 implements I1, I2 {} // works
- hierarchical inheritance - one superclass, multiple subclasses
- hybrid inheritance - combination of multiple types of inheritance
- inheritance example -
- confusion cleared - we just said every class extends object. if a class C1 extends another class C2, it is extending both C2 and Object. then isn’t this multiple inheritance? why did we say java does not allow multiple inheritance?
- when we do not extend any class, we extend Object implicitly
- when we extend a different class, we do not extend Object directly. so, the root class in the chain which does not have any explicit superclass extends Object implicitly. so it is basically multi level inheritance and not multiple inheritance which helps extend this subclass extend Object indirectly
- note -
final
class cannot have a subclass
Polymorphism - Overloading
- method signature - method name + argument types
- in java, return type is not a part of method signature
- when resolving method calls, method signature is what gets used
- so, it is a compile time error if we try to add two methods with same signature, even if they have different return types
- overloading - when a class has multiple method names with same but different argument types
- advantage - same method is being used for multiple implementations
- static polymorphism / compile time polymorphism / early binding - in case of overloading, the decision around which variation of method to use is made at compile time
- example -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
class Overloader { public void printer(int x) { System.out.println("printing an integer: " + x); } public void printer(String x) { System.out.println("printing a string: " + x); } } public class Overloading { public static void main(String[] args) { Overloader overloader = new Overloader(); overloader.printer(1); // printing an integer: 1 overloader.printer("hello"); // printing a string: hello } }
- automatic promotion + overloading in java - if when overloading, an exact match is not found for a primitive type, java promotes to the next available primitive type using the following rules -
- byte -> short -> int -> long -> float -> double
- char -> int -> …
- so, if refer the example above - there is no overloaded method for char. so, we jump to the next type as follows -
1
overloader.printer('a'); // printing an integer: 97
- if no promotion is possible, we get a compile time error -
1
overloader.printer(10.5); // Overloading.java:19: error: no suitable method found for printer(double)
- if there is a clash during overloading for superclass vs subclass, subclass gets priority
- e.g.
null
can be used both forObject
andString
. so, if a method is overloaded for both of them and we pass itnull
, it will call theString
implementation - if there is clash during overloading for two classes which are independent, compiler throws an unambiguous exception
- e.g.
null
can be used both forString
andStringBuffer
. so, if a method is overloaded for both of them and we pass itnull
, it will throw an exception1
overloader.printer(null); // Overloading.java:24: error: reference to printer is ambiguous
- since method overloading is compile time, the decision is influenced by the reference, not by the instance
- e.g. if i do
Object x = new String("s")
, and a method is overloaded for bothString
andObject
, the object version would be called, since the decision is made by the type of reference - if i have two variations -m1(Object obj)
andm1(String str)
, them1(Object obj)
variation would be called
Polymorphism - Overriding
- superclass reference can hold subclass instance
- the other way around does not hold i.e. subclass reference can not hold superclass instance
- overriding - subclass redefines method of superclass
- variations -
- superclass reference pointing to superclass instance - superclass method would be called
- subclass reference pointing to subclass instance - subclass method would be called
- superclass reference pointing to subclass instance - subclass method would be called
- the third variation is what interests us - compiler only checks if superclass has that method defined
- the method is called actually called on the instance during execution
- dynamic polymorphism / runtime polymorphism / late binding - in case of overriding, the decision around which variation of method to use is made at runtime
- co variant - when overriding, we can return subclass type of what superclass returns
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
class Parent { public Object m1() { return null; } } class Child extends Parent { public String m1() { return "hello world"; } } class CoVariant { public static void main(String[] args) { Parent p = new Child(); System.out.println("covariant response = " + p.m1()); // covariant response = hello world } }
- if superclass method is final, we cannot override the method and we get a compile time error
- if superclass method is non final, we can override the method, and can also make it final in the subclass
- if method is private, there is no concept of overriding, since it is treated like an internal method. so, even if we redefine the method with the same name in the subclass, the compiler would not complain
- access modifiers + overriding - when overriding, we cannot reduce the scope, but we can increase the scope
1 2 3 4 5 6 7 8 9 10 11
class Parent { public String m1() { return "from parent"; } } class Child extends Parent { protected String m1() { return "from child"; } }
- output -
1
attempting to assign weaker access privileges; was public
- so, conclusion for access modifiers and overriding - for
private
methods, overriding concept is not applicable, for others -- superclass -
public
, subclass can be -public
- superclass -
protected
, subclass can be -protected
,public
- superclass -
default
, subclass can be -default
,protected
,public
- superclass -
- exception - below is of course only applicable for checked exceptions and not unchecked exceptions. below will make sense automatically as well if we think about
Parent p = new Child(); p.m1();
- if subclass does not throw an exception, superclass can or cannot throw an exception
- if subclass throws an exception, superclass should throw a superclass of exception as well
- superclass
public static void m1()
, subclass -public void m1()
- compile time error - subclass
public void m1()
, superclass -public static void m1()
- compile time error - subclass
public static void m1()
, superclass -public static void m1()
- works, but it is not overriding. it is method hiding. this resolution is compile time, happens by reference, and the superclass version is called1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
class Parent { public static String m1() { return "hello from parent"; } } class Child extends Parent { public static String m1() { return "hello from child"; } } class MethodHiding { public static void main(String[] args) { Parent p = new Child(); System.out.println("parent reference, child object responds with = " + p.m1()); } }
- output -
1
parent reference, child object responds with = hello from parent
- conclusion - method hiding is also example of compile time polymorphism / static polymorphism / early binding just like overloading
- variable hiding / shadowing - there is no concept of overriding for variable members. so, if we redefine the vairable in the child class as well, resolution happens via superclass
1 2 3 4 5 6 7 8 9
class Parent { String s = "parent"; } class Child extends Parent { String s = "child"; } class VariableShadowing { public static void main(String[] args) { Parent p = new Child(); System.out.println(p.s); // prints 'parent' } }
- TODO: add double dispatch?
Object Type Casting
- syntax -
A b = (C) d
- three checks -
- compile time check 1 - C and d should be somehow related. either should be superclass of other
- passes compilation -
1 2
Object o = new String("hello world"); StringBuffer sb = (StringBuffer) o;
- fails compilation -
incompatible types: String cannot be converted to StringBuffer
1 2
String str = new String("hello world"); StringBuffer sb = (StringBuffer) str;
- passes compilation -
- compile time check 2 - obvious - C should be subclass of A or same as A
- passes compilation -
1 2
Object o = new String("hello world"); StringBuffer sb = (StringBuffer) o;
- fails compilation -
incompatible types: StringBuffer cannot be converted to String
1 2
Object o = new String("hello world"); String s = (StringBuffer) o;
- passes compilation -
- runtime check 1 - actual instance d should be subclass of C or same as C. understand how this is different from compile time check 1 - there, we were checking if whatever reference is used for d, that should be somehow related to C. here however, we check if the actual runtime object that d holds is a subclass of C or same as C
- passes runtime -
1 2
Object o = new String("hello world"); String s = (String) o;
- fails runtime -
ClassCastException: class java.lang.String cannot be cast to class java.lang.StringBuffer
1 2
Object o = new String("hello world"); StringBuffer sb = (StringBuffer) o;
- passes runtime -
- compile time check 1 - C and d should be somehow related. either should be superclass of other
Constructors
- constructor helps with initialization
new
keyword helps with instantiation- for constructor, method name should be same as the name of class
- only applicable modifiers for constructors are access modifiers
- use case - make the constructor
private
. now, an object for the class can only be created from inside the class. this can help us for e.g. implement the singleton pattern - if we do not add any constructor for a class, the compiler adds the default no args constructor for us automatically
- note - this default no args constructor is added for abstract classes as well
- first line in our constructor should always be
super()
orthis
- if we do not add the super call ourselves, the compiler will automatically add
super()
for us - note - this automatic adding of super happens for both constructors written by us and inside the default no args constructor
- convoluted example -
- our code -
1 2 3 4 5 6 7 8 9 10
class Test { Test(int i) { this(); } Test() { } }
- what compiler generates
1 2 3 4 5 6 7 8 9 10
class Test { Test(int i) { this(); } Test() { super(); } }
- our code -
- when we have code like below, we get a compilation error because when the compiler generates
super()
automatically, it is not enough since the superclass only one constructor - the one we manually wrote, and it requires an argument, which the compiler is not capable of defaulting1 2 3 4 5 6
class Parent { Parent(int i) {} } class Child extends Parent { }
- error -
1 2 3 4 5 6 7
Test.java:5: error: constructor Parent in class Parent cannot be applied to given types; class Child extends Parent { ^ required: int found: no arguments reason: actual and formal argument lists differ in length 1 error
- note - super has to be the first line, otherwise we get
error: call to super must be first statement in constructor
- note - this has to be the first line, otherwise we get
error: call to this must be first statement in constructor
- so, conclusion - we can only use either
super()
orthis()
and that too only in the first line of the constructor super()
orthis()
can only be called inside a constructor and not inside any other methodthis
andsuper
keywords can also be used to reference instance variables- note -
this
andsuper
are always related to an instance, so they cannot be used insidestatic
methods - my doubt - how to handle variable hiding / shadowing for static variables? solution - maybe use class prefix instead of super?
- constructor + overloading is a common pattern. we then can use
this()
inside them to call each other with default values for missing arguments - a constructor can throw exceptions
- however, if superclass constructor throws an exception, the subclass constructor should throw the same exception or superclass exception of that exception. we cannot wrap with try catch, since super or this should be the first call
Strings
- string is mutable, string buffer objects are immutable
1 2 3 4 5 6 7 8 9 10 11 12
class Introduction { public static void main(String[] args) { String s = new String("Durga"); s.concat(" Software"); System.out.println(s); // Durga StringBuffer sb = new StringBuffer("Durga"); sb.append(" Software"); System.out.println(sb); // Durga Software } }
==
is for reference comparison.equals
inObject
class works just like==
, but sometimes subclasses can override this method, e.g.String
class below overrides it for content comparison, whileStringBuffer
does not1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
class Equality { public static void main(String[] args) { String s1 = new String("durga"); String s2 = new String("durga"); System.out.println(s1 == s2); // false System.out.println(s1.equals(s2)); // true StringBuffer sb1 = new StringBuffer("durga"); StringBuffer sb2 = new StringBuffer("durga"); System.out.println(sb1 == sb2); // false System.out.println(sb1.equals(sb2)); // false } }
- heap is used for storing objects. string objects can be created when we use
new String()
,str.concat("suffix")
, etc - scp (string constant pool) is used for storing string literals. java stores them in the hopes of reusing them later
- note - scp a section in the heap itself, maybe it is present in a different location when compared to where java objects are stored
- while objects in heap are eligible for gc (garbage collection), objects in scp are not, because java internally maintains references to the string literals stored in scp
- deeper understanding - scp is used for storing string literals. if i do
str.concat("suffix")
, suffix would be stored in scp, not concatenated result of str and suffix. the concatenated result will however be stored in heap - so, it almost feels like that albeit in heap, scp is more of a compile time feature, while string objects are a runtime feature
- 2 in heap - (s1, String(“durga”)), (s2, String(“durga”)) and 1 in scp - “durga”. s3 and s4 point to scp itself, while s1 and s2 point to both heap and scp. note how despite having the same string 4 times, it was stored only once in scp
1 2 3 4
String s1 = new String("durga"); String s2 = new String("durga"); String s3 = "durga"; String s4 = "durga";
- 3 in heap, out of which 1st and 2nd are eligible for gc - (,String(“durga”)), (,String(“durga software”)), (s, String(“durga software solutions”)) and 3 in scp - “durga”, “ software”, “ solutions” -
1 2 3
String s = new String("durga"); s.concat(" software"); s = s.concat(" solutions")
- in below examples, we compare equality using
==
and notequals
, maybe because equals should anyway do content comparison, but here we see which references point to the same object - equality of string literals - (equals compares reference and if not same, contents, while == just compares reference, and that evaluates to true since sl1 and sl2 are both pointing to the same object inside scp)
1 2 3
String sl1 = "durga"; String sl2 = "durga"; System.out.println(sl1 == sl2); // true
- concatenation for string literals can happen at compile time as well, which is why slc1 and slc2 point to the same object stored in the scp. this is probably happening due to optimizations that are performed on instructions
1 2 3
String slc1 = "durga software"; String slc2 = "durga " + "software"; System.out.println(slc1 == slc2); // true
- here, str2 is created at runtime, so str2 points to string object in heap while str3 points to string literal in scp. str2 does not point to a corresponding object in scp in this case
1 2 3 4
String str1 = "durga"; String str2 = str1 + " software"; String str3 = "durga software"; System.out.println(str2 == str3); // false
- here, both strf2 and strf3 are created at compile time hence scp itself, because final variables would be replaced at compile time. understand how this behavior changed when compared to the example above, just by adding the
final
keyword1 2 3 4
final String strf1 = "durga"; String strf2 = strf1 + " software"; String strf3 = "durga software"; System.out.println(strf2 == strf3); // true
- the main advantage of scp - if a string is used multiple times, its instance need not be managed / tracked separately multiple times
- basically, jvm maintains a reference to strings in scp, so that there is no garbage collection happening there
- also, strings in scp cannot be mutated - when we make changes, new objects are stored in heap / new strings are stored in scp
- string buffers do not work like strings - string buffers do not use concepts like scp etc - so it is mutable - there is no referencing to same object in scp like concepts in string buffer
- in strings,
concat
and+
both do the same thing - other important methods in strings -
equalsIgnoreCase()
,charAt()
,length()
,isEmpty()
,substring()
,replace()
(replace a certain character),indexOf()
,lastIndexOf()
,toLowerCase()
,toUpperCase()
,trim()
- string buffer - string is not meant for string content that can change frequently
- strings are immutable - for every change, a new object is created
- this is why we need string buffer. all changes we make happen on the same object
- since string buffer is mutable, it has two concepts -
capacity
andlength
.capacity
determines how many characters the string buffer can hold, whilelength
gives the current number of characters the string buffer has - when we run out of space, memory is doubled, a new object is created and all the existing characters are copied
- other important methods in string buffer -
capacity()
(get the current capacity),setCharAt()
,append()
(works with most primitive etc types),insert()
(insert a string at a specific position),delete()
(delete substring from specified positions),reverse()
(reverse contents of string buffer),ensureCapacity()
(increase capacity to specified capacity upfront) - note - all methods inside string buffer are synchronized - run
javap java.lang.StringBuffer
in terminal to view the profile of string buffer1 2 3 4
public synchronized int length(); public synchronized int capacity(); public synchronized void setCharAt(int, char); // and so on...
- so, at a time, only one thread can operate on a
StringBuffer
, thus affecting performance of applications - so, we can also use
StringBuilder
- the apis of string builder are almost the same as string buffer - so, it is like a “non synchronized version of string buffer” - run
javap java.lang.StringBuilder
in terminal -1 2 3 4
public void setCharAt(int, char); public int capacity(); public int length(); // and so on...
- so higher performance at the cost of race conditions which we might have to take care of ourselves
- side note - strings are automatically thread safe since they are immutable
- method chaining - because most methods in
String
,StringBuffer
,StringBuilder
return the same object type, we can use method chaining technique
Exceptions
- Throwable
- Exception
- RuntimeException
- ArithmeticException
- NullPointerException
- etc
- IOException - used when doing file related operations etc
- InterruptedException - used in multithreading related code etc
- etc
- RuntimeException
- Error
- Exception
- unchecked exceptions
- runtime exceptions and its subtree
- error and its subtree
- everything else is checked exception
- try with resources - cleaner code, no need to call
close
explicitly, if they use the interfaceAutoCloseable
1 2 3
try (BufferedReader br = new BufferedReader(new FileReader("file.txt"))) { // ... }
- note - what we declare / assign inside the try statement is final, and cannot be reassigned
Closable
extendsAutoClosable
.Closable
throwsIOException
, whileAutoClosable
throwsException
which is more generic- when we are handling exceptions, it might happen that we lose track of the original exception, and throw another exception which is not that relevant. e.g.
- reading from a resource fails due to missing file
- closing the resource fails due to null pointer because the resource was never initialized properly
- the eventual exception we get is the null pointer, but the missing file exception would have helped us more in identifying the root cause
- so, we can also use
ex.addSuppressed(Throwable t)
orThrowable[] t = ex.getSuppressed()
. this way, we can also find the original cause behind the exception - note - try with resources will automatically make use of suppressions for us bts
- another note - when using try with resources, the null pointer exception will be added as a suppression to the file not found exception, because understand that the main exception that happened in the try block was file not found exception, and the null pointer exception happened inside the finally block
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
class CustomResource implements AutoCloseable { public void read() { throw new RuntimeException("could not read file"); } @Override public void close() { throw new RuntimeException("a null pointer exception happened"); } } public class SuppressionExample { public static void main(String[] args) { try { beforeJava7Way(); } catch (Exception e) { System.out.println(e.getMessage()); for (Throwable t : e.getSuppressed()) { System.out.println("suppress: " + t.getMessage()); } } try { sinceJava7Way(); } catch (Exception e) { System.out.println(e.getMessage()); for (Throwable t : e.getSuppressed()) { System.out.println("suppress: " + t.getMessage()); } } } private static void beforeJava7Way() { CustomResource customResource = null; try { customResource = new CustomResource(); customResource.read(); } finally { customResource.close(); } } private static void sinceJava7Way() { try (CustomResource customResource = new CustomResource()) { customResource.read(); } } }
- output -
1 2 3 4
a null pointer exception happened could not read file suppress: a null pointer exception happened
- we can also catch multiple exceptions using a single catch block
1 2 3 4 5 6
try { } catch (NullPointerException | ArrayOutOfBoundsException e) { e.printStackTrace(); triggerAlert(e); }
Multi Threading
Concepts
- there are two benefits of multithreading - responsiveness and performance
- repeat - remember multithreading gives both the features above
- concurrency means performing different tasks on the same core. instead of waiting for one task to entirely complete first, we perform both simultaneously in a time-shared manner. it increases responsiveness
- concurrency is also called multi tasking. remember - we do not even need different cores for this
- parallelism means performing different tasks on different cores. it increases performance
- throughput is the number of tasks completed per unit time
- latency is the time taken per unit task
- how are the two different
- for optimizing throughput, since the tasks themselves are different, they just need to be scheduled on different threads in parallel, and that automatically increases the throughput. therefore, fewer considerations exist
- for optimizing latency, we would probably break a single task into smaller subtasks. considerations -
- what parts of the original task can be performed in parallel and which parts have to be done sequentially
- how to aggregate the smaller chunks of results into the final result
- in case of multithreading, components like heaps get shared across the threads, while components like stack and instruction pointer are scoped to a single thread
- what is stored inside the stack -
- local primitive types, e.g. if we declare an
int a = 1
inside a method - primitive formal parameters - similar to above -
int add(int a, int b)
- references created inside functions are stored in stack
- local primitive types, e.g. if we declare an
- what is stored inside the heap -
- all the objects (not references) are stored in heap
- members of a class - primitive values and non primitive references - should be stored in heap - note how these when declared inside functions are stored in stacks as discussed above
- ideally this makes sense - remember, each thread is executing a different instruction, so each of them needs its own instruction pointer etc
- recall - a frame is created for every method call, and it can result with a stack overflow exception if we end up with too many frames
- heap belongs to a process, and all threads can write to / read from the heap at any given time. all objects are stored in the heap till there is a reference to them, after which they get garbage collected
- when we execute a program, it becomes a process i.e. it gets loaded into the memory from the disk and a thread is used to execute it
- there are often way more processes being executed than cores in a cpu. so, using context switching, one thread at a time gets cpu and, gets paused and another thread is scheduled on the cpu
- context switching has overhead, and doing a lot of it can lead to something called thrashing
- however, context switching between the threads of the same process is much cheaper than context switching between the threads of different processes, since a lot of components like heaps are reused
- when the operating system has to chose between scheduling multiple tasks on a thread, and if for e.g. it schedules a computationally expensive task first, it can lead to the starvation of other smaller tasks
- so, to combat issues like this, there are various algorithms used by the operating system to calculate the priority of a task
- we can also programmatically provide a priority which gets used in the calculation above
- a thing that struck me - when writing applications, do not base your conclusions off the computer you are running your code on, base it off how it would work on the server
- number of threads = number of cores is the best way to start, since context switching as discussed earlier consumes resources
- however, it is only optimal if the threads are always performing some computation, and never in blocked state. if the threads perform some io, then a thread performing some computation can take its place
- also, modern day computers use hyper threading i.e. the same physical core is divided into multiple virtual cores. this means that a core can run more than one thread in modern cpus
Thread Creation
- we create an instance of
Thread
and to it, we pass an object of a class that implementsRunnable
. itsrun
method needs to be overridden. all of this can be replaced by a lambda java 8 onwards1 2
Thread thread = new Thread(() -> System.out.println("i am inside " + Thread.currentThread().getName())); thread.start();
- if instead of using
Runnable
, we extend theThread
class, we get access to a lot of internal methods - when we run
Thread.sleep
, we instruct the os to not schedule that thread until the timeout is over - note misconception - invoking this method does not consume any cpu i.e. it is not like a while loop that waits for 5 seconds
- we can set a name of a thread to make it helpful when debugging, using
thread.setName()
- we can set a priority between 1 and 10 using
thread.setPriority
- we can use
thread.setUncaughtExceptionHandler
to catch “unchecked exceptions” that might have occurred during the execution of the thread, and thus cleanup resources - we can shut down the application entirely from any thread using
System.exit(0)
Thread Coordination
- the application will not terminate until all threads stop
- but, we might want to interrupt a thread so that the thread can maybe understand that the application wants to terminate, and accordingly handle cleaning up of resources
1 2 3
Thread thread = new Thread(new Task()); thread.start(); thread.interrupt();
- the interruption can be handled gracefully in two ways as described below
- if our code throws an interrupted exception, calling
interrupt
will trigger it, and then we can handle it. other examples where this exception happens are for calls likethread.join()
andobject.wait()
1 2 3 4 5 6 7 8 9 10 11
public class Task implements Runnable { @Override public void run() { try { Thread.sleep(20000); } catch (InterruptedException e) { System.out.println("[inside catch] i was interrupted..."); } } }
- else we can check the property
isInterrupted
and handle it accordingly1 2 3 4 5 6 7 8 9 10 11 12 13
public class Task implements Runnable { @Override public void run() { Date date = new Date(); while ((new Date()).getTime() - date.getTime() < 10000) { if (Thread.currentThread().isInterrupted()) { System.out.println("[inside loop] i was interrupted..."); break; } } } }
- if our code throws an interrupted exception, calling
- background / daemon threads there might be a case when what the thread does need not be handled gracefully, and it is just an overhead for us to check for e.g. the
isInterrupted
continually. so, we can set the daemon property of the thread to true. this way when the thread is interrupted, it will be terminated without us having to handle it1 2 3 4
Thread thread = new Thread(new Task()); thread.setDaemon(true); thread.start(); thread.interrupt();
- also, unlike normal threads, where the application does not close if any thread is running, a daemon thread does not prevent the application from terminating
- if we implement
Callable
instead ofRunnable
, we can also throw anInterruptedException
when for e.g. we see thatisInterrupted
is evaluated to true. this means the parent thread calling this thread will know that it was interrupted in an adhoc manner - threads execute independent of each other. but what if thread b depends on the results of thread a?
- busy wait - one way could be we run a loop in thread b to monitor the status of thread a (assume thread a sets a boolean to true). this means thread b is also using resources, which is not ideal
- so, we can instead call
threadA.join()
from thread b, thread b goes into waiting state till thread a completes - we should also consider calling the join with a timeout, e.g.
threadA.join(t)
- my understanding - if for e.g. the main thread runs the below. first, we start threads t1 and t2 in parallel of the main thread. now, we block the main thread by calling
t1.join()
. the main thread will be stopped till t1 completes1 2
t1.start(); t2.start(); t1.join(); t2.join();
- scenario 1 - t1 completes before t2, the main thread resumes, and again will be stopped till t2 completes
- scenario 2 - t1 completes after t2. the main thread resumes and will not wait for t2 since it has already completed
Thread Pooling
- thread pooling - reusing threads instead of recreating them every time
- tasks are added to a queue, and the threads pick them up as and when they become free
- so, when tasks are cpu intensive, we should have number of threads closer to core size, and when tasks are io intensive, we should have higher number of threads, but remember that -
- too many threads can cause performance issues as well due to context switching
- threads are not trivial to create, they are resource intensive
- java provides 4 kinds of thread pools -
FixedThreadPool
,CachedThreadPool
,ScheduledThreadPool
andSingleThreadedExecutor
- fixed thread pool executor - polls for tasks stored in a queue. there can be many tasks, but a set number of threads which get reused. the queue should be thread safe i.e. blocking
1 2 3 4
int numberOfProcessors = Runtime.getRuntime().availableProcessors(); ExecutorService executorService = Executors.newFixedThreadPool(numberOfProcessors); executorService.execute(new Runnable() {...});
- cached thread pool executor - it looks at its threads to see if any of them are free, and if it is able to find one, it will schedule this task on the free thread. else, it will spawn a new thread. too many threads is not too big of a problem, thanks to the keep alive timeout discussed later. however, expect out of memory exceptions if too many tasks are added to the executor, because threads are resource intensive
1
ExecutorService executorService = Executors.newCachedThreadPool();
- to remember - threads occupy a lot of space in main memory, hence can cause out of memory exceptions if not controlled properly
- scheduled thread pool executor - it used a delay queue, so that the tasks get picked up by the threads after the specified delay or schedule. this means tasks might have to be reordered, which is done by the queue itself.
schedule
can help trigger the task after a certain delay,scheduleAtFixedRate
can help trigger it like a cron at regular intervals whilescheduleAtFixedDelay
can help schedule the next task a fixed time period after the previous task was completed1 2 3 4 5 6
ScheduledExecutorService executorService = Executors.newScheduledThreadPool(5); executorService.schedule( () -> System.out.println("hi from " + Thread.currentThread().getName()), 5, TimeUnit.SECONDS );
- single thread pool executor - like fixed thread pool executor with size of pool as one. the advantage is for e.g. all the tasks will be run in order of creation
- all thread pool executors create new threads if the previous thread is killed for some reason
- there are a variety of parameters that can be added to the executors
- core pool size - minimum number of threads that are always kept in the pool
- max pool size - maximum number of threads that can be present in the thread pool. it has value
INTEGER.MAX_VALUE
by default for cached and scheduled thread pool executor, while the same value as core pool size for fixed and single thread pool executor - keep alive timeout - the time till an idle thread is kept in the pool, after which it is removed. keep alive is only applicable to cached and scheduled thread pool executors, since in fixed and single thread pool executors, the number of threads do not change
- note that keep alive timeout does not change the core pool threads. this behavior can however be changed using
allowCoreThreadTimeOut
- queue - the different types of executors use different queues based on their requirements. the queues also need to be thread safe
- e.g. a fixed and single thread pool executor has a fixed number of threads, so there can potentially be infinite number of tasks that get queued up, because of which it uses a
LinkedBlockingQueue
- cached thread pool spawns number of threads equal to the number of tasks, so it uses a
SynchronousQueue
, which only needs to hold one task - scheduled thread pool uses
DelayedWorkQueue
so that the tasks are returned from the queue only if the condition of cron etc. is met
- e.g. a fixed and single thread pool executor has a fixed number of threads, so there can potentially be infinite number of tasks that get queued up, because of which it uses a
- rejection handler - assume all threads are occupied and the queue is full. in this case, the thread pool will reject the task that it gets. how it rejects the task is determined using the rejection policy. the different rejection policies are -
- abort - submitting the new task throws
RejectedExecutionException
, which is a runtime exception - discard - silently discard the incoming task
- discard oldest - discard the oldest task from the queue to add this new task to the queue
- caller runs - requests the caller thread itself to run this task
- abort - submitting the new task throws
- till now, to obtain an instance of
ExecutorService
, we were using static methods onExecutors
. we can also usenew ThreadPoolExecutor()
and then pass our own core pool size, queue, etc. configuration parameters as the constructor arguments - we need to shut down the executor in a clean way. we can initiate it using
executorService.shutdown()
. this will throw theRejectedExecutionException
for any new tasks that are submitted to it, but at the same time will complete both all the currently executing tasks and queued up tasks - if we run
shutdownNow
, it will returnList<Runnable>
for the queued up tasks and clear the queue, but complete all the currently executing tasks awaitTermination(timeout)
will terminate the tasks if they are not completed by the specified time- we also have helper methods like
isShutdown()
andisTerminated()
- if a task wants to return a value, we use
Callable
instead ofRunnable
- however, the
execute
method onExecutorService
only works if we implementRunnable
interface. if we implementCallable
interface, we have to usesubmit
- the return value of
Callable
is wrapped around aFuture
.future.get()
is a blocking call i.e. the thread calling it will not move ahead until the future resolves. so, we can also usefuture.get(timeout)
1 2 3 4 5 6 7 8 9 10 11 12
ExecutorService executorService = Executors.newFixedThreadPool(1); Future<Integer> result = executorService.submit(() -> { Thread.sleep(4000); return (new Random()).nextInt(); }); Thread.sleep(3000); // this simulates that we were able to perform 3 seconds worth of operations // in the main thread while the task thread was performing its blocking stuff System.out.println("result = " + result.get());
- we can cancel the task using
future.cancel(false)
. this means that the thread pool will remove the task from the queue. the false means that if a thread is already running the task, it will not do anything. had we passed true, it would have tried to interrupt the task - we also have helper methods like
future.isDone()
andfuture.isCancelled()
- suppose we have a list of items, and for each item, we want to perform a series of processing
1 2 3
Future<Package> package$ = executorService.submit(() -> pack(order)); Future<Delivery> delivery$ = executorService.submit(() -> deliver(package$.get())); Future<Email> email$ = executorService.submit(() -> sendEmail(delivery$.get()));
notice how the calling thread is blocked by all
get
of future. instead, we could use -1 2 3 4
CompletableFuture.supplyAsync(() -> pack(order)) .thenApply((package) -> deliver(package)) .thenApply((delivery) -> sendEmail(delivery)) // ...
- in the above case, we have specified a series of steps to run one after another and since we do not care about the results in our main thread, the assigning of tasks to threads is managed by java itself. the main thread is not paused by the get calls. notice how we also do not need to specify any executor
- if we use
thenApplyAsync
instead ofthenApply
, a different thread can be used to execute the next operation instead of the previous one - internally,
CompletableFuture
uses fork join pool, but we can specify a custom executor as well, e.g.thenApplyAsync(fn, executor)
Race Condition
- race condition - happens where resource is shared across multiple threads
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
public class SharedResourceProblem { public static void main(String[] args) throws Exception { Integer count = 10000000; Counter counter = new Counter(); Thread a = new Thread(() -> { for (int i = 0; i < count; i++) { counter.increment(); } }); Thread b = new Thread(() -> { for (int i = 0; i < count; i++) { counter.decrement(); } }); a.start(); b.start(); a.join(); b.join(); System.out.println("shared resource value = " + counter.getCount()); // shared resource value = 15 } } class Counter { private int count = 0; public void increment() { count += 1; } public void decrement() { count -= 1; } public int getCount() { return count; } }
- the
resource += 1
andresource -= 1
operations are not atomic, it comprises of three individual operations -- getting the original value
- incrementing it by one
- setting the new value
- solutions - identify critical sections and use locks, make operations atomic, etc
Synchronized
- we can wrap our code blocks with a critical section, which makes them atomic. this way, only one thread can access that block of code at a time, and any other thread trying to access it during this will be suspended till the critical section is freed
- say we use
synchronized
on multiple methods of a class - once a thread invokes one of the synchronized method of this class, no other thread can invoke any other synchronized method of this class. this is because using synchronized on a method is applied on the instance (object) of the method
- the object referred to above is called a monitor. only one thread can acquire a monitor at a time
- method one - prefix method signature with synchronized (refer the counter example earlier. the shared resource print would now print 0)
1 2 3
public synchronized void increment() { // ... }
- another method is to use synchronized blocks
1 2 3
synchronized (object) { // ... }
- using blocks, the code is much more flexible since we can have different critical sections locked on different monitors
- if using synchronized on methods, two different methods of the same class cannot be executed in parallel - the monitor there is the instance itself
- however, when using synchronized blocks, we can do as follows inside different methods of the same class -
1 2 3 4 5 6 7 8 9 10 11 12
Object lock1 = new Object(); Object lock2 = new Object(); // ... synchronized(lock1) { // ... } synchronized(lock2) { // ... }
- note - reduce critical section size for better performance
Atomic Operations
- so, assignment to references and primitive values in java are atomic
this.name = name
inside for e.g. a constructor is atomicint a = 8
is atomic
- however, an exception in this is assignment to longs and doubles. since it is 64 bit, it happens in 2 operations - one assignment for the lower 32 bit and another one for the upper 32 bit
- the solution is to declare them with volatile, e.g.
volatile double a = 1.2
- using volatile makes operations on longs and doubles atomic
- also, java has a lot of atomic classes under
java.util.concurrent.atomic
as well - remember - when we use volatile, we make assignment atomic, not operations like
a++
atomic - my doubt probably cleared - then what is the use for e.g.
AtomicReference
, if assignment to reference is already an atomic operation? we can do as follows (a metric example discussed later)1 2 3 4 5 6
AtomicReference<State> state$ = new AtomicReference<>(); state$.set(initialValue); State currentState = state$.get(); State newSate = computeNewState(); Boolean isUpdateSuccess = state$.compareAndSet(currentState, newState);
Data Race
- remember - race condition and data race are two different problems
- data race - when the order of operations on variables do not match the sequential code we write. this happens mostly because there are optimizations like prefetching, vectorization, rearranging of instructions, etc
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
class Pair { private int a = 0; private int b = 0; public void increment() { a++; b++; } public void check() { if (b > a) { System.out.println("well that doesn't seem right..."); } } }
calling the class -
1 2 3 4 5 6 7
Pair pair = new Pair(); Thread t1 = new Thread(() -> { while (true) pair.increment(); }); Thread t2 = new Thread(() -> { while (true) pair.check(); }); t1.start(); t2.start(); t1.join(); t2.join();
- our expectation is that since b is read before a and a is incremented before b, there is no way even with a race condition that b can be bigger than a. however, due to data race, we do hit the print statement
- data race is also where we can use
volatile
. volatile guarantees the order of instructions being executed1 2
private volatile int a = 0; private volatile int b = 0;
- this is called the visibility problem
- basically, the two threads have their own local cache, but also have a shared cache. they write the value to the local cache, but this does not
- either update the shared cache
- or the second thread’s local cache does not refresh its value from the shared cache
- however, when we use volatile, it refreshes / synchronizes both the shared cache and the local cache of all threads
- basically, code before access to a volatile variable gets executed before it, and code after the access to a volatile variable after. this is called the happens before relationship
- while we could have just used synchronized for both the methods above, realize the advantage of using volatile over synchronized. with synchronization, we lose out on the multithreading, since our functions would have been invoked one at a time. in this case, the two methods are still being invoked concurrently
- if we have n cores, for each core we have a register. then we have an associated l1 cache on top of each register. l2 cache can be shared across multiple cores, and finally we have only one l3 cache and ram
- java memory model - it is an enforcement that jvm implementations have to follow so that java programs have similar behavior everywhere, and the different optimizations of instructions, cache, etc. do not affect the functioning of the program
Locking Strategies and Deadlocks
- coarse-grained locking - meaning we use one lock for everything, just like having synchronized on all methods, not performant. its counterpart is fine-grained locking
- coarse grained locking example - make all methods of the class synchronized
- cons with fine-grained locking - we can run into deadlocks more often
- conditions for a deadlock -
- mutual exclusion - only one thread can hold the resource at a time
- hold and wait - the thread acquires the resource and is waiting for another resource to be freed up
- non-preemptive - the resource is released only when the thread is done using it and another thread cannot acquire it forcefully
- circular wait - a cyclic dependency is formed where threads wait for resources acquired by each other
- one way to prevent deadlocks is to acquire locks in our code in the same order. this need not be considered when releasing the locks
- another way can be to use techniques like
tryLock
,lockInterruptibly
, etc (discussed later) - reentrant lock - instead of having a synchronized block, we use this reentrant lock
1
Lock lock = new ReentrantLock();
- unlike synchronized where the block signals the start and end of the critical section, locking and unlocking happens explicitly in case of reentrant locks
- to avoid deadlocks caused by for e.g. the method throwing exceptions, we should use it in the following way -
1 2 3 4 5 6
lock.lock(); try { // critical section } finally { lock.unlock(); }
- it provides a lot of methods for more advanced use cases like
getOwner
,getQueuedThreads
,isHeldByCurrentThread
,isLocked
, etc - the name
Reentrant
comes from the fact that the lock can be acquired by the thread multiple times, which means it would have to free it multiple times as well, e.g. think about recursive calls. we can get the number of times it was acquired usinggetHoldCount
- another benefit of using reentrant locks is fairness - e.g. what if a thread repeatedly acquires the lock, leading to the starving of other threads? we can prevent this by instantiating it using
new ReentrantLock(true)
- note that introducing fairness also has some overhead associated with it, thus impacting performance
- if we do not set to true, what we get is a barge in lock i.e. suppose there are three threads waiting for the lock in a queue. when the thread originally with the lock releases it, if a new thread not in the queue comes up to acquire the lock, it gets the lock and the threads in the queue continue to stay there. however, if we had set the fairness to true, the thread with the longest waiting time gets it first
- so, two problems - “fairness” and “barge in lock” are solved by reentrant lock
- if the lock is not available, the thread of course goes into the suspended state till it is able to acquire the lock
- we can use
lockInterruptibly
- this way, another thread can for e.g. callthis_thread.interrupt()
, and an interrupted exception is thrown. this “unblocks” the thread to help it proceed further. had we just used lock, the wait would have been indefinite1 2 3 4 5
try { lock.lockInterruptibly(); } catch (InterruptedException e) { // cleanup and exit }
- similar to above, we also have the
tryLock
method, which returns a boolean that indicates whether a lock was successfully acquired. it also accepts timeout as a parameter, what that does is self-explanatory - this can help, for e.g. in realtime applications to provide feedback continuously without pausing the application entirely
1 2 3 4 5 6 7 8 9 10 11 12
while (true) { if (lock.tryLock()) { try { // critical section } finally { lock.unlock(); } } else { // some logic } // some logic }
- so, we saw how reentrant lock, which while works like synchronized keyword, has additional capabilities like telling current owner and locking using different strategies like
lockInterruptibly
andtryLock
- when locking till now, we used mutual exclusion to its fullest. but, we can be a bit more flexible when the shared resource is just being read from and not written to
- multiple readers can access a resource concurrently but multiple writers or one writer with multiple readers cannot
- this is why we have
ReentrantReadWriteLock
1 2 3
ReentrantReadWriteLock lock = new ReentrantReadWriteLock(); Lock readLock = lock.readLock(); Lock writeLock = lock.writeLock();
- fairness in
ReentrantReadWriteLock
works the same way asReentrantLock
, except that if the thread waiting for the longest time was a reader, all reader threads in the queue are freed up to read - of course, base decisions off of type of workloads - if workload is read intensive, read write lock is better, otherwise we might be better off using the normal reentrant lock itself
Inter Thread Communication
- semaphore - it helps restrict number of users to a resource
- remember - locks only allow one user per resource, but semaphores allow multiple users to acquire a resource
- so, we can call a lock a semaphore with one resource
1
Semaphore semaphore = new Semaphore(number_of_permits);
- when we call
semaphore.acquire()
to acquire a permit, and the number of permits reduces by one. if no permits are available at the moment, the thread is blocked till a resource in the semaphore is released - similarly, we have
semaphore.release()
- optionally, i think both
acquire
andrelease
accept n, the number as an argument which can help acquire / release more than one permit - another major difference from locks - there is no notion of owning thread in semaphores unlike in locks - e.g. a semaphore acquired by thread a can be released by thread b. so, thread a can acquire it again without having ever released it
- this reason also makes semaphores are a great choice for producer consumer problems. producer consumer problem using semaphores -
- we need a lock so that multiple threads cannot touch the queue at one go
- we start with the full semaphore being empty and the empty semaphore being full, since there are no items initially
- look how we use semaphore’s philosophy to our advantage - consumer threads acquire full semaphore while producer threads release it
- my understanding of why we need two semaphores - e.g. if we only had full semaphore - producer releases it and consumer acquires it - how would we have “stopped” the producer from producing when the rate of production > rate of consumption? its almost like that the two semaphores help with back pressure as well
1 2 3 4 5
Integer CAPACITY = 50; Semaphore empty = new Semaphore(CAPACITY); Semaphore full = new Semaphore(0); Queue<Item> queue = new ArrayDeque<>(CAPACITY); Lock lock = new ReentrantLock();
- producer code -
1 2 3 4 5 6 7 8 9 10 11
while (true) { empty.acquire(); Item item = produce(); lock.lock(); try { queue.add(item); } finally { lock.unlock(); } full.release(); }
- consumer code -
1 2 3 4 5 6 7 8 9 10 11 12
while (true) { full.acquire(); Item item; lock.lock(); try { item = queue.poll(); } finally { lock.unlock(); } consume(item); empty.release(); }
- some different inter thread communication techniques we saw till now -
- calling
interrupt
from one thread on another thread. this is then further used in techniques likelockInterruptibly
- calling
join
for a thread to wait for another thread to complete its job - using
acquire
andrelease
on semaphore
- calling
- conditions flow -
- one thread checks a condition, and goes to sleep if the condition is not met
- a second thread can “mutate the state” and signal the first thread to check its condition again
- if the condition is met, the thread proceeds, else it can go back to sleep
- note - conditions come with a lock, so that the “state” being modified can be wrapped with a critical section
- note - when we call
await
on the condition, it also releases the lock before going to sleep, so that the second thread described in the flow above can acquire the lock to mutate the state. so, even though the thread which was waiting gets signaled to wake up, it also needs to be able to acquire the lock again, i.e. the other threads modifying state need to release the lock - placing the condition inside the while loop helps so that even if signalled, it will again start waiting if the condition is not met yet
- first thread -
1 2 3 4 5 6 7 8 9 10 11
ReentrantLock lock = new ReentrantLock(); Condition condition = lock.newCondition(); lock.lock(); try { while (condition x is not met) { condition.await(); } } finally { lock.unlock(); }
- second thread -
1 2 3 4 5 6 7 8
lock.lock(); try { // modify variables used in condition x... condition.signal(); // despite signalling, thread one does not wake up, we need to unlock the lock first } finally { lock.unlock(); }
- conditions also have advanced methods like -
await(timeout)
- just like locks have timeouts to prevent indefinite waitingsignalAll
- usingsignal
, only one of all the threads waiting on the condition wake up,signalAll
wakes all of them up
- the class Object, and therefore all objects have methods
wait
,notify
andnotifyAll
- therefore, without using any special classes -
- simulate conditions using
wait
,notify
andnotifyAll
- simulate locks using
synchronized
- simulate conditions using
- note - recall how when using conditions we were wrapping it via locks. we need to do the same thing here i.e. wrap using synchronized block in order to be able to call notify
- first thread -
1 2 3 4 5
synchronized (this) { while (condition x is not met) { wait(); } }
- second thread. my understanding - but needs to happen on same object and inside different method -
1 2 3 4
synchronized(this) { // modify variables used in condition x... notify(); }
- first thread -
- when we call
wait
on an object, the thread it was called on continues to be in waiting state until another thread callsnotify
on that object notify
will wake up any random thread that was sleeping, and to wake up all threads we can usenotifyAll
- if we think about it, the
lock.lock()
andlock.unlock()
are the starting and ending ofsynchronize
blocks respectively,condition.await()
is likewait()
andcondition.signal()
likenotify()
- introducing locks can make our code more error-prone, more subject to deadlocks etc. however, it makes the code more flexible, e.g. unlike synchronized blocks which have to exist within a single method, locks can be acquired and freed from different methods
- using locks result in issues like deadlocks if coded improperly
- our main objective is to execute instructions as a single hardware operation
- we can achieve this by using Atomic classes provided by java
1 2
AtomicInteger count = new AtomicInteger(initialValue); count.incrementAndGet();
- recall how we had discussed that
a = a + 1
actually consisted of three atomic operations, which has all been condensed down into one using these java helper classes - so, recall the counter example in shared resource earlier, and how we had solved it using synchronized. we can now get rid of the
synchronized
and implement it as follows -1 2 3
public void increment() { count.incrementAndGet(); }
- the disadvantage of using these classes is of course that only each operation by itself is atomic, a series of such calls together is not atomic, so it may be good only for simpler use cases
- a lot of operations use
compareAndSet
underneath, and we have access to it to. it sets the value to the new value if the current value matches the expected value. otherwise, the old value is retained. it also returns a boolean which is true if the current value matches the expected value1
count.compareAndSet(expectedValue, newValue);
AtomicReference
can be used for any object type to get and set values in a thread safe i.e. atomic way, and we can use methods like compareAndSet on it- e.g. notice how below, the synchronized keyword is not used for addSample, but we still have a thread safe implementation by using
compareAndSet
. note how and why we use a loop - if the old value stays the same before and after calculating the new value, then update using the new value, else recalculate using the new value using the “new old value”1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
class Metric { int count = 0; int sum = 0; } class MetricAtomic { AtomicReference<Metric> metric$ = new AtomicReference<>(new Metric()); public void addSample(int sample) { Metric currentMetric; Metric newMetric; do { currentMetric = metric$.get(); newMetric = new Metric(); newMetric.count = currentMetric.count + 1; newMetric.sum = currentMetric.sum + sample; } while (!metric$.compareAndSet(currentMetric, newMetric)); } }
- we often have a lot of tasks but not so many threads. some objects are not thread safe i.e. cannot be used by multiple threads. however, they can be used by multiple tasks being executed on the same thread. coding this ourselves can be tough, which is why we have
ThreadLocal
, which basically returns a new instance for every thread, and reuses that instance when a thread asks for that instance again1
public static ThreadLocal<Car> car = ThreadLocal.withInitial(() -> new Car());
- spring uses the concept of this via
ContextHolder
s in for instance,RequestContextHolder
,TransactionContextHolder
,SecurityContextHolder
, etc. my understanding - since spring follows one thread per-request model, this way, any of the services, classes, etc. that need access to information can get it easily. it is like setting and sharing state for a request
High Performance IO
- what is blocking io - when cpu is idle, e.g. when reading from database etc
- such io bound tasks block the thread till they return the result
- io bound tasks are very common in web applications etc
- how it works internally -
- the controllers like network cards return the response to the dma (direct memory access)
- the dma writes it to the memory
- the dma notifies the cpu that the response is available
- the cpu can now access the memory for variables
- so, during this entire duration, the thread that was processing the request that involved the io task (and thus reaching out to the controller) was sitting idle and thus was blocked
- this is why number of threads = number of cores does not give us the best performance when we have more io bound instead of cpu intensive tasks
- this is why we have a “thread per request model” in spring mvc, which i believe caps at 200 threads to prevent out of memory errors etc
- it has caveats like -
- creating and managing threads are expensive - recall how it has its own stack etc
- number of context switching increases, which too is an expensive operation - recall thrashing
- assume that there are two kinds of calls a web server supports - one that makes a call to an external service and one that calls the database. assume the external service has a performance bug, which makes the first call very slow. this way, if we had for e.g. 150 requests for first call and 150 for the second call (assume 200 is the default thread pool size in embedded tomcat), the 150 instances of the second call would start to be affected because of the 150 instances of the first call now
- so, the newer model used by for e.g. spring web flux is asynchronous and non blocking
- the thread is no longer blocked waiting for the response - a callback is provided which is called once the request is resolved
- so now, we can go back to the thread per core model - which is much more optimal
- there can be problems like callback hell etc, which is solved by using libraries like project reactor for reactive style of programming, which is more declarative to write
Virtual Threads
- till now, the
Thread
class we saw was actually a wrapper around an actual os thread - these are also called platform threads - since they map one to one with os threads
- virtual threads - they are not directly related to os threads. they are managed by the jvm itself
- this makes them much less resource intensive
- the jvm manages a pool of platform threads, and schedules the virtual threads on these platform threads one by one
- once a virtual thread is mounted on a platform thread, it is called a carrier thread
- if a virtual thread cannot progress, it is unmounted from the platform thread and the platform thread starts tracking a new virtual thread
- this way, the number of platform threads stay still small in number and influenced by the number of cores
- there is no context switching overhead just like in reactive programming - what we are saving on here - frequent normal (hence platform hence os threads) context switching is replaced by frequent virtual thread context switching
- creation techniques -
1 2 3 4 5 6 7 8 9 10
Runnable runnable = () -> System.out.println("from thread: " + Thread.currentThread()); new Thread(runnable).start(); // platform thread (implicit) // from thread: Thread[#19,Thread-0,5,main] Thread.ofPlatform().unstarted(runnable).start(); // platform thread (explicit) // from thread: Thread[#20,Thread-1,5,main] Thread.ofVirtual().unstarted(runnable).start(); // platform thread // from thread: VirtualThread[#21]/runnable@ForkJoinPool-1-worker-1
- note - virtual threads are only useful when we have blocking io calls, not when we have cpu intensive operations
- this happens because unlike the usual model where our thread had to sit idle for the blocking call, the platform thread never stops here and is always working, it is the virtual thread that is sitting idle, and hence we optimize our cpu usage because we are using our platform threads optimally
- so, developers still write the usual blocking code, which simplifies coding, as compared to say reactive programming
- underneath, the blocking calls have been refactored for us to make use of virtual threads so that the platform threads are not sitting idle
- e.g. cached thread pools replacement is new virtual thread per task executor - we do not have to create pools of fixed size - we use a thread per task model and all the complexity is now managed by jvm for us bts
- when we are using normal threads for blocking calls e.g. using jpa, the thread cannot be used. what we can do is use context switching to utilize the cpu better. however, this model meant we needed a lot of platform threads, and managing them, context switching between them, etc has a lot of overhead, which is why maybe embedded tomcat for instance had a cap of about 200 threads. now with virtual threads, there is no cap needed, so it can be used via cached thread pool executor equivalent, but here there would never be any out of memory etc issues like in cached thread pool executor, since virtual threads are very lightweight
- some notes -
- virtual threads are always daemon, and making them non daemon will throw an exception
- virtual threads do not have a concept of priority
Miscellaneous Notes
- io bound threads are prioritized more than computation based threads
- since most of the time of ios threads is spent in waiting state
- and most of the time of cpu bound threads is spent in computation
- maybe this is related to concepts of starvation etc somehow
- why context switching is expensive - the entire state of the thread has to be saved in memory - all the stack, instruction pointer, local variables inside the method, etc
- thread yield - helps hint to the scheduler that the current thread wants to give up its processor
- can be used by computationally expensive threads to hint the scheduler that they want to give up the processor for another thread
- the priority we set manually only serves as a hint - the os can choose to accept / ignore it
thread.start()
is not the same asthread.run()
-thread.run()
simply runs the runnable we pass to it inside the calling thread1 2 3 4 5 6 7
public static void main(String [] args) { Thread thread = new Thread(() -> System.out.println("Hello from " + Thread.currentThread().getName())); thread.setName("New Thread"); thread.run(); // Hello from main }
- Thread.State - an enum, with the following states -
- NEW - created but not yet started
- RUNNABLE - thread is available for execution / already executing on some processor
- BLOCKED - blocked for a monitor / lock
- WAITING - a thread goes into this state after we call
object.wait()
orsome_thread.join()
- so, the idea is that the thread now waits for some other thread’s action?- a thread can also go “out” of this state after we call
object.notify()
from elsewhere to wake this thread up
- a thread can also go “out” of this state after we call
- TIMED_WAITING - same as above but with timeouts? threads on calling
Thread.sleep
also go to this state - TERMINATED - after thread has finished execution
- when we override the
start
method, we need to callsuper.start()
- when we say
Thread.sleep(x)
, first the thread goes into timed_waiting state. after the time when it does go into runnable state, there is no guarantee that the thread will iimediately be scheduled on a core - a core might be occupied by some other thread - an
IIllegalMonitorStateException
is thrown if we try to callawait
/signal
on a Condition without locking thelock
first
Generics
- what is generics -
- helps extend java’s type system - types now start acting like parameters that we as clients can provide
- to allow a type or method to operate on objects of various types to thus allow reusability. e.g. without generics, we would use overloading, which causes a lot of duplication of logic -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
class OverloadingProblem { public static Double add(Double a, Double b) { return a + b; } public static Integer add(Integer a, Integer b) { return a + b; } public static void main(String[] args) { System.out.println(add(1, 5)); System.out.println(add(1.2, 5.3)); } }
- while providing compile time safety - e.g. without using generics, we would use type casting, which has two compile time checks but one runtime check, and catching errors at compile time > catching them at runtime -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
class TypeCastingProblem { private static Object item = null; public static void setItem(Object item) { TypeCastingProblem.item = item; } public static Object getItem() { return item; } public static void main(String[] args) { setItem(1.4); Integer item = (Integer) getItem(); } }
- we use the diamond operator for generics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
class Pair<K, V> { private K key; private V value; public Pair(K key, V value) { this.key = key; this.value = value; } @Override public String toString() { return "{ " + key + ": " + value + " }"; } } class GenericExample { public static void main(String args[]) { Pair<String, Integer> score = new Pair<>("maths", 85); System.out.println(score); } }
- generic method - my understanding - this is useful when the class itself is not generic / maybe method and class generics do not mean the same thing, so we can for e.g. use
T
for class andV
for method1 2 3 4 5 6 7 8 9 10 11
class GenericMethod { public static <T> void printer(T arg) { System.out.println("value is: " + arg); } public static void main(String args[]) { printer(1); printer("hello"); } }
- while the return type above is void, we could have for e.g. returned
T
etc as well - bounded generic types - bound the types that are allowed to be used, to get access to the additional functionality that is present in the types used in these bounds, e.g. only allow
Number
and its subclasses to be used for a generic class containing mathematical utilities - we use the
extends
keyword to achieve bounded generic types, and the target type should be a subclass of the interface / class mentioned in this clause1 2 3
public static <T extends Comparable<T>> T calculateMin(T a, T b) { return (a.compareTo(b) < 0) ? a : b; }
- e.g.
copy
is implemented as follows - my understanding - this is to help make use of dynamic polymorphism + bounded types. note - just because we can dosuperclass_reference = subclass_reference
, does not mean we can doList<superclass_reference> = List<subclass_reference>
1
public static <T> void copy(List<? super T> dest, List<? extends T> src)
- we can also specify multiple bounds using
&
- type inference - determine the types automatically. some examples -
- java can automatically guess “the most specific type” that both
String
andArrayList
can work with -Serializable
1 2 3 4 5 6 7 8 9 10
class TypeInference { public static <T> T getFirst(T a, T b) { return a; } public static void main(String[] args) { Serializable result = getFirst("hello world", new ArrayList<Integer>()); } }
- we use
List<String> list = new ArrayList<>();
and notnew ArrayList<String>()
- we use
list.add("name")
and notlist.<String>add("name")
- java can automatically guess “the most specific type” that both
- note - just because
Number
andInteger
are related via inheritance, it does not meanList<Number>
andList<Integer>
are somehow related as well - this is the motivation behind wildcards
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
import java.util.List; class Wildcards { private static void print(List<Object> list) { list.forEach(System.out::println); } public static void main(String[] args) { List<Integer> list = List.of(1, 2, 3); print(list); } } // error: incompatible types: List<Integer> cannot be converted to List<Object> // solution - notice the use of ? // private static void print(List<?> list) { ...
- upper bounded wildcards - when we use
?
andextends
, e.g. allow all lists where the type of element is a subclass of the class specified in the generic method signature - drawback - e.g. while we can print all elements of the list easily, we cannot add an element to the list - e.g. the list is actually of integer, and we might be trying to add a double to the list. since java cannot identify this problem, it gives a compile time error
- e.g. this works perfectly
1 2 3 4 5 6 7 8
private static void printList(List<? extends Number> numbers) { numbers.forEach(System.out::println); } public static void main(String[] args) { printList(List.of(1, 2, 3)); printList(List.of(1.1, 2.2, 3.3)); }
- however, if we add the below to the printList method -
1 2 3 4
private static void printList(List<? extends Number> numbers) { numbers.forEach(System.out::println); numbers.add(7); }
- we get the error below -
1 2 3 4 5
BoundedWildCardsExtends.java:7: error: incompatible types: int cannot be converted to CAP#1 numbers.add(7); ^ where CAP#1 is a fresh type-variable: CAP#1 extends Number from capture of ? extends Number
- lower bounded wildcards - when we use
?
andsuper
, e.g. allow all lists where the type of element is a superclass of the class specified in the generic method signature - so now, since java knows that the list passed to has elements of supertype of specified type, we can now add elements to the list of that type (dynamic polymorphism)
- drawback - we cannot read from the list - we have to treat the element as type
Object
1 2 3 4 5 6 7 8 9 10
public static void addToList(List<? super Double> list) { list.add(1.4); } public static void main(String[] args) { List<Object> list = new ArrayList<>(); list.add(1); list.add("shameek"); addToList(list); System.out.println(list); }
- use case of wildcards + bounded types - copy elements from one list to another -
1 2 3
public <T> void copy(List<? extends T> source, List<? super T> destination) { source.forEach(destination::add); }
- so, we should -
- use “lower bounded wildcards” when we want to perform some kind of mutation
- use “upper bounded wildcards” when we want to read values
- use “type parameters” when we want to do both reading and writing
- one difference between “type parameters” and “wildcards” is that type parameters allow for multiple bounds unlike wildcards, e.g. following is valid -
<T extends XYZ & PQR>
- rule of thumb? - use wildcards when possible, when not possible (e.g. we want to influence return type based on arguments), then use type parameters
- type erasure - java replaces all generic types we define with either Object, or the bound if a bound is specified
- as a part ofo this, java might introducing type casting etc as well
- e.g. the code below -
1 2 3 4 5
List<Integer> list = new ArrayList<>(); list.add(1); Integer ele = list.get(0); class Store<T extends Serializable> { T item; }
- is converted to this code due to type erasure -
1 2 3 4 5
List list = new ArrayList(); list.add(1); Integer ele = (Integer) list.get(0); class Store { Serializable item; }
Collections
List
ArrayList
allows for control over ordering of elements- all items are identified by an index
- items are located right next to each other in ram, thus making random access via index o(1)
- searching for items based on value is however o(n)
- adding items at the end is o(1)
- adding items at random positions is o(n), since it requires shifting of items by one position
- same logic is applicable for removal of items - o(1) for removing items from the end and o(n) for removing items from arbitrary positions
- size of array lists in java can change dynamically - once the amount of memory allocated gets over, a list with memory equal to double the size of the current list is provisioned, and all the items from the current list are copied over to the new list
- however, this ability to resize dynamically comes at a price - it takes o(n) time for this resize + copying over of items to the new location to happen
- however, when instantiating, we can provide the initial capacity, so that this resizing does not have to happen often
- disadvantage of array lists - when removing / adding items at random positions, a lot of shifting is needed to maintain the contiguous nature
- this problem is not there when using
LinkedList
- since in linked lists, there is only a pointer to the next element that needs to be maintained
- disadvantage - linked lists don’t allow random access with given index at o(1) time
- note - linked list in java is optimized -
- implemented as doubly linked list which allows it traversal in both directions
- maintains pointers to both head and tail - e.g. we can do use both
addFirst
andaddLast
at o(1) time
- linked list vs array list performance for adding elements at the beginning -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
import java.util.List; import java.util.ArrayList; import java.util.LinkedList; class ListPerformance { public static void main(String args[]) { perform("linked list", new LinkedList<>()); perform("array list", new ArrayList<>()); } private static void perform(String type, List<Integer> list) { long start = System.currentTimeMillis(); for (int i = 0; i < 500000; i++) { list.add(0, i); } long end = System.currentTimeMillis(); System.out.println("time taken by " + type + ": " + (end - start) + "ms"); } }
- output -
1 2
time taken by linked list: 75ms time taken by array list: 43375ms
- note - while we compared linked list to array lists above, as discussed later, if removing or adding to one of the ends, the most performant option we have is array deque, not stacks, not linked lists
- vector - synchronized implementation of array list i.e. all operations like add etc will do acquiring and releasing of lock
- generally, doing this using our own locks might be better, since we get more flexibility, e.g. batch multiple operations under one acquiring + releasing of lock
- stack - lifo structure (last in first out)
- important operations include
push
,pop
andpeek
- note - stacks use vectors underneath, so they are inherently synchronized
1 2 3 4 5
Stack<String> stack = new Stack<>(); stack.push("jane"); stack.push("jackson"); System.out.println(stack); // [jane, jackson] System.out.println(stack.pop()); // jackson
- to avoid using synchronized version, we can use array dequeue instead
Queues
- queues - fifo structure (first in first out)
- important operations include
add
(enqueue),remove
(dequeue) andpeek
(retrieve but not remove last element) - queues are abstract like stack as well - it is implemented using linked lists
1 2 3 4 5
Queue<String> queue = new LinkedList<>(); queue.add("jane"); queue.add("jackson"); System.out.println(queue); // [jane, jackson] System.out.println(queue.remove()); // jane
- priority queue - objects being stored inside a priority queue should extend the
Comparable
interface - this helps retrieve items form the structure in the order of their priority
- dequeue - double ended queue - o(1) for operating from either side of the collection. it is implemented by array dequeue and just like normal queues, we can implement it using linked lists instead as well
- note - java calls it deque and not dequeue
1 2
Deque<String> dequeOne = new LinkedList<>(); Deque<String> dequeTwo = new ArrayDeque<>();
- my doubt about performance - based on the fact that array dequeue might be using an array underneath, doing the typical “capacity resizing” that we discussed, would we have an even more performant solution if we were to use linked list? be it for implementing stacks or queues, logically performance of linked lists > array dequeues (dynamic capacity resizing issue) > stacks (synchronization issue)
- based on this answer, apparently not, because the main overhead that comes with linked lists is the extra creation of that node, garbage collection of that node, etc
- so, it is probably safe to conclude that in java, when we are looking for stack or queue implementation, we should use array dequeues almost always (over the stack since it is synchronized, and linked lists since it has memory overhead?)
- also, do not use array lists blindly - if we just have to remove and add elements to either ends, and do not need random access, array dequeues might be better than array lists (inserting at beginning of array list is o(n) and inserting at beginning of array deque is o(1))
Maps
- key value pairs
- also called associative arrays
- with maps, we ensure times of o(1) for adding, removing and lookup
- maps are unordered / do not support sorting
- the idea is that since keys in a map are unique, we transform the keys into an index between 0 to length - 1 of array using a hash function. then, accessing elements via the given key becomes o(1) - we just need to translate the key into an index using the hash function, and random access of elements in an array is an o(1) operation
- the hash function should be able to handle the type of key - e.g. if the key is an integer, using modulo operator with the length of array is enough, if the key is a string then ascii value of characters can be used and so on
- collision in hash tables - the hash function we used result in the same value for multiple keys
- overwrite - replace current value with new incoming value
- chaining - each bucket in the hash table can store a linked list. worst case scenario - all keys evaluate to the same value, so the entire map is just a single big linked list stored in one bucket, thus resulting in an o(n) complexity instead of o(1)
- open addressing -
- linear probing - try finding the next available empty slot - k + 1, k + 2, k + 3, … disadvantage - clusters are formed i.e. elements with same hash are clustered together
- quadratic probing - try finding the next available empty sot using a quadratic polynomial - k + 1, k + 4, k + 9, k + 16, …
- rehashing - perform another hashing on the key till an empty slot is found - h(h(h….(x)))
- so actually, worst case in hash tables for all operations - insertions, deletions and lookups are o(n)
- load factor - n / m, where n = number of items in the hash table and m = size of the array. if it is close to 1, the probability of collision will increase
- so, we can also do dynamic resizing of hash tables. disadvantage - this resizing is an o(n) operation
- in java, for
HashMap
, when the load factor becomes around 0.75, the dynamic resizing happens - however, hash maps cannot be used in multithreaded scenarios, since they are not synchronized
- some important methods available in maps -
keySet()
,entrySet()
,values()
- auto generated hash code example - look at how a prime number is used to generate a function with less collision chances
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
class Person { private Integer age; private String name; @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((age == null) ? 0 : age.hashCode()); result = prime * result + ((name == null) ? 0 : name.hashCode()); return result; } }
- note - the
equals
needs to be overridden as well. it might happen that due to chaining discussed earlier, multiple items end up in the same bucket of hash table. at that point, java might need to be able to differentiate between two different elements of the same hash - so basically, java uses both chaining and dynamic resizing based on load factor by the looks of it
LinkedHashMaps
vsHashMaps
- linked hash maps use a doubly linked lists underneath to track the “order of insertion”, so the keys are basically ordered according to insertion time1 2 3 4 5 6 7
Map<String, Integer> hashMap = new HashMap<>(); hashMap.put("aaa", 1); hashMap.put("bbb", 2); hashMap.put("ccc", 3); System.out.println(hashMap); // {aaa=1, ccc=3, bbb=2} Map<String, Integer> linkedHashMap = new LinkedHashMap<>(); linkedHashMap.put("aaa", 1); linkedHashMap.put("bbb", 2); linkedHashMap.put("ccc", 3); System.out.println(linkedHashMap); // {aaa=1, bbb=2, ccc=3}
- balanced bst (binary search trees) - red black trees and avl trees
- tree rotations are used to maintain this structure
- tree maps use red black trees unlike in hash maps, where an array like structure is used
- so, the keys are stored in sorted order in tree maps. notice how it is automatically fr us below -
1 2 3
Map<String, Integer> treeMap = new TreeMap<>(); treeMap.put("ccc", 3); treeMap.put("bbb", 2); treeMap.put("aaa", 1); System.out.println(treeMap); // {aaa=1, bbb=2, ccc=3}
- because it uses trees, operations have a guaranteed complexity of o(log n) in tree maps, whereas operations have mostly o(1) but sometimes o(n) complexity in case of hash maps
- my understanding - since a bst is being used, concept of collision, load factor, etc do not exist in tree maps unlike in hash maps
- so, for huge workloads, while we might have to consider tuning the load factor in case of hash set, we do not have to think about it in case of a tree set
- note - in newer versions, hash maps does not use linked lists (chaining) for each bucket, it uses red black trees for each bucket. this further optimizes the hash maps now
- because of the very nature - using a red black tree per bucket, using an array to store the multiple keys, etc - memory required by hash maps > tree maps
- but remember, reducing time > reducing memory with cloud etc
Sets
- they allow no duplicates
- hash sets and hash maps work in the same way - a one dimensional array is used to store the elements by performing a hash on the element
- some important functions -
add
,remove
,retainAll
(callingset2.retainAll(set1)
will retain all the elements in the set2 present in set1, and remove other elements from set2) - so, operations are mostly are o(1) but can be o(log n) in worst case / o(n) when dynamic resizing is needed
- again, linked hash sets are same as hash maps, the insertion order would be maintained, which is maintained with the help of an additional doubly linked list
- finally, tree set are same as tree maps - maintain elements in a sorted order using a red black tree underneath, thus making operations o(log n) in general
- tree sets come with their own additional methods - e.g.
subset(a, b)
will give us a new set with all values of the set present between a and b,first
for getting the first element, etc
Sorting
- sort - notice “reverse order” below -
1 2 3 4 5 6 7 8 9
List<Integer> list = new ArrayList<>(); list.add(3); list.add(2); list.add(1); list.add(4); list.add(5); System.out.println(list); // [3, 2, 1, 4, 5] Collections.sort(list); System.out.println(list); // [1, 2, 3, 4, 5] Collections.sort(list, Collections.reverseOrder()); System.out.println(list); // [5, 4, 3, 2, 1]
- we can implement
Comparable
on our custom classes to be able to sort them directly -1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
class Person implements Comparable<Person> { String name; Integer age; Person(String name, Integer age) { this.name = name; this.age = age; } public int compareTo(Person person) { Integer nameDiff = name.compareTo(person.name); Integer ageDiff = age.compareTo(person.age); return ageDiff != 0 ? ageDiff : nameDiff; } public String toString() { return "Person(name=" + name + ", age=" + age + ")"; } } class CustomSortComparable { public static void main(String[] args) { List<Person> people = new ArrayList<>(); people.add(new Person("ayan", 25)); people.add(new Person("ruth", 5)); people.add(new Person("jack", 25)); people.add(new Person("jane", 25)); people.add(new Person("mike", 20)); System.out.println(people); // [Person(name=ayan, age=25), Person(name=ruth, age=5), Person(name=jack, age=25), Person(name=jane, age=25), Person(name=mike, age=20)] Collections.sort(people); System.out.println(people); // [Person(name=ruth, age=5), Person(name=mike, age=20), Person(name=ayan, age=25), Person(name=jack, age=25), Person(name=jane, age=25)] } }
Comparator
use cases -- we want to sort using multiple techniques. compareTo can only have one implementation, therefore lacks flexibility
- we want to sort a class not in our control i.e. we cannot change the class to make it implement
Comparable
- also helps achieve separation of concerns
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
class PersonAgeComparator implements Comparator<Person> { @Override public int compare(Person person1, Person person2) { return person2.age.compareTo(person1.age); } } Collections.sort(people, new PersonAgeComparator()); System.out.println(people); // [Person(name=ayan, age=25), Person(name=jack, age=25), Person(name=jane, age=25), Person(name=mike, age=20), Person(name=ruth, age=5)] Collections.sort(people, new PersonAgeComparator().reversed()); System.out.println(people); // [Person(name=ruth, age=5), Person(name=mike, age=20), Person(name=ayan, age=25), Person(name=jack, age=25), Person(name=jane, age=25)]
- using lambdas - for a more functional style, we can use the following syntax as well 🤯 -
1 2 3 4 5 6
Collections.sort( people, Comparator.comparing(Person::getAge).reversed().thenComparing(Person::getName) ); System.out.println(people); // [Person(name=ayan, age=25), Person(name=jack, age=25), Person(name=jane, age=25), Person(name=mike, age=20), Person(name=ruth, age=5)]
Miscellaneous
- some methods, refer docs for more -
1 2 3 4 5 6 7 8 9 10 11 12
List<Integer> list = new ArrayList<>(); list.add(5); list.add(1); list.add(2); list.add(4); list.add(3); System.out.println("original list = " + list); // original list = [5, 1, 2, 4, 3] Collections.shuffle(list); System.out.println("shuffled list = " + list); // shuffled list = [3, 1, 5, 4, 2] Collections.reverse(list); System.out.println("reversed list = " + list); // reversed list = [2, 4, 5, 1, 3] System.out.println("min = " + Collections.min(list) + ", max = " + Collections.max(list)); // min = 1, max = 5
- since collections are pass by reference, make collections unmodifiable so that clients cannot mutate our collections
1 2 3 4 5
List<Integer> unmodifiableList = Collections.unmodifiableList(list); unmodifiableList.add(-1); // Exception in thread "main" java.lang.UnsupportedOperationException // at java.base/java.util.Collections$UnmodifiableCollection.add(Collections.java:1091) // at MiscellaneousMethods.main(MiscellaneousMethods.java:20)
- if we want to obtain a synchronized version of the normal collections we can use
List<Integer> synchronizedList = Collections.synchronizedList(normalList)
- drawback - coarse grained locking is used, all methods use
synchronized
keyword now - so, better solution is to use concurrent collections that java provides, e.g.
ConcurrentHashMap
Maven
- maven is a build tool for java
- other alternatives are gradle, ant, etc
- build - process of building source code into artifacts that can be run
- maven has various plugins -
- jar plugin to create jars
- compiler plugin to help compile code
- surefire plugin to execute tests
- a plugin has various goals. goals represent a unit of work
- to examine a plugin, we can use the following commands -
1
mvn help:describe -Dplugin=org.apache.maven.plugins:maven-compiler-plugin`
- maven coordinates -
- group id - company / department name. domain name in reverse order is the convention
- artifact id - project name
- version -
- packaging - there are two types of packaging - jar (mostly used nowadays and the default) and war (web application archive)
- classifier - e.g. we want to build for different versions of java but use the same pom. so, we can use classifiers like
jdk8
andjdk11
. these then get appended to the version, so people can import the right dependency
- out of these, the gav (group id, artifact id and version) help us uniquely identify the project
- to use these libraries in our projects, we use repositories
- there two repositories - local repository and remote repository
- basically, maven downloads from remote repositories and puts it into our local repository
- then, our projects running locally can use dependencies downloaded in this local repository
- default location for local repository is ~/.m2/repository
- default url for remote repository is https://repo1.maven.org/maven2/ (called maven central)
- we can configure remote repositories via settings.xml - so that we can use our own remote repository - use case - companies maintain their own remote repository, which is a mirror of maven central
Plugin Management
- lifecycle has phases
- a phase has multiple goals attached to it
- if a phase does not have any goals attached to it, it would not be executed
- e.g. the clean lifecycle has three phases - pre-clean, clean and post-clean
- only the clean phase of the clean lifecycle is attached to a goal
- it is attached to the clean goal of maven-clean-plugin plugin 🤯
- when we say
mvn clean
, we are actually instructing maven to run the clean phase - when we run a phase, all the phases before it in the lifecycle are executed - in this case pre-clean would be executed first (if it has some goals attached to it, it does not by default) and the clean phase itself
- we just discussed that we typically invoke
mvn <<phase>>
, which runs all the goals of all the phases up to before the specified phase’s lifecycle. however, we can also invoke a particular goal using the following syntax variations -mvn plugin_group_id:plugin_artifact_id:plugin_version:goal
mvn plugin_group_id:plugin_artifact_id:goal
mvn plugin_prefix:goal
mvn plugin_prefix:goal@execution_id
- while executions help us tie goals to phases, we can also invoke these executions directly
1 2 3
mvn org.apache.maven.plugins:maven-clean-plugin:2.5:clean mvn org.apache.maven.plugins:maven-clean-plugin:clean mvn clean:clean
- there are two kinds of plugins -
- reporting plugins - run during site generation
- build plugin - run to help build the project
- below - we try to tie the run goal of maven-antrun-plugin to pre-clean and post-clean phases -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-antrun-plugin</artifactId> <version>3.0.0</version> <executions> <execution> <id>1</id> <phase>pre-clean</phase> <goals> <goal>run</goal> </goals> <configuration> <target> <echo level="info">Learning Maven: pre-clean</echo> </target> </configuration> </execution> <execution> <id>2</id> <phase>post-clean</phase> <goals> <goal>run</goal> </goals> <configuration> <target> <echo level="info">Learning Maven: post-clean</echo> </target> </configuration> </execution> </executions> <configuration> <target> <echo level="info">Learning Maven: standalone invoking</echo> </target> </configuration> </plugin>
- so, now when we run post-clean phase, all three phases - pre-clean, clean and post-clean would be run
- configuring a plugin
- a plugin can have multiple execution blocks. each execution block specifies -
- what goal to run
- what phase to tie this goal to
- configuration for the goal
- a configuration element can be specified in the root as well. earlier point was us basically specifying multiple execution blocks, which helped us tie goals to phases. this point here is about specifying configuration in the root block of the plugin. this can be useful when we invoke the plugin:goal directly
- dependencies - if a plugin has dependencies, we can for e.g. specify the version of that dependency using this block
- inherited - by default, the plugin configuration is inherited by the children. we can disable this behavior by setting inherited to false
- a plugin can have multiple execution blocks. each execution block specifies -
- id should be unique across all executions for a plugin (not across plugins)
- apart from clean, the two other lifecycles are default and site
the goals that are triggered for the default lifecycle are dependent on the packaging type (recall packaging type can be one of jar or pom, it is a part of maven coordinates). for jar, this is the table -
phase plugin:goal process-resources resources:resources compile compiler:compile process-test-resources resources:testResources test-compile compiler:testCompile test surefire:test package jar:jar install install:install deploy deploy:deploy - when we specify dependencies in dependency management of parent, child projects can get these dependencies if they want to, but don’t get the dependency unless added explicitly. plugin management works in the same way - inherit all the configuration related to the plugin specified in the plugin management section of the parent, but do not get it by default unless the plugin is added explicitly
- extra - executing scripts using exec maven plugin! -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
<plugin> <artifactId>exec-maven-plugin</artifactId> <version>3.1.1</version> <groupId>org.codehaus.mojo</groupId> <executions> <execution> <id>Renaming build artifacts</id> <phase>package</phase> <goals> <goal>exec</goal> </goals> <configuration> <executable>bash</executable> <commandlineArgs>handleResultJars.sh</commandlineArgs> </configuration> </execution> </executions> </plugin>
Inheritance and Aggregation
<modelVersion>
helps determine the xsd (scm schema definition) version to use i.e. what elements are allowed in the pom file, how they should be configured, etc- multiple levels of inheritance is supported in pom
- all pom (directly or indirectly) inherit from the super pom
- this inheritance helps us extract out common functionality around plugins, plugin configuration, dependencies, etc to a parent pom from which all other projects can inherit
- we can print the effective pom like so -
mvn help:effective-pom
- my understanding - the parent might be managed separately -
- parent would be downloaded from the remote repository into the local repository, post which it can be used
- for development purpose - build the parent, which will install it in the local repository, and then build the child
- the parent might be managed in the same project, in which we can provide the
relativePath
. understand that this way, we do not have to build the parent project separately like above - - also, packaging type in parent can be specified to be
pom
instead of relying on the default value i.e.jar
- till now, we discussed inheritance. we can also use aggregation in maven
- use case - when we run a phase e.g.
mvn clean
,mvn install
, etc., it gets run for all the child projects as well - not only that - in aggregate projects, if the child projects depend on each other, maven can determine the right order to build them in for us automatically
- we can also use the same pom for both aggregation and inheritance
- notes about versions -
- version property of parent gets inherited by the children as well
- for specifying the version of parent in the child, we use
${revision}
- for specifying interdependencies between children, we use
${project.version}
- based on everything above, a simple multi module setup -
- parent -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
<project> <modelVersion>4.0.0</modelVersion> <groupId>org.apache.maven.ci</groupId> <artifactId>ci-parent</artifactId> <version>${revision}</version> <properties> <revision>1.0.0-SNAPSHOT</revision> </properties> <modules> <module>child1</module> <module>child2</module> </modules> </project>
- child -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
<project> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.apache.maven.ci</groupId> <artifactId>ci-parent</artifactId> <version>${revision}</version> <relativePath>../pom.xml</relativePath> </parent> <groupId>org.apache.maven.ci</groupId> <artifactId>ci-child</artifactId> <dependencies> <dependency> <groupId>org.apache.maven.ci</groupId> <artifactId>child2</artifactId> <version>${project.version}</version> </dependency> </dependencies> </project>
- parent -
Dependency Management
- we can specify a range of versions using
[3.8, 4.0)
([
for inclusive,(
for exclusive) - version format -
<<major-version>>.<<minor-version>>.<<patch>>-<<qualifier>>
- the qualifier
SNAPSHOT
is used for unstable projects, which can change frequently - this way, if we depend on a project with snapshot in its version, we get access to the latest code always
- my understanding - if for e.g. we do not use snapshot -
- if the local repository already has an existing copy, maven would not bother refetching it from the remote repository to refresh the local copy
- probably sometimes the remote repository also would not allow pushing artifacts again against the same version
- bts, this SNAPSHOT is converted to timestamp automatically for us - so,
x.y.z-SNAPSHOT
basically becomesx.y.z-timestamp
, and thus this way, maven always tries pulling the latest version for us - maven is able to handle transitive dependencies for us - if our project depends on jar a which in turn depends on jar b, maven is able to download jar a and then jar b automatically for us when building the project
- classpath - location of classes and packages that our project is dependent on
- the different dependency scopes -
- compile - include dependency in all classpaths. the default if
scope
is not specified explicitly - test - only required for compiling and executing tests, not required when executing therefore need not be included when packaging artifacts
- runtime - include dependency when project executes or tests are being run, but do not include them when compiling. e.g. jdbc driver like mysql connector. use case - we as developers will not mistakenly depend on these libraries
- provided - dependencies provided by the environment. e.g. we are developing a web application, we would need to depend on the servlet api to compile, but we would not want to include this in the war file, since it would be provided to us by the servlet container
- system - like provided, but the path when compiling is specified manually
1 2 3 4 5 6 7
<dependency> <groupId>io.datajek</groupId> <artifactId>some-dependency</artifactId> <version>1.0</version> <scope>system</scope> <systemPath>${project.basedir}/libs/dep-1.0.jar</systemPath> </dependency>
- import - when a dependency is of type pom and has the scope of import, it should be replaced by its dependencies in its
dependencyManagement
section
- compile - include dependency in all classpaths. the default if
- dependency mediation - choosing what version of dependency to use
- default behavior -
- e.g. our project depends on A and B. A depends on D which again depends on E (version x). B directly depends on E (version y). our project would use E (version y), because if we imagine dependencies like a tree, E (version y) is the closest to root
- e.g. our project depends on B and A (B comes first in pom.xml). B depends on E (version x), while A depends on E (version y). our project would use E (version x), because B comes first
- so one technique based on above - if we would like to use version x of E invariably - place version x of dependency E as early as possible and directly inside the pom. this way, we end up using the verison x of E always
- when adding a dependency, if we use the
exclusion
tag along with it, the dependencies specified in the exclusion tag are excluded from the dependency tree -1 2 3 4 5 6 7 8 9 10 11
<dependency> <groupId>io.datajek</groupId> <artifactId>project9-projectb</artifactId> <version>1</version> <exclusions> <exclusion> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> </exclusion> </exclusions> </dependency>
- this means that we should either expect gson to come as a transitive dependency from another project, or include gson manually inside our pom as another dependency, etc
- lets say our project name is xyz, and we mark a dependency in our pom as optional
- it excludes this dependency from being added as a transitive dependency in any project that has xyz itself as a dependency
- dependency management section - this way, all the projects in for e.g. a team can specify the versions of dependencies that work well with each other in one place, and all of it gets inherited by all other child projects
- example -
- if a parent has the following section -
1 2 3 4 5 6 7 8 9
<dependencyManagement> <dependencies> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.8.6</version> </dependency> </dependencies> </dependencyManagement>
- the child can skip the version of gson when adding it as a dependency
1 2 3 4 5 6
<dependencies> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> </dependency> </dependencies>
- if a parent has the following section -
- another use case of dependency management section 🤯 - helps with transitive dependencies as well - e.g. if our project has a dependency A, which depends on C (version x), and has a dependency B, which again depends on C (version y). if we add the dependency C (version z) in the dependency management section, version z of dependency is the one that maven uses!
- note - we could also have included dependency C (version z) directly in the dependency section to force maven to use version z (default behavior - closest to the root wins). however, if another project added this project as a dependency, even if it was not using dependency C (version z) directly, it would still have it being added to its classpath. this problem would not have happened in the first place if we had added dependency C (version z) in the dependency management section as described earlier
Build Portability
- build portability - having consistent ways to build cross environments, machines, teams, etc
- variables - variables defined in parent are inherited by children
- however, children can override these variables
- project can be accessed using
project
, e.g.${project.version}
,${project.build.sourceDirectory}
, etc. the root element of our pom isproject
, so that is where these variables come from. another very useful one i found -${project.parent.basedir}
if for e.g. a child project wants to access something from the parent directory - whatever we define in the properties section can be accessed using the name of the property directly, e.g.
${MyString}
- java system properties (what we access using
java.lang.System.getProperties()
) can be accessed usingjava
, e.g.${java.home}
- environment variables can be accessed using
env
, e.g.${env.PATH}
- variables in settings.xml can be accessed using
settings
, e.g.${settings.offline}
- profiles - alternative configuration for overriding default values
- we can specify profiles either in the project specific pom.xml, or in settings.xml, which itself can be machine / project specific
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
<project> <modelVersion>4.0.0</modelVersion> <groupId>io.datajek</groupId> <artifactId>project14</artifactId> <version>1</version> <name>Project14</name> <profiles> <profile> <id>test</id> <build> <plugins> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <executions> <execution> <id>my-special-exec</id> <configuration> <executable>/Project14/testScript.sh</executable> </configuration> </execution> </executions> </plugin> </plugins> </build> </profile> <profile> <id>prod</id> <build> <plugins> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <executions> <execution> <id>my-special-exec</id> <configuration> <executable>/Project14/prodScript.sh</executable> </configuration> </execution> </executions> </plugin> </plugins> </build> </profile> </profiles> <build> <pluginManagement> <plugins> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>3.0.0</version> <executions> <execution> <id>my-special-exec</id> <phase>clean</phase> <goals> <goal>exec</goal> </goals> </execution> </executions> </plugin> </plugins> </pluginManagement> </build> </project>
- the most basic way in which we can specify which profile to use -
mvn clean -Ptest
- another way ofo enabling a certain profile - inside the
profile
section we saw above, we can have an activation section like below -1 2 3 4 5 6
<activation> <property> <name>testProp</name> <value>DataJek</value> </property> </activation>
- this means that the profile expects the system property testProp to be of value DataJek -
mvn clean -DtestProp=DataJek
- archetypes - a project templating toolkit so that a new project can be created easily with all the for e.g. firm specific standards established in the project from the get go
Working of Java
- java tries releasing changes every 6 months
- not all versions of java are lts (long term support) - java will provide patches and security updates for a long time only for versions marked lts - effort towards maintaining non lts versions would be much less
- open jdk is the underlying java source code, but it is not responsible for building and distributing installable binaries
- there are various vendors sitting on top of it, like adopt openjdk, oracle jdk, and then some providers like amazon have their own jdk distributions which work better with their own aws products