From 7e225c9afaa0ed18a5c626502093e2facbcb35dc Mon Sep 17 00:00:00 2001 From: Daniel Arad Date: Sat, 24 Mar 2018 14:41:01 +0300 Subject: [PATCH] HTTP.md and Algorithm.md fixed and added --- notes-english/notes/Algorithm.md | 1650 +++++++++++++++-- notes-english/notes/HTTP.md | 599 +++++- ...剑指 offer 题解.md => 剑指 offer 题解.txt} | 0 3 files changed, 2051 insertions(+), 198 deletions(-) rename notes/{剑指 offer 题解.md => 剑指 offer 题解.txt} (100%) diff --git a/notes-english/notes/Algorithm.md b/notes-english/notes/Algorithm.md index 7943e324..557e477a 100644 --- a/notes-english/notes/Algorithm.md +++ b/notes-english/notes/Algorithm.md @@ -1,227 +1,1587 @@ -* [I. Basic Concepts] (#1 - Basic Concepts) -    * [Web infrastructure] (#web-based) -    * [URL] (#url) -    * [Request and Response Message] (#Request and Response Message) -* [2. HTTP method] (#2 http-method) -    * [GET: Get Resources] (#get Get Resources) -    * [POST: transport entity body] (#post transport entity body) -    * [HEAD: Get Message Header] (#head Get Message Header) -    * [PUT: upload file] (#put upload file) -    * [PATCH: Partially modify the resource] (#patch partially modified the resource) -    * [DELETE: delete file] (#delete delete file) -    * [OPTIONS: query support method] (#options query support method) -    * [CONNECT: Requires Tunneling Protocol to Connect to Proxy] (#connect requires a tunneling protocol to connect to the proxy) -    * [TRACE: Tracking Path] (#trace Tracking Path) -* [3. HTTP status code] (#3 http-status code) -    * [2XX Success] (#2xx-Success) -    * [3XX Redirect] (#3xx-Redirect) -    * [4XX Client Error] (#4xx - Client Error) -    * [5XX Server Error] (#5xx-Server Error) -* [4. HTTP Header] (#4 http-header) -    * [Universal Header Fields] (#Common Header Field) -    * [Request header field] (#Request header field) -    * [Response header field] (# response header field) -    * [Entity Header Field] (#Entity Header Field) -* [5. Specific application] (#5 specific application) -    * [Cookie](#cookie) -    * [Cache] (#Cache) -    * [Permanent connection] (# persistent connection) -    * [Code] (#Code) -    * [Block Transfer] (#Block Transfer) -    * [Multipart Object Collection] (#Multipart Object Collection) -    * [Scope Request] (#Scope Request) -    * [Content negotiation] (# content negotiation) -    * [virtual host] (# virtual host) -    * [communication data forwarding] (# communication data forwarding) -* [six, https] (#six https) -    * [encryption] (#encryption) -    * [authentication] (#certification) -    * [Integrity] (#Integrity) -* [VII. Comparison of versions] (Comparison of #7 versions) -    * [The difference between HTTP/1.0 and HTTP/1.1] (The difference between #http10- and -http11-) -    * The difference between HTTP/1.1 and HTTP/2.0 (the difference between #http11- and -http20-) -* [Reference materials] (#reference materials) +* [I. Algorithm Analysis] (Analyzing Algorithm #) + * [Function conversion] (# function conversion) + * [Mathematical Model] (#Mathematical Model) + * [ThreeSum](#threesum) + * [Magnification Experiment] (#Magnification Experiment) + * [Note] (#Notes) +* [2. Stack and Queue] (#2 stack and queue) + * [Stack] (#stack) + * [Queue] (#Queue) +* [three, union-find] (# three union-find) + * [quick-find] (#quick-find) + * [quick-union](#quick-union) + * [weighted quick-union] (#weighted-quick-union) + * [Weight quick-union for path compression] (weight-quick-union for # path compression) + * [comparison of various union-find algorithms] (comparison of various -union-find-algorithms) +* [4, Sort] (#4 Sort) + * [Select sort] (#Select sort) + * [insert sort] (#insert sort) + * [Hill Sort] (#Hill Sort) + * [Merge sort] (# merge sort) + * [Quick Sort] (#Quick Sort) + * [priority queue] (# priority queue) + * [Application] (#Application) +* [5. Find] (#5 search) + * [symbol table] (# symbol table) + * [Binary Search Tree] (# Binary Search Tree) + * [2-3 Search Tree] (#2-3 - Search Tree) + * [Red and Black Binary Search Tree] (# Red and Black Binary Search Tree) + * [Hash table] (# hash table) + * [Application] (#Application) -# I. Basic concepts +# I. Algorithm Analysis -## Web Fundamentals +## function conversion -- HTTP (HyperText Transfer Protocol). -- Three technologies of the WWW (World Wide Web): HTML, HTTP, URL. -- RFC (Request for Comments, Request for Comments), Internet Design Document. +The exponential function can be converted to a linear function so that it is more intuitive to display on the function image. E.g -## URL +

-- Uniform Resource Indentifier (URI) -- URL (Uniform Resource Locator) -- URN (Uniform Resource Name), for example urn:isbn:0-486-27557-4. +You can take the logarithms at both ends to get: -URIs contain URLs and URNs. Currently, WEBs only have URLs that are popular, so basically all URLs are seen. +

-

+

-## Request and Response Messages +## mathematical model -### 1. Request message +### 1. Approximate -

+Use \~f(N) to denote all functions that tend to approach 1 as the result of increasing N by f(N), for example N3/6-N2< /sup>/2+N/3 \~ N3/6. -### 2. Response message +

-

+### 2. Magnitude of growth -# Second, the HTTP method +The order of magnitude of growth isolates the algorithm from its implementation. An algorithm has a growth order of N3 and it is implemented in Java and is independent of whether it runs on a specific computer. -The **Request message sent by the client** The first line of the request contains the method field. +

-## GET: Get resources +### 3. The inner loop -## POST: Transport Entity Body +The most frequently executed instructions determine the total time the program is executed. These instructions are called the inner loop of the program. -The main purpose of POST is not to obtain resources but to transfer data stored in content entities. +### 4. Cost Model -Both GET and POST requests can use extra parameters, but GET's parameters appear in the URL as query strings, and POST's parameters are stored in content entities. +Using a cost model to evaluate an algorithm, such as the number of visits to an array is a cost model. -``` -GET /test/demo_form.asp?name1=value1&name2=value2 HTTP/1.1 +## ThreeSum + +ThreeSum is used to count the sum of the triples in an array. + +```java +Public class ThreeSum { + Public static int count(int[] a) { + Int N = a.length; + Int cnt = 0; + For (int i = 0; i < N; i++) { + For (int j = i + 1; j < N; j++) { + For (int k = j + 1; k < N; k++) { + If (a[i] + a[j] + a[k] == 0) { + Cnt++; + } + } + } + } + Return cnt; + } +} ``` -``` -POST /test/demo_form.asp HTTP/1.1 -Host: w3schools.com -Name1=value1&name2=value2 +The inner loop of the algorithm is an `if (a[i] + a[j] + a[k] == 0)` statement. The total number of executions is N(N-1)(N-2) = N3/6 - N2/2 + N/3, so its approximate number of executions is \~N3/6, and the growth order is N 3. + + **Improvement**
+ +By sorting the array first, the two elements are summed and the binary search method is used to find if there is an inverse of the sum. If so, it means that the sum of the triples is 0. + +This method can reduce the ThreeSum algorithm's growth order to N2logN. + +```java +Public class ThreeSumFast { + Public static int count(int[] a) { + Arrays.sort(a); + Int N = a.length; + Int cnt = 0; + For (int i = 0; i < N; i++) { + For (int j = i + 1; j < N; j++) { + // The rank() method returns the index of the element in the array. If the element does not exist, this returns -1. + // It should be noted that the index here must be greater than j, otherwise the statistics will be repeated. + If (BinarySearch.rank(-a[i] - a[j], a) > j) { + Cnt++; + } + } + } + Return cnt; + } +} ``` -GET's pass arguments are less secure than POST because GET passed parameters are visible in the URL and may reveal private information. And GET only supports ASCII characters. If the parameter is Chinese, it may be garbled, and POST supports the standard character set. +## magnification experiment -## HEAD: Get Message Header +If T(N) \~ aNblogN, then T(2N)/T(N) \~ 2b. -Same as the GET method but does not return the body of the message entity. +For example, for the ThreeSum algorithm of the violent method, the approximate time is \~N3/6. Perform the following experiment: Run the algorithm several times, each time you take the N value twice the previous one, count the time of each execution, and count the ratio between the current run time and the previous run time, and get the following results: -It is mainly used to confirm the validity of the URL and the date and time of the resource update. +

-## PUT: upload file +We can see that T(2N)/T(N)\~23, so we can determine T(N) \~ aN2logN. -Since no authentication mechanism is available, anyone can upload a file, so there is a security problem and this method is generally not used. +## Precautions -```html -PUT /new.html HTTP/1.1 -Host: example.com -Content-type: text/html -Content-length: 16 +### 1. Large constant -

New File

+In the approximation, if the constant coefficient of the low-order term is large, the approximate result is erroneous. + +### 2. Cache + +Computer systems use caching techniques to organize memory. Accessing an array of adjacent elements is much faster than accessing non-adjacent elements. + +### 3. Guarantee for worst-case performance + +Software in nuclear reactors, pacemakers, or brake controllers, the worst-case performance is very important. + +### 4. Randomization algorithm + +By disorganizing the input, the algorithm's dependence on the input is removed. + +### 5. Evening analysis + +Divide the total cost of all operations by the total number of operations to spread costs equally. For example, an N consecutive push() call to an empty stack needs to access the elements of the array as N+4+8+16+...+2N=5N-4 (N is to write the elements to the array, the rest are When the array size is adjusted, the number of accesses to the array required for each operation is constant. + +#2. Stacks and Queues + +## stack + +First-in-last-out(FILO) + +

+ + **1. Array implementation**
+ +```java +Public class ResizeArrayStack implements Iterable { + Private Item[] a = (Item[]) new Object[1]; + Private int N = 0; + + Public void push(Item item) { + If (N >= a.length) { + Resize(2 * a.length); + } + a[N++] = item; + } + + Public Item pop() { + Item item = a[--N]; + If (N <= a.length / 4) { + Resize(a.length / 2); + } + Return item; + } + + // Resize the array to make the stack scalable + Private void resize(int size) { + Item[] tmp = (Item[]) new Object[size]; + For (int i = 0; i < N; i++) { + Tmp[i] = a[i]; + } + a = tmp; + } + + Public boolean isEmpty() { + Return N == 0; + } + + Public int size() { + Return N; + } + + @Override + Public Iterator iterator() { + // An iterator that needs to return a reversed traversal + Return new ReverseArrayIterator(); + } + + Private class ReverseArrayIterator implements Iterator { + Private int i = N; + + @Override + Public boolean hasNext() { + Return i > 0; + } + + @Override + Public Item next() { + Return a[--i]; + } + } +} ``` -## PATCH: Partially Modifying Resources +The above implementation uses generics. Java cannot directly create generic arrays. It can only be created using transformations. -PUT can also be used to modify resources, but can only completely replace the original resource, PATCH allows partial modification. - -```html -PATCH /file.txt HTTP/1.1 -Host: www.example.com -Content-Type: application/example -If-Match: "e0023aa4e" -Content-Length: 100 - -[description of changes] +```java +Item[] arr = (Item[]) new Object[N]; ``` -## DELETE: Delete Files + **2. List Implementation**
-Contrary to the PUT function, it also does not have an authentication mechanism. +Need to use the head of the linked list to achieve, because the last element of the head inserted into the stack at the beginning of the list, its next pointer points to the previous push into the stack of elements, in the pop-up element so that you can press the previous one The elements of the stack are called stack top elements. -```html -DELETE /file.html HTTP/1.1 +```java +Public class Stack { + + Private Node top = null; + Private int N = 0; + + Private class Node { + Item item; + Node next; + } + + Public boolean isEmpty() { + Return N == 0; + } + + Public int size() { + Return N; + } + + Public void push(Item item) { + Node newTop = new Node(); + newTop.item = item; + newTop.next = top; + Top = newTop; + N++; + } + + Public Item pop() { + Item item = top.item; + Top = top.next; + N--; + Return item; + } +} +``` +## Queue + +First-in-first-out(FIFO) + +

+ +The following is a list of the queue implementation, need to maintain the first and last node pointers, respectively, pointing to the head and tail. + +Here we need to consider which pointer to the list node and which pointer to the tail node of the list. Because the dequeue operation needs to make the next element of the first element of the team to be the head of the team, it needs to easily obtain the next element, and the next pointer of the head node of the linked list points to the next element, so let the head pointer be the first pointer to the beginning of the linked list. + +```java +Public class Queue { + Private Node first; + Private Node last; + Int N = 0; + Private class Node{ + Item item; + Node next; + } + + Public boolean isEmpty(){ + Return N == 0; + } + + Public int size(){ + Return N; + } + + // Into the queue + Public void enqueue(Item item){ + Node newNode = new Node(); + newNode.item = item; + newNode.next = null; + If(isEmpty()){ + Last = newNode; + First = newNode; + } else{ + Last.next = newNode; + Last = newNode; + } + N++; + } + + // Out of the queue + Public Item dequeue(){ + Node node = first; + First = first.next; + N--; + Return node.item; + } +} ``` -## OPTIONS: Query Support Methods +# three, union-find -Query the method that the specified URL can support. + **Overview**
-Will return something like Allow: GET, POST, HEAD, OPTIONS. +To solve dynamic connectivity problems, two points can be dynamically connected and whether two points are connected. -## CONNECT: Requires Tunneling Protocol Connection Agent +

-The requirement is to set up a tunnel when the proxy server communicates, and use SSL (Secure Sockets Layer, Secure Sockets) and TLS (Transport Layer Security) protocols to encrypt the communication content and tunnel it over the network. + **API**
-```html -CONNECT www.example.com:443 HTTP/1.1 +

+ + **Basic Data Structure**
+ +```java +Public class UF { + // Use id array to save point connectivity information + Private int[] id; + + Public UF(int N) { + Id = new int[N]; + For (int i = 0; i < N; i++) { + Id[i] = i; + } + } + + Public boolean connected(int p, int q) { + Return find(p) == find(q); + } +} ``` -

+## quick-find -## TRACE: Tracking Path +Ensure that the id values ​​of all contacts in the same connected component are equal. -The server will return the communication path to the client. +This method can quickly obtain the id value of a contact, and determine whether the two contacts are connected, but the operating cost of union is very high. It is necessary to modify the value of all nodes in one connected component to the value of another node. Id value. -When sending a request, enter the value in the Max-Forwards header field, subtract 1 from each server, and stop transmission when the value is 0. +```java + Public int find(int p) { + Return id[p]; + } + Public void union(int p, int q) { + Int pID = find(p); + Int qID = find(q); -TRACE is not usually used, and it is vulnerable to XST attacks (Cross-Site Tracing), so it will not be used. + If (pID == qID) return; + For (int i = 0; i < id.length; i++) { + If (id[i] == pID) id[i] = qID; + } + } +``` -

+## quick-union -#3. HTTP status code +In union, only the id value of the contact point to another contact id value, do not use id directly to store the connected component. This forms an inverted tree structure with the root node pointing to itself. When searching for a connected component to which a node belongs, it is always up until the root node and the id value of the root node is used as the id value of the connected component. -The first line of status in the **response message** returned by the server contains the status code and the reason phrase used to inform the client of the requested result. +

-| Status Code | Categories | Reason Phrases | -| --- | --- | --- | -| 1XX | Informational (Informational Status Code) | Receiving requests are being processed | -| 2XX | Success (Success Status Code) | Request Normal Processing Complete | -| 3XX | Redirection | Additional actions required to complete the request | -| 4XX | Client Error (Client Error Status Code) | Server Cannot Process Request | -| 5XX | Server Error (Server Error Status Code) | Server Processing Request Error | +```java + Public int find(int p) { + While (p != id[p]) p = id[p]; + Return p; + } -## 2XX Success + Public void union(int p, int q) { + Int pRoot = find(p); + Int qRoot = find(q); + If (pRoot == qRoot) return; + Id[pRoot] = qRoot; + } +``` -- **200 OK** +This method can perform union operations quickly, but the find operation is proportional to the height of the tree. In the worst case, the height of the tree is the number of contacts. -- **204 No Content** : The request has been successfully processed, but the returned response does not contain the main part of the entity. It is generally used when sending information from the client to the server only, without returning data. +

-- **206 Partial Content** : Indicates that the client has made a scope request. The response message contains the content of the entity specified by Content-Range. +## Weighted quick-union -## 3XX Redirects +In order to solve the problem that the quick-union tree is usually very high, the weighted quick-union will make the smaller tree join the larger tree when the union operation. -- **301 Moved Permanently** : Permanent Redirect +Theoretical studies have shown that the tree depth constructed by the weighted quick-union algorithm does not exceed logN at most. -- **302 Found** : Temporary Redirection +

-- **303 See Other** : Same as the 302, but 303 explicitly requires that the client should use the GET method to get the resource. +```java +Public class WeightedQuickUnionUF { + Private int[] id; + // Save node number information + Private int[] sz; -- Note: Although the HTTP protocol specifies that the POST method should not be changed to the GET method when redirected in the 301 or 302 state, most browsers redirect the POST method to the GET method in the 301, 302, and 303 state redirection. + Public WeightedQuickUnionUF(int N) { + Id = new int[N]; + Sz = new int[N]; + For (int i = 0; i < N; i++) { + Id[i] = i; + Sz[i] = 1; + } + } -- ** 304 Not Modified** : If the request packet header contains some conditions, such as: If-Match, If-ModifiedSince, If-None-Match, If-Range, If-Unmodified-Since, but does not meet the conditions, then The server returns a 304 status code. + Public boolean connected(int p, int q) { + Return find(p) == find(q); + } -- **307 Temporary Redirect** : Temporary redirect, similar to the meaning of 302, but 307 requires the browser not to change the redirect request's POST method to the GET method. + Public int find(int p) { + While (p != id[p]) p = id[p]; + Return p; + } -## 4XX Client Error + Public void union(int p, int q) { + Int i = find(p); + Int j = find(q); + If (i == j) return; + If (sz[i] < sz[j]) { + Id[i] = j; + Sz[j] += sz[i]; + } Else { + Id[j] = i; + Sz[i] += sz[j]; + } + } +} +``` -- **400 Bad Request** : There is a syntax error in the request message. +## Weighted quick-union of path compression -- **401 Unauthorized** : This status code indicates that the sent request requires authentication information (BASIC authentication, DIGEST authentication). If a request has been made before, the user authentication has failed. +While checking the nodes and linking them directly to the root node, you only need to add a loop to find. -

+## Comparison of various union-find algorithms -- **403 Forbidden**: The request was rejected. The server did not need to give the detailed reason for the rejection. +

-- **404 Not Found** +#4. Sorting -## 5XX server error + **Convention**
-- **500 Internal Server Error** : An error occurred while the server was executing the request. +The elements to be sorted need to implement Java's Comparable interface, which has a compareTo() method. -- **503 Service Unavilable** : The server is temporarily overloaded or maintenance is taking place. It is now unable to process the request. +When studying the cost model of the sorting algorithm, the number of comparisons and exchanges is calculated. -#4. HTTP Header +Use the helper functions less() and exch() for comparisons and exchanges to make the code more readable and portable. -There are 4 types of header fields: the generic header field, the request header field, the response header field, and the entity header field. +```java +Private boolean less(Comparable v, Comparable w){ + Return v.compareTo(w) < 0; +} -The various header fields and their meanings are as follows (do not need to be recorded in all for review): +Private void exch(Comparable[] a, int i, int j){ + Comparable t = a[i]; + a[i] = a[j]; + a[j] = t; +} +``` -## Common Header Fields +## Select sort + +Find the smallest element in the array and swap it with the first element of the array. Find the smallest element from the remaining elements and swap it with the second element of the array. Keep doing this until you sort the entire array. + +

+ +```java +Public class Selection { + Public static void sort(Comparable[] a) { + Int N = a.length; + For (int i = 0; i < N; i++) { + Int min = i; + For (int j = i + 1; j < N; j++) { + If (less(a[j], a[min])) min = j; + } + Exch(a, i, min); + } + } +} +``` + +Select sorting requires \~N2/2 comparisons and \~N swaps. Its running time is independent of input. This feature makes it require so many comparisons to an already sorted array. Exchange operation. + +## Insert sort + +The sorting is done from left to right, each time the current element is inserted into the left already sorted array, so that the left array is still ordered after insertion. + +

+ +```java +Public class Insertion { + Public static void sort(Comparable[] a) { + Int N = a.length; + For (int i = 1; i < N; i++) { + f +Or (int j = i; j > 0 && less(a[j], a[j - 1]); j--) { + Exch(a, j, j - 1); + } + } + } +} +``` + +The complexity of insert sorting depends on the initial order of the array. If the array is already partially ordered, the insert sort will be very fast. Insertion sorting on average requires \~N2/4 comparisons and \~N2/4 swaps. In the worst case, it requires \~N2. /2 comparisons and \~N2/2 swaps. The worst case is that the arrays are reversed; in the best case, N-1 comparisons and 0 swaps are required. The best situation is that the array is already in order. + +Insert sorting is particularly efficient for partially ordered arrays and small-scale arrays. + + **Select sort and insert sort comparison
+ +For an array of randomly ordered non-repeating primary keys, the runtime for insert sorting and select sorting is squared, and the ratio between the two is a smaller constant. + +## Hill Sort + +For large-scale arrays, insert sorting is slow because it can only swap adjacent elements. If you want to move elements from one end to the other, many operations are required. + +The Hill sorting is intended to improve this limitation of insert sorting, which makes the elements move to the correct position faster by swapping non-adjacent elements. + +The Hill sort uses an insertion order to sort the sequence of intervals h. If h is large, the elements can be moved very far. By reducing h continuously and finally making h=1, the entire array can be ordered. + +

+ +```java +Public class Shell { + Public static void sort(Comparable[] a) { + Int N = a.length; + Int h = 1; + While (h < N / 3) { + h = 3 * h + 1; // 1, 4, 13, 40, ... + } + While (h >= 1) { + For (int i = h; i < N; i++) { + For (int j = i; j >= h && less(a[j], a[j - h]); j -= h) { + Exch(a, j, j - h); + } + } + h = h / 3; + } + } +} +``` + +The running time of the Hill sort does not reach the square level, and the number of comparisons required by the Hill sequence using the ascending sequence 1, 4, 13, 40, ... will not exceed several times the N times the length of the ascending sequence. The advanced sorting algorithm described later will only be about three times faster than the Hill sorting. + +## merge sort + +The idea of ​​merging and sorting is to divide the array into two parts, sort them separately, and merge them together. + +

+ +### 1. How to merge + +The merge method merges the two sorted parts of the array into one. + +```java +Public class MergeSort { + Private static Comparable[] aux; + + Private static void merge(Comparable[] a, int lo, int mid, int hi) { + Int i = lo, j = mid + 1; + + For (int k = lo; k <= hi; k++) { + Aux[k] = a[k]; // copy data to auxiliary array + } + + For (int k = lo; k <= hi; k++) { + If (i > mid) a[k] = aux[j++]; + Else if (j > hi) a[k] = aux[i++]; + Else if (aux[i].compareTo(a[j]) < 0) a[k] = aux[i++]; // First perform this step to ensure stability + Else a[k] = aux[j++]; + } + } +} +``` + +### 2. Top-down merge sort + +```java +Public static void sort(Comparable[] a) { + Aux = new Comparable[a.length]; + Sort(a, 0, a.length - 1); +} + +Private static void sort(Comparable[] a, int lo, int hi) { + If (hi <= lo) return; + Int mid = lo + (hi - lo) / 2; + Sort(a, lo, mid); + Sort(a, mid + 1, hi); + Merge(a, lo, mid, hi); +} +``` + +

+ +

+ +Because each time the problem is divided into two sub-problems, and the half-division algorithm is generally O(NlogN), the time complexity of the merge sort method is also O ( NlogN). + +Because the recursive operation of small arrays is too frequent, using insert sorting to handle small arrays will result in higher performance. + +### 3. Bottom up merge sort + +We first merge those mini-arrays and then merge the resulting sub-arrays. + +

+ +```java +Public static void busort(Comparable[] a) { + Int N = a.length; + Aux = new Comparable[N]; + For (int sz = 1; sz < N; sz += sz) { + For (int lo = 0; lo < N - sz; lo += sz + sz) { + Merge(a, lo, lo + sz - 1, Math.min(lo + sz + sz - 1, N - 1)); + } + } +} +``` + +## quick sort + +### 1. Basic algorithm + +Merge sorting divides the array into two sub-arrays and sorts them, and sorts the ordered sub-arrays to make the entire array sort. Quick sorting divides the array into two sub-arrays by a splitting element, and the left sub-array is less than or equal to the splitting element. The subarray is greater than or equal to the split element. Sorting the two subarrays also sorts the entire array. + +

+ +```java +Public class QuickSort { + Public static void sort(Comparable[] a) { + Shuffle(a); + Sort(a, 0, a.length - 1); + } + + Private static void sort(Comparable[] a, int lo, int hi) { + If (hi <= lo) return; + Int j = partition(a, lo, hi); + Sort(a, lo, j - 1); + Sort(a, j + 1, hi); + } +} +``` + +### 2. Slicing + +Take a[lo] as the slice element, then scan from the left end of the array to the right until the first element is found to be greater than or equal to it, then scan from the right end of the array to the left to find the first element that is less than or equal to it, swap the two Elements, and continue to continue this process, you can ensure that the left hand of the left side of the element is not greater than the splitting element, the right side of the right hand pointer j is not smaller than the splitting element. When the two pointers meet, the split element a[lo] is swapped with the rightmost element a[j] of the left subarray and returns j. + +

+ +```java +Private static int partition(Comparable[] a, int lo, int hi) { + Int i = lo, j = hi + 1; + Comparable v = a[lo]; + While (true) { + While (less(a[++i], v)) if (i == hi) break; + While (less(v, a[--j])) if (j == lo) break; + If (i >= j) break; + Exch(a, i, j); + } + Exch(a, lo, j); + Return j; +} +``` + +### 3. Performance Analysis + +Quick sorts are in-place sorts and do not require an auxiliary array, but recursive calls require a secondary stack. + +The best case for quick sorting is to just split the array half each time, so that the number of recursive calls is the least. The number of comparisons in this case is CN=2CN/2+N, ie, the complexity is O(NlogN). + +In the worst case, the first time it is divided from the smallest element, and the second time is divided from the second smallest element, and so on. So in the worst case you need to compare N2/2. In order to prevent the array from being initially ordered, random arrays need to be randomly shuffled during quick sorting. + +### 4. Algorithm Improvements + +#### 4.1 Switch to Insert Sort + +Because quick sorting also calls itself in small arrays, insert sorts perform better than quick sorts for small arrays, so you can switch to insert sorting in small arrays. + +#### Three samples + +The best case is to use the median of the array as the segmentation element each time, but the cost of calculating the median is high. People find it best to take 3 elements and use the centered element as a split element. + +#### 4.3 Three-way segmentation + +For an array with a large number of repeating elements, the array can be divided into three parts, corresponding to less than, equal to, and greater than the segmentation element. + +Three-way splitting quick sorting A random array with only a few different primary keys can be ordered in linear time. + +

+ +```java +Public class Quick3Way { + Public static void sort(Comparable[] a, int lo, int hi) { + If (hi <= lo) return; + Int lt = lo, i = lo + 1, gt = hi; + Comparable v = a[lo]; + While (i <= gt) { + Int cmp = a[i].compareTo(v); + If (cmp < 0) exch(a, lt++, i++); + Else if (cmp > 0) exch(a, i, gt--); + Else i++; + } + Sort(a, lo, lt - 1); + Sort(a, gt + 1, hi); + } +} +``` + +## priority queue + +The priority queue is mainly used to handle the largest element. + +### 1. Stack + +Definition: Each node of a binary tree is greater than or equal to its two child nodes. + +The heap can be represented by an array because the heap is a complete binary tree, and a complete binary tree is easily stored in an array. The position of the parent node of position k is k/2, and the positions of its two child nodes are 2k and 2k+1, respectively. Here we do not use the position where the array index is 0, in order to understand the relationship of the nodes more clearly. + +

+ +```java +Public class MaxPQ { + Private Key[] pq; + Private int N = 0; + + Public MaxPQ(int maxN) { + Pq = (Key[]) new Comparable[maxN + 1]; + } + + Public boolean isEmpty() { + Return N == 0; + } + + Public int size() { + Return N; + } + + Private boolean less(int i, int j) { + Return pq[i].compareTo(pq[j]) < 0; + } + + Private void exch(int i, int j) { + Key t = pq[i]; + Pq[i] = pq[j]; + Pq[j] = t; + } +} +``` + +### 2. Floating and sinking + +In a heap, when a node is larger than a parent node, the two nodes need to be exchanged. The exchange may also be larger than its new parent, so it requires constant comparisons and exchanges. Call this operation floating. + +```java +Private void swim(int k) { + While (k > 1 && less(k / 2, k)) { + Exch(k / 2, k); + k = k / 2; + } +} +``` + +Similarly, when a node is smaller than a child node, it also needs constant downward comparison and exchange operations, and this operation is called sinking. A node has two child nodes and should exchange with the largest of the two child nodes. + +```java +Private void sink(int k) { + While (2 * k <= N) { + Int j = 2 * k; + If (j < N && less(j, j + 1)) j++; + If (!less(k, j)) break; + Exch(k, j); + k = j; + } +} +``` + +### 3. Insert elements + +Place the new element at the end of the array and then float it to the right position. + +```java +Public void insert(Key v) { + Pq[++N] = v; + Swim(N); +} +``` + +### 4. Remove the largest element + +Remove the largest element from the top of the array, and put the last element of the array to the top, and let this element sink to the right place. + +```java +Public Key delMax() { + Key max = pq[1]; + Exch(1, N--); + Pq[N + 1] = null; + Sink(1); + Return max; +} +``` + +### 5. Heap sort + +Since the heap can easily get the largest element and delete it, doing this operation continuously can get a descending sequence. If you swap the position of the largest element with the last element of the array in the current heap, and do not delete it, you can get a descending sequence from tail to head, which is an ascending sequence in the forward direction. So it's easy to use the heap for sorting, and the heap sort is sort-in-place without taking up extra space. + +Heap sorting is divided into two phases. The first phase is to create a heap of unordered arrays; the second phase is to swap the largest element and the last element of the array of the current heap, and to perform sinking operations to maintain the ordered state of the heap. + +Unordered arrays The most straightforward way to build a heap is to iterate through the array from left to right and then go up. A more efficient method is to perform the sinking operation from right to left. If two nodes of a node are already in an orderly heap, the sinking operation can make this node an orderly heap of the root node. The leaf node does not need to sink, so the elements of the leaf node can be ignored, so only half of the elements need to be traversed. + +

+ +```java +Public static void sort(Comparable[] a){ + Int N = a.length; + For(int k = N/2; k >= 1; k--){ + Sink(a, k, N); + } + While(N > 1){ + Exch(a, 1, N--); + Sink(a, 1, N); + } +} +``` + +### 6. Analysis + +The height of a heap is logN, so the complexity of inserting elements in the heap and removing the largest element is logN. + +For heap sorting, the complexity is NlogN due to the sinking of N nodes. + +An in-place sort of heap sorting does not utilize additional space. + +Modern operating systems rarely use heap sorting because it cannot use caches, which means that array elements are rarely compared to adjacent elements. + +## Application + +### 1. Comparison of sorting algorithms + +

+ +The quickest sorting algorithm is the fastest universal sorting algorithm. It has very few internal loops, and it can use caches because it always accesses data sequentially. Its run time is in the order of \~cNlogN, where c is smaller than other linear logarithm-level sorting algorithms. After using three-way slicing, the input of some distributions that may occur in real-world applications can reach a linear level, while other sorting algorithms still require a linear logarithmic time. + +### 2. Java Sorting Algorithm Implementation + +The main sorting method in the Java System Library is java.util.Arrays.sort() , using three-way sharding for the original data type and merge sorting for the reference type. + +### 3. Segmentation-based quick selection algorithm + +The fast-ordering partition() method returns an integer j such that a[lo..j-1] is less than or equal to a[j], and a[j+1..hi] is greater than or equal to a[j]. [j] is the jth element of the array. + +You can use this feature to find the kth element of an array. + +```java +Public static Comparable select(Comparable[] a, int k) { + Int lo = 0, hi = a.length - 1; + While (hi > lo) { + Int j = partion(a, lo, hi); + If (j == k) return a[k]; + Else if (j > k) hi = j - 1; + Else lo = j + 1; + } + Return a[k]; +} +``` + +The algorithm is linear, because the array is bisected exactly once, then the total number of comparisons is (N+N/2+N/4+..) until the kth element is found, and the sum is obviously less than 2N. + +# V. Finding + +This chapter uses three classical data structures to implement efficient symbol tables: binary search trees, red-black trees, and hash tables. + +## Symbol table + + **1. Disordered symbol table **
+ +

+ + **2. Ordered symbol table **
+ +

+ +The keys of an ordered symbol table need to implement the Comparable interface. + +Finding cost model: The number of times the key was compared, using the number of visits to the array when no comparison was made. + + **3. Binary Finding Implements Ordered Symbol Table **
+ +Use a pair of parallel arrays, one store key and one store value. + +You need to create an array of Comparable objects of type Key and an array of Object objects of type Value. + +The rank() method is crucial. When a key is in a table, it knows where the key is; when the key is not in the table, it can also know where to insert the new key. + +Complexity: The binary search requires at most logN + 1 comparisons, and the time required for the search operation using the binary search to achieve the symbol table is at most logarithmic. But the insert operation needs to move the array elements, which is a linear level. + +```java +Public class BinarySearchST, Value> { + Private Key[] keys; + Private +Value[] values; + Private int N; + + Public BinarySearchST(int capacity) { + Keys = (Key[]) new Comparable[capacity]; + Values ​​= (Value[]) new Object[capacity]; + } + + Public int size() { + Return N; + } + + Public Value get(Key key) { + Int i = rank(key); + If (i < N && keys[i].compareTo(key) == 0) { + Return values[i]; + } + Return null; + } + + Public int rank(Key key) { + Int lo = 0, hi = N - 1; + While (lo <= hi) { + Int mid = lo + (hi - lo) / 2; + Int cmp = key.compareTo(keys[mid]); + If (cmp == 0) return mid; + Else if (cmp < 0) hi = mid - 1; + Else lo = mid + 1; + } + Return lo; + } + + Public void put(Key key, Value value) { + Int i = rank(key); + If (i < N && keys[i].compareTo(key) == 0) { + Values[i] = value; + Return; + } + For (int j = N; j > i; j--) { + Keys[j] = keys[j - 1]; + Values[j] = values[j - 1]; + } + Keys[i] = key; + Values[i] = value; + N++; + } + + Public Key ceiling(Key key){ + Int i = rank(key); + Return keys[i]; + } +} +``` + +## Binary Search Tree + +**Binary tree** is defined as an empty link, or a node with two left and right links, each pointing to a sub-binary tree. + +**The binary search tree ** (BST) is a binary tree, and each node's key is greater than the key of any node in its left subtree but less than that of any node in the right subtree. + +

+ +The lookup operation of a binary search tree will reduce the interval by half each iteration, similar to a binary search. + +```java +Public class BST, Value> { + Private Node root; + + Private class Node { + Private Key key; + Private Value val; + Private Node left, right; + // The total number of nodes in the subtree rooted at this node + Private int N; + + Public Node(Key key, Value val, int N) { + This.key = key; + This.val = val; + this.N = N; + } + } + + Public int size() { + Return size(root); + } + + Private int size(Node x) { + If (x == null) return 0; + Return x.N; + } +} +``` + +### 1. get() + +If the tree is empty, then the search misses; if the key being searched for is equal to the root node's key, a hit is found, otherwise it is recursively searched in the subtree: if the searched key is smaller, it is looked up in the left subtree. The larger is found in the right subtree. + +```java +Public Value get(Key key) { + Return get(root, key); +} +Private Value get(Node x, Key key) { + If (x == null) return null; + Int cmp = key.compareTo(x.key); + If (cmp == 0) return x.val; + Else if (cmp < 0) return get(x.left, key); + Else return get(x.right, key); +} +``` + +### 2. put() + +When the inserted key does not exist in the tree, a new node needs to be created, and the link of the upper node is updated so that the node is correctly linked to the tree. + +```java +Public void put(Key key, Value val) { + Root = put(root, key, val); +} +Private Node put(Node x, Key key, Value val) { + If (x == null) return new Node(key, val, 1); + Int cmp = key.compareTo(x.key); + If (cmp == 0) x.val = val; + Else if (cmp < 0) x.left = put(x.left, key, val); + Else x.right = put(x.right, key, val); + x.N = size(x.left) + size(x.right) + 1; + Return x; +} +``` + +### 3. Analysis + +The algorithm runtime of the binary search tree depends on the shape of the tree, which in turn depends on the order in which the keys were inserted. In the best case, the tree is completely balanced, and each empty link and the root node are logN. In the worst case, the height of the tree is N. + +

+ +Complexity: Both find and insert operations are logarithmic. + +### 4. floor() + +If the key is less than the key of the root node, then the largest key node less than or equal to the key must be in the left subtree; if the key is greater than the key of the root node, only if the node in the right subtree of the root node is less than or equal to the key, less than or equal to the key The largest key node is in the right subtree, otherwise the root node is the largest key node less than or equal to key. + +```java +Public Key floor(Key key) { + Node x = floor(root, key); + If (x == null) return null; + Return x.key; +} +Private Node floor(Node x, Key key) { + If (x == null) return null; + Int cmp = key.compareTo(x.key); + If (cmp == 0) return x; + If (cmp < 0) return floor(x.left, key); + Node t = floor(x.right, key); + If (t != null) { + Return t; + } Else { + Return x; + } +} +``` + +### 5. rank() + +```java +Public int rank(Key key) { + Return rank(key, root); +} +Private int rank(Key key, Node x) { + If (x == null) return 0; + Int cmp = key.compareTo(x.key); + If (cmp == 0) return size(x.left); + Else if (cmp < 0) return rank(key, x.left); + Else return 1 + size(x.left) + rank(key, x.right); +} +``` + +### 6. min() + +```java +Private Node min(Node x) { + If (x.left == null) return x; + Return min(x.left); +} +``` + +### 7. deleteMin() + +Let the link to the smallest node point to the right subtree of the smallest node. + +

+ +```java +Public void deleteMin() { + Root = deleteMin(root); +} +Public Node deleteMin(Node x) { + If (x.left == null) return x.right; + X.left = deleteMin(x.left); + x.N = size(x.left) + size(x.right) + 1; + Return x; +} +``` + +### 8. delete() + +If the node to be deleted has only one subtree, then only the link that points to the node to be deleted needs to point to the only subtree; otherwise, let the smallest node of the right subtree replace the node. + +

+ +```java +Public void delete(Key key) { + Root = delete(root, key); +} +Private Node delete(Node x, Key key) { + If (x == null) return null; + Int cmp = key.compareTo(x.key); + If (cmp < 0) x.left = delete(x.left, key); + Else if (cmp > 0) x.right = delete(x.right, key); + Else { + If (x.right == null) return x.left; + If (x.left == null) return x.right; + Node t = x; + x = min(t.right); + X.right = deleteMin(t.right); + X.left = t.left; + } + x.N = size(x.left) + size(x.right) + 1; + Return x; +} +``` + +### 9. keys() + +The result of ordered traversal in the binary search tree is the characteristic of the ordered sequence. + +```java +Public Iterable keys(Key lo, Key hi) { + Queue queue = new LinkedList<>(); + Keys(root, queue, lo, hi); + Return queue; +} +Private void keys(Node x, Queue queue, Key lo, Key hi) { + If (x == null) return; + Int cmpLo = lo.compareTo(x.key); + Int cmpHi = hi.compareTo(x.key); + If (cmpLo < 0) keys(x.left, queue, lo, hi); + If (cmpLo <= 0 && cmpHi >= 0) queue.add(x.key); + If (cmpHi > 0) keys(x.right, queue, lo, hi); +} +``` + +### 10. Performance Analysis + +Complexity: Binary Search Tree The time required for all operations in the worst case is proportional to the height of the tree. + +## 2-3 Find Tree + +

+ +A perfectly balanced 2-3 lookup tree should have the same distance from all empty links to the root node. + +### 1. Insert operation + +When a temporary 4-node is generated after insertion, the 4-node needs to be split into 3 2-nodes, and the 2-nodes in the middle are moved to the upper-level nodes. If the move-up operation continues to generate a temporary 4-node, the split-up move is continued until there is no temporary 4-node. + +

+ +### 2. Nature + +The transformations of the 2-3 lookup tree insertion operations are local, and there is no need to modify or check other parts of the tree except the related nodes and links. These local transformations do not affect the global ordering and balance of the tree. + +2-3 Finding Tree's Finding and Inserting Operations Complexity and Inserting Order Are Independent** In the worst case, the search and insert operations must have no more than logN nodes and a 2-3 lookup with 1 billion nodes. The tree can only perform up to 30 nodes to perform arbitrary search and insert operations. + +## Red and Black Binary Search Tree + +The 2-3 search tree requires 2- and 3-nodes, and the red-black tree uses red links to implement 3-nodes. If the color of the link to a node is red, then this node and the upper node represent a 3-node, while black is the normal link. + +

+ +The red-black tree has the following properties: + +1. Red links are left links; +2. The perfect black balance, that is, the same number of black links on any path that is free to link to the root node. + +Red and black trees can draw red links. + +

+ +```java +Public class RedBlackBST, Value> { + Private Node root; + Private static final boolean RED = true; + Private static final boolean BLACK = false; + + Private class Node { + Key key; + Value val; + Node left, right; + Int N; + Boolean color; + + Node(Key key, Value val, int n, boolean color) { + This.key = key; + This.val = val; + N = n; + This.color = color; + } + } + + Private boolean isRed(Node x) { + If (x == null) return false; + Return x.color == RED; + } +} +``` + +### 1. Rotate left + +Because the legal red links are left links, if the right links are red links, then the left rotation operation is needed. + +

+ +

+ +```java +Public Node rotateLeft(Node h) { + Node x = h.right; + H.right = x.left; + X.left = h; + X.color = h.color; + H.color = RED; + x.N = h.N; + h.N = 1 + size(h.left) + size(h.right); + Return x; +} +``` + +### 2. Rotate Right + +The right rotation is to convert two consecutive left red links, which will be discussed later in the insert process. + +

+ +

+ +```java +Public Node rotateRight(Node h) { + Node x = h.left; + H.left = x.right; + X.color = h.color; + H.color = RED; + x.N = h.N; + h.N = 1 + size(h.left) + size(h.right); + Return x; +} +``` + +### 3. Color Conversion + +A 4-node node appears as a node in the red-black tree. The left and right children are all red. The split 4-node needs to change the color of the parent node from black to red, and from the perspective of the 2-3 tree, it needs to move the middle node to the upper node. + +

+ +

+ +```java +Void flipColors(Node h){ + H.color = RED; + H.left.color = BLACK; + H.right.color = BLACK; +} +``` + +### 4. Insert + +First insert a node into the correct location as a binary search tree, and then perform the following color operations: + +- If the right child is red and the left child is black, rotate left; +- If the left child is red and its left child is also red, rotate right; +- If the left and right child nodes are all red, perform color conversion. + +

+ +```java +Public void put(Key key, Value val) { + Root = put(root, key, val); + Root.color = BLACK; +} + +Private Node put(Node x, Key key, Value val) { + If (x == null) return new Node(key, val, 1, RED); + Int cmp = key.compareTo(x.key); + If (cmp == 0) x.val = val; + Else if (cmp < 0) x.left = put(x.left, key, val); + Else x.right = put(x.right, key, val); + + If (isRed(x.right) && !isRed(x.left)) x = rotateLeft(x); + If (isRed(x.left) && isRed(x.left.left)) x = rotateRight(x); + If (isRed(x.left) && isRed(x.right)) flipColors(x); + + x.N = size(x.left) + size(x.right) + 1; + Return x; +} +``` + +You can see that the insert operation is similar to the insert operation of the binary search tree, except that the rotation and color conversion operations are added at the end. + +The root node must be black, because the root node does not have an upper node, and there is no left link of the upper node to the root node. flipColors() may change the color of the root node to red, and the height of the tree's black link increases whenever the root node changes from red to black. + +### 5. Delete the minimum key + +If the minimum key is in a 2-node, deleting the key will leave an empty link, breaking the balance, so make sure that the minimum key is not in the 2-node. There are two ways to convert 2-nodes into 3-nodes or 4-nodes. One is to get a key from the upper node, and the other is to get a key to the sibling node. If the upper node is a 2-node, there is no way to get the key from the upper node, so make sure that all nodes on the deleted path are not 2-nodes. In the process of deleting downwards, ensure that one of the following occurs: + +1. If the current node's left child is not a 2-node, complete; +2. If the current node's left child is a 2-node and its sibling is not a 2-node, take a key from the sibling node. +3. If the left child of the current node and its siblings are both 2-nodes, merge the left child, the smallest key in the parent, and the nearest sibling node into a 4-node. + +

+ +Finally get a 3-node or 4-node with the smallest key and remove it directly from it. Then reassemble all temporary 4-nodes from scratch. + +

+ +### 6. Analysis + +A red-black tree of size N will not exceed 2 logN in height. In the worst case scenario, the left-most path node in the corresponding 2-3 tree is all 3-nodes and the rest are 2-nodes. + +Most red-black trees require logarithmic steps for most operations. + +## hash table + +A hash table is similar to an array. You can think of the hash value of a hash table as the index value of an array. Accessing a hash table is as fast as accessing an array element. It can find and insert symbol tables within a constant time. + +Since the size relationship of the keys cannot be known by the hash value, the hash table cannot achieve an orderly operation. + +### 1. Hash function + +For a hash table of size M, the hash function can convert any key to a positive integer in [0, M-1], which is the hash value. + +There is a conflict in the hash table, that is, two different keys may have the same hash value. + +The hash function should satisfy the following three conditions: + +1. Consistency: equal keys should have equal hash values, and equality of two keys means that the value returned by calling equals() is equal. +2. Efficient: The calculation should be simple, if necessary, the hash value can be cached and returned directly when the hash function is called. +3. Uniformity: The hash value of all keys should be evenly distributed between [0, M-1]. This condition is critical and directly affects the performance of the hash table. + +The dive leftover method can hash integers to between [0, M-1], +For a positive integer k, calculating k%M can result in a hash value between [0, M-1]. Note M must be a prime number, otherwise all the information contained by the key cannot be utilized. For example, if M is 10k, only the last k bits of the key can be used. + +For other numbers, you can convert it to an integer and then use the remainder remainder method. For example, floating point numbers can be represented in binary form and then use the binary form of integer values ​​to perform the remainder remainder method. + +For a multi-part combination of keys, each part needs to calculate the hash value, and at the end of the merge it needs to make each part of the hash value equally important. You can think of this key as an R-ary integer. Each part of the key has a different weight. + +For example, a string hash function is implemented as follows + +```java +Int hash = 0; +For(int i = 0; i < s.length(); i++) + Hash = (R * hash + s.charAt(i)) % M; +``` + +For another example, the hash function of a custom class that has multiple members is as follows + +```java +Int hash = (((day * R + month) % M) * R + year) % M; +``` + +The value of R is not very important, usually taken as 31. + +The hashCode() in Java implements the hash function, but by default uses the memory address value of the object. When using the hashCode() function, it should be used in conjunction with the remainder remainder method. Because the memory address is a 32-bit integer, we only need a 31-bit non-negative integer, so the sign-off bit should be masked before using the remainder remainder method. + +```java +Int hash = (x.hashCode() & 0x7fffffff) % M; +``` + +When you use the built-in hash table such as Java's own HashMap, you only need to implement the hashCode() function of the Key type. Java stipulates that hashCode() can evenly distribute keys over all 32-bit integers. Java's hashCode() of objects such as String, Integer, etc. can do this. The following shows how custom types implement hashCode(). + +```java +Public class Transaction{ + Private final String who; + Private final Date when; + Private final double amount; + + Public int hashCode(){ + Int hash = 17; + Hash = 31 * hash + who.hashCode(); + Hash = 31 * hash + when.hashCode(); + Hash = 31 * hash + ((Double) amount).hashCode(); + Return hash; + } +} +``` + +### 2. Hash table based hash table + +The zipper method uses a linked list to store keys with the same hash value to resolve conflicts. At this point, the search needs to be divided into two steps. First, look up the linked list of the Key, and then search sequentially in the linked list. + +

+ +For N keys, M lists (N > M), if the hash function satisfies the condition of homogeneity, the size of each list tends to N/M, so the number of comparisons required for missed find and insert operations is \~N/M. + +### 3. A hash table based on linear detection + +Linear detection uses gaps to resolve conflicts. When a collision occurs, a gap is detected forward to store the conflicting key. Using thread detection, the size of the array M should be greater than the number of keys N (M>N). + +

+ +```java +Public class LinearProbingHashST { + Private int N; + Private int M = 16; + Private Key[] keys; + Private Value[] vals; + + Public LinearProbingHashST() { + Init(); + } + + Public LinearProbingHashST(int M) { + this.M = M; + Init(); + } + + Private void init() { + Keys = (Key[]) new Object[M]; + Vals = (Value[]) new Object[M]; + } + + Private int hash(Key key) { + Return (key.hashCode() & 0x7fffffff) % M; + } +} +``` + +#### 3.1 Find + +```java +Public Value get(Key key) { + For (int i = hash(key); keys[i] != null; i = (i + 1) % M) { + If (keys[i].equals(key)) { + Return vals[i]; + } + } + Return null; +} +``` + +#### 3.2 Insert + +```java +Public void put(Key key, Value val) { + Int i; + For (i = hash(key); keys[i] != null; i = (i + 1) % M) { + If (keys[i].equals(key)) { + Vals[i] = val; + Return; + } + } + Keys[i] = key; + Vals[i] = val; + N++; + Resize(); +} +``` + +#### 3.3 Delete + +The delete operation should reinsert all adjacent key values ​​to the hash table. + +```java +Public void delete(Key key) { + If (!contains(key)) return; + Int i = hash(key); + While (!key.equals(keys[i])) { + i = (i + 1) % M; + } + Keys[i] = null; + Vals[i] = null; + i = (i + 1) % M; + While (keys[i] != null) { + Key keyToRedo = keys[i]; + Value valToRedo = vals[i]; + Keys[i] = null; + Vals[i] = null; + N--; + Put(keyToRedo, valToRedo); + i = (i + 1) % M; + } + N--; + Resize(); +} +``` + +#### 3.4 Adjusting the Array Size + +The cost of linear detection depends on the length of consecutive entries. Consecutive entries are also called clusters. When clusters are long, many probes are also needed to find and insert. + +α = N/M, and α is called utilization. Theory has proved that when α is less than 1/2, the expected number of detections is only between 1.5 and 2.5. + +

+ +In order to guarantee the performance of the hash table, the size of the array should be adjusted so that α is between [1/4, 1/2]. + +```java +Private void resize() { + If (N >= M / 2) resize(2 * M); + Else if (N <= M / 8) resize (M / 2); +} + +Private void resize(int cap) { + LinearProbingHashST t = new LinearProbingHashST<>(cap); + For (int i = 0; i < M; i++) { + If (keys[i] != null) { + T.put(keys[i], vals[i]); + } + } + Keys = t.keys; + Vals = t.vals; + M = t.M; +} +``` + +Although every time the array is re-adjusted, each key-value pair needs to be re-inserted into the hash table. However, from the point of view of amortization analysis, the cost required is very small. As can be seen from the figure below, each time the array length doubles, the cumulative average increases by 1 because each key in the table needs to recalculate the hash value, but then the average value decreases. + +

+ +## Application + +### 1. Comparison of various symbol tables + +

+ +Priority should be given to hash tables, and red-black trees are used when ordered operations are required. + +### 2. Java Symbol Table Implementation + +Java's java.util.TreeMap and java.util.HashMap are symbol table implementations based on the red-black tree and zipper hash tables, respectively. + +### 3. Collection Type + +In addition to the symbol table, the collection type is often used. It only has no value for the key. You can use the collection type to store a series of keys and then determine if a key is in the collection. + +### 4. Sparse vector multiplication + +When the vector is a sparse vector, you can use the symbol table to store non-zero indices and values ​​in the vector, so that the multiplication operation only needs to be performed on those non-zero elements. + +```java +Import java.util.HashMap; + +Public class SparseVector { + Private HashMap hashMap; + + Public SparseVector(double[] vector) { + hashMap = new HashMap<>(); + For (int i = 0; i < vector.length; i++) { + If (vector[i] != 0) { + hashMap.put(i, vector[i]); + } + } + } + + Public double get(int i) { + Return hashMap.getOrDefault(i, 0.0); + } + + Public double dot(SparseVector other) { + Double sum = 0; + For (int i : hashMap.keySet()) { + Sum += this.get(i) * other.get(i); + } + Return sum; + } +} +``` -| First field name | diff --git a/notes-english/notes/HTTP.md b/notes-english/notes/HTTP.md index 7943e324..09eed38c 100644 --- a/notes-english/notes/HTTP.md +++ b/notes-english/notes/HTTP.md @@ -1,46 +1,52 @@ * [I. Basic Concepts] (#1 - Basic Concepts) -    * [Web infrastructure] (#web-based) -    * [URL] (#url) -    * [Request and Response Message] (#Request and Response Message) + * [Web infrastructure] (#web-based) + * [URL] (#url) + * [Request and Response Message] (#Request and Response Message) * [2. HTTP method] (#2 http-method) -    * [GET: Get Resources] (#get Get Resources) -    * [POST: transport entity body] (#post transport entity body) -    * [HEAD: Get Message Header] (#head Get Message Header) -    * [PUT: upload file] (#put upload file) -    * [PATCH: Partially modify the resource] (#patch partially modified the resource) -    * [DELETE: delete file] (#delete delete file) -    * [OPTIONS: query support method] (#options query support method) -    * [CONNECT: Requires Tunneling Protocol to Connect to Proxy] (#connect requires a tunneling protocol to connect to the proxy) -    * [TRACE: Tracking Path] (#trace Tracking Path) + * [GET] (#get) + * [POST](#post) + * [HEAD](#head) + * [PUT](#put) + * [PATCH](#patch) + * [DELETE] (#delete) + * [OPTIONS] (#options) + * [CONNECT](#connect) + * [TRACE](#trace) * [3. HTTP status code] (#3 http-status code) -    * [2XX Success] (#2xx-Success) -    * [3XX Redirect] (#3xx-Redirect) -    * [4XX Client Error] (#4xx - Client Error) -    * [5XX Server Error] (#5xx-Server Error) + * [2XX Success] (#2xx-Success) + * [3XX Redirect] (#3xx-Redirect) + * [4XX Client Error] (#4xx - Client Error) + * [5XX Server Error] (#5xx-Server Error) * [4. HTTP Header] (#4 http-header) -    * [Universal Header Fields] (#Common Header Field) -    * [Request header field] (#Request header field) -    * [Response header field] (# response header field) -    * [Entity Header Field] (#Entity Header Field) + * [Universal Header Fields] (#Common Header Field) + * [Request header field] (#Request header field) + * [Response header field] (# response header field) + * [Entity Header Field] (#Entity Header Field) * [5. Specific application] (#5 specific application) -    * [Cookie](#cookie) -    * [Cache] (#Cache) -    * [Permanent connection] (# persistent connection) -    * [Code] (#Code) -    * [Block Transfer] (#Block Transfer) -    * [Multipart Object Collection] (#Multipart Object Collection) -    * [Scope Request] (#Scope Request) -    * [Content negotiation] (# content negotiation) -    * [virtual host] (# virtual host) -    * [communication data forwarding] (# communication data forwarding) + * [Cookie] (#cookie) + * [Cache] (#Cache) + * [Permanent connection] (# persistent connection) + * [Code] (#Code) + * [Block Transfer] (#Block Transfer) + * [Multipart Object Collection] (#Multipart Object Collection) + * [Scope Request] (#Scope Request) + * [Content negotiation] (# content negotiation) + * [virtual host] (# virtual host) + * [communication data forwarding] (# communication data forwarding) * [six, https] (#six https) -    * [encryption] (#encryption) -    * [authentication] (#certification) -    * [Integrity] (#Integrity) -* [VII. Comparison of versions] (Comparison of #7 versions) -    * [The difference between HTTP/1.0 and HTTP/1.1] (The difference between #http10- and -http11-) -    * The difference between HTTP/1.1 and HTTP/2.0 (the difference between #http11- and -http20-) + * [encryption] (#encryption) + * [authentication] (#certification) + * [Integrity] (#Integrity) +* [seventh, Web attack technology] (#7 web-attack technology) + * [Attack Mode] (# Attack Mode) + * [Cross-site scripting attack] (# cross-site scripting attack) + * [SQL Injection Attack] (#sql-injection attack) + * [Cross Site Request Forgery] (# Cross Site Request Forgery) + * [Denial of Service Attack] (# Denial of Service Attack) +* [eight, each version comparison] (# eight versions compared) + * [The difference between HTTP/1.0 and HTTP/1.1] (The difference between #http10- and -http11-) + * The difference between HTTP/1.1 and HTTP/2.0 (the difference between #http11- and -http20-) * [Reference materials] (#reference materials) @@ -77,9 +83,15 @@ URIs contain URLs and URNs. Currently, WEBs only have URLs that are popular, so The **Request message sent by the client** The first line of the request contains the method field. -## GET: Get resources +## GET -## POST: Transport Entity Body +> Get resources + +Most of the current web requests use the GET method. + +## POST + +> Transmission entity body The main purpose of POST is not to obtain resources but to transfer data stored in content entities. @@ -97,13 +109,19 @@ Name1=value1&name2=value2 GET's pass arguments are less secure than POST because GET passed parameters are visible in the URL and may reveal private information. And GET only supports ASCII characters. If the parameter is Chinese, it may be garbled, and POST supports the standard character set. -## HEAD: Get Message Header +Another difference between GET and POST is that with the GET method, the browser sends the HTTP Header and Data together, and the server responds with 200 (OK) and returns the data. Using the POST method, the browser first sends a header. After the server responds to 100 (Continue), the browser sends the data again. Finally, the server responds with 200 (OK) and returns the data. + +## HEAD + +> Get message header Same as the GET method but does not return the body of the message entity. It is mainly used to confirm the validity of the URL and the date and time of the resource update. -## PUT: upload file +## PUT + +> Upload file Since no authentication mechanism is available, anyone can upload a file, so there is a security problem and this method is generally not used. @@ -116,9 +134,11 @@ Content-length: 16

New File

``` -## PATCH: Partially Modifying Resources +## PATCH -PUT can also be used to modify resources, but can only completely replace the original resource, PATCH allows partial modification. +> Partially modify the resource + +PUT can also be used to modify resources, but can only completely replace the original resources, PATCH allows partial modification. ```html PATCH /file.txt HTTP/1.1 @@ -130,7 +150,9 @@ Content-Length: 100 [description of changes] ``` -## DELETE: Delete Files +## DELETE + +> Delete file Contrary to the PUT function, it also does not have an authentication mechanism. @@ -138,15 +160,19 @@ Contrary to the PUT function, it also does not have an authentication mechanism. DELETE /file.html HTTP/1.1 ``` -## OPTIONS: Query Support Methods +## OPTIONS + +> Query Support Methods Query the method that the specified URL can support. Will return something like Allow: GET, POST, HEAD, OPTIONS. -## CONNECT: Requires Tunneling Protocol Connection Agent +## CONNECT -The requirement is to set up a tunnel when the proxy server communicates, and use SSL (Secure Sockets Layer, Secure Sockets) and TLS (Transport Layer Security) protocols to encrypt the communication content and tunnel it over the network. +> Requires tunneling protocol to connect proxy + +The requirement is to set up a tunnel when the proxy server communicates, encrypt the communication content using the SSL (Secure Sockets Layer, Secure Sockets) and TLS (Transport Layer Security) protocols, and then tunnel over the network. ```html CONNECT www.example.com:443 HTTP/1.1 @@ -154,11 +180,13 @@ CONNECT www.example.com:443 HTTP/1.1

-## TRACE: Tracking Path +## TRACE + +> Tracking path The server will return the communication path to the client. -When sending a request, enter the value in the Max-Forwards header field, subtract 1 from each server, and stop transmission when the value is 0. +When sending a request, enter the value in the Max-Forwards header field, decrement each time a server passes, and stop transmission when the value is zero. TRACE is not usually used, and it is vulnerable to XST attacks (Cross-Site Tracing), so it will not be used. @@ -169,7 +197,7 @@ TRACE is not usually used, and it is vulnerable to XST attacks (Cross-Site Traci The first line of status in the **response message** returned by the server contains the status code and the reason phrase used to inform the client of the requested result. | Status Code | Categories | Reason Phrases | -| --- | --- | --- | +| :---: | :---: | :---: | | 1XX | Informational (Informational Status Code) | Receiving requests are being processed | | 2XX | Success (Success Status Code) | Request Normal Processing Complete | | 3XX | Redirection | Additional actions required to complete the request | @@ -190,13 +218,13 @@ The first line of status in the **response message** returned by the server cont - **302 Found** : Temporary Redirection -- **303 See Other** : Same as the 302, but 303 explicitly requires that the client should use the GET method to get the resource. +- **303 See Other** : Same function as 302, but 303 explicitly requires that the client should use the GET method to get the resource. -- Note: Although the HTTP protocol specifies that the POST method should not be changed to the GET method when redirected in the 301 or 302 state, most browsers redirect the POST method to the GET method in the 301, 302, and 303 state redirection. +- Note: Although the HTTP protocol specifies that the POST method should not be changed to the GET method when redirecting in the 301 or 302 state, most browsers redirect the POST method to the GET method in the 301, 302, and 303 state redirection. - ** 304 Not Modified** : If the request packet header contains some conditions, such as: If-Match, If-ModifiedSince, If-None-Match, If-Range, If-Unmodified-Since, but does not meet the conditions, then The server returns a 304 status code. -- **307 Temporary Redirect** : Temporary redirect, similar to the meaning of 302, but 307 requires the browser not to change the redirect request's POST method to the GET method. +- **307 Temporary Redirect** : Temporary redirect, similar to the meaning of 302, but 307 requires that the browser does not change the POST method of the redirect request to the GET method. ## 4XX Client Error @@ -224,4 +252,469 @@ The various header fields and their meanings are as follows (do not need to be r ## Common Header Fields -| First field name | +| First field name | Description | +| :--: | :--: | +| Cache-Control | Controlling Cache Behavior | +| Connection | Control header fields that are no longer forwarded to the agent, manage persistent connections | +Date | Date the message was created | +| Pragma | Message instructions | +Trailer | List of Heads of Message Ends | +| Transfer-Encoding | Specifies the transmission encoding of the message body | +Upgrade | Upgrade to Other Agreements | +| Via | Proxy Server Information | +| Warning | Error Notification | + +## request header fields + +| First field name | Description | +| :--: | :--: | +| Accept | User-agent-processable media types | +| Accept-Charset | Priority character set | +| Accept-Encoding | Priority Content Encoding | +| Accept-Language | Preferred Language (Natural Language) | +| Authorization | Web Authentication Information | +| Expect | Expected Server Specific Behavior | +| From | User's email address | +| Host | Requested Resource Server | +| If-Match | Compare Entity Tag (ETag) | +| If-Modified-Since | Compare Resources Updated | +| If-None-Match | Compare Entity Tag (as opposed to If-Match) | +| If-Range | Scope Request for Send Entity Byte When Resource is Not Updated | +| If-Unmodified-Since | Compare resource update time (opposite If-Modified-Since) | +Max-Forwards | Maximum number of hops per transmission | +Proxy-Authorization | Proxy Server Requirements Client Authentication Information | +Range | Entity byte range request | +Referer | The original source of the URI in the request | +| TE | Transmission Coding Priority | +User-Agent | HTTP Client Program Information | + +## Response header field + +| First field name | Description | +| :--: | :--: | +Accept-Ranges | Accepts byte range requests | +| Age | Estimated resource creation elapsed time | +| ETag | Resource Matching Information | +| Location | Redirect the client to the specified URI | +Proxy-Authenticate | Proxy server-to-client authentication | +| Retry-After | Opportunity to Re-initiate Request | +Server | HTTP Server Installation Information | +Vary | Proxy Server Cache Management Information | +WWW-Authenticate | Server-to-Client Authentication | + +## Entity header fields + +| First field name | Description | +| :--: | :--: | +Allow | Resources Supported HTTP Methods | +| Content-Encoding | Encoding of the entity body | +Content-Language | Natural Language for Entity Subjects | +| Content-Length | Size of entity body | +| Content-Location | URI to Replace Corresponding Resource | +| Content-MD5 | Entity Body Message Summary | +| Content-Range | Entity Body Location Range | +| Content-Type | Entity Body Media Type | +| Expires | Date and time when the entity expired | +| Last-Modified | Last Modified Date | + +#5. Application + +## Cookie + +The HTTP protocol is stateless, mainly to make the HTTP protocol as simple as possible, so that it can handle a large number of transactions. HTTP/1.1 introduces cookies to save state information. + +A cookie is data that the server sends to the client. The data is stored in the browser and is included in the next request. Cookies allow the server to know if two requests are from the same client, enabling the ability to stay logged in. + +### 1. Creating Process + +The response packet sent by the server contains the Set-Cookie field. After the client obtains the response packet, the cookie content is saved to the browser. + +```html +HTTP/1.0 200 OK +Content-type: text/html +Set-Cookie: yummy_cookie=choco +Set-Cookie: tasty_cookie=strawberry + +[page content] +``` + +When the client sends a request later, it reads the cookie value from the browser and includes the cookie field in the request message. + +```html +GET /sample_page.html HTTP/1.1 +Host: www.example.org +Cookie: yummy_cookie=choco; tasty_cookie=strawberry +``` + +### 2. Set-Cookie + +| Properties | Description | +| :--: | -- | +NAME=VALUE | Name and Value Assigned to Cookies (Required) | +Expires=DATE | The expiration date of the cookie (if not explicitly specified, the default is until the browser is closed) | +Path=PATH | Use the file directory on the server as the cookie's applicable object (If not specified, the default is the file directory where the document is located) | +|domain=domain name | Domain name that is the target of the cookie (If not specified, it defaults to the domain name of the server that created the cookie) | +| Secure | Sending Cookies Only When Securely Communicating Over HTTPS | +HttpOnly | restricts cookies from access by JavaScript scripts | + +### 3. Differences between Session and Cookie + +Session is a means by which the server tracks users. Each Session has a unique identifier: Session ID. When the server creates a Session, the response packet sent to the client contains the Set-Cookie field. There is a key-value pair called sid. This key-value pair is the Session ID. After receiving the cookie, the client saves the cookie in the browser, and the request packet sent later contains the Session ID. HTTP is used to implement tracking of user status through Session and Cookie. Session is used on the server and Cookie is used on the client. + +### 4. The browser disables cookies + +URL rewriting technology is used, with sid=xxx appended to the URL. + +### 5. Use cookies to automatically fill in user names and passwords + +The website script automatically reads the username and password from the cookies stored in the browser for automatic filling. + +## Cache + +### 1. Advantages + +1. Reduce the burden on the server; +2. Increase the speed of response (cache resources are closer to the client than resources on the server). + +### 2. How to achieve it + +1. Let the proxy server cache; +2. Have the client browser cache it. + +### 3. Cache-Control field + +HTTP controls the cache through the Cache-Control header field. + +```html +Cache-Control: private, max-age=0, no-cache +``` + +### 4. no-cache directive + +The command appears in the Cache-Control field of the request packet, indicating that the cache server needs to verify with the original server whether the cache resource expires. + +The command appears in the Cache-Control field of the response message, indicating that the cache server needs to verify the validity of the cache resource before caching. + +### 5. no-store directive + +This instruction indicates that the cache server cannot cache any part of the request or response. + +No-cache does not mean not to cache, but it needs to be verified before caching, and no-store is not cached. + +### 6. The max-age directive + +This instruction appears in the Cache-Control field of the request message. If the cache time of the cache resource is less than the time specified by the instruction, the cache can be accepted. + +This command appears in the Cache-Control field of the response message and indicates the time when the cached resource is stored in the cache server. + +The Expires field can also be used to tell the cache server when the resource will expire. In HTTP/1.1 ä +Cache-Control: max-age directives are handled first; in Http-1.0, the Cache-Control: max-age directives are ignored. + +## persistent connection + +When a browser accesses an HTML page containing multiple pictures, in addition to requesting access to HTML page resources, the picture resource is also requested. If a TCP connection is disconnected for every HTTP communication, the overhead of connection establishment and disconnection will be very high. Big. A persistent connection requires only one TCP connection to make multiple HTTP communications. + +

+ +Persistent connections need to be managed using the Connection header field. HTTP/1.1 Start HTTP is a persistent connection by default. If you want to disconnect the TCP connection, you need to disconnect it from the client or the server. Use Connection: close; and before HTTP/1.1, the default is non-persistent connection. To maintain a continuous connection, use Connection : Keep-Alive. + +** Pipelining ** Multiple requests and responses can be sent at the same time without sending a single request and waiting for the response before sending the next request. + +

+ +## code + +Encoding is mainly for compressing entities. The commonly used encodings are: gzip, compress, deflate, and identity, where identity represents the encoding format in which compression is not performed. + +## Block transmission + +Chunked Transfer Coding can split the data into multiple chunks, allowing the browser to display pages gradually. + +## Multi-part object collection + +A message body can contain multiple types of entities that are sent at the same time. Each part is separated by a delimiter defined by the boundary field. Each part can have a header field. + +For example, when uploading multiple forms, you can use the following method: + +```html +Content-Type: multipart/form-data; boundary=AaB03x + +--AaB03x +Content-Disposition: form-data; name="submit-name" + +Larry +--AaB03x +Content-Disposition: form-data; name="files"; filename="file1.txt" +Content-Type: text/plain + +... contents of file1.txt ... +--AaB03x-- +``` + +## range request + +If the network is interrupted, the server sends only a portion of the data. The scope request allows the client to request only the portion of data that was not sent, thereby preventing the server from resending all the data. + +Add the Range field to the request message header and specify the scope of the request, for example Range:bytes=5001-10000. The server sends 206 Partial Content status if the request is successful. + +```html +GET /z4d4kWk.jpg HTTP/1.1 +Host: i.imgur.com +Range: bytes=0-1023 +``` + +```html +HTTP/1.1 206 Partial Content +Content-Range: bytes 0-1023/146515 +Content-Length: 1024 +... +(binary content) +``` + +## Content negotiation + +The most appropriate content is returned through content negotiation, for example, whether to return a Chinese interface or an English interface according to the default language of the browser. + +The following header fields are involved: Accept, Accept-Charset, Accept-Encoding, Accept-Language, Content-Language. + +## Web Hosting + +Using virtual host technology, a server has multiple domain names and can be logically viewed as multiple servers. + +## Communication Data Forwarding + +### 1. Agency + +The proxy server accepts the client's request and forwards it to other servers. + +The proxy server is generally transparent and does not change the URL. + +The main purpose of using a proxy is: caching, network access control, and access logging. + +

+ +### 2. Gateway + +Unlike proxy servers, gateway servers translate HTTP to other protocols for communication, requesting services from other non-HTTP servers. + +

+ +### 3. Tunnel + +Use encryption methods such as SSL to establish a secure communication line between the client and the server. + +

+ +#6. HTTPs + +HTTP has the following security issues: + +1. Use clear text to communicate and content may be eavesdropped; +2. Without verifying the identity of the communicating party, the identity of the communicating party may be subject to camouflage; +3. The integrity of the message cannot be proved and the message may be tampered with. + +HTTPs are not new protocols, but HTTP communicates with SSL (Secure Socket Layer) and SSL and TCP. By using SSL, HTTPs provide encryption, authentication, and integrity protection. + +## encryption + +There are two types of encryption: symmetric key encryption and public key encryption. The encryption and decryption of symmetric key encryption use the same key, while public key encryption uses a pair of keys for encryption and decryption, which are the public key and the private key, respectively. The public key owner is available. After the communication sender obtains the public key of the receiver, it can use the public key for encryption, and the recipient receives the communication content and decrypts it with the private key. + +Disadvantages of symmetric key encryption: The key cannot be transmitted securely; the disadvantages of public key encryption are relatively time-consuming. + +HTTPs employ a **hybrid encryption mechanism** that uses public-key encryption for transmitting symmetric keys and then uses symmetric-key encryption for communication. (In the figure below, the shared key is the symmetric key) + +

+ +## Certification + +Use the **certificate** to authenticate the communicating party. + +A digital certificate authority (CA) is a third party trusted by both the client and the server. The operator of the server submits a request for a public key to the CA. After determining the identity of the applicant, the CA digitally signs the applied public key, and then assigns the signed public key and closes the public key. The keys are bound together after they are put into a public key certificate. + +When performing HTTPs communication, the server sends the certificate to the client. After the client obtains the public key, the client first verifies it. If the verification is successful, the server can start communication. + +In addition to the server-side certificates mentioned in the appeal, there are client certificates. The purpose of the client certificate is to have the server authenticate the client. The client certificate needs to be installed by the user. The client certificate is used only when the business requires very high security, such as online banking. + +Using the OpenSSL open-source program, everyone can build a set of their own certification bodies and issue their own server certificates. When the browser accesses the server, a warning message such as "Unable to confirm connection security" or "There is a problem with the site's security certificate" is displayed. + +## Integrity + +SSL provides digest functionality to verify integrity. + +#7. Web Attack Technology + +The main goal of Web attacks is Web applications that use the HTTP protocol. + +## attack mode + +### 1. Actively attack + +Directly attacking the server, there are typical SQL injection and OS command injection. + +### 2. Passive attack + +Set up a trap to allow users to send HTTP requests with attack code. After the user sends the HTTP request, it reveals personal information such as cookies. Representative cross-site scripting attacks and cross-site request forgery are typical. + +## Cross-Site Scripting Attacks + +### 1. Concept + +(Cross-Site Scripting, XSS), which injects code into the web pages the user is browsing. This code contains HTML and JavaScript. By exploiting vulnerabilities left in the development of web pages, malicious instructions are injected into web pages through clever methods to allow users to load and execute malicious web pages created by attackers. After the attack is successful, the attacker may be given higher rights (such as performing some operations), private web content, conversations, and cookies. + +### 2. Harmful + +- Forging false input forms to cheat personal information +- Stealing user's cookie value +- Display forged articles or pictures + +### 3. Preventive measures + +**(1) Filter special characters** + +Many languages ​​provide filtering of HTML: + +- PHP's htmlentities() or htmlspecialchars(). +- Python's cgi.escape(). +- Java's xssprotect (Open Source Library). +- The node-validator of Node.js. + +** (b) Specifying HTTP Content-Type** + +In this way, you can prevent content from being parsed as HTML. For example, the PHP language can use the following code: + +```php + +``` + +## SQL Injection Attack + +### 1. Concept + +The database on the server runs an illegal SQL statement. + +### 2. Attack Principle + +For example, a website login verification SQL query code is: + +```sql +strSQL = "SELECT * FROM users WHERE (name = '" + userName + "') and (pw = '"+ passWord +"');" +``` + +If you fill in the following: + +```sql +userName = "1' OR '1'='1"; +passWord = "1' OR '1'='1"; +``` + +Then the SQL query string is: + +```sql +strSQL = "SELECT * FROM users WHERE (name = '1' OR '1' = '1') and (pw = '1' OR '1' = '1');" +``` + +You can execute the following query without verifying the pass: + +```sql +strSQL = "SELECT * FROM users;" +``` + +### 3. Hazards + +- The data in the data sheet is leaked, such as personal confidential data, account data, passwords, etc. +- The data structure was hacked to make further attacks (eg SELECT * FROM sys.tables). +- The database server was attacked and the system administrator account was tampered with (eg ALTER LOGIN sa WITH PASSWORD='xxxxxx'). +- After gaining access to the system, malicious links, malicious code, and XSS may be added to the web page. +- Operating system support provided by the database server allows hackers to modify or control the operating system (for example, xp_cmdshell "net stop iisadmin" can stop the server's IIS service). +- Destroys the hard disk data and saves the entire system (for example, xp_cmdshell "FORMAT C:"). + +### 4. Preventive measures + +- When designing an application, use a fully parameterized query to design data access capabilities. +- When combining SQL strings, use character substitution for the passed in parameters (single quote characters are replaced by two consecutive single quote characters). +- If you use PHP to develop web applications, you can also turn on the PHP Magic quote feature (automatically pass all web pages parameters, replace single quote characters with two consecutive single quote characters). +- Others, use other more secure ways to connect to SQL databases. For example, database connectivity components that have fixed SQL injection issues, such as ASP.NET's SqlDataSource object or LINQ to SQL. +- Use SQL anti-injection system. + +## Cross Site Request Forgery + +### 1. Concept + +(Cross-site request forgery, XSRF) is an attacker who deceives a user's browser through a number of technical means to access a self-certified website and perform some operations (such as sending e-mail, sending messages, or even property operations such as transfer and purchase). commodity). Since the browser has been authenticated, the visited website will be considered as a real user operation. This exploits a vulnerability in user authentication on the Web: Simple authentication only guarantees that the request was sent from a user's browser, but there is no guarantee that the request itself was made voluntarily by the user. + +XSS utilizes the user's trust in the specified website, and CSRF uses the website's trust in the user's web browser. + +If the address of a URL used by a bank to perform a transfer operation is: `http://www.examplebank.com/withdraw?account=AccoutName&amount=1000&for=PayeeName`. + +
Then, a malicious attacker can place the following code on another website: ``.

+ +If a user with an account named Alice visits a malicious site and she has just visited the bank shortly before and the login information has not expired, she will lose 1000 funds. + +This malicious URL can take many forms and hide in many places on the page. In addition, attackers do not need to control websites that have malicious URLs. For example, he can hide such addresses in forums, blogs, and other user-generated content sites. This means that if the server does not have appropriate defenses, users will be at risk of attack even if they visit familiar trusted websites. + +By way of example, it can be seen that an attacker cannot directly gain control of the user's account through the CSRF attack, nor can he directly steal any information from the user. What they can do is deceive the user's browser and let it perform operations on behalf of the user. + +### 2. Preventive measures + +**(a) Check the Referer field** + +There is a Referer field in the HTTP header that indicates which address the request came from. When processing sensitive data requests, the Referer field should normally be in the same domain as the requested address. + +This approach is simple, low-effort, and requires only one step of verification at key access points. However, this approach also has its limitations because it completely relies on the browser to send the correct Referer field. Although the HTTP protocol specifies the content of this field, it does not guarantee the specific implementation of the browser you are visiting, nor does it guarantee that the browser has no security flaws affecting this field. There is also the possibility that an attacker attacks some browsers and falsifies their Referer fields. + +** (b) Add verification Token** + +Because the essence of CSRF is that an attacker tricks a user to access an address set by himself, if he requests the user's browser to provide data that is not stored in a cookie and the attacker cannot forge it as a check when accessing a request for sensitive data, then the attack The person can no longer perform CSRF attacks. This data is usually a data item in the form. The server generates it and attaches it to the form. Its content is a fake number. When the client submits the request through the form, the pseudo-random number is also submitted for verification. During normal access, the client browser can correctly obtain and return this pseudo-random number. In the deceptive attack from the CSRF, the attacker does not first learn the value of the pseudo-random number, and the server will check the token because of the verification token. The value is null or false, rejecting this suspicious request. + +## Denial of Service Attack + +### 1. Concept + +Denial-of-service attack (DoS), also known as flood attack, is designed to exhaust the network or system resources of the target computer and temporarily interrupt or stop the service, causing its normal users to lose access. + +(distributed denial-of-service attack, DDoS), where an attacker uses two or more compromised computers on the network as a "zombie" to launch a "denial of service" attack on a specific target. + +> [Wikipedia: Denial of Service Attack] (https://en.wikipedia.org/wiki/%E9%98%BB%E6%96%B7%E6%9C%8D%E5%8B%99%E6%94 %BB%E6%93%8A) + +#8. Comparison of versions + +Difference between ## HTTP/1.0 and HTTP/1.1 + +HTTP/1.1 added the following: + +- The default is long connection; +- Provides scope request function; +- Provides the functionality of a virtual host; +- More cache processing fields; +- More status codes. + +## Differences Between HTTP/1.1 and HTTP/2.0 + +### 1. Multiplexing + +HTTP/2.0 uses multiplexing to use the same TCP connection to handle multiple requests. + +### 2. Header compression + +The header of HTTP/1.1 carries a lot of information and is repeated every time. HTTP/2.0 requires that both parties to the communication buffer their own header table, thereby avoiding duplicate transmissions. + +### 3. Server push + +When the client requests a resource, the relevant resources are sent to the client together, and the client does not need to initiate the request again. For example, the client requests the index.html page and the server sends the index.js to the client. + +### 4. Binary format + +HTTP/1.1 resolution is text-based, and HTTP/2.0 is in binary format. + +# References + +- [Illustrated HTTP] (https://pan.baidu.com/s/1M0AHXqG9sP9Bxne6u0JK8A) +- [MDN: HTTP] (https://developer.mozilla.org/en-US/docs/Web/HTTP) +- [Wikipedia: Cross-site scripting] (https://zh.wikipedia.org/wiki/%E8%B7%A8%E7%B6%B2%E7%AB%99%E6%8C%87%E4%BB %A4%E7%A2%BC) +- [Wikipedia: SQL Injection Attack] (https://zh.wikipedia.org/wiki/SQL%E8%B3%87%E6%96%99%E9%9A%B1%E7%A2%BC%E6% 94%BB%E6% +93%8A) +- [Wikipedia: Cross Site Request Forgery] (https://zh.wikipedia.org/wiki/%E8%B7%A8%E7%AB%99%E8%AF%B7%E6%B1%82%E4% BC%AA%E9%80%A0) +- [Wikipedia: Denial of Service Attack] (https://zh.wikipedia.org/wiki/%E9%98%BB%E6%96%B7%E6%9C%8D%E5%8B%99%E6%94 %BB%E6%93%8A) + diff --git a/notes/剑指 offer 题解.md b/notes/剑指 offer 题解.txt similarity index 100% rename from notes/剑指 offer 题解.md rename to notes/剑指 offer 题解.txt