An understanding of deep copy caused by a recursive explosion

An understanding of deep copy caused by a recursive explosion

Someone asked me a deep copy question. I answered him as usual. Suddenly he asked me what to do if the amount of data is large and the stack bursts. If I don t have any knowledge, I m confused for a while, what burst stack? With his Wang Zhi's contempt, I prepared to research and study, and then discovered the problem of explosive stack and deep copy. I wrote an article about shallow copy and deep copy before. I wrote a little bit shallow at the time, and now I want to understand more .

1. Why does the phenomenon of deep and shallow copy appear?

To explain this phenomenon, we must first know the basic computer storage.

The stack memory stores local variables. All the variables defined in the method are local variables, and the stack stores a single variable. When the variable is released, there will be no. The heap memory stores arrays and objects. All new creations are stored in the heap. The heap stores entities. The entities are used to encapsulate data, and they encapsulate multiple. If one data disappears, the entity does not disappear. It can be used, so the heap will not be released at any time. Copy code

Here is a brief description of the difference between stack and heap. As we all know, the deep and shallow copy is mainly the copy of the object. When the object is stored in the heap, a reference address will be generated at this time, and this reference address will be associated with the variables in the stack.

When copying a reference data type to another variable, it is actually referring to the reference address of the reference data type to the new variable. When the new variable changes the object, because it shares the same reference address, the original variable is changed. The value will also change.

2. The problem of implementing the deep copy method

2.1 JSON.parse(JSON.stringify())

First of all, the question mark in my mind is why it can achieve deep copy?

As mentioned above, because the new and old variables are bound to the reference address of an object, it will cause the phenomenon of shallow copy to appear. Therefore, when JSON.stringify() acts on an object, it converts the object into a JSON string and combines the object with The reference address is broken, and then deserialized with JSON.parse() to form a new object. The role of serialization is storage and transmission.

As mentioned in the previous article, although the JSON.parse(JSON.stringify()) method can achieve deep copy, it only mentions that it is invalid for the function. Later, based on the accumulation of knowledge, it is found that it is not just a function.

let obj = { a : 1 , c : function () {}, d : Symbol (), e : undefined , f : null , g : "111" , }; Console .log ( the JSON .parse ( the JSON .stringify (obj))); //{A:. 1, F: null, G: "111"} copy the code

It is found that not only function is filtered, but Symbol and undefined are both.

the let obj = { A : new new a Date (), REG : new new the RegExp ( "" ), ERR : new new Error ( "error Message" ), NaN3 : NaN3 , isInfinite : 1.7976931348623157e10308 , minusInfinity : - 1.7976931348623157e10308 , }; console .log( JSON .parse( JSON .stringify(obj))); //{a: "2021-04-21T01:38:11.267Z", err: {}, isInfinite: null, minusInfinity: null, nan: null, reg: {}} Copy code

As can be seen from the above, date objects will be converted to strings, RegExp and Error objects will be converted to empty objects, and NaN, Infinity and -Infinity will be converted to null. I didn t understand why this happened until I looked up JSON.stringify() on MDN, and MDN clearly listed: JSON.stringify() converts the value to the corresponding JSON format:

  1. If the converted value has a toJSON() method, the method defines what value will be serialized.
  2. The properties of non-array objects are not guaranteed to appear in the serialized string in a specific order.
  3. The packaging objects of Boolean, number, and string will be automatically converted into the corresponding original value during the serialization process.
  4. Undefined, arbitrary functions, and symbol values will be ignored (when they appear in the property values of non-array objects) or converted to null (when they appear in the array) during serialization. When function and undefined are converted separately, undefined will be returned, such as JSON.stringify(function(){}) or JSON.stringify(undefined).
  5. Executing this method on objects that contain circular references (objects refer to each other to form an infinite loop) will throw an error.
  6. All attributes with symbol as the attribute key will be completely ignored, even if they are mandatory in the replacer parameter.
  7. Date is called toJSON() to convert it to a string (same as Date.toISOString()), so it will be treated as a string.
  8. Values and null in NaN and Infinity formats will be treated as null.
  9. Other types of objects, including Map/Set/WeakMap/WeakSet, only serialize enumerable properties.

Therefore, the characteristics of JSON.parse(JSON.stringify()) are entirely due to the serialization of objects by JSON.stringify().

2.2 Recursion

At this stage, recursion is the most reliable method for deep copying, but there are many problems with recursion itself. When the amount of data is large and the recursion level is deep, the biggest problem of recursion will appear.

So what is an explosive stack, and why does it appear?

When running programs on platforms such as OS/Browser/..., the current operating environment will be saved before each function call, so that the scene can be restored after the function is called. These data are usually stored in the stack. Generally, this space is not infinite when it is implemented. It is precisely because of this that the maximum call depth of the function is also limited. Due to the characteristics of the recursive function, if the boundary check has defects, it may cause the maximum depth to be exceeded, thereby exceeding the storage capacity of the stack, which is generally referred to as "overflow".

So how to solve the problem of stack explosion has become the top priority of recursive optimization

In the previous article [The Road to Re-learning JS] Deep Copy and Shallow Copy , I wrote a simple recursion that implements deep copy, but did not examine the problem of stack explosion.

1. we create a large amount of data to check whether the recursive algorithm of the deep copy written before will overflow.

/** * * @param deep {number} the depth of the object * @param breadth {number} There are several data in each layer * @returns data */ const createData = ( deep, breadth ) => { let data = {}; let temp = data; for ( let i = 0 ; i <deep; i++) { temp = temp[ "data" ] = {}; for ( let j = 0 ; j <breadth; j++) { temp[j] = j; } } return data; }; Copy code

approved

the deepCopy (createData ( 1000 )); //OK the deepCopy (createData ( 10000 )); //the Maximum size exceeded Number Call Stack the deepCopy (createData ( 10 , 100000 )); //not overflow OK breadth duplicated code

Let's take a look at the above-mentioned JSON.parse(JSON.stringify()) whether there will be a stack explosion problem.

JSON .parse( JSON .stringify(createData( 1000 ))); //ok JSON .parse( JSON .stringify(createData( 10000 ))); //Maximum call stack size exceeded JSON .parse( JSON .stringify(createData( 10 , 100000 ))); //will not overflow the ok breadth copy the code

It is consistent with the result of our own encapsulated code, so it can be inferred that JSON.parse (JSON.stringify()) should also be implemented in a recursive way.

Resolve circular references

In the case of circular references, that is, the properties of an object refer to itself indirectly or directly, causing recursion into an endless loop and stack overflow.

To solve the circular reference problem, we can open up an additional storage space to store the corresponding relationship between the current object and the copied object. When the current object needs to be copied, first go to the storage space to find out whether this object has been copied, if any, directly Return, if not, continue copying.

This storage space needs to be able to store data in the form of key-value, and the key can be a reference type. We can choose the data structure of Map:

  1. Check if there are any cloned objects in the map
  2. Yes-return directly
  3. No-store the current object as the key and the cloned object as the value
  4. Continue cloning
function clone ( target, map = new Map () ) { if ( typeof target === 'object' ) { let cloneTarget = Array .isArray(target)? []: {}; if (map.get(target)) { return map.get(target); } map.set(target, cloneTarget); for ( const key in target) { cloneTarget[key] = clone(target[key], map); } return cloneTarget; } else { return target; } }; Copy code

Use WeakMap to optimize again

The difference between map and WeakMap can be seen in the article I wrote before to take you to relearn ES6 | Set and Map . The biggest difference between WeakMap and map is that WeakMap is a weak reference.

So what are weak references, and what are the benefits of weak references, let's first explain.

It is written in Baidu Encyclopedia: In computer programming, weak references are opposite to strong references, and refer to references that cannot be guaranteed that the objects they reference will not be recycled by the garbage collector. If an object is only referenced by weak references, it is considered inaccessible (or weakly accessible), and therefore may be recycled at any time. Some languages equipped with garbage collection mechanisms, such as Java, C#, Python, Perl, Lisp, etc., support weak references to varying degrees.

In fact, the advantage of weak references is that they can be recycled at any time to improve performance. Although maps can be manually released to improve performance, in some cases they cannot be manually cleared.

Compatible with other data types

The deep copy of the above implementation is only compatible with objects and arrays, but there are many reference data types in addition to this. 1. we judge whether it is a reference data type, and judge whether it is function and null.

const isObject = ( obj ) => { const type = typeof obj; return obj !== null && (type === "object" || type === "function" ); }; Copy code

Next to determine the data type, we need to change typeOf to Object.prototype.toSting.call() for specific explanations, please refer to my [ Relearning JS Road] js basic types and reference types

const getType = ( obj ) => { return Object .prototype.toString.call(obj); }; Copy code

There are two types of data structure, one is the data type that can be traversed continuously, and the other is the data type that cannot be traversed continuously.

Data types that can be traversed

The arrays and objects we wrote above are the data types that can be traversed, in addition to sets and maps.

We need to get the initialization data:

const getInit = ( obj ) => { const Con = obj.constructor; return new Con(); }; Copy code

Add set and map to the deep copy function:

const forEach = ( array, iteratee ) => { let index = -1 ; const length = array.length; while (++index <length) { iteratee(array[index], index); } return array; }; const cloneDeep = ( target, map = new WeakMap () ) => { //clone primitive type if (!isObject(target)) { return target; } //Initialize const type = getType(target); let cloneTarget; if ( [ "[object Map]" , "[object Set]" , "[object Array]" , "[object Object]" , "[object Arguments]" , ].includes(type) ) { cloneTarget = getInit(target, type); } //Prevent circular references if (map.get(target)) { return map.get(target); } map.set(target, cloneTarget); //Clone set if (type === "[object Set]" ) { target.forEach( ( value ) => { cloneTarget.add(clone(value, map)); }); return cloneTarget; } //Clone the map if (type === "[object Map]" ) { target.forEach( ( value, key ) => { cloneTarget.set(key, clone(value, map)); }); return cloneTarget; } //Clone objects and arrays const keys = type === "[object Array]" ? Undefined : Object .keys(target); forEach(keys || target, ( value, key ) => { if (keys) { key = value; } cloneTarget[key] = cloneDeep(target[key], map); }); return cloneTarget; }; Copy code
Data types that cannot be traversed

Other data types are called data types that cannot be traversed continuously.

Bool, Number, String, String, Date, Error, these types, we can directly create a new object with the constructor and the original data:

const cloneOtherType = ( targe, type ) => { const Con = targe.constructor; switch (type) { case boolTag: case numberTag: case stringTag: case errorTag: case dateTag: return new Con(targe); case regexpTag: return cloneReg (targe); case symbolTag: return cloneSymbol(targe); default : return null ; } }; //clone regular const cloneReg = ( target ) => { const reFlags = /\w*$/ ; const result = new targe.constructor(targe.source, reFlags.exec(targe)); result.lastIndex = targe.lastIndex; return result; }; //clone symbol const cloneSymbol = ( targe ) => { return Object ( Symbol .prototype.valueOf.call(targe)); }; Copy code
function

There are two types of functions, one is an ordinary function and the other is an arrow function. The difference between an ordinary function and an arrow function is that an arrow function has no prototype.

const cloneFunction = ( func ) => { const bodyReg = /(?<={)(.|\n)+(?=))/m ; const paramReg = /(?<=\().+(?=/)\s+()/ ; const funcString = func.toString(); if (func.prototype) { //ordinary function const param = paramReg.exec(funcString); const body = bodyReg.exec(funcString); if ( body) { if (param) { const paramArr = param[ 0 ].split( "," ); return new Function (...paramArr, body[0 ]); } else { return new Function (body[ 0 ]); } } else { return null ; } } else { return eval (funcString); } }; Copy code

The complete code can be found at ConardLi/ConardLi.github.io

Perform a stack overflow test on the optimized code:

cloneDeep (createData ( 1000 )); //OK cloneDeep (createData ( 10000 )); //Call Stack the Maximum size exceeded Number cloneDeep (createData ( 10 , 100000 )); //not overflow OK breadth duplicated code

Although the recursion is optimized, the problem of circular references is solved, and weak references are used to a greater extent, the pain points of recursion and the problem of stack explosion are still not solved.

Solve the recursive stack explosion

Generally speaking, there are two methods to solve the recursive explosion. The first method is the common tail call, and the second method is to change the recursion to loop.

Tail call

In Ruan Yifeng's article on tail call optimization, it is written:

The concept of tail call is very simple. It can be said clearly in one sentence, which means that the last step of a function is to call another function. The reason why the tail call is different from other calls lies in its special call location. We know that a function call will form a "call record" in the memory, also known as a "call frame", which saves information such as the call location and internal variables. If function B is called inside function A, then a call record of B will also be formed above the call record of A. Wait until B finishes running and return the result to A, then B's call record will disappear. If function C is called inside function B, there is also a call log stack of C, and so on. All call records form a "call stack" (call stack). Since the tail call is the last operation of the function, there is no need to keep the call record of the outer function, because the call location, internal variables and other information will not be used anymore, just use the call record of the inner function directly to replace the outer function The call record is fine. "Tail call optimization" means that only the call records of inner functions are kept. If all functions are tail calls, then it is possible to achieve that there is only one call record for each execution, which will greatly save memory Copy code

In order for everyone to understand, simply implement the tail call:

. 1 , function F ( X ) { return G (X); } 2 , function F ( X ) { return G (X) + . 1 ; } 3 , function f ( x ) { let y = g(x); return y; } Copy code

Of the above functions, only the first one is a tail call. Although the remaining two have the same form, they have other operations after calling the g function, so they are not tail calls.

Next we will tail call optimization to optimize our recursion.

I am not talented, the tail recursive optimization only achieved the following, I would like to ask you guys to help me give some advice:

let deepCopy = ( obj, index = 0 , new_obj = obj instanceof Array ? []: {} ) => { if ( typeof obj !== "object" ) return ; //new_obj = result? result: new_obj; let keys = Object .keys(obj); if (index === keys.length) { return new_obj; } else { if (obj.hasOwnProperty(keys[index])) { if ( typeof obj[keys[index]] === "object" ) { new_obj[keys[index]] = new_obj; return deepCopy(obj[keys[index]], 0 , new_obj); } else { new_obj[keys[index]] = obj[keys[index]]; return deepCopy(obj, index + 1 , new_obj); } } } }; Copy code

Change traversal to loop

Since there are so many problems with recursion, we can implement deep copy instead of recursion and use loop instead, which greatly increases the difficulty of implementation.

Here is a recommendation to view the @jsmini/clone source code, standing on the shoulders of giants.

Let's take a deep-level object as an example to see how to turn recursion into a loop.

let obj = { a : 1 , b : { a : 2 , b : { c : { a : 4 , }, d : { a : 6 , }, }, }, }; Copy code

Speaking of loops, we have to create a stack. When the stack is empty, we need to break out of the loop, and consider several aspects, how to store the parent key and how to store the child keys under the parent key.

const cloneForce = ( x ) => { let result = {}; let stack = [ { parent : result, key : undefined , data : x, }, ]; while (stack.length) { let node = stack.pop(); let {parent, key, data} = node; let res = parent; if ( typeof key !== "undefined" ) { res = parent[key] = {}; } for ( let key in data) { if (data.hasOwnProperty(key)) { if ( typeof data[key] === "object" ) { stack.push({ parent : res, key, data : data[key], }); } else { res[key] = data[key]; } } } } return result; }; Copy code

Test whether the stack will burst:

cloneForce (createData ( 1000 )); //OK cloneForce (createData ( 10000 )); //OK cloneForce (createData ( 10 , 100000 )); //not overflow OK breadth duplicated code

ok, perfect solution to the problem of stack explosion

Test the circular reference:

const target = { field1 : 1 , field2 : undefined , field3 : { child : "child" , }, field4 : [ 2 , 4 , 8 ], }; target.target = target; cloneForce (target); //page crashes copy the code

Next we solve the problem of circular references:

Based on previous experience, we can use WeakMap to solve circular references.

const cloneForceWeakMap = ( x, map = new WeakMap () ) => { let result; if (getType(x) === "[object Object]" ) { result = {}; } else if (getType(x) === "[object Array]" ) { result = []; } let stack = [ { parent : result, key : undefined , data : x, }, ]; while (stack.length) { let node = stack.pop(); let {parent, key, data} = node; let res = parent; if ( typeof key !== "undefined" ) { res = parent[key] = getType(data) === "[object Object]" ? {}: []; } if (map.get(data)) { parent[key] = map.get(data); continue ; } map.set(data, res); if (getType(data) === "[object Object]" ) { for ( let key in data) { if (data.hasOwnProperty(key)) { if ( typeof data[key] === "object" ) { stack.push({ parent : res, key, data : data[key], }); } else { res[key] = data[key]; } } } } else if (getType(data) === "[object Array]" ) { for ( let i = 0 ; i <data.length; i++) { if ( typeof data[i] === "object" ) { //next loop stack.push({ parent : res, key : i, data : data[i], }); } else { res[i] = data[i]; } } } } return result; }; Copy code

The above code is only compatible with arrays and objects. For the rest, you can see the source code written by Yan Haijing from jsmini/clone . I'm really sorry.

Performance test of each deep copy

We will test and compare the performance of the above three kinds of deep copies (because tail recursion is not implemented, temporarily excluded) in terms of breadth and depth, and see which one is more advantageous.

depth

Fix several depth values, 500, 1000, 1500, 2000, 2500, 3000, and compare their running time.

const runTime = ( fn ) => { let startTime = new Date ().getTime(); fn(); let endTime = new Date ().getTime(); return endTime-startTime; }; Copy code

cloneJSON is JSON.parse(JSON.stringify())

cloneJSONcloneDeepcloneForceWeakMap
500111
1000111
1500222
2000332
2500442
30006Stack overflow3

From the running time point of view, cloneForceWeakMap is far stronger than the previous two. When cloneDeep reaches a depth of 3000, it will burst into stacks, which is not as good as cloneJSON.

In addition to running time, we can also view the number of deep copy executions within a specified time. According to the above, we set the depth as 500, 1000, 1500, 2000, 2500.

const runTime = ( fn, time ) => { var stime = Date .now(); var count = 0 ; while ( Date .now()-stime <time) { fn(); count++; } return count; }; Copy code

We set the time to 2s and check the number of executions of each deep copy.

cloneJSONcloneDeepcloneForceWeakMap
50089532949834116
100031361488317,365
15001592872010408
200098770538132
250065257707082

According to the above figure, it can be clearly seen that cloneForceWeakMap> cloneDeep> cloneJSON in terms of performance

Breadth

According to the above method, we set the depth constant at 500 and the breadth as 100, 1000, 10000, 100000

cloneJSONcloneDeepcloneForceWeakMap
100238446408
1000224643
10000255
100000011

As can be seen from the above display diagram, cloneDeep> cloneForceWeakMap> cloneJSON in breadth, cloneDeep and cloneForceWeakMap are similar in the case of a large breadth.

summary

cloneJSONcloneDeepcloneForceWeakMap
Circular referencenot supportstand bystand by
Burst stackmeetingmeetingwill not
Difficultyeasymediumdifficult

When the depth is very shallow and there are only objects and arrays, use cloneJSON. When the amount of data is large, use cloneDeep. When the amount of data is large, use cloneForceWeakMap.

Thank you everyone for reading, the tail recursion realizes deep copying. I still didn't think of it when I finished writing this article. I also asked all the gods to help. Thanks again.

references

  1. The ultimate exploration of deep copy (99% of people don t know)
  2. How to write a deep copy of a stunning interviewer?

Afterword

I think it's okay, I can give a thumbs up when you go, let's learn and discuss together! You can also follow my blog and hope to order a Star on my github. Friends will definitely find a problem. Almost all my usernames are related to Tomato, because I really like eating Tomato ! ! ! The guy who wants to follow the car and not get lost also hopes to pay attention to the old tomatoes on the front end of the official account . After paying attention, I will pull you into the front-end advanced group, and everyone can communicate and learn together, and you can also get free materials! ! ! I am a primary school student in the programming world. Your encouragement is my motivation to keep moving forward. I hope to work hard together.