- Original address: Highlights from Git 2.31
- Original Author: Taylor Blau
- Translation from: Nuggets Translation Project
- Permanent link to this article: github.com/xitu/gold-m...
- Translator: Badd
- Proofreaders: PassionPenguin , PingHGao
Highlights in Git 2.31
The open source project Git recently released version 2.31 , with 85 contributors bringing new features and bug fixes, of which 23 are new entrants. The last time we synchronized the version update with you was when Git 2.29 was just released. Since version 2.29, Git has gone through two more version iterations, so let's take a look at the most interesting features and changes.
Introduce Git maintenance
Imagine this: you open the terminal, and you are submitting, pulling from other warehouses, and pushing the final results to the remote end, but suddenly, you bump into this unwelcome message:
Then, you are stuck here. Now, you can only wait for Git to finish running
How is this going? In normal usage scenarios, Git writes large amounts of data: objects, package files, references, and so on. For some of the data paths, Git will optimize write performance. For example, writing to a "loose" object is indeed faster, but reading a package file is faster.
In order to keep you high efficiency, Git has coordinated: Usually, it will optimize the write path during your operation, that is, it will pause from time to time to make its internal data structure read more efficiently, the purpose is to make you Maintain long-term efficient output.
Git has its own algorithms to determine when it is appropriate to perform this "pause", but sometimes those algorithms may trigger blocking at an inappropriate time
Starting from Git 2.31, background maintenance allows you to have both fish and bear's paws. This cross-platform feature keeps the warehouse running well without blocking any interaction. It is worth mentioning that Git will pre-fetch the latest objects from the remote end once an hour, which will effectively shorten the execution
Getting started with the background maintenance function couldn't be easier. Just switch to the warehouse where you want to use the background maintenance function in the terminal, and then run the following command:
$ Git maintenance start Copy the code
Git will do the remaining work. In addition to pre-pulling the latest objects every hour, Git also ensures that its own data is also in order. It will be updated every hour
File , and pack loose objects every night (and repack objects that have already been packed). In the documentation , you can read more about this feature and learn how to use it[ Source code , source code , source code , source code ]
Inverted index on local disk
As you may already know, Git stores all data in the form of "objects": commits, trees, and Blob files that store the contents of each file. For efficiency reasons, Git puts many objects in a package file, and the package file is essentially a series of object streams (
What if we want to visit in reverse? Furthermore, if Git only knows which byte it is looking for in the package file, how does it know which object that byte belongs to?
To do this, Git uses an aptly named reverse index : an opaque mapping that associates the locations in the package file and which object each location belongs to. Before Git 2.31, there was no disk file format for reverse indexing (like
But such an operation takes time. If the package file in the warehouse is large, the process will be very long. In order to better understand the impact of volume on time, we can do an experiment to compare the time it takes to print the size and content of the same object. When only printing the contents of one object, Git uses forward indexing to locate the target object in the package file. But if you want to print the size of an object in the package file , Git not only needs to locate the target object, but also locate the object that follows it, and then subtract the two positions to get how much space the target object occupies. In order to find the position of the first byte of an adjacent object, Git needs to use a reverse index.
Comparing the two, it can be found that the size of the printed object is 62 times slower than the content of the entire object . You can try it with hyperfine :
$ git rev-parse HEAD >tip
$ hyperfine --warmup=3/
'CAT-File --batch Git <Tip'/
'Git-CAT-Check File --batch = "% (of objectsize: Disk)" <Tip'
duplicated code
In version 2.31, Git can finally serialize the reverse index into a new disk file format. The file extension of this format is
An insightful reader may wonder why Git has to spend a lot of time using inverted indexing. After all, if you can already print out the content of the object, then printing its size will certainly not be difficult to calculate how many keystrokes were hit while printing the content. However, this also depends on the size of the object. If the object is very large, calculating how many bytes it has in total is more expensive than simply subtracting.
In addition to the aforementioned kind of human experimentation, inverted indexes are also very useful in other places. For example, when passing objects in the process of Fetch or Push, the inverted index is used to send the object bytes directly from the disk. . Calculating the inverted index in advance can make this process run faster.
Git does not generate by default
[ Source code , source code ]
Tidbits
-
In the previous article, we have already mentioned
commit-graphFile. This is a very useful information sequence that contains common information about submissions, such as who is whose parent submission node, who is whose root node, and so on. (If you want to go into more details, the series of articles here provide a very detailed explanation). The submission record map also stores the generation serial number information of each submission , which helps to speed up the various submission walk (Walk) process. Git 2.31 uses a new generation serial number, which can further improve performance in certain scenarios. This part of the code was contributed by Abhishek Kumar , a student in the Google Summer of Code .[ Source code ]
-
In recent versions of Git, with the help of
Configuration items , it is easier to change the default name of the main branch in the new warehouse. Git has always tried to check out remote warehouses in the pastHEADThe branch pointed to (for example: if the default branch of the remote end is "foo", then executegit cloneWhen, Git will try tofooBranch checkout to local), but this does not work for empty warehouses. In Git 2.31, this operation also applies to empty warehouses. Now, if you clone a newly created warehouse locally and then start writing the first piece of code, then the copied version in your local will follow the default branch name of the remote warehouse, even if the remote has no commit records.[ Source code ]
-
Speaking of renaming, Git 2.30 also makes it easier to change another default name: the name of the first remote branch of the repository. When you clone a repository, the first initial remote branch is always called "origin". Before Git 2.30, if you want to modify, you can only run
git remote rename origin <newname>. Git 2.30 will let you choose whether to configure a custom name by default, instead of always using "origin". You can try the settings yourselfclone.defaultRemoteNameConfiguration item.[ Source code ]
-
When a warehouse becomes larger and larger, it will be difficult to determine which branches are the main branches. In Git 2.31,
git rev-listGot one--disk-usageOption, calculating the size of the object is easier and faster than using the original tool.rev-listThe example part of the manual shows us some use cases (in the timing part of the source link below, you can see the "traditional" way of this operation).[ Source code ]
-
You may have used
-G<regex>Option to find modified specific code characters (for example:git log -G'foo\('Can find those that involvefoo()Changes to function calls, whether they are added, deleted, or modified) are submitted. But you may also want to ignore changes that match a particular pattern. Git 2.30 introduced-I<regex>, It allows you to ignore those code changes that match specific regular expressions. such as,git log -p -I'//'Will omit only modified comments (including//Part).[ Source code ]
-
In order to pave the way for Merge backend, the rename detection mechanism has also been significantly optimized. For more details, please refer to the code author s article Optimizing git's merge machinery, #1 and Optimizing git's merge machinery, #2 .
The above is just a glimpse of the latest updates. If you want to know more about the update, you can read the 2.30 , 2.31 or earlier release notes in the Git repository .
If you find there is a translation error or other areas for improvement, welcome to Denver translation program to be modified and translations PR, also obtained the corresponding bonus points. The beginning of the article Permalink article is the MarkDown the links in this article on GitHub.
Nuggets Translation Project is a high-quality translation of technical articles Internet community, Source for the Nuggets English Share article on. Content covering Android , iOS , front-end , back-end , block chain , product , design , artificial intelligence field, etc., you want to see more high-quality translations, please continue to focus Nuggets translation program , the official micro-blog , we know almost columns .