Worried about getting an interviewer? Get a thorough understanding of this MySQL interview exercise to help you win offers from major manufacturers

Worried about getting an interviewer? Get a thorough understanding of this MySQL interview exercise to help you win offers from major manufacturers

Hello, today I will share with you the Mysql interview exercises for children's shoes, please take out a small book and write them down

3.database paradigms

The first normal form (1NF) has no duplicate columns

The so-called first normal form (1NF) means that each column of the database table is an inseparable basic data item. There cannot be multiple values in the same column, that is, an attribute in an entity cannot have multiple values or duplicate attributes . If there are duplicate attributes, it may be necessary to define a new entity. The new entity consists of duplicate attributes, and there is a one-to-many relationship between the new entity and the original entity. In the first normal form (1NF), each row of the table contains only one instance of information. In short, the first normal form is a column without repetition.

The second normal form: the attribute is completely dependent on the primary key [eliminate part of the functional dependence]

Assume that the course selection relationship table is SelectCourse (student ID, name, age, course name, grade, credits), and the keywords are combined keywords (student ID, course name), because there are the following determining relationships:

(Student number, course name) (name, age, grades, credits)

This database table does not meet the second normal form, because there are the following determining relationships:

(Course name) (Credits)

(Student number) (name, age)

The third normal form: attributes do not depend on other non-primary attributes [eliminate transitive dependence]

To meet the third normal form (3NF), you must first meet the second normal form (2NF). The third normal form (3NF) requires that a database table does not contain non-primary key information that is already contained in other tables.

What is a transaction? The characteristics of a transaction?

A transaction is a logical unit composed of a set of SQL statements. It is a set of operations that meet the ACID characteristics. You can submit a transaction through Commit, or you can use Rollback to roll back. The transaction has the following four attributes

  • A-Atomicity (Atomicity) transaction is an atomic operation unit, the modification of the data, or all of them are executed, or all of them are not executed.
  • C-Consistent (Consistent) Before the transaction starts and completes, the data must be in a consistent state (consistency here refers to the migration of the system from a correct state to another correct state)
  • I-Isolation (Isolation) The database system provides a certain isolation mechanism to ensure that transactions are executed in an "independent" environment that is not affected by external concurrent operations. Isolation is when multiple users access the database concurrently, such as when operating the same table. The transaction opened by the database for each user cannot be disturbed by the operation of other transactions, and multiple concurrent transactions must be isolated from each other
  • D-Durable (Durable) After the transaction is completed, its modification to the data is permanent, and it can be maintained even if there is a system failure

supplement

These characteristics are not equal

Only when consistency is met, the execution result of the transaction is correct

In the absence of concurrency, transactions are executed serially, and isolation must be satisfied. At this time, as long as atomicity is satisfied, consistency must be satisfied.

In the case of concurrency, multiple transactions are executed concurrently, and the transaction not only needs to meet the atomicity, but also needs to meet the isolation to meet the consistency

The transaction meets the durability in order to be able to cope with the situation of the database crash (log system)

What are the concurrency consistency issues?

Update lost

Both transactions T1 and T2 modify a piece of data. T1 is modified first, and T2 is modified later. The modification of T2 overwrites the modification of T1.

Solution: pessimistic locking (low concurrency) optimistic locking (high concurrency)

Dirty read

Transaction B reads the data modified by transaction A but has not yet been committed, and also performs operations on the basis of this data.At this time, if transaction A rolls back Rollback, the data read by B becomes invalid and does not meet the consistency requirements

Non-repeatable

In a transaction, the same data is read multiple times. Before the transaction is over, another transaction also accesses the same data. Then, between the two readings of the first transaction, due to the second transaction Modify, then the data read by the first transaction may be different, so it happens that the data received twice in a transaction is different

T2 reads a data, and T1 modifies the data. If T2 reads the data again, the result read at this time is different from the result read the first time

Solution If the data can be read only after the modification transaction is fully committed, this problem can be avoided and the isolation level of the database can be adjusted to REPEATABLE_READ

Phantom reading

Transaction A reads the new data submitted by transaction B, which does not meet isolation

Solution: If no other transaction can add new data before the operation transaction completes data processing, you can avoid this problem and adjust the transaction isolation level of the database to SERIALIZABLE_READ

What is the isolation level of the transaction?

  • Read Uncommitted
  • Read Committed
  • Repeatable Read
  • Serializable

How is transaction durability achieved?

redolog binlog

Redo log is unique to InnoDB engine, binlog is implemented by MySQL Server layer, all engines are available

Redo log is a physical log, which records "XXX modification made on page XXX"; binlog is a logical log, such as "add 1 to the c field of the line id = 2"

The redo log has a fixed size, so its space will be used up. If it is used up, it must be written to disk before it can continue; binlog can be written additionally, that is, binlog has no concept of space. Just keep writing

Are you stupid and unclear about redo log and binlog?

What are the Mysql locks? Granularity? The relationship between isolation level and locks?

lock

1. Shared lock

S lock, read lock (lock in share mode)

The locked data can be read by other transactions, but cannot be modified or deleted

2. Exclusive lock

X lock, write lock

The locked data cannot be read or written by other transactions

If the locked row has no index, it is a table lock

Innodb row lock is to lock the index item on the index

3. Intent shared lock

IS When adding a shared lock to a row, the database will automatically add an intentional shared lock to the table

4. Intent exclusive lock

IX When an exclusive lock is added to a row, the database will automatically add an intention exclusive lock to the table

Intentional locks are operated automatically by the system. When other transactions try to perform full table operations (locking), you can first ask whether there are intentional locks indicating that certain rows in the table are locked by row locks to avoid full table scans.

5. Self-increasing lock

Special table-level lock for Insert operation

About adjacent key lock, gap lock, record lock

Gap locks and next-key locks are unique to the RR isolation level

1) Record lock: The default row lock of mysql is next-key lock. When a unique index equivalent query matches a record, it degenerates into a record lock.

2) Gap lock: MySQL's default row lock is next-key lock. When the index query does not match any records, it degenerates into a gap lock. eg: There are records with id=1 and id=4, select * from t where id = 3 for update; select * from tb_temp where id> 1 and id <4 for update; will lock the interval (1,4)

3) Next-Key lock: lock the record itself, but also lock the gap between the records. eg: select * from tb_temp where id> 2 and id <= 7 for update; will lock (2, 7), (7, ~)

When no record is matched, it degenerates into a gap lock.

4) Range query: Part of the record is hit, and next-key lock is used. eg: select * from tb_temp where id> 2 and id <= 7 for update; Will lock (2, 7], (7, ~) [The locked interval will contain the critical key interval to the right of the last record]

Gap lock example

1) Gap lock (for auxiliary index): Take (key column, auxiliary index column) as the gap point, and lock the data area between the two gap points.

Next-key lock: Contains record locks and gap locks, that is, locks a range and locks the record itself. InnoDB's default locking method is next-key lock.

2) InnoDB row locks are implemented by locking the index items on the index. Only when data is retrieved through index conditions, InnoDB uses row-level locks, otherwise, InnoDB uses table locks.

Any locks on auxiliary indexes, or locks on non-indexed columns, must eventually be traced back to the primary key, and a lock must be added to the primary key.

3) The purpose of the gap lock is to prevent phantom reads, prevent new data from being inserted in the gap, and prevent existing data from being updated to the data in the gap.

6. Pro key lock

7. Gap lock

granularity

Locks can be divided into row locks, table locks, record locks, page locks, and library locks (rare) according to their granularity.

No index read-write lock------>table lock

Self-increasing lock, intention lock >table lock

Indexed read-write lock ----> row lock

Record lock: A type of row lock. The range of record lock is a record in the table. Record lock means that only a record of the table is locked after the transaction is locked.

The exact condition hits, and the hit condition is a unique index

Page lock: Page-level lock is a lock in Mysql whose locking granularity is between row-level locks and table-level locks. Table-level locks are fast, but there are many conflicts, and row-level locks are less conflicted, but slow.

Features: The unlocking and locking time is between table locks and row locks: deadlocks will occur, the lock granularity is between table locks and row locks, and the degree of concurrency is general

Gap lock: range lock

Read Uncommitted

Dirty reads, non-repeatable reads, and phantom reads exist

Read Committed

Solve dirty reads through write locks

When writing data, other transactions need to query this record to be blocked, and can only be viewed after writing and submitting

Will lead to repeated reading and phantom reading

Repeatable Read

Solve repeated reads through long-term read locks and long-term write locks

Read lock: the data I read, you can't change it, this will solve the repeated reading

Solve the phantom reading through the proximity key lock

Proximity key lock is mainly for insert insert operation

Serializable

What guarantee does ACID rely on?

A atomicity is guaranteed by the undo log log, which records the log information that needs to be rolled back. When the transaction is rolled back, the SQL that has been successfully executed is undone.

C consistency is guaranteed by the other three features, and the program code must ensure business consistency

I isolation is guaranteed by MVCC

D durability is guaranteed by memory + redo log. MySQL modifies data and records this operation in memory and redo log at the same time. It can be restored from redo log when it is down

InnoDB redo log writes the disk, and the InnoDB transaction enters the prepare state. If the previous prepare is successful, the binlog writes the disk, and then continues to persist the transaction log to the binlog. If the persistence is successful, then the InnoDB transaction enters the commit state (and then a commit record in the redo log )

Does MVCC understand?

Multi-version concurrency control: When reading data, the data is saved in the same snapshot-like way, so that the read lock and write lock will not conflict, and different transaction sessions will see their own specific version, version chain

The submission of any modification to the database will not directly overwrite the previous data, but generate a new version to coexist with the old version, so that it can be read without locking at all, so MVCC mainly solves the performance problem of concurrent reading

MVCC only works under the two isolation levels of READ COMMITTED and REPEATABLE READ. The other two isolation levels are incompatible with MVCC, because READ UNCOMMITTED always reads the latest data row, not the data row that conforms to the current transaction version, while SERIALIZABLE Will lock all read rows

There are two necessary hidden columns in the clustered index record

trx_id: used to store the transaction id each time a clustered index record is modified

roll_pointer: Every time a clustered index record is modified, the old version will be written to the undo log. This roll_pointer is to store a pointer that points to the position of the previous version of the clustered index, and through it To get the record information of the previous version, (note that the undo log of the insert operation does not have this attribute, because it does not have the old version)

What is the difference between delete truncate and drop?

Drop directly delete the entire table structure, and then want to record data, you need to rebuild the table

truncate: Clear the data of the table, and release the space, the table structure is still there, clear the index, and cannot be rolled back

delete: delete the specified data in the table, do not release space, do not clear the index, you can roll back

How to locate inefficient SQL?

Slow query log

In the business system, in addition to the query using the primary key, the other will test its time-consuming on the test library, and the statistics of slow queries are mainly done by operation and maintenance.

The optimization of slow queries must first understand the reason for the slowness? Is the query condition not hitting the index? Is the unneeded data column loaded? Or is the amount of data too large?

First analyze the statement to see if additional data is loaded. It may be that redundant rows are queried and discarded. It may be that many columns that are not needed in the results are loaded, and the statement is analyzed and rewritten.

Analyze the execution plan of the statement, and then obtain its use of the index, and then modify the statement or modify the index, so that the statement can hit the index as much as possible

If the optimization of the statement is no longer possible, you can consider whether the amount of data in the table is too large, and if it is, you can divide the table horizontally or vertically.

Horizontal sub-table and vertical sub-table

Vertical sub-table

The content that could have been in the same table is artificially divided into multiple tables

For a blog system, the article title, author, classification, creation time, etc., have a slow change frequency, a large number of queries, and it is best to have good real-time data. We call it cold data. The number of blog views, the number of replies, etc., similar statistical information, or other data with a relatively high frequency of change, we call it active data. Therefore, when designing the database structure, you should consider sub-tables, first of all, the processing of vertical sub-tables.

Horizontal sub-table

The user table user is divided into user1, and user2 performs special processing on id

The difference between InnoDB and MyISAB

  1. InnoDB supports transactions, but MyISAM does not. Each SQL language in InnoDB is encapsulated as a transaction by default and automatically submitted, which will affect the speed, so it is best to put multiple SQL languages between begin and commit to form a transaction;
  2. InnoDB supports foreign keys, but MyISAM does not. Converting an InnoDB table containing foreign keys to MYISAM will fail;
  3. InnoDB is a clustered index. The data file is tied to the index, and it must have a primary key. The efficiency of the primary key index is very high. However, the secondary index requires two queries, the primary key is first queried, and then the data is queried through the primary key. Therefore, the primary key should not be too large,
  4. Because the primary key is too large, other indexes will also be large. MyISAM is a non-clustered index, the data file is separated, and the index saves a pointer to the data file. The primary key index and secondary index are independent. InnoDB does not save the specific number of rows in the table, and a full table scan is required when executing select count(*) from table. MyISAM uses a variable to save the number of rows in the entire table, and only needs to read the variable when executing the above statement, which is very fast;
  5. Innodb does not support full-text indexing, while MyISAM supports full-text indexing, and MyISAM has higher query efficiency;

Basic principles of indexing

Index is a data structure used by storage engine to quickly find records.

Indexes are used to quickly find records with specific values. If there is no index, generally the entire table is traversed when executing a query.

The principle of indexing is to turn unordered data into ordered queries

  1. Sort the contents of the indexed column
  2. Generate an inverted list of sort results
  3. Put the data address chain on the content of the inverted table
  4. When querying, first get the contents of the inverted table, and then take out the data address chain, so as to get the specific data

Why is Hash not commonly used in Mysql indexes?

From the perspective of memory, the indexes in the database are generally stored on disk. The use of hash indexes requires the construction of a hash table in memory, and when the amount of data in the table is large, it may not be possible to load all index column data into memory at once ; The size of each super node of the B+ tree can be set to the size of a data page, and each query only loads a small part of the data pages that meet the conditions, instead of loading the entire index data into the memory.

From the business scenario, if you only need to query one piece of data based on specific conditions, the hash is indeed faster, but in actual business, multiple pieces of data and query data within a certain condition range are often queried. At this time, the B+ tree index is ordered and If there is a linked list connected, you can find the first matching one, and then take out the data that meets the query conditions along the linked list at one time; while the hash cannot do such a range condition query, because it is disordered and can only be traversed one by one. match.

Why don't Mysql indexes use red-black trees?

The query time of the tree is related to the height of the tree. The B+ tree is a multi-path search tree that can reduce the height of the tree and improve the search efficiency. In addition, the smallest unit for the operating system to read and write to the hard disk is a block. The size of a block is generally 4KB, which means that it will read at least 4KB at a time; the red-black tree is a binary tree with only two nodes at each level, and some nodes need to be loaded. Random IO operations on multiple disks are very inefficient.

The difference between Mysql clustered and non-clustered index

Are all B+ tree data structures

Clustered index: Put the data storage and index together, and organize them in a certain order. When you find the index, you will find the data. The physical storage order of the data is the same as the index order, that is: as long as the indexes are adjacent , Then the corresponding data must also be stored adjacently on the disk.

Non-clustered index: Leaf nodes do not store data, but store the address of the data row, that is to say, find the location of the data row according to the index and then go to the disk to find the data. This is a bit similar to the directory of a tree

Advantage

Query can directly obtain data through clustered index, which is more efficient than non-clustered index requiring a second query (in the case of non-covering index)

Clustered index is very efficient for range queries, because its data is arranged according to size

Clustered index is suitable for sorting occasions, non-clustered index is not suitable

Disadvantage

Maintaining indexes is expensive, especially when new rows are inserted or the primary key is updated to cause paging. It is recommended to choose a time period with a lower load after a large number of new rows are inserted, and optimize the table through optimize table, because the row data that must be moved may be Cause fragmentation, using exclusive tables can weaken fragmentation

If the table uses UUID (random ID) as the primary key, the data storage will be very sparse, which will cause the clustered index to be slower than the full table scan, so it is recommended to use Int's auto_increment as the primary key

If the primary key is relatively large, the secondary index will become larger, because the leaf of the secondary index stores the primary key value, and a too long primary key value will cause non-leaf nodes to occupy more physical space.

There must be a primary key in InnoDB, and the primary key must be a clustered index. If it is not manually set, a unique index will be used. Without a unique index, the hidden id of a row in the database will be used as the primary key index. On top of the clustered index The created index is called an auxiliary index. Auxiliary index access data always requires a second lookup. Non-clustered indexes are auxiliary indexes, such as composite indexes, prefix indexes, unique indexes, and auxiliary index leaf nodes are no longer stored in rows. The physical location, but the primary key value.

MyISM uses a non-clustered index. There is no clustered index. The two B+ trees of the non-clustered index look no different. The structure of the nodes is exactly the same, but the content is different. The nodes of the primary key index B+ tree store the primary key. The auxiliary index B+ tree stores auxiliary keys, and the table data is stored in a separate place. The leaf nodes of the two B+ trees use an address to point to the real table data. For table data, there is no difference between the two keys. The index tree is independent, and retrieval through the index key does not require access to the index tree of the primary key

If it involves sorting large amounts of data, full table scans, count and other operations, MyISAM still has the advantage, because the index occupies a small space, these operations need to be completed in memory

The data structure of Mysql index, their advantages and disadvantages

B+ tree-----balanced polytree

Principles of Index Design

Queries faster and take up less space

How to look at the mysql execution plan? (explain)

explain select * from A whrer X=? and Y=?

The sequence number of the select query, including a set of numbers, indicating the order in which the select clause or operation table is executed in the query

The id is the same, the execution order is from top to bottom

The id is different. If it is a subquery, the id's serial number will increase. The larger the id value, the higher the priority, and the earlier it is executed

Same id but different, exist at the same time

Each number of the id number represents an independent query. The fewer the number of queries for a sql, the better.

selectType: The type of query, which is mainly used to distinguish complex queries such as ordinary queries, joint queries, and sub-queries

SIMPLE: simple select query, the query does not contain subqueries or UNION

PRIMARY: If the query contains any complex sub-parts, the outermost query is marked as Primary

DERIVED: Subqueries included in the FROM list are marked as DERIVED (derivative)

MySQL will execute these subqueries recursively, placing the results in a temporary table.

SUBQUERY: Subqueries are included in the SELECT or WHERE list

DEPENDENT SUBQUERY: A subquery is included in the SELECT or WHERE list, and the subquery is based on the outer layer

UNCACHEABLE SUBQUREY

UNION: If the second SELECT appears after UNION, it is marked as UNION;

If UNION is included in the subquery of the FROM clause, the outer SELECT will be marked as: DERIVED

UNION RESULT: SELECT to get the result from the UNION table

table: Shows which table the data of this row is about

partitions: represents the hit situation in the partition table, non-partition table, this item is null

type: an important field for optimizing sql, which is also an important indicator for us to judge sql performance and optimization degree

possible_keys: Shows the indexes that may be applied to this table, one or more.

If there is an index on the field involved in the query, the index will be listed, but it may not be actually used by the query

key: The index actually used. If it is NULL, the index is not used

key_len: indicates the number of bytes used in the index, and the length of the index used in the query can be calculated through this column. The key_len field can help you check whether the index is fully utilized

ref: Shows which column of the index is used, if possible, a constant. Which columns or constants are used to find the value on the indexed column

The rows: rows column shows the number of rows that MySQL thinks it must check when executing a query.

filtered: This field indicates the percentage of the number of records that satisfy the query after the data returned by the storage engine is filtered at the server level. Note that it is a percentage, not a specific number of records.

Extra: Contains extra information that is not suitable for display in other columns but is very important

Scenarios where the index fails

  1. When the where statement contains or, it may cause the index to fail (using or does not necessarily make the index invalid, you need to see whether the query columns on the left and right sides of the or hit the same index).
  2. The index column in the where statement uses a negative query, which may cause the index to fail. Negative queries include: NOT, !=, <>, !<, !>, NOT IN, NOT LIKE, etc.
  3. The index field can be null. When is null or is not null is used, the index may become invalid
  4. Using built-in functions on the index column will definitely cause the index to fail
  5. Performing operations on the index column will definitely cause the index to fail
  6. like wildcards may cause index failure
  7. In a joint index, the index column in where violates the leftmost matching principle, which will definitely cause the index to fail
  8. The ultimate choice of MySQL optimizer, do not take the index

mysql master-slave synchronization principle

The process of mysql master-slave synchronization

There are three main threads in the master-slave replication of Mysql: master (binlog dump thread), slave (I/O thread, sql thread), one thread in the master and two threads in the slave.

The master node binlog, the basis of master-slave replication is that the master database records all changes in the database to binlog. Binlog is a file that saves all changes to the database structure or content from the moment the database server starts.

The main node log dump thread, when the binlog changes, the log dump thread reads its content and sends it to the node.

Receive the binlog content from the node I/O thread and write it to the relay log

Read the content of the relay log file from the SQL thread of the node to replay the data update, and ultimately ensure the consistency of the master-slave database

Note: The master-slave node uses the binlog file + position offset to locate the position of the master-slave synchronization. The slave node will save the offset it has received. If the slave node sends a crash and restarts, it will automatically initiate from the position. Synchronize

Since the default replication method of mysql is asynchronous, the master library does not care whether the slave library has processed the log after sending the log to the slave library. After becoming the main library, the log is lost, resulting in two concepts

Fully synchronous replication

After the main library writes to the binlog, the log is forced to synchronize the log to the slave library, and all the slave libraries are executed before returning to the client, but it is obvious that the performance will be severely affected in this way.

Semi-synchronous replication

Unlike full synchronization, the logic of semi-synchronous replication is such that after the slave library writes the log successfully, an ACK is returned to confirm the master library, and the master library receives at least one confirmation from the slave library and considers the write operation to be completed.

Briefly describe mysql index types and their impact on database performance

Ordinary index: allow the indexed data column to contain duplicate values

Unique index: the uniqueness of data records can be guaranteed

Primary key: It is a special index. Only one primary key index can be defined in a table. The primary key is used to uniquely identify a record. Use the keyword PRIMARY KEY to create

Joint index: Index can cover multiple data columns, such as Index (columnA, columnB) index

Full-text indexing: By establishing an inverted index, retrieval efficiency can be greatly improved, and the problem of determining whether a field is included is a key technology of current search engines.

Indexes can greatly improve data query data.

By using the index, you can use the optimized hider during the query process to improve the performance of the system

But it will slow down the speed of inserting, deleting, and updating the table, because when performing these write operations, the index file must be manipulated.

Indexes need to occupy physical space. In addition to the data space occupied by the data table, each index also occupies a certain physical space. If a clustered index is to be established, the space required will be larger. If there are many non-clustered indexes, once The clustered index changes, then all non-clustered indexes will change accordingly

Alright, that's it for today's article, I hope it can help you who are confused in front of the screen