Talking about some pits when using ArrayList and LinkedList

foreword

We generally know ArrayListthe LinkedListintuitive difference between and:

1. The underlying implementation of [ArrayList] is an array, and ArrayList implements RandomAccess, which means that it can quickly access stored elements at random, through subscript indexaccess , but we need to use get()the form of the method, the [array] supports random access, the query speed is fast, and the addition and deletion of elements is slow;

2. The LinkedListbottom layer uses a doubly [linked list] data structure (circular linked list before JDK1.6, JDK1.7 cancels the cycle), LinkedListno RandomAccessinterface , the linked list supports sequential access, the query speed is slow, and the addition and deletion of elements is fast;

ArrayList steps on the pit

List< String > temp = new ArrayList();
 //Get a collection data 
List< String > all = getData();
 for ( String str:all){
   temp.add(str);
}

This code does not seem to be a problem, indeed, in general, there is no problem; however, when the getData()amount of returned data is large, the following temp.add(str)problems will become prominent.

ArrayList fills the pit

Everyone knows that ArrayList is implemented by an array. According to the following table, it is efficient to randomly access the elements of the array, and it is efficient to add elements to the end of the array. However, it is inefficient to delete data in the array and add data to the middle of the array, because the array needs to be moved. For example, the worst case is to delete the first array element, you need to move the 2nd to nth array elements forward by one.

add(int index,E element) writes data to the specified location.
Here is an add(E e)example :

ArrayList temp = new ArrayList<>( 2 ); //Define the length of the array as 2 // 
Add 3 data to the array 
temp.add ( "1" );
temp.add("2");
temp.add("3");

When we initialize an ArrayList with a length of 2 and write three pieces of data into it, the ArrayList has to be expanded, that is, copy the previous data to a new array with a length of 3.

The reason it is 3 is because the new length = original length * 1.5

At the same time, the length of the data is limited; the array needs to be expanded at the right time.
From the source code, we can know that the default length of ArrayList is 10.
But it is not DEFAULT_CAPACITY=10an .
Instead, it will expand to 10 when addthe first data is entered.

We know that the default length is 10, which means that once it is written to the ninth element, it will expand to 10*1.5=15. This step is to copy the array, that is, to re-open a new memory space to store the 15 arrays.

Once we write frequently and in large numbers, it will cause many array copies, which is extremely inefficient.

But if we predict in advance how many pieces of data may be written, we can avoid this problem in advance.

For example, if we write 1000W pieces of data into it, there is a huge gap in performance between the given array length and the default length of 10 during initialization.

Here is a simple test:

@Warmup ( iterations = 5 , TIME = 1 , timeUnit = TimeUnit . SECONDS )
  @Measurement ( iterations = 5 , TIME = 1 , timeUnit = TimeUnit . SECONDS ) 
  public class CollectionsTest { 

  private static final int TEN_MILLION = 10000000 ;

  @Benchmark
  @BenchmarkMode ( MODE . AverageTime )
  @OutputTimeUnit ( TimeUnit . MICROSECONDS ) 
  public void arrayList () {

  List<String> array = new ArrayList <>();

  for( int i = 0 ;i < TEN_MILLION ;i ++) { 
       array . add("123");
     } 
 }

  @Benchmark
  @BenchmarkMode ( MODE . AverageTime )
  @OutputTimeUnit ( TimeUnit . MICROSECONDS ) 
  public void arrayListSize () {
  List < String > array = NEW ArrayList <>( TEN_MILLION );

    for( int i = 0 ;i < TEN_MILLION ;i ++) { 
            array . add("123");
     }
 } 

  public static void main (String [] args ) throws RunnerException { 
     Options opt = new OptionsBuilder() 
        . include (CollectionsTest.class.getSimpleName()) 
        . forks ( 1 ) 
        . build ();

 new Runner ( opt ).run ();
    } 
}

According to the results, it can be seen that the efficiency of the preset length will be much higher than that of the default (Score here refers to the time it takes to execute the function).

Therefore, it is suggested here that you must initialize the specified length when a large amount of data is known to be written .ArrayList

》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》

Another is to be careful add(int index,E element)to write data to the specified location.
From the source code, we can see that each write will indexmove the data back again, in fact, the essence is to copy the array;

But it is different from writing data to the end of the array in the conventional way, it will copy the array every time, which is extremely inefficient.

LinkedList

LinkedListIt is composed of a linked list, and each node has two nodes, the head and the tail, which refer to the front and rear nodes respectively; therefore, it is also a doubly linked list.
In theory it is very efficient to write, and there will be ArrayListno array copying that is extremely inefficient, just moving the pointer each time.

[The writing efficiency of LinkedList] is higher than that of ArrayList, so we think that it is very suitable for LinkedList when writing is greater than reading.

@Benchmark
  @BenchmarkMode (MODE.AverageTime)
  @OutputTimeUnit (TimeUnit.MICROSECONDS) 
  public void linkedList () {
  List < String > array = NEW LinkedList <>();

  for( INT i = 0 ;i < TEN_MILLION ;i ++) { 
      array.add ("123");
        } 
    }

Here is a test to see whether the conclusion is consistent; the same is also true for LinkedListwriting 1000W data, and the ArrayListefficiency of initializing the length of the array is obviously higher from the results LinkedList.

But the premise here is to preset ArrayListthe avoid the expansion of the array. This kind ArrayListof writing efficiency is very high, and although LinkedListthe memory does not need to be copied, it needs to create objects, transform pointers and other operations.

Needless to say, the query ArrayListcan support subscript random access, which is very efficient.

LinkedListSince the bottom layer is not an array, subscript access is not supported, but it is necessary to judge whether to traverse from the beginning or the end according to indexthe .
But either way, you need to move the pointer to traverse one by one, especially indexnear the middle position will be very slow.

Summarize

Efficiency issues:

ArrayList is a linear table (array)
get() directly reads the first few subscripts, complexity O(1);
add(E) adds elements, and adds them directly behind, complexity O(1);
add(index,E ) Add elements, insert them after the first few elements, the latter elements need to be moved backward, the complexity is O(n);
remove() deletes the elements, the latter elements need to be moved one by one, and the complexity is O(n);

LinkedList is the operation of the linked list
get() is to get the first element, traverse it in turn, the complexity is O(n);
add(E) is added to the end, and the complexity is O(1);
add(index,E) is added to the first After a few elements, you need to find the first element first, the direct pointer points to the operation, the complexity is O(n);
remove() deletes the element, the direct pointer points to the operation, the complexity is O(1);

Therefore,
for random access get and set, ArrayList is better than LinkedList, because LinkedList has to move the pointer.
For the add and remove operations add and remove, LinedList has the advantage because ArrayList moves data.

At the same time, when we

1.ArrayList If the size of the data can be predicted in advance when using again , be sure to specify its length when it is larger.

2. Avoid using add(index,e)api as much as possible, which will lead to duplication of arrays and reduce efficiency.

3. In addition, the Mapcontainer HashMapis also recommended to initialize the length to avoid expansion.

Leave a Comment

Your email address will not be published. Required fields are marked *