常用集合类数据结构使用小结

文章目录

1. 集合的分类

集合、容器应该是每个工程师打交道高频的几个包中的top3，尤其是在不同语言间进行切换时时长会把集合类用错，为了避免这种尴尬，想了想还是把不同语言下的集合对应关系罗列一下。

集合的分类

列表

列表应该是使用频率很高的一部分，所以对这部分的集合分类标准也有很多。
如，按照可动态变长和不能动态变长分为：

array (Java) / tuple (Python)
ArrayList (Java) / List (Python)

还有一种分法是按照List使用特点区分：

访问快，插入、删除慢 – ArrayList
插入、删除快，访问慢 – LinkedList

ArrayList

或者换一个角度说，使用场景是 访问操作多，而添加、删除的修改操作少。
具体来说：

操作	复杂度
访问元素	$O(1)$
添加、删除元素	$O(n)$

具体的不同的语言有：

语言	类、模块	用法
Java	ArrayList	ArrayList list = new ArrayList();
Python	list	list = []

LinkedList

使用场景 添加、删除的修改操作多，而访问操作少
具体来说：

操作	复杂度
访问元素	$O(n)$
添加、删除元素	$O(1)$

具体到不同的语言有：

语言	类、模块	用法
Java	LinkedList	LinkedList list = new LinkedList();
Python	deque	import collections list = deque()

栈

众所周知的抽象数据结构，特点：后入先出（LIFO）
具体来说：

操作	复杂度
push	$O(1)$
pop	$O(1)$
peek	$O(1)$
isEmpty	$O(1)$

具体到不同的语言有：

语言	类、模块	用法
Java	Stack	Stack list = new Stack();
Python	list	list = []
这里Python将list用做了栈，对应的：
push操作是使用`append()`方法
pop操作是使用`pop()`方法
peek可以使用 `return list[-1]`
isEmpty可以判断对象是否为None或者长度为0

队列

与栈对立的抽象数据结构，特点：先入先出（FIFO）
具体来说：

操作	复杂度
add	$O(1)$
poll	$O(1)$
peek	$O(1)$
isEmpty	$O(1)$

具体到不同的语言有：

语言	类、模块	用法
Java	LinkedList	LinkedList list = new LinkedList();
Python	deque	import collections list = deque()

这里Python将list用成了队列，特别的这里的list是双端队列deque，对应的操作是：
add –> list.append()
poll –> list.popleft()
peek –> list[1]
isEmpty可以判断对象是否为None或者长度为0

优先队列

牛逼闪闪的集合类，按照估价函数自定义先出。
由于优先队列有多种实现方式，我们这里讨论最常见的堆实现。
具体来说：

操作	复杂度
add	$O(log^n)$
poll	$O(log^n)$
peek	$O(1)$

具体到不同的语言有：

语言	类、模块	用法
Java	PriorityQueue	PriorityQueue pqueue = new PriorityQueue();
Python	PriorityQueue	import collections pqueue = PriorityQueue()

Python下使用collections模块下的 PriorityQueue 来实现优先队列，当然也可以使用heapq来实现。
Python使用PriorityQueue来实现优先队列的话：
add –> pqueue.put()
poll –> pqueue.get()

具体的实现参见这里

集合（set）

众所周知的抽象数据结构，key型容器，原生set不允许重复的key。
按照操作的复杂度和是否需要保证顺序将set区分为

hashset

基本操作（查找、添加、删除元素）性能超群 $O(1)$
具体的的

操作	复杂度
add	$O(1)$
remove	$O(1)$
contains	$O(1)$

具体到不同的语言有：

语言	类、模块	用法
Java	HashSet	HashSet set = new HashSet();
Python	set	set = set()

我去，性能牛逼完了，你会问：这么牛逼的set还要treeset干啥？
你这么说treeset会生气的，你让hashset按插入的顺序输出下试试？

是的，如果要支持顺序的话还是要看treeset

treeset

虽然在性能上有所妥协，但是支持Order
具体说来，性能是这样的：

操作	复杂度
add	$O(log^n)$
remove	$O(log^n)$
contains	$O(log^n)$

性能上从$O(1)$到$log^n$还是很大的退步的，有没有更好的选择呢？
其实有linkedHashSet

linkedHashSet

拥有$O(1)$的操作效率，而且支持顺序。

具体到不同的语言有：

语言	类、模块	用法
Java	linkedHashSet	LinkedHashSet set = new LinkedHashSet();
Python	OrderedDict	import collections set = OrderedDict()

这里Python使用了K-V类型的Dict作为set的内部实现，其实想想，set就是一种特殊的Dict，去看看JDK的HashSet和LinkedHashSet就知道了。
附Sun JDK HashSet部分代码：

private transient HashMap<E,Object> map;

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

/**
 * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
 * default initial capacity (16) and load factor (0.75).
 */
public HashSet() {
    map = new HashMap<>();
}
/**
 * Returns <tt>true</tt> if this set contains the specified element.
 * More formally, returns <tt>true</tt> if and only if this set
 * contains an element <tt>e</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;e==null&nbsp;:&nbsp;o.equals(e))</tt>.
 *
 * @param o element whose presence in this set is to be tested
 * @return <tt>true</tt> if this set contains the specified element
 */
public boolean contains(Object o) {
    return map.containsKey(o);
}

/**
 * Adds the specified element to this set if it is not already present.
 * More formally, adds the specified element <tt>e</tt> to this set if
 * this set contains no element <tt>e2</tt> such that
 * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
 * If this set already contains the element, the call leaves the set
 * unchanged and returns <tt>false</tt>.
 *
 * @param e element to be added to this set
 * @return <tt>true</tt> if this set did not already contain the specified
 * element
 */
public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

/**
 * Removes the specified element from this set if it is present.
 * More formally, removes an element <tt>e</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;e==null&nbsp;:&nbsp;o.equals(e))</tt>,
 * if this set contains such an element.  Returns <tt>true</tt> if
 * this set contained the element (or equivalently, if this set
 * changed as a result of the call).  (This set will not contain the
 * element once the call returns.)
 *
 * @param o object to be removed from this set, if present
 * @return <tt>true</tt> if the set contained the specified element
 */
public boolean remove(Object o) {
    return map.remove(o)==PRESENT;
}

字典（map）

Key-Value型容器，原生字典类型不支持重复的key。
因为map是set的更一般情况，所以自然地
类似Set，有下面这三种map：

HashMap
TreeMap
LinkedHashMap

对应的操作性能参照set系列
Python下的对照可以这样参照：

HashMap –> dict ({'key':'value'})
TreeMap, LinkedHashMap –> OrderedDict()