PowerShell Journeyman - Generic Collections

Learn about some generic collections that can make your life much easier

Introduction

This is the first post in my PowerShell Journeyman series. In traditional trades, the Journeyman (often called Jman) is someone who has experience and can get stuff done effectively. They will forsee problems and adequately protect assets (especially coworkers) to keep things running smoothly.

My expectation is that before you can be a “PowerShell Journeyman”, you should have experience with many aspects that can help your projects get done quickly and effectively. This post will cover some of the generic collections that are available in PowerShell (and .net) that you should be familiar with.

Overview

Here is a quick list and description of each of the classes that you will read about today:

  1. Queue - First-in/First-out collection
  2. Stack - First-in/Last-out collection
  3. List - The best common PowerShell array there is
  4. Hashset - A keyed dictionary without the lookup/value portion

Keep on reading to learn more about each of these collection types!

Commands

Each of these come from the [System.Collections.Generic] library. You can read about everything in that namespace easily on the Microsoft documentation site.

These classes are each coded to a specific type during creation. This means that anything you put into these arrays will get cast to that destination type when inserted. If it cannot be cast, then the insertion will fail and error.

Because these arrays enforce a specific type, they are often pronounced as “of T”. For example, [List<T>] is spoken “List of T”, meaning, list with a specific type. Similarly, you will learn about [Queue<T>], [Stack<T>], etc.

Any type that you have in your PowerShell session can be used as the type when creating these arrays. In this post, you will only read about using the type [string], but remember that you can put anything for the type, including [object] or [psobject].

Queue

The Queue collection provides an easy interface for adding or removing one item at a time in a normally ordered array. Adding is called via Enqueue() and removing is called via Dequeue(). Sometimes it is easiest to see examples. Go run this yourself and play around:

$Queue = [System.Collections.Generic.Queue[string]]@()
$Queue.Enqueue("First")
$Queue.Enqueue("Second")
$Queue.Enqueue("Third")
$Queue

You can see that the queue does stay ordered. Now, remove an entry and Peek at what is next up in the queue after your dequeue. Notice that when you dequeue an item, it is returned in the pipeline and removed from the queue.

$Queue.Dequeue() # First
$Queue.Peek() # Second

Since you only did a “peek”, you didn’t actually remove the next item from the array. Run another dequeue and see that you get “second” again.

$Queue.Dequeue() # Second

Hopefully that’s all making sense. If you need to clear the queue, you can use the handy method called .Clear()

$Queue.Clear()

Stack

The Queue collection provides an easy interface for adding or removing one item at a time in a “most recent” ordered array. Adding is called via Push() and removing is called via Pop(). If you’ve ever used Push-Location or Pop-Location (pushd, popd), then you may already be familiar with this first-in/last-out method. Go hit an example to check it out in action:

$Stack = [System.Collections.Generic.Stack[string]]@()
$Stack.Push("First")
$Stack
$Stack.Push("Second")
$Stack
$Stack.Push("Third")
$Stack

You can see that the most recent item pushed onto the stack is at the top of the array when you display the whole thing. When you pop from the array, it takes that first item off the array and spits it out to you:

$Stack.Peek()
$Stack.Pop()

The previously-second item in the array is now the first position:

$Stack

Its good to remember that you can check the size of the stack: $Stack.Count

List

The generic list will feel quite familiar and is generally considered the best collection to use for operations due to its superior performance. There are other arrays in PowerShell like @() and [Collections.ArrayList]. PowerShell is always evolving! But one of the team’s design goals was not to break existing code where possible. As such, there are these older ways to hold data that remain in PowerShell and work fine despite having a better alternative.

The problem with @() is that as you add items to it via $array += "foo", it has to create a copy of the entire array and add in the extra data all into a new array. If you will add somewhere between 10k and 100k+ items to it via +=, it becomes noticibly slow performance. This is because it is an array of “fixed” size. Similarly, you can’t remove items from the array at all.

The problem with [Collections.ArrayList] is that it has been completely superseded. Here’s what MS says about it:

We don’t recommend that you use the ArrayList class for new development. Instead, we recommend that you use the generic List class.

Time to try out using the generic list class [List<T>]!

$List = [System.Collections.Generic.List[string]]@()
$List.Add("First")
$List.Add("Second")
$List.Add("Third")
$List.Add("Last")
$List

Pretty simple so far. Now, remove the first item who equals ‘Second’. Notice that it returns $true, indicating that it found something to remove and successfully removed it.

$List.Remove("Second")

Try removing it again. Notice that it returns $false because it couldn’t find anything to remove that matched that object.

$List.Remove("Second")

Its worth being familiar with other methods in the list like RemoveAll(), RemoveAt(), Reverse(), and IndexOf().

Hashset

This is a very fun collection to know about! Are you familiar with a a hashtable (a keyed dictionary)? If not, go run this code to explore a hashtable:

$ht = @{}
$ht.first = 1
$ht.second = 3
$ht.third = 3
$ht.second = 2
$ht

Notice that the hashtable has no apparent ordering and that setting the key “second” multiple times will update any previous instance of that key to the new value (second is 2, not 2 and 3).

Now check out maintaining a keyed list in a hashset. Notice that it returns true or false as you attempt to add each item to the array, indicating whether or not it added the item (such as if the array already includes that item):

$Hashset = [System.Collections.Generic.Hashset[string]]@()
$Hashset.Add("first")
$Hashset.Add("second")
$Hashset.Add("second")
$Hashset.Add("third")

Though it may appeared ordered on viewing the hashset (and i don’t know that I’ve ever seen it be unordered), I have it on good authority that a hashset is not guarenteed to be ordered.

Because a Hashset is keyed, it means that there can’t be duplicates in it. This is a GREAT collection to use when you want to de-duplicate a list. It can be MUCH faster than the -Unique parameter from Select-Object or Sort-Object.

WARNING: Hashsets are case sensitive. Notice that your Hashset contains “first”, but not “First”:

$Hashset.Contains("first") #true
$Hashset.Contains("First") #false

Wrapping Up

There you have it, some really nifty patterns to use when the situation calls for it. As usual my priorities when scripting are:

  1. Make it work first.
  2. Make it work fast enough.
  3. Make it clean enough to work on next time.

Discuss on reddit