Deep Learning Week1 Notes

1. Tensors

\(\text{A tensor is a generalized matrix:}\)

\(\text{an element of }\mathbb{R^3} \text{ is a 3-dimension vector, but it's a 1-dimension tensor.}\)

\(\large \text{The 'dimension' of a tensor is the number of indices.}\)

2. PyTorch operation

@ \(\text{ corresponds to matrix/vector or matrix/matrix multiplication.}\)

* \(\text{ is component-wise product.}\)

lstsq:\(\text{ least square: }mq = y\)

y = torch.randn(3) y tensor([ 1.3663, -0.5444, -1.7488]) m = torch.randn(3,3) q = torch.linalg.lstsq(m,y).solution m@q tensor([ 1.3663, -0.5444, -1.7488]) 

3. Data Sharing

 a = torch.full((2, 3), 1)  a tensor([[1, 1, 1], [1, 1, 1]])  b = a.view(-1)  b tensor([1, 1, 1, 1, 1, 1])  a[1, 1] = 2  a tensor([[1, 1, 1], [1, 2, 1]])  b tensor([1, 1, 1, 1, 2, 1])  b[0] = 9  a tensor([[9, 1, 1], [1, 2, 1]])  b tensor([9, 1, 1, 1, 2, 1]) 

\(\large \text{Note: many operations returns a new tensor which shares the same underlying storage as the original tensor, so changing the values of one will change the other as well:}\) view, transpose,
squeeze, unsqueeze, expand, permute.

4. Einstein summation convention

torch.einsum:
\(\text{Matrix Multiplication}\)

 p = torch.rand(2, 5)  q = torch.rand(5, 4)  torch.einsum('ij,jk->ik', p, q) tensor([[2.0833, 1.1046, 1.5220, 0.4405], [2.1338, 1.2601, 1.4226, 0.8641]])  p@q tensor([[2.0833, 1.1046, 1.5220, 0.4405], [2.1338, 1.2601, 1.4226, 0.8641]]) 

\(\text{Matrix-Vector product:}\)

w = torch.einsum('ij,j->i', m, v) 

\(\text{Component-wise Product:}\)

m = torch.einsum('ij,ij->ij', p, q) 

\(\text{Trace:}\)

v = torch.einsum('ii->i', m) 

5. Storage

 x = torch.zeros(2, 4)  x.storage() 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [torch.FloatStorage of size 8]  q = x.storage()  q[4] = 1.0  x tensor([[ 0., 0., 0., 0.], [ 1., 0., 0., 0.]]) 

\(\large \text{The main idea of functions like }\)view, narrow, transpose, \(\large\text{ etc. and of operations involving broadcasting is to never replicate data in memory, but to “play” with the offsets and strides of the underlying storage.}\)
\(\text{Therefore:}\)

 x = torch.empty(100, 100)  x.t().view(-1) Traceback (most recent call last): File <stdin>, line 1, in <module> RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view() 

x.t() \(\text{ shares the storage with }\)x, \(\text{ cannot flatten to 1d}\). \(\text{We can use the function }\)reshape().