Deep Learning Week1 Notes
1. Tensors
\(\text{A tensor is a generalized matrix:}\)
\(\text{an element of }\mathbb{R^3} \text{ is a 3-dimension vector, but it's a 1-dimension tensor.}\)
\(\large \text{The 'dimension' of a tensor is the number of indices.}\)
2. PyTorch operation
@ \(\text{ corresponds to matrix/vector or matrix/matrix multiplication.}\)
* \(\text{ is component-wise product.}\)
lstsq:\(\text{ least square: }mq = y\)
y = torch.randn(3) y tensor([ 1.3663, -0.5444, -1.7488]) m = torch.randn(3,3) q = torch.linalg.lstsq(m,y).solution m@q tensor([ 1.3663, -0.5444, -1.7488]) 3. Data Sharing
a = torch.full((2, 3), 1) a tensor([[1, 1, 1], [1, 1, 1]]) b = a.view(-1) b tensor([1, 1, 1, 1, 1, 1]) a[1, 1] = 2 a tensor([[1, 1, 1], [1, 2, 1]]) b tensor([1, 1, 1, 1, 2, 1]) b[0] = 9 a tensor([[9, 1, 1], [1, 2, 1]]) b tensor([9, 1, 1, 1, 2, 1]) \(\large \text{Note: many operations returns a new tensor which shares the same underlying storage as the original tensor, so changing the values of one will change the other as well:}\) view, transpose,
squeeze, unsqueeze, expand, permute.
4. Einstein summation convention
torch.einsum:
\(\text{Matrix Multiplication}\)
p = torch.rand(2, 5) q = torch.rand(5, 4) torch.einsum('ij,jk->ik', p, q) tensor([[2.0833, 1.1046, 1.5220, 0.4405], [2.1338, 1.2601, 1.4226, 0.8641]]) p@q tensor([[2.0833, 1.1046, 1.5220, 0.4405], [2.1338, 1.2601, 1.4226, 0.8641]]) \(\text{Matrix-Vector product:}\)
w = torch.einsum('ij,j->i', m, v) \(\text{Component-wise Product:}\)
m = torch.einsum('ij,ij->ij', p, q) \(\text{Trace:}\)
v = torch.einsum('ii->i', m) 5. Storage
x = torch.zeros(2, 4) x.storage() 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [torch.FloatStorage of size 8] q = x.storage() q[4] = 1.0 x tensor([[ 0., 0., 0., 0.], [ 1., 0., 0., 0.]]) \(\large \text{The main idea of functions like }\)view, narrow, transpose, \(\large\text{ etc. and of operations involving broadcasting is to never replicate data in memory, but to “play” with the offsets and strides of the underlying storage.}\)
\(\text{Therefore:}\)
x = torch.empty(100, 100) x.t().view(-1) Traceback (most recent call last): File <stdin>, line 1, in <module> RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view() x.t() \(\text{ shares the storage with }\)x, \(\text{ cannot flatten to 1d}\). \(\text{We can use the function }\)reshape().