====== EGPU ======
{{ :tamiwiki:projects:pasted:20230618-183833.png}}
we are using the [[https://egpu.io/best-egpu-buyers-guide/|TH3P4G3 eGPU external thunderbolt]] thing.\\
Linux Kernal notes > https://docs.kernel.org/admin-guide/thunderbolt.html\\
[[https://realtechtalk.com/Nvidia_Tesla_GPUs_K40K80M40P40P100V100_at_homedesktop_hacking_cooling_powering_cable_solutions_Tutorial_AIO_Solutions-2465-articles|realtechtalk guide]], [[https://archive.is/Kgj7E|mirror]]
=== ThunderBolt check and setup ===
TLDR
- upgrade kernel (??)
- install gfx (nvidia|amd) drivers
- plug card
- reboot
- trust thunderbolt
The authorized attribute reads 0 which means no PCIe tunnels are created yet. The user can authorize the device by simply entering:
# echo 1 > /sys/bus/thunderbolt/devices/0-1/authorized
This will create the PCIe tunnels and the device is now connected.
==== upgrade kernel ====
from mainline, (the ubuntu dist-upgrade is too conservative (5.19))
cd /tmp
rm -i *deb
wget -c https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-headers-6.3.7-060307-generic_6.3.7-060307.202306090936_amd64.deb
wget -c https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-headers-6.3.7-060307_6.3.7-060307.202306090936_all.deb
wget -c https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-image-unsigned-6.3.7-060307-generic_6.3.7-060307.202306090936_amd64.deb
wget -c https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.3.7/amd64/linux-modules-6.3.7-060307-generic_6.3.7-060307.202306090936_amd64.deb
sudo dpkg -i *.deb
==== trust ====
hmm, you need to connect before boot. \\
now permissions
$ sudo dmesg
dprobe" pid=563 comm="apparmor_parser"
[ 7.888207] audit: type=1400 audit(1686781044.331:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=563 comm="apparmor_parser"
authorized the tamala!
(base) user@eight:~$ echo 1 | sudo tee /sys/bus/thunderbolt/devices/0-1/authorized
==== 1080Ti ====
{{ :tamiwiki:projects:pxl_20230616_212235630.jpg?400|}}
looks legit
$sudo dmesg -w
[96236.873213] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[96236.874544] nvidia 0000:09:00.0: enabling device (0006 -> 0007)
[96236.874646] nvidia 0000:09:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[96236.991272] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 535.43.02 Mon May 22 20:46:13 UTC 2023
[96237.009537] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 535.43.02 Mon May 22 20:25:24 UTC 2023
[96237.013346] [drm] [nvidia-drm] [GPU ID 0x00000900] Loading driver
[96238.239429] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:09:00.0 on minor 1
[96238.269008] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[96238.330257] nvidia-uvm: Loaded the UVM driver, major device number 507.
[96238.399348] NVRM: API mismatch: the client has the version 390.157, but
NVRM: this kernel module has the version 535.43.02. Please
NVRM: make sure that this kernel module and all NVIDIA driver
update the driver to fit
$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:1c.4/0000:04:00.0/0000:05:01.0/0000:07:00.0/0000:08:01.0/0000:09:00.0 ==
modalias : pci:v000010DEd00001B06sv00001458sd0000377Abc03sc00i00
vendor : NVIDIA Corporation
model : GP102 [GeForce GTX 1080 Ti]
manual_install: True
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-510 - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-525 - distro non-free
driver : nvidia-driver-535 - third-party non-free recommended
driver : nvidia-driver-515 - distro non-free
driver : nvidia-driver-515-server - distro non-free
driver : nvidia-driver-530 - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
$ sudo ubuntu-drivers autoinstall
==== P40 ====
this doesnt work on our test machine
{{ :tamiwiki:projects:pxl_20230615_095016755.jpg?400|}}
the P40 needs modern motherboard that allow for ''Enable Above 4G memory'' bios see [[https://github.com/JingShing/How-to-use-tesla-p40#bios-settings|link]], see [[tamiwiki:projects:P40a|P40]] page for info on dedicated machine.
NVIDIA Tesla P40 24GB DDR5 GPU Accelerator Card Dual PCI-E 3.0 x16\\
need to retrofit with a FAN,it doesnt come with one
got one on ebay for 200$(+shipping) ([[https://archive.md/SL4Kq|ebay mirror]])\\
some dude got it working, https://github.com/JingShing/How-to-use-tesla-p40
=== SPECIFICATIONS: ===
* GPU Architecture: NVIDIA Pascal
* Single-Precision Performance 12 TeraFLOPS*
* Integer Operations (INT8) 47 TOPS* (TeraOperations per Second)
* GPU Memory 24 GB
* Memory Bandwidth 346 GB/s
* System Interface PCI Express 3.0 x16
* Form Factor 4.4” H x 10.5” L, Dual Slot, Full Height
* Max Power 250 W
* Enhanced Programmability with Page Migration Engine Yes
* ECC Protection Yes
* Server-Optimized for Data Center Deployment Yes
* Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engine />
* NVPN: 699-2G610-0200-100
* NVIDIA® CUDA® cores: 3840
installing
sudo apt install nvidia-headless-535
there is some issue, unlike other cards the blue led doesnt turn green on thunderbolt connection.\\
no power passing to the gPU.\\
:(
==== misc ====
lspci -v | grep -A 2 -E "(VGA comp|3D)"
00:02.0 VGA compatible controller: Intel Corporation Iris Pro Graphics 580 (rev 09) (prog-if 00 [VGA controller])
DeviceName: CPU
Subsystem: Intel Corporation Iris Pro Graphics 580
--
09:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation GF106GL [Quadro 2000]
Flags: bus master, fast devsel, latency 0
power from 12v dc plug (150W?)\\
https://www.reddit.com/r/eGPU/comments/ukqto9/comment/ige1rwv
https://egpu.io/forums/thunderbolt-linux-setup/