Lead Architect Network Infrastructure

💰 $35k

Job description

Company Description

Our brand Deutsche Telekom IT Solutions Slovakia entered the life of Košice region in 2006 under the name of T-Systems Slovakia and ever since has been inextricably linked with the region when became one of the founding members of Košice IT Valley. We have managed to grow from scratch to the second largest employer in the eastern part of the country with more than 3900 employees. Our goal is to proactively find new ways to improve and continuously transform into the type of company providing innovative information and communication technology services.

Job Description

NVIDIA and Deutsche Telekom are jointly developing the world’s first industrial AI cloud for European manufacturers. This AI factory in Germany will host 10,000 GPUs across NVIDIA DGX B200 systems and RTX Pro Servers. Deutsche Telekom provides secure, sovereign and fast infrastructure, including data centers, operations, security, and AI solutions.

Role Overview:

We are seeking a Lead Architect for Network Infrastructure at Industrial AI Cloud to design, build, automate network platform for automation and operation related network components such as Switches, Firewalls, Routers, Border Gateways as part of core environment of the Industrial AI Cloud. In this role you will design, provision and manage above mentioned stack, implement and fine-tune monitoring, and deploy additional components if necessary. You’ll be working and coordinating between multiple teams (such as Infrastructure, Platform) to deliver and continuously improve infrastructure services following ITIL processes.

Lead Architect Considers and defines design to enable automated configuration management, release management, build, test and deployment activities. This is a customer facing role/ tailor made solutions and implementations for the customer including consultancy. Proprietary technologies used for managing above scope: InfiniBand, Cumullus OS, RoCE, UFM,  FortiGate friewalls, Cisco Border gateways.

WHAT WILL YOU DO?

  • Coordinate Operations together with Data Center, IaaS & PaaS layer: Coordinate and support network lifecycle activities (installs, upgrades, changes, firmware updates) and manage /network interconnections and related documentation
  • Switch & Firewall Management: Provision and maintain InfiniBand switches according to ITIL Standards
  • Automation: Develop and maintain automation scripts to orchestrate overall scope. Fine tuning, configuration changes through whole project lifetime
  • OS & Firmware Management: Maintain network-based environments, apply patches, and manage firmware upgrades at scale.
  • Monitoring & Observability:
  • ITIL Processes: Follow and improve incident, problem, and change management workflows; document runbooks and standard operating procedures. Adhere to ZERO Outage guidelines.
  • Cross-Team Collaboration: Work closely with Platform Engineers and AI solution teams to ensure smooth deployments and operations.
  • Manage High-Speed Fabric: A unified network fabric utilizing both InfiniBand and Ethernet / RoCE technologies.
  • Management Network: A separate 1 Gbps Ethernet  and serial console for out-of-band (OOB) network management.
  • PE/CE datacenter connectivity: CE routers, firewalls Design, develop, test, implement and support ICT components and applications in order to deliver quality standard product portfolio on AI Factory Cloud platform.
  • Build and develop concepts, processes and methods for automation, optimization, and standardization to satisfy efficiency and automation requirements.
  • Provide advice or information at request or at own initiative to all relevant employees or customers regarding technical aspects of products.
  • Provide project deliverables to fulfil the project scope.
  • Consult and implement new innovative technologies to satisfy innovation strategy.
  • Provide overall solutions and principles in planning, developing, and implementing new products to satisfy business requirements.
  • Design, develop, and implement architecture of services based on AI Factory Cloud platform requirements.
  • Mentor and train co-workers to spread knowledge level and develop their skills.
  • Act as key technical lead and solve and coordinate activities across related technologies/outside own team.
  • Provide consulting services to project teams on areas of expertise.
  • Research and development in assigned technology, determine business requirements, propose changes and develop implementation plans.

Qualifications

YOU WILL SUCCEED IF YOU:

  • Hold a Master’s degree in Information Technologies.

  • Bring hands-on experience in network installation, maintenance, and operations.

  • Deeply understand InfiniBand architecture, RDMA over Converged Ethernet (RoCE), and low-latency high-throughput networking for AI/HPC workloads.

  • Have experience with NVIDIA/Mellanox switch configuration and UFM (Unified Fabric Manager) management.

  • Have solid knowledge of Data Center Routing & Border Gateway Protocols - understanding of Cisco or Juniper routers (e.g., CR-8608, PTX 10004) and BGP/OSPF routing. Knowledge of ASNs, IP Transit, peering, and failover connectivity

  • Demonstrate Linux Networking skills (Cumulus OS / Ubuntu / Debian), command-line networking skills on Linux-based systems, especially Cumulus Linux and you have experience with configuring bridges, bonds, VLANs, and routing tables.

  • Have experience using tools such as iperf, ETHTool, nvidia-smi for network devices, perfquery, and Mellanox/NVIDIA diagnostics.

  • Are skilled in continuous monitoring, incident detection, and root-cause analysis for large-scale data center networks.

  • Are familiar with NOC/SOC operational procedures and on-call rotation models.

  • Possess experience in Firewall & Security Management includes: Proficiency in FortiGate firewall administration — policies, NAT, VPNs, IDS/IPS, and HA configuration. Understanding of security segmentation, DDoS mitigation, and zero-trust networking.

  • Have hands-on experience in Configuration & Lifecycle Management includes switch provisioning, firmware/OS upgrades, patch management, and configuration backups.

  • Possess working knowledge of ITIL processes (incident, problem, change).

  • Have 5+ years of experience in the design and delivery of systems based on IaaS, PaaS and SaaS.

  • Have 3+ years of experience building CI/CD pipelines and serverless architectures.

  • Have Industrial AI Factory Cloud platform SW stack knowledge incl. its dependencies on below layers.

  • Demonstrate knowledge of NVIDIA GPU-Accelerated server platform.

  • Have knowledge of Data Engineering, Data Transformation, Data Migration Tools.

  • Have solid knowledge of Kubernetes (or similar) container-based technologies.

  • Have experience with VMware environments (VMware Tanzu Kubernetes).

  • Write scripts in Go, Python, and/or Bash.

  • Use automation tools effectively (Ansible, SaltStack, Terraform, Helm) for deployments.

  • Work with CI/CD in Kubernetes environments and manage repositories.

  • Are expert in automation using Git (GitHub, GitLab) and CI/CD tools like GitHub Actions or GitLab CI/CD.

  • Have practical experience with Linux operating systems.

  • Understand Software-Defined Networking (SDN) principles.

  • Use monitoring and visualization tools such as Grafana and Prometheus.

  • Communicate clearly and demonstrate analytical thinking, teamwork, presentation, and negotiation skills.

  • Speak English at an advanced (C1) level; German is an advantage.

  • Apply basic project management and leadership skills.

  • Demonstrate intermediate knowledge of quality management and financial literacy.

Additional Information

WHY SHOULD YOU CHOOSE US?

We believe in balance between work and personal life. An attractive and extensive work-life balance portfolio guarantees lasting motivation for employees and thus a better quality of life, promotes physical and mental well-being and contributes to a positive work environment. All this with the aim of providing more freedom in reconciling work, career growth, private life and individual lifestyle. Therefore we offer to our employees over 25 different benefits to improve their personal and professional life in these areas:

  • Financial benefits
  • Benefits with focus on learning and development
  • Benefits with focus on health and sport
  • Benefits with focus on family and work – life balance
  • Other benefits

For more information about our benefits click to Benefits

Salary

Final salary is negotiable.

We are offering base salary depending on seniority level and previous experience of candidate. In addition to base salary we provide variable part and other financial benefits. Base salary will not be lower than 2 500 € /brutto.

Additional information

\* Please be informed that our remote working possibility is only available within Slovakia due to European taxation regulation.

Share this job:
Please let Deutsche Telekom IT Solutions Slovakia know you found this job on Remote First Jobs 🙏

Similar Remote Jobs

Find Remote Jobs

Connect with top companies hiring for remote jobs, work-from-home roles, and 100% online jobs worldwide.

Discover Hidden Jobs

Unique jobs you won't find on other job boards.

Advanced Filters

Filter by category, benefits, seniority, and more.

Priority Job Alerts

Get timely alerts for new job openings every day.

Manage Your Job Hunt

Save jobs you like and keep a simple list of your applications.

Apply