Principal Site Reliability Engineer
薪資範圍:1,800,000 ~ 2,400,000 TWD / year
Job Overview
We are seeking a self-driven Principal Site Reliability Engineer with a strong technical background and excellent communication skills. This individual will lead the development, construction, and management of reliable and distributed systems that support our business operations.
In this role, you will play a vital part in supporting the following businesses:
- IXT: An insurance core system solution for APAC insurance markets.
- OneDegree HK: A user-friendly digital insurance platform for individuals and businesses in Hong Kong.
- Cymetrics: A cybersecurity platform designed specifically for small and medium enterprises in the APAC region.
Responsibilities
- Implement and enhance system reliability, availability, scalability, performance, and efficiency by leveraging monitoring, alerting, and automation tools on public cloud platforms.
- Participate in capacity planning, analyze software performance, and fine-tune systems to ensure optimal operation.
- Develop and enhance GitLab CI/CD process and toolset to streamline software delivery and deployment.
- Define and monitor key metrics to assess and enhance system reliability.
- Collaborate closely with the engineering team to improve reliability and operational efficiency at every software development life cycle (SDLC) stage.
- Troubleshoot, optimize infrastructure and automate repetitive tasks to increase efficiency and effectiveness
- Develop departmental annual OKRs.
- Optimize cloud cost.
- 8+ years technical experience in software engineering, network engineering, or systems administration
- 6+ years of experience running large scale cloud services.
- 2+ years of SRE team leadership role.
- Fluent in English at a business level or higher.
- Capable of planning infrastructure upgrades and optimizations.
- Skilled in budget planning and ensuring cloud expenses remain within the allocated budget.
- Skilled in OKR planning and ensuring the key results meet with company objectives.
- Advanced knowledge of monitoring solutions like Prometheus, Grafana, ELK (Elasticsearch, Logstash, Kibana).
- Experience in the complete software development life cycle (SDLC).
- In-depth understanding of network concepts, particularly with a focus on security.
- Hands-on experience implementing GitLab CI/CD processes.
- Proficiency in automation platforms like Ansible and Terraform.
- Knowledge of orchestration tools like Kubernetes.
- Familiarity with container technologies like Docker.
- Experience with Git source code version control systems.
- Experience with AI pair programming like OpenAI.
- Proficiency in programming languages such as Bash, Python, or Go.
- Team player and good interpersonal skills.
公司地址:
台北市信義區四段460號7樓其他:
HR phone interview1st Interview: 1.5~2 hours, 1~1.5 hour meet with hiring team + 0.5 hours with HR2nd Interview: 0.5~1 hour, meet with CTO-2025-04-01