Site Reliability Engineer (TS/SCI + CI Poly)
Company: The Darkstar Group
Location: Herndon
Posted on: February 1, 2025
Job Description:
DescriptionThe DarkStar Group is seeking a Site Reliability
Engineer with a TS/SCI + CI Poly clearance to join one of our top
projects in Herndon, VA. Below is an overview of the project, as
well as information on our company and our benefits.THE PROJECTThe
DarkStar Group's team solves unique and challenging intelligence
problems for a Special Operations customer. This work is as close
to the mission as a technologist can get, so the environment is
fast-paced: team members face rapidly-changing requirements and
priorities as mission needs evolve. If you hate monotony and want
to use your skills to have a direct impact on real-world
operational success, this is the project for you.We are a
multi-faceted software development and systems administration team
working to build and maintain software applications backed by a
self-managed cloud infrastructure (OpenStack) with a true big-data
footprint (over 10 petabytes). Our diverse background of experience
in mission support and software development serves as a catalyst to
solve unique and challenging intelligence problems in support of
special operations analysts and their ongoing activities.
Prototyping and frequent, iterative feedback are core to our
delivery approach, anchored by a need to work quickly in support of
our missions.The technical stack is quite robust and includes Java,
Python, C#, C/C++, Geospatial tools, Big Data and Graph Products
(Hadoop, MapReduce, Spark, ElasticSearch, Neo4j), Linux, OpenStack,
AWS, Ansible, SQL/NoSQL, Text Processing, Cloud Services,
Containerization, Infrastructure as Code (IAC), and more.Work on
this program takes place in the Herndon, VA area (we cannot support
remote work) and requires a TS clearance and a willingness to
obtain a CI Poly: a current TS/SCI + CI Poly is preferred.THE
ROLEThe DarkStar Group is seeking a Site Reliability Engineer (RSE)
for our OpenShift PaaS organization. You will be responsible for
ensuring the availability, performance, and scalability of our
OpenShift environments. You will collaborate with development,
operations, and product teams to automate processes, build robust
monitoring systems, and enhance the overall reliability of our
platforms.Key Responsibilities:
- System Reliability & Scalability: Design, implement, and
maintain highly available OpenShift clusters to support
mission-critical applications.
- Automation & Infrastructure as Code (IaC): Develop and maintain
automation scripts and tools to streamline deployment, scaling, and
recovery processes using tools like Ansible, Terraform, and
Helm.
- Monitoring & Incident Management: Build and enhance monitoring
and alerting systems (e.g., Prometheus, Grafana, ELK). Respond to
and resolve incidents, conducting post-mortem analyses to identify
root causes.
- Performance Optimization: Analyze and optimize system
performance, ensuring minimal latency and maximum throughput.
- Collaboration: Work closely with development teams to implement
DevOps best practices, CI/CD pipelines, and platform
enhancements.
- Security & Compliance: Ensure platforms meet security and
compliance requirements by integrating tools for vulnerability
scanning, policy enforcement, and logging.Required Skills:
- Bachelor's degree in Computer Science, Engineering, or
equivalent experience.
- Minimum 5+ years of experience as an SRE, DevOps Engineer, or
related role.
- Expertise in OpenShift or Kubernetes platform
administration.
- Strong knowledge of Linux systems, networking, and
containerization technologies (Docker).
- Proficiency in scripting languages such as Python, Bash, or
Go.
- Experience with CI/CD pipelines (e.g., Jenkins, GitLab
CI/CD).
- Familiarity with monitoring and logging tools like Prometheus,
Grafana, ELK, or Splunk.Desired Skills (Optional):
- OpenShift certification (e.g., Red Hat Certified Specialist in
OpenShift Administration).
- Experience with cloud platforms (AWS, Azure, or GCP).
- Knowledge of service mesh technologies (Istio, Linkerd).
- Strong understanding of microservices and distributed systems
architecture.About The DarkStar GroupOur CompanyThe DarkStar Group
is a small business that solves BIG problems. We're one of the Inc.
5000 fastest-growing private companies in the US, and our engineers
and scientists support the most critical national security missions
in Virginia, Maryland, and elsewhere. Data Science, Software
Engineering, Cloud/AWS Infrastructure, and Cyber/CNO are our core
areas of expertise. We offer interesting and important work, job
security, some of the best and most flexible benefits you'll find
in the IC, and salaries so strong that they'll likely surprise
you.Our BenefitsThe DarkStar Group offers exceptional compensation
and benefits:
- very strong salaries;
- 100% company-paid medical, dental, and vision premiums for you
and all dependents;
- the ability to get increased salary if you don't need
medical/dental/vision;
- 100% company-paid disability and life insurance benefits;
- a generously-funded HSA;
- an 8% 401(k) contribution;
- 31 days of PTO/holidays to start (more with tenure);
- the ability to flex time across pay periods without using your
PTO;
- a generous training budget;
- $25,000 employee referral bonuses;
- business development / growth incentives; and
- top notch company swag.** We have a huge growth opportunity, so
we are offering up to a $25,000 reward for anyone new you refer
whom we hire. **All qualified applicants will receive consideration
for employment without regard to race, color, religion, sex, sexual
orientation, gender identity, national origin, disability, or
status as a protected veteran.
#J-18808-Ljbffr
Keywords: The Darkstar Group, Towson , Site Reliability Engineer (TS/SCI + CI Poly), Engineering , Herndon, Maryland
Didn't find what you're looking for? Search again!
Loading more jobs...