Overview
An introduction to the Freestyle Git service for managing source code in AI projects, followed by a comparison of alternative source control methods and their relative strengths.
Overview
Freestyle Git Service is a hosted Git platform designed specifically for multi-tenant applications, enabling seamless management of repositories across numerous users and organizations without the burden of maintaining your own Git infrastructure. It provides an API for programmatically creating and managing repositories, controlling access through identity-based permissions tailored for diverse roles like CI/CD pipelines or team members, and setting up event-driven automation triggers. The service further enhances workflows with features for CI/CD integration, direct application deployments from repositories, Git objects access for inspection, bidirectional synchronization with GitHub repositories, and compatibility with the GitHub Files API.
Thinking about Source Control
In multi-tenant applications, where you're managing codebases for numerous users and organizations, maintaining a centralized source of truth for code becomes critical to handle version tracking, collaboration, and secure access. A well-designed system delivers reliable version history, seamless collaboration, and robust security, while avoiding issues like data loss, access breaches, or workflow inefficiencies. Key considerations when choosing a system include:
- Multi-tenant support for segregated repositories
- Fine-grained permissions and identity management
- Data accessibility for efficient retrieval, sharing, and manipulation of code across different users and environments
- API integration for programmatic control and CI/CD support for automated workflows
- "Time travel" capabilities to navigate backward and forward through code history
- Debuggability and observability for troubleshooting and monitoring changes
- Support for branching and forking to facilitate parallel development and experimentation
- Ease of integration with existing tools and services
To help you choose, here are a range of source control techniques commonly applied, highlighting their benefits and shortcomings.
1. The VM as the Source of Truth
A common naive approach is to use a virtual machine (VM) or container as the source of truth for your code. In this model, your users/AI develop directly on the VM, and it serves as the central repository for your customers' code.
This is a simple and straightforward approach from an implementation perspective, but it has several significant drawbacks:
- Lack of Version Control: You lose the ability to track changes, revert to previous versions, and collaborate effectively. If something goes wrong, you have no history to fall back on.
- Lack of Automation: Developing directly on a VM often leads to missed opportunities for automation, as you may not integrate with CI/CD pipelines effectively.
- Security Risks: Sensitive information may be stored directly on the VM, making it harder to manage access and permissions securely.
- Lack of Portability: If you need to move your code to a different environment or share it with others, you have to manually copy files, which is error-prone and cumbersome.
- No Backup: If the VM fails or is lost, you risk losing all your code and data without any backup.
- Difficulty in Testing: Testing changes becomes more challenging, as you may not have a clear way to isolate and test specific features or bug fixes.
- Complex and Expensive APIs: Building APIs becomes inherently complex, slow, brittle, and expensive since the VM must be awakened for every request, requiring custom scripts for basic operations and creating significant latency and resource overhead.
- Hard to Debug: Debugging issues becomes difficult, as you may not have a clear view of the code's history or an easy way to access the code outside of the VM.
1b. Git on the VM as the Source of Truth
Using Git on the VM as the source of truth is a step up from the previous approach, as it introduces version control. However, it still suffers from many of the same drawbacks.
While you can track changes and collaborate better, you still face issues with scalability, portability, and backup. The VM must still be accessed for every Git operation, which can lead to performance bottlenecks and increased runtime cost.
2. S3 + Database as the Source of Truth
A more advanced approach is to use a combination of S3 (or similar object storage) and a database as the source of truth. In this model, your customers' code is stored in S3, and you use a database to track metadata, versions, and changes.
When the VM needs to access the code, it retrieves it from S3, and any changes are written back to S3.
This approach has several advantages compared to using the VM state as the source of truth:
- Version Control: You can track changes, revert to previous versions, and collaborate more effectively.
- Security: Storing your code in S3 and using a database enhances security by managing access and permissions more effectively.
- Portability: Moving or sharing your code is simplified as S3 handles file storage independently.
- Durability: Utilizing S3 ensures that your code is highly available and can be backed up and restored in case of data loss.
- API Simplicity: Building APIs becomes simpler, as you can directly access code files in S3 and metadata from the database without needing to wake up a VM for every request.
However, this approach has drawbacks of its own:
- Data Transfer Overhead: Full files/directories need to be regularly read from and written to S3, which can be slow and costly due to data transfer and storage costs.
- Complexity: Managing the interaction between S3, the database, and the VM can introduce complexity, especially when handling concurrent updates.
- Manual versioning: You need to implement your own versioning system, which can be error-prone and requires additional development effort.
3. Git as the Source of Truth
Using a hosted Git API service (like GitHub or Freestyle Git) as the source of truth is a popular and effective approach. In this model, your customers' code is stored in Git repositories, and you use Git's built-in version control features to manage changes. The VM maintains a local clone of the repository, which it syncs with the remote repository when changes are made.
Git (hosted) as the source of truth has several advantages:
- Easy Integration: Manage Git repositories via API and SDKs, without managing the underlying infrastructure. Faster development cycles and easier integration with existing tools.
- Optimized Data Transfer: Git uses efficient data transfer mechanisms, such as delta encoding and compression, to minimize the amount of data transferred during operations.
- Stateless Operations: Git operations are stateless, meaning you can perform actions like cloning, pushing, and pulling without needing to maintain a persistent connection to a VM.
3a. GitHub
GitHub is a popular choice for hosting Git repositories, but it has some limitations for AI app builders:
- Lack of Ownership: You do not own the infrastructure or the data, which can lead to vendor lock-in and potential data loss if GitHub changes its policies or services.
- Cost/Licensing: You are dependent on GitHub's infrastructure and policies, which may not align with your needs.
- Complex API: GitHub Apps require managing multi-step authentication flows, handling token expiration, and coordinating installations across different organizations, adding significant complexity for programmatic repository management.
- Limited Customization: You may have restricted control over the repository setup and features.
3b. Freestyle Git
Freestyle provides a comprehensive Git API that enables you to manage Git repositories, control access permissions, and set up automation triggers.
Freestyle's Git API offers unique advantages tailored for AI app builders. Key features include:
- Natively Multi-Tenant: Freestyle's Git API is designed for multi-tenant applications, allowing you to manage repositories for multiple users and organizations seamlessly.
- Robust Identity Management: Freestyle provides built-in identity management, allowing you to create and manage identities for different purposes (e.g., CI/CD, team members) with fine-grained access control.
- Seamless Integration: Freestyle's triggers system facilitates easy collaboration with CI/CD systems and external services.
- GitHub Sync: Built-in synchronization with GitHub, including app/auth management, allowing you to maintain synchronized code across both platforms while leveraging Freestyle's infrastructure.
All Together
Put together, depending on your needs and the scale of your application, you can choose from various source control methods. Below is a comparison table summarizing the strengths and weaknesses of each approach:
Feature | VM | Git on VM | S3 + DB | GitHub | Freestyle Git |
---|---|---|---|---|---|
Version Control | ~ | ||||
Time Travel | ~ | ||||
Branching/Forking | |||||
Fine-Grained Permissions | ~ | ||||
Identity Management | ~ | ||||
Data Accessibility | ~ | ||||
API Integration | ~ | ~ | |||
CI/CD Support | ~ | ~ | |||
Debuggability | ~ | ~ | ~ | ||
Security | |||||
Portability | ~ | ||||
Backup/Durability | |||||
Multi-Tenant Support | ~ | ||||
Data Ownership | |||||
Cost-Effectiveness | ~ | ~ | ~ | ||
Customization | ~ | ~ | |||
GitHub Sync | ~ | N/A | |||
Automation Triggers | ~ |
We've built this API specifically for multi tenant apps, ensuring that it meets the unique needs of managing codebases on behalf of users and organizations. It provides a powerful and flexible solution for source control, enabling you to focus on building your AI applications without worrying about the underlying infrastructure.
If you're interested in trying it, you should read the Using Git Guide