GitHub Copilot
Hare are some notes about GitHub Copilot. The concepts were extracted from documentation and Udemy training.

Definition
GitHub Copilot is an AI coding assistant that helps you write code faster and with less effort, allowing you to focus more energy on problem solving and collaboration[What is][QuickStart][Immersive view]
Machine Learn
- Supervised: input label - e.g algorithm: classification, Regression
- Unsupervised: without input label - e.g. algorithm: clustering
- Reinforced: feedback - e.g. algorithm: Decision making
- Process: input text -> tokenization([create, a, file]) -> embedding generation (each token is converted into a vector embbedings) -> model processing
Characteristics
- Probabilistic: may generate different outputs for the same input
- Coding related questions
- Primary English
- Uses OpenAI’s Codex model
- Can generate source code, documentation, git ignore, commit messages, unit test
- It is available in IDE, GH Mobile, command line and Github.com (only Enterprise)
Features [1][2]
- Code suggestions
- Code Review
- Understand the context of the code
- Multi language support
- Inteligent debugging
- Code refactoring
- Security assistence
- Writing documentation
- Autocomple
- Automate the creation of projects and related directories
- Chat
- Support in the CLI
- Generate shell commands by describing intent in plain language
- Suggesting shell commands based on natural language input, helping with syntax and automating repetitive tasks
- Install GitHub CLI (gh)
- Authenticate with 'gh auth login'
- Run 'gh extension install github/gh-copilot'
- 'gh extension install github/copilot-cli' (integrates Copilot CLI into the GitHub CLI ecosystem)
- Test if is correctly installed: 'gh extension list'
- AI-generated PR summaries (only for Enterprise). It analyzes the diff of the code changes, the commit history, and comments
- Refactor, migrate a project, write tests, modernize legacy code, upgrading java projects, Choosing the right AI tool for your task
Training
- It is trained on all languages that appear in public repositories (including open-source repositories). The quality of suggestions depends on the volume and diversity of training data for each language
- Largin training Dataset in public repo > Neural Network Arch based on transformer in unsupervised leraning (learned pattern and struture without label) > use Supervised learn for during fine tuning process (learn from examples helps to understand the context and improve the accuracy) > outcome: Codex model (descendent of GptTree; based on transformer archtecture)
- GitHub Copilot’s model is trained on a static dataset that includes publicly available code and is not updated in real-time. The suggestions may be outdated because of some old or deprecated code
- Proprietary code from private repositories is explicitly excluded from GitHub Copilot’s training dataset to protect user privacy and ensure compliance with ethical standards.
How it works
- Transmits the code and its surrounding context to a large language model (e.g. Codex), which is hosted remotely (cloud-based models)
- Only the code currently working on ( and metadata), such as a few lines or function definitions, is sent to OpenAI’s servers for processing. The model generates suggestions based on the context actively edited, without needing access the entire project or any other unrelated data
- (1) The request is sent to GitHub Copilot's servers (anonymized, encrypted data), (2) forwarded to a proxy server that pre-processes the data (such as context and completion suggestions). The proxy filters user inputs, removing sensitive or personally identifiable information before sending the data to the cloud-based model, which means the suggestions are based on a sanitized version of the data, without leaking private code or sensitive data. (3) Then pass to the model. Once the model generates a suggestion, (4) it undergoes post-processing through the proxy server before being (5) sent back to your IDE
- It sends small snippets of code (a few lines around the cursor) to GitHub’s servers, where an AI model processes the data and generates relevant code completions. These snippets are temporarily processed to provide real-time suggestions
- No user-specific data is stored or logged persistently. The processing is done in memory and the data is discarded once the suggestions are generated
- Duplication:
- GitHub Copilot is designed to avoid suggesting code that matches more than 150 characters from any single block in publicly available repositories
- The duplication detector filter compares generated code suggestions against a database of popular, open-source repositories and flags suggestions if a certain similarity threshold is met.
- The duplication filter prevents Copilot from suggesting exact matches to publicly available code, ensuring that its suggestions are not direct copies of existing repositories. However, it does not completely eliminate similar code from being suggested.
- Telemetry:
- GitHub Copilot collects anonymized telemetry data, which may be shared with Microsoft and GitHub to improve services.
- Telemetry data may be collected, and snippets of user code can be logged for debugging purposes
- Telemetry collection from Copilot can offer insights into how much time is saved, how often suggestions are accepted, and even areas where Copilot may not be helpful
- The GitHub Productivity API allows teams to track how frequently developers accept Copilot’s suggestions
- Since Copilot is trained on vast datasets of publicly available code, it tends to generate widely used, conventional coding patterns. This makes it useful for common programming tasks but less innovative for highly specific, custom, or domain-specific solutions. Developers may need to modify or refine the suggestions to better fit unique use cases.
- Regularly reviewing AI-generated code for bias is a critical step in responsible AI use
- Using Copilot in private mode limits the scope of suggestions and reduces the risk of sensitive data leakage. Excluding sensitive files from Copilot’s context is a safeguard that ensures Copilot doesn’t suggest code based on sensitive internal data. This is a best practice for maintaining privacy while still using Copilot's benefits.
- GitHub Copilot generates suggestions from a model trained on publicly available code, which uses pattern recognition and natural language processing (NLP) to match your input to relevant code. It uses natural language processing (NLP) and pattern recognition techniques to identify relevant code based on the context of the user's current input, generating suggestions that match the programming style and libraries in use.
- GitHub Copilot uses a limited context window to process a portion of the code at a time. The suggestions are based on the code within this window, which means that in longer files or complex codebases, suggestions may not fully reflect the overall structure or intent of the project because parts of the code fall outside the context window.
- Pressing Ctrl+Space manually requests GitHub Copilot to generate code suggestions in supported IDEs like Visual Studio Code. This works in scenarios where Copilot does not auto-suggest completions but the user still wants assistance.
- GitHub Copilot Chat prioritizes inline comments and existing code context when generating responses
- GitHub Copilot provides multiple code suggestions when invoked via the appropriate keyboard shortcut (e.g., pressing Ctrl+Enter or Alt+] in some IDEs)
Chat
- Inputs are sent directly to OpenAI’s Codex API for processing
- You can select the AI models[1]
- Coding-related questions, explanations for code snippets, debugging help, and real-time code suggestions based on current coding environment [1]
- Prompt engineering: Start general, then get specific; give examples; Break complex tasks into simpler tasks; Avoid ambiguity; Indicate relevant code; Experiment and iterate; Keep history relevant; Follow good coding practices [1]
- Very effective for helping developers understand unfamiliar code by explaining its functionality, dependencies, and usage
- Includes built-in feedback mechanisms (rate by clicking thumbs-up or thumbs-down buttons)
- Feedback about GitHub Copilot Chat is through the in-editor feedback button, which allows users to send context-specific feedback while using Copilot
- Builds a prompt by extracting relevant portions of the currently open file, taking into account the user’s cursor position, function signatures, surrounding comments, and contextual code
- it cannot execute code directly within the chat interface
- The mobile has the chat feature, but with some limitations of quality
- Edit mode is use for more granular control: choose files to let Copilot make changes[1]
- Chat for debbuging: developer describe issues in natural language, and based on the context of the surrounding code, it provides suggestions for how to address potential bugs
- Improve GitHub response relevance and performance in large codebases limiting the number of open files or tabs in the editor
- GitHub Copilot Chat is available for both public and private repositories for individual and enterprise plans, with the necessary permissions.
Agent [1]
- Copilot can work like a developer: fix bugs, implement features, create PR, etc [1]
Subscription [1][2]
- Non-GitHub users can access GitHub Copilot through Microsoft Visual Studio and Visual Studio Code if they have an Azure subscription
- GitHub Copilot requires users to have a GitHub account for authentication and subscription purposes. However, it does not require users to host their repositories on GitHub
- Free: code completion (2k line/month), chat (50/month), block suggestion, access to Claude Sonnet and ChatGPT model
- Pro: code completion [no limit], chat[no limit], chat in GH Mobile
- Pro+: PRO + Full access to all available models in Copilot Chat; Up to 1,500 premium requests per month; Priority access to advanced AI capabilities
- Business:
- All previous + file exclusion, organization wide policy, audit logs, support for public and private repositories, manage policies at the enterprise or organization level
- Enables admin-level control, telemetry for auditing, and organization-wide enforcement of code matching filters, which scan for snippets that closely match public GitHub content—crucial for enterprise risk mitigation
- Allows the organization to configure the service to meet company-wide policies and exclude specific files from being evaluated
- Orgaization-wide policies like disable suggestions matching public code
- Designed for organizations that need data privacy and security features
- Enterprise-grade security practices, including end-to-end encryption of data in transit and at rest
- GitHub Copilot Business is designed with enterprise clients in mind, offering key security and compliance features such as SOC 2 Type 2 and GDPR compliance
- It has the ability to restrict AI-generated code suggestions based on organization policies
- Provides an IP (intellectual property) indemnity clause that covers claims against the generated code in certain scenarios
- SSO integration
- Private code generation
- Vulnerability scanning
- The "block matching public code" feature to support copyright or licensing risks. Copilot does not generate code suggestions that match public repositories
- REST API:
- Automate the management of subscriptions using GitHub’s REST API to list, add, and remove GitHub Copilot seats for users in organization
- Manage users and assign Copilot seats, it is not role-based
- Endpoint to manage subscription for a user in an organization: POST /orgs/{org}/copilot/seats
- Remove a user: the administrator must call the /orgs/{org}/copilot/billing/seats/{username} endpoint with a DELETE request.
- The admin can get the list of subscription by the endpoint GET /orgs/{org}/copilot/subscriptions using the API token scope as admin:org. OAuth2 token is necessary to provide the required authentication for organizational access
- Enterprise:
- All the Business + Copilot Knowladge bases (improve accuracy), fine tuning a custom LLM
- Knowladge base: dedicated repository that holds all the relevant documentation, code, and libraries; make the contents available for enhanced coding suggestions, ensuring that organization-specific practices are reflected in Copilot’s output. The most useful types of knowledge are stored: code snippets, standardized functions, and reusable components from internal repositories
- Provides the ability to manage licenses and users at scale
- Centralized administrative controls (license management, security policies, billing)
- Analyzes commit messages, file changes, and project context to generate a concise pull request summary
- Best option for large organizations with strict privacy and security concerns
- It includes advanced privacy controls, like the ability to configure context exclusions, enforce corporate policy integration, and ensure that sensitive codebases are handled securely. Also allows admins configure policies at the organization level to specify which repositories are enabled for Copilot
- GitHub Copilot Enterprise includes features like enhanced telemetry and more granular data retention policies. Telemetry data allows to monitor usage while complying with internal security policies
Access
- GitHub Copilot Individual is designed for single-user environments and does not include team management features like access control. It is a member of an Organization with subscription.
- Free Copilot Pro: student, teacher or maintainer of a popular OSS project
Configuration
- '.copilot' can be used to disabled suggestions
- 'copilot.yaml' can be used to configure content exclusion
- 'GitHub Copilot editor config' to ensure that code suggestions from Copilot do not incorporate sensitive internal code: use the exclude directive ("exclude": true) in the Copilot editor config file for directories or files
- The "excludePatterns" directive in the editor config file to exclude private repositories or directories from Copilot completions ensure that specific directories, files, or repositories are excluded from GitHub Copilot’s completions, protecting proprietary or internal code from being suggested
- Exclude sensitive content within a repository: Enable "Copilot Exclusion Rules" in the repository's settings, specifying the files and directories containing sensitive information
Security
- Audit Logs: track Copilot usage at a high level (organizational level - user and admin activities related to GitHub Copilot) such as when users enable or disable Copilot in their settings, subscription updates, unauthorized access to GitHub Copilot
- Search for audit events: filters or search queries can tracking administrative actions on Copilot access, e.g, action:copilot.access_enabled to identify events where Copilot access was granted or enabled for organization members
- Admins can search for Copilot-related events by using the query "copilot" within the GitHub organization audit log, filtering results by actor, event type, and date range
- GitHub provides options to configure repository-level exclusion rules to prevent sensitive files and directories from being accessed by GitHub Copilot
- If a policy is applied at enterprise level, all organizations within the enterprise will inherit the policiy
Developer Responsiblity [1]
- Developers should review AI-generated code for security vulnerabilities, licenses, and ensure compliance with internal policies before using it in production. It can mitigate copyright violations and security risks
- The code generated by copilot is available to be used, modified, and distribute by developers as if they had written it manually
- Copilot can generate code that is similar to existing public code, but it does not track or enforce license compliance. Developers must verify that
- Bias: the developer is respobsible for review code to avoid biased suggestions and make appropriate edits to ensure that the open-source project remains inclusive and avoids reinforcing harmful stereotypes