Skip to content

Commit

Permalink
fix: fix typo and add linguist override (#153)
Browse files Browse the repository at this point in the history
Co-authored-by: guofei <[email protected]>
  • Loading branch information
Fei-Guo and Fei-Guo committed Nov 8, 2023
1 parent c997133 commit b53e3ae
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 6 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
presets/test/** linguist-vendored
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Note that the *gpu-provisioner* is not an open sourced component. It can be repl
The following guidance assumes **Azure Kubernetes Service(AKS)** is used to host the Kubernetes cluster .

#### Enable Workload Identity and OIDC Issuer features
The *gpu-povisioner* controller requires the [workload identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=dotnet) feature to acquire the access token to the AKS cluster.
The *gpu-provisioner* controller requires the [workload identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=dotnet) feature to acquire the access token to the AKS cluster.

```bash
export RESOURCE_GROUP="myResourceGroup"
Expand All @@ -40,7 +40,7 @@ az aks update -g $RESOURCE_GROUP -n $MY_CLUSTER --enable-oidc-issuer --enable-wo
```

#### Create an identity and assign permissions
The identity `kaitoprovisioner` is created for the *gpu-povisioner* controller. It is assigned Contributor role for the managed cluster resource to allow changing `$MY_CLUSTER` (e.g., provisioning new nodes in it).
The identity `kaitoprovisioner` is created for the *gpu-provisioner* controller. It is assigned Contributor role for the managed cluster resource to allow changing `$MY_CLUSTER` (e.g., provisioning new nodes in it).
```bash
export SUBSCRIPTION="mySubscription"
az identity create --name kaitoprovisioner -g $RESOURCE_GROUP
Expand Down Expand Up @@ -91,7 +91,7 @@ helm uninstall workspace
```

## Quick start
After installing Kaito, one can try following commands to start a faclon-7b inference service.
After installing Kaito, one can try following commands to start a falcon-7b inference service.
```
$ cat examples/kaito_workspace_falcon_7b.yaml
apiVersion: kaito.sh/v1alpha1
Expand Down
6 changes: 3 additions & 3 deletions presets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The current supported models with preset configurations are listed below. For mo


## Validation
Each model has its own hardware requirements in terms of GPU count and GPU memory. Kaito controller performs a validation check to whether the specified SKU and node count are sufficient to run the model or not. In case the provided SKU in not in the known list, the controller bypasses the validation check which means users need to ensure the model can run with the provided SKU.
Each model has its own hardware requirements in terms of GPU count and GPU memory. Kaito controller performs a validation check of whether the specified SKU and node count are sufficient to run the model or not. In case the provided SKU is not in the known list, the controller bypasses the validation check which means users need to ensure the model can run with the provided SKU.

## Build private images
Kaito has built-in images for the supported falcon models which are hosted in a public registry (MCR). For llama2 models, due to the license constraint, users need to containerize the model inference service manually.
Expand All @@ -29,7 +29,7 @@ The sample docker files and the source code of the inference API server can be f

#### 2. Download models

This step has to be done manually. Llama2 model weights can be downloaded by following the instructions [here](https://github.com/facebookresearch/llama#download).
This step must be done manually. Llama2 model weights can be downloaded by following the instructions [here](https://github.com/facebookresearch/llama#download).
```
export LLAMA_MODEL_NAME=<one of the supported llama2 model names listed above>
export LLAMA_WEIGHTS_PATH=<path to your downloaded model weight files>
Expand Down Expand Up @@ -70,7 +70,7 @@ inference:

## Use inference API servers

The inference API server uses ports 80 and exposes model health check endpoint `/healthz` and server health check endpoint `/`. The inference service is exposed by a Kubernetes service with ClusterIP type by default.
The inference API server uses port 80 and exposes model health check endpoint `/healthz` and server health check endpoint `/`. The inference service is exposed by a Kubernetes service with ClusterIP type by default.

### Case 1: Llama-2 models
| Type | Endpoint|
Expand Down

0 comments on commit b53e3ae

Please sign in to comment.