03-26-2026, 11:23 PM
I finally have it working with esphome. Unfortunately I'm unable to use it - or any local device - for a voice assistant. My host is too old without AVX to host Whisper. However I still have it working as a smart speaker able to play media and tts just fine (not easy!). However the volume is too low no matter what I do. I can hear it but barely. From the marketing pictures the speaker appears to be 8 ohm at 3W. I have a few Gen1 Amazon Dots and they appear to be only about 1.2W and are much much louder than the AS. Are there any hints to maximize the volume? I'm running the most current version of Esphome builder and Home Assistant as of the date of this post.
I've commented out volume_max as it is said the default is 100%, but I've tried many values with no improvement.
Thanks for any help
Martin
Here's the yaml
I've commented out volume_max as it is said the default is 100%, but I've tried many values with no improvement.
Thanks for any help
Martin
Here's the yaml
Code:
# Whisper addon cannot be started on the HP Microserver
# Voice Assistant will not work
# AVX instruction set required. Not avail on Turion II
# only Instructionsets: MMX, 3DNow!, SSE, SSE2, SSE3, SSE4A,
# AMD64, AMD-V (AMD Virtualization), and EVP (Enhanced Virus Protection).
esphome:
name: as
friendly_name: AS
platformio_options:
board_build.flash_mode: dio
on_boot:
- light.turn_on:
id: led_ww
blue: 100%
brightness: 60%
effect: fast pulse
esp32:
board: esp32-s3-devkitc-1
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
CONFIG_AUDIO_BOARD_CUSTOM: "y"
psram:
mode: octal # quad for N8R2 and octal for N16R8
speed: 80MHz
# Enable logging
logger:
hardware_uart: USB_SERIAL_JTAG
# Enable Home Assistant API
api:
encryption:
key: "TFpb+pBAvQIS1MVwaA7EoJ2DkpWE+79UvVro7yMyGdU="
ota:
- platform: esphome
password: "******************"
wifi:
ssid: "wifi"
password: "***************************"
fast_connect: True
manual_ip:
static_ip: 192.168.55.127 #WiFi
gateway: 192.168.55.1
subnet: 255.255.255.0
dns1: 192.168.55.1
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Esp32-S3-Wake-Word"
password: "*************"
captive_portal:
button:
- platform: restart
name: "Restart"
id: but_rest
light:
- platform: esp32_rmt_led_strip
id: led_ww
rgb_order: GRB
pin: GPIO16
num_leds: 1
chipset: ws2812
name: "on board light"
effects:
- pulse:
- pulse:
name: "Fast Pulse"
transition_length: 0.5s
update_interval: 0.5s
min_brightness: 0%
max_brightness: 100%
i2s_audio:
- id: i2s_output
i2s_lrclk_pin: GPIO6 #LRC
i2s_bclk_pin: GPIO7 #BLCK
speaker:
- platform: i2s_audio
id: i2s_audio_speaker
dac_type: external
sample_rate: 48000
i2s_dout_pin:
number: GPIO8
bits_per_sample: 32bit
i2s_audio_id: i2s_output
timeout: never
buffer_duration: 100ms
channel: mono
# sample_rate: 16000
# bits_per_sample: 32bit
# Virtual speakers to combine the announcement and media streams together into one output
- platform: mixer
id: mixing_speaker
output_speaker: i2s_audio_speaker
# num_channels: 2
num_channels: 1
source_speakers:
- id: announcement_mixing_input
timeout: never
- id: media_mixing_input
timeout: never
# Vritual speakers to resample each pipelines' audio, if necessary, as the mixer speaker requires the same sample rate
- platform: resampler
id: announcement_resampling_speaker
output_speaker: announcement_mixing_input
sample_rate: 48000
bits_per_sample: 16
- platform: resampler
id: media_resampling_speaker
output_speaker: media_mixing_input
sample_rate: 48000
bits_per_sample: 16
media_player:
- platform: speaker
id: external_media_player
name: Media Player
internal: False
volume_increment: 0.05
volume_min: 0.4
# volume_max: 0.85 # when amp gain connected to ground. Avoids cutting out.
icon: mdi:speaker-wireless
announcement_pipeline:
speaker: announcement_resampling_speaker
format: FLAC # FLAC is the least processor intensive codec
num_channels: 1 # Stereo audio is unnecessary for announcements
sample_rate: 48000
media_pipeline:
speaker: media_resampling_speaker
format: FLAC # FLAC is the least processor intensive codec
# num_channels: 2
num_channels: 1
sample_rate: 48000
on_announcement:
- mixer_speaker.apply_ducking:
id: media_mixing_input
decibel_reduction: 20
duration: 0.0s
